Suggestion for speeding up `index_for_timestep` by removing sequential `nonzero()` calls in samplers

**Is your feature request related to a problem? Please describe.**
First off, thanks for the great codebase and providing so many resources! I just wanted to provide some insight into an improvement I made for myself, in case you'd like to include it for all samplers. I'm using the `FlowMatchEulerDiscreteScheduler` and after profiling, I've noticed that it's unexpectedly slowing down my training speeds. I'll describe the issue and proposed solution here rather than making a PR, since this would touch a lot of code and perhaps someone on the diffusers team would like to implement it.

**Describe the solution you'd like.**
This line in particular is very slow because it is a for loop `step_indices = [self.index_for_timestep(t, schedule_timesteps) for t in timestep]` and the `self.index_for_timestep()` is calling a nonzero() function which is slow.

https://github.com/huggingface/diffusers/blob/b9e2f886cd6e9182f1bf1bf7421c6363956f94c5/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py#L149

**Describe alternatives you've considered.**
I've changed the code as follows:

```python
# huggingface code
def index_for_timestep(self, timestep, schedule_timesteps=None):
    if schedule_timesteps is None:
        schedule_timesteps = self.timesteps

    indices = (schedule_timesteps == timestep).nonzero()

    # The sigma index that is taken for the **very** first `step`
    # is always the second index (or the last index if there is only 1)
    # This way we can ensure we don't accidentally skip a sigma in
    # case we start in the middle of the denoising schedule (e.g. for image-to-image)
    pos = 1 if len(indices) > 1 else 0

    return indices[pos].item()
```

changed to =>

```python
# my code
def index_for_timestep(self, timestep, schedule_timesteps=None):
    if schedule_timesteps is None:
        schedule_timesteps = self.timesteps

    num_steps = len(schedule_timesteps)
    start = schedule_timesteps[0].item()
    end = schedule_timesteps[-1].item()
    indices = torch.round(((timestep - start) / (end - start)) * (num_steps - 1)).long()

    return indices
```

and

```python
# huggingface code
# self.begin_index is None when scheduler is used for training, or pipeline does not implement set_begin_index
if self.begin_index is None:
    step_indices = [self.index_for_timestep(t, schedule_timesteps) for t in timestep]
```

changed to =>

```python
# my code
# self.begin_index is None when scheduler is used for training, or pipeline does not implement set_begin_index
if self.begin_index is None:
    step_indices = self.index_for_timestep(timestep, schedule_timesteps)
```

**Additional context.**
Just wanted to bring this modification to your attention since it could be a training speedup for folks. 🙂 Especially when someone has a large batch size > 1 and this for loop it occurring with nonzero search operations. Some other small changes might be necessary to ensure compatibility of the function changes, but I suspect it could help everyone. Thanks for the consideration!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggestion for speeding up `index_for_timestep` by removing sequential `nonzero()` calls in samplers #9417

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggestion for speeding up index_for_timestep by removing sequential nonzero() calls in samplers #9417

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Suggestion for speeding up `index_for_timestep` by removing sequential `nonzero()` calls in samplers #9417