Wrong learning rate scheduler training step count for examples with multi-gpu when setting `--num_train_epochs`

### Describe the bug

I think there are still some problems with the learning rate scheduler. This is resolved when you set `--max_train_steps`, as discussed in #3954 , but not completely.

For example, the code snippet https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py#L816-L833 . I paste it here:
```python
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
    args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
    overrode_max_train_steps = True

lr_scheduler = get_scheduler(
    args.lr_scheduler,
    optimizer=optimizer,
    num_warmup_steps=args.lr_warmup_steps * accelerator.num_processes,
    num_training_steps=args.max_train_steps * accelerator.num_processes,
)

# Prepare everything with our `accelerator`.
unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
    unet, optimizer, train_dataloader, lr_scheduler
)
```

When setting `--num_train_epochs` instead of `--max_train_steps`, the calculation of `num_update_steps_per_epoch` is incorrect because `train_dataloader` has not yet been wrapped by `accelerator.prepare`. Consequently, `args.max_train_steps` is roughly `num_processes` times the actual value. This discrepancy leads to unintended values being passed into the `get_scheduler` function.

In fact, the logic here is quite confusing. It seems like a refactoring might be necessary.

### Reproduction

```diff
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
  ...
-  --max_train_steps=15000 \
+  --num_train_epochs=100 \
  ...
```

### Logs

_No response_

### System Info

- `diffusers` version: 0.27.2
- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.9.17
- PyTorch version (GPU?): 2.0.1 (False)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.30.0
- Accelerate version: 0.21.0
- xFormers version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no

### Who can help?

@sayakpaul @yiyixuxu @eliphatfs 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong learning rate scheduler training step count for examples with multi-gpu when setting `--num_train_epochs` #8236

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong learning rate scheduler training step count for examples with multi-gpu when setting --num_train_epochs #8236

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Wrong learning rate scheduler training step count for examples with multi-gpu when setting `--num_train_epochs` #8236