Closed
Description
Describe the bug
For cosine learning rate schedules, the learning rate should approach zero when the training ends. However, due to accelerator prepare logic a learning rate scheduler will be stepped N times for each optimization step when there are N GPUs. Thus, we need to multiply the steps by num_processes
when constructing schedulers.
Sample learning rate curve running dreambooth example on 4 GPUs:
Reproduction
Run the example scripts passing cosine learning rate schedules.
Logs
No response
System Info
diffusers
version: 0.17.1- Platform: Linux-5.4.0-150-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.13
- PyTorch version (GPU?): 1.12.1 (True)
- Huggingface_hub version: 0.15.1
- Transformers version: 4.30.2
- Accelerate version: 0.20.3
- xFormers version: 0.0.20+1dc3d7a.d20230628
- Using GPU in script?: Y
- Using distributed or parallel set-up in script?: DDP