Skip to content

Question about non-convergence of training autoencoderkl. #11221

Closed
@xiaoli1996

Description

@xiaoli1996

Describe the bug

When training the Autoencoderkl model, its loss does not converge on the ImageNet dataset. Unlike
this.

Reproduction

Script

accelerate launch --multi_gpu --num_processes=2 --gpu_ids=0,1 \
     train_autoencoderkl.py \
    --pretrained_model_name_or_path stabilityai/sd-vae-ft-mse \
    --max_train_steps 850000 \
    --validation_steps 100 \
    --checkpointing_steps 1000 \
    --gradient_accumulation_steps 2 \
    --learning_rate 4.5e-6 \
    --lr_scheduler cosine \
    --report_to wandb \
    --mixed_precision bf16 \
    --train_batch_size 8 \
    --dataloader_num_workers 16 \
    --output_dir autoencoderkl-model/imagenet \
    --train_data_dir /datasets/image/imagenet-test/train \
    --validation_image ./val/ILSVRC2012_val_00000293.JPEG ./val/ILSVRC2012_val_00002138.JPEG \
    --resolution 128 \

Logs

Image

Logs

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.17
  • Running on Google Colab?: No
  • Python version: 3.8.20
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.30.1
  • Transformers version: 4.46.3
  • Accelerate version: 1.0.1
  • PEFT version: not installed
  • Bitsandbytes version: 0.45.4
  • Safetensors version: 0.5.3
  • xFormers version: 0.0.28.post1
  • Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
    NVIDIA GeForce RTX 3090, 24576 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@lavinal712

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions