Fixes training resuming: Advanced Dreambooth LoRa Training #6566

steverhoades · 2024-01-13T17:59:30Z

What does this PR do?

Fixes #6482
Part of #6552

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Train

!accelerate launch scripts/train_dreambooth_lora_sdxl_advanced_orig.py \
  --report_to="wandb" \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="./training_set" \
  --output_dir="father_lora_v21" \
  --cache_dir="./dataset_cache_dir" \
  --caption_column="prompt" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of Brian de palma" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4   \
  --gradient_checkpointing \
  --snr_gamma=5.0 \
  --lr_scheduler="cosine_with_restarts" \
  --lr_warmup_steps=0 \
  --repeats=10 \
  --max_train_steps=20 \
  --checkpointing_steps=10 \
  --validation_prompt="a photo of Brian de palma in a suit, looking at the camera" \
  --validation_epochs=1 \
  --with_prior_preservation \
  --class_data_dir="./prior_preservation-man-v2" \
  --num_class_images=110 \
  --class_prompt="a photo of a man" \
  --rank=32 \
  --optimizer="prodigy" \
  --prodigy_safeguard_warmup=True \
  --prodigy_use_bias_correction=True \
  --adam_beta1=0.9 \
  --adam_beta2=0.99 \
  --adam_weight_decay=0.01 \
  --train_text_encoder \
  --learning_rate=1 \
  --text_encoder_lr=1 \
  --resume_from_checkpoint="checkpoint-10" \
  --seed="0"

Resume

!accelerate launch scripts/train_dreambooth_lora_sdxl_advanced.py \
  --report_to="wandb" \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --dataset_name="./training_set_ti" \
  --output_dir="father_lora_v1-test" \
  --cache_dir="./dataset_cache_dir" \
  --caption_column="prompt" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of TOK man" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4   \
  --gradient_checkpointing \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=20 \
  --checkpointing_steps=10 \
  --checkpoints_total_limit=10 \
  --validation_prompt="a photo of TOK man in a suit, looking directly at the camera" \
  --validation_epochs=1 \
  --with_prior_preservation \
  --class_data_dir="./prior_preservation-man-v2" \
  --num_class_images=100 \
  --class_prompt="a photo of man" \
  --rank=32 \
  --optimizer="prodigy" \
  --prodigy_safeguard_warmup=True \
  --prodigy_use_bias_correction=True \
  --adam_beta1=0.9 \
  --adam_beta2=0.99 \
  --adam_weight_decay=0.01 \
  --learning_rate=1 \
  --text_encoder_lr=1 \
  --train_text_encoder_ti \
  --train_text_encoder_frac=0.5 \
  --resume_from_checkpoint="checkpoint-10" \
  --seed="0"

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

sayakpaul

Thanks for the PR. Let's also include the training example commands as instructed here.

HuggingFaceDocBuilderDev · 2024-01-16T04:22:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

linoytsaban · 2024-01-16T08:42:40Z

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

LGTM! Thanks for the PR! 🔥

sayakpaul · 2024-01-16T09:01:01Z

Thanks for getting this in. Much appreciated!

…ce#6566) * Fixes huggingface#6418 Advanced Dreambooth LoRa Training * change order of import to fix nit * fix nit, use cast_training_params * remove torch.compile fix, will move to a new PR * remove unnecessary import

steverhoades mentioned this pull request Jan 13, 2024

[Tracker] fix training resuming problem when using FP16 in the examples #6552

Closed

5 tasks

linoytsaban self-assigned this Jan 15, 2024