You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I feel like it made the outputs better on average. Some of the comparisons seem worse, but it seems like it's better more often than it's worse. PhotoMerge is the model which was finetuned, so it's here as a reference.
I did 1 epoch, and only trained the unet. Full settings are below.
Generation settings: 512x512, 99 steps, DPM++ 2M SDE (Exponential), 3.5 CFG a photo of a dog a photo of a cat a photo of a wild animal
Training Settings.
Only difference between the two models is one has Debias Estimation Loss enabled, and the other does not. Same settings and seed were used to train.
Captions:
1_cat: a photo of a cat
1_dog: a photo of a dog
1_wild animal: a photo of a wild animal
11:45:40-002875 INFO Valid image folder names found in: ./AFHQ_v2_512x512/
11:45:40-016875 INFO Folder 1_cat : steps 5065
11:45:40-026928 INFO Folder 1_dog : steps 4678
11:45:40-037220 INFO Folder 1_wild animal : steps 4593
11:45:40-038179 INFO max_train_steps (14336 / 4 / 1 * 1 * 1) = 3584
11:45:40-039539 INFO stop_text_encoder_training = 0
11:45:40-040372 INFO lr_warmup_steps = 0
11:45:40-041709 INFO Saving training config to ./AFHQ_v2-With-Debias/AFHQ_v2-With-Debias_20231118-114540.json...
11:45:40-042814 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="./PhotoMerge_v1-2.safetensors"
--train_data_dir="./AFHQ_v2_512x512/" --resolution="512,512"
--output_dir="./AFHQ_v2-With-Debias" --logging_dir="./AFHQ_v2-With-Debias/logs"
--save_model_as=safetensors --vae="./ft-mse_fp16.safetensors" --output_name="AFHQ_v2-With-Debias" --lr_scheduler_num_cycles="1" --max_data_loader_n_workers="0"
--learning_rate_te="0.0" --learning_rate="1e-06" --lr_scheduler="constant" --train_batch_size="4" --max_train_steps="3584" --save_every_n_epochs="1" --mixed_precision="fp16"
--save_precision="fp16" --seed="6969" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="PagedAdamW" --max_data_loader_n_workers="0"
--bucket_reso_steps=64 --save_every_n_steps="1024" --save_last_n_steps="1" --mem_eff_attn --gradient_checkpointing --sdpa --bucket_no_upscale --noise_offset=0.0 --log_with wandb
--wandb_api_key="REDACTED" --sample_sampler=dpmsolver++
--sample_prompts="./AFHQ_v2-With-Debias/sample/prompt.txt" --sample_every_n_steps="64"
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I feel like it made the outputs better on average. Some of the comparisons seem worse, but it seems like it's better more often than it's worse. PhotoMerge is the model which was finetuned, so it's here as a reference.
I did 1 epoch, and only trained the unet. Full settings are below.
Generation settings:
512x512, 99 steps, DPM++ 2M SDE (Exponential), 3.5 CFG
a photo of a dog
a photo of a cat
a photo of a wild animal
Training Settings.
Only difference between the two models is one has
Debias Estimation Loss
enabled, and the other does not. Same settings and seed were used to train.Captions:
1_cat:
a photo of a cat
1_dog:
a photo of a dog
1_wild animal:
a photo of a wild animal
Gaussian 16 smoothing.
Both versions can be downloaded here.
Beta Was this translation helpful? Give feedback.
All reactions