You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @yunkchen. I noticed you use 10M images from the SAM dataset to adapt the HunyuanDiT with VAE. However, SAM images do not have text annotation. How do you use it to adapt HunyuanDiT? I think it may degenerate the ability of T2I.
The text was updated successfully, but these errors were encountered:
Hi @yunkchen, I think it is necessary to contact you again about the VAE adaption. I tried to fine-tune the transformer at 256x256, but it seems that HunYuanDiT does not provide any initialization for resolution 256. After training, the image generation quality is largely degraded (either at resolution 256 or resolution 1024). Any suggestion on this? Thanks.
Hi, @yunkchen. I noticed you use 10M images from the SAM dataset to adapt the HunyuanDiT with VAE. However, SAM images do not have text annotation. How do you use it to adapt HunyuanDiT? I think it may degenerate the ability of T2I.
The text was updated successfully, but these errors were encountered: