You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The downsampling ratio of VAE in maisai is 4, i.e. a [128, 128, 128] patch results in a [32, 32, 32] latent.
While 32 ^ 3 = 32768 is very large for a transformer-based model.
Is there any pre-trained VAE with a larger down-sampling ratio like 8 ([128, 128, 128]->[16,16,16]?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The downsampling ratio of VAE in maisai is 4, i.e. a [128, 128, 128] patch results in a [32, 32, 32] latent.
While 32 ^ 3 = 32768 is very large for a transformer-based model.
Is there any pre-trained VAE with a larger down-sampling ratio like 8 ([128, 128, 128]->[16,16,16]?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions