You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just a quick comparison of the speed and image quality of the two new fp8 options added today.
Testing was done on an RTX2060 6GB with counterfeitXLv2 as my test model. The same workflow was used for each test.
The workflow I used is very basic and not ideal in most cases. I wanted raw unassisted outputs.
--fp8_e4m3fn-unet:
seed: 420:
27.98s, 25 steps dpm++2m 1.03it/s
--fp8_e5m2-unet
seed: 420:
25.68s, 25 steps dpm++2m 1.09it/s
fp16
seed: 420:
25 steps, speed irrelevant for reasons explained in my notes.
Other notes:
Model loading time was a bit longer that fp16 as expected. I did multiple generations before recording the final speed to ensure that everything was cached. Because of my limited vram, some ram offloading is required for fp16 so it is much slower than both. That test was only done to compare image quality/similarity.
I've yet to find something that doesn't work with the new fp8 options. I've gotten great results when using LCM and turbo.
Conclusion:
They're both fine. Neither is objectively better, each maintains some aspects of the fp16 version that the other lacks. From other seeds I've tried(not included here to avoid filling this post with images), I think I prefer --fp8_e5m2-unet a little bit more. If you have limited vram fp8 is a no-brainer.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Just a quick comparison of the speed and image quality of the two new fp8 options added today.
Testing was done on an RTX2060 6GB with counterfeitXLv2 as my test model. The same workflow was used for each test.
The workflow I used is very basic and not ideal in most cases. I wanted raw unassisted outputs.
--fp8_e4m3fn-unet:
seed: 420:
27.98s, 25 steps dpm++2m 1.03it/s
--fp8_e5m2-unet
seed: 420:
25.68s, 25 steps dpm++2m 1.09it/s
fp16
seed: 420:
25 steps, speed irrelevant for reasons explained in my notes.
Other notes:
Model loading time was a bit longer that fp16 as expected. I did multiple generations before recording the final speed to ensure that everything was cached. Because of my limited vram, some ram offloading is required for fp16 so it is much slower than both. That test was only done to compare image quality/similarity.
I've yet to find something that doesn't work with the new fp8 options. I've gotten great results when using LCM and turbo.
Conclusion:
They're both fine. Neither is objectively better, each maintains some aspects of the fp16 version that the other lacks. From other seeds I've tried(not included here to avoid filling this post with images), I think I prefer --fp8_e5m2-unet a little bit more. If you have limited vram fp8 is a no-brainer.
Beta Was this translation helpful? Give feedback.
All reactions