New Quant: flux1-dev-bnb-nf4-v2.safetensors #1079
Replies: 19 comments 26 replies
-
please add a support flux 6gb model.. thanks |
Beta Was this translation helpful? Give feedback.
-
I was trying out the difference of Queue/Async/CPU/shared to my speed and then this post appeared and so I downloaded and git pulled again to try some speed of v2 too. But perhaps the options and models difference will matter to 8gb but not 12gb as all the speed seem similar. flux1-dev-bnb-nf4: batch count 4 flux1-dev-bnb-nf4-v2: batch count 2 batch size 2 if anyone's 4070 gives faster it/s, please share your secret to me |
Beta Was this translation helpful? Give feedback.
-
Does this model now work with Loras? |
Beta Was this translation helpful? Give feedback.
-
这个模型国内下载真的好慢,fuck GFW! |
Beta Was this translation helpful? Give feedback.
-
I've noticed a noticeable increased in VRAM usage with v2, for example, generating a relatively high res image 1792x1008, v1 was using around 7.4GB VRAM on my 4060, while v2 gave me OOM error, and the VRAM usage went up to like 8.8GB VRAM usage causing some slow down in inference speed. |
Beta Was this translation helpful? Give feedback.
-
Does "Distilled CFG Scale" slider works for anyone? I changing it (with fixed seed and CFG Scale=1), but generated image always stay the same. Both checkpoints flux1-dev-bnb-nf4 and flux1-dev-bnb-nf4-v2.version: f2.0.1v1.10.1-previous-260-gaadc0f04 • python: 3.10.6 • torch: 2.3.1+cu121 • xformers: N/A • gradio: 4.40.0 |
Beta Was this translation helpful? Give feedback.
-
@lllyasviel for the President 2024!
|
Beta Was this translation helpful? Give feedback.
-
My comparison between NF4 V1 ,V2 and FP8. ( Euler + Beta, Seed - 114, 1024 x 1024, CFG - 4)
|
Beta Was this translation helpful? Give feedback.
-
Have to test if is better than nf4v1 where is loosing a lot details comparing to fp8. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
@lllyasviel is there a quant like this for schnell? |
Beta Was this translation helpful? Give feedback.
-
jfyi, on simple prompts it seems nf4v2 is somewhat more balanced; when compared to fp8 the composition is still very different. |
Beta Was this translation helpful? Give feedback.
-
Waiting for more tests from the gods |
Beta Was this translation helpful? Give feedback.
-
fp16/8/nf4V2/nf4 series on RTX3090, shared/async swap. Can't comment on memory usage in this case (because of async?), and maybe some leak kept increasing usage (which eventually reached full VRAM for fp16 and fp8 but simply slowed to a crawl instead of errorring out). Best case for fp16 was 18GB VRAM, worst was OOM error, sometimes straight on the first generation after launching the UI, I can't reproduce this behavior reliably, feels almost random. Best case speed: fp8 stayed closest to fp16 in general composition, nf4 variants significantly different but extremely similar to each other. Although some objects from fp16 appear in nf4 but not in fp8 (billboard obviously, also some tree branches). Only fp16 didn't fail the text (it did fail with 20 steps, that's the only reason for using 30) but this is not the rule, in other prompts fp16 failed and fp8 didn't. nf4v2 is the only one that produced damaged letters, even though fp8 added nonsense- it was readable at least. Very fancy detailed photorealistic forest, mountains in background, with large sign in ground that says "this is a text test", five old people standing for a group photo Short prompt: |
Beta Was this translation helpful? Give feedback.
-
I was testing a merged version of schnell+dev and was getting good results with only 4 steps and an average of 18s to generate each image. Now I'm testing this new version dev-bnb-nf4-v2 and I only get reasonably good images with 10 steps, which takes up to 50s to generate the image. A significant increase in generation time. How many steps are you using this new version to get a reasonable image? My setup is: Please leave your comments Thank in advance |
Beta Was this translation helpful? Give feedback.
-
Comparison between different Distilled CFGs using the nf4 (v1) model: https://www.dropbox.com/scl/fi/ewdrxrggz8u01ld681emh/CFG-Flux.jpg?rlkey=qppaj6mfsn2s3usfzktm0nck8&st=bg0hgs2e&dl=0 |
Beta Was this translation helpful? Give feedback.
-
how to use with diffusrrs? |
Beta Was this translation helpful? Give feedback.
-
Can someone please explain what all of these different versions of Flux models are ? For this NF4 alone, there's so many other versions like GGUF4, BNB4, Schnell_Dev, BNB_NF4, GGUF4_K, GGUF4_K_S, etc with them and the more I search the more there are. All with different file sizes. What is the difference between all of these in term of speed and Quality and which ones are best optimized for speed and which ones for Quality ? Is there some article or page that explains these in simple terms ? |
Beta Was this translation helpful? Give feedback.
-
Hello! Im very green! I try to start using flux1-dev-bnb-nf4 with python, but I cantt understand. Its really difficult for me.
Can someone help me? |
Beta Was this translation helpful? Give feedback.
-
See also: https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4-v2.safetensors
Copied from readme:
Always use V2 by default.
V2 is quantized in a better way to turn off the second stage of double quant.
V2 is 0.5 GB larger than the previous version, since the chunk 64 norm is now stored in full precision float32, making it much more precise than the previous version. Also, since V2 does not have second compression stage, it now has less computation overhead for on-the-fly decompression, making the inference a bit faster.
The only drawback of V2 is being 0.5 GB larger.
Beta Was this translation helpful? Give feedback.
All reactions