Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized custom flux model was still bfloat16 #27

Open
samedii opened this issue Nov 20, 2024 · 8 comments
Open

Quantized custom flux model was still bfloat16 #27

samedii opened this issue Nov 20, 2024 · 8 comments

Comments

@samedii
Copy link

samedii commented Nov 20, 2024

Hi, thanks for sharing your very efficient quantization method!

I was trying it out on a custom flux model and was surprised to see the saved model was the same size as the original bfloat16. I suspect the errors might be large and it decided to keep bfloat16 rather than quantizing.

When I looked in model.pt everything was bfloat16 and the wgts.pt file showed this:

single_transformer_blocks.37.attn.to_q {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}
single_transformer_blocks.37.attn.to_k {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}
single_transformer_blocks.37.attn.to_v {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}
single_transformer_blocks.37.proj_out.linears.0 {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}
single_transformer_blocks.37.proj_mlp {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}
single_transformer_blocks.37.proj_out.linears.1.linear {'channels_dim': None, 'scale': None, 'zero': None, 'dynamic_range': None, 'range_bound': None, 'quant_range': None}

These are some logs from running quantization:

24-11-20 23:53:46 | D |           - beta        = [    0.0000,     0.0000,     0.0000,     0.0000,   
  0.0000]                                                                                            
24-11-20 23:53:46 | D |           - sum  error  = [ 2540.9990,  2869.0414,  3201.9792,  3683.2138,  4
107.1776]                                                                                            
24-11-20 23:53:46 | D |           - best error  = [ 1343.7403,  1343.7403,  1343.7403,  1343.7403,  1
343.7403]                                                                                            
24-11-20 23:53:46 | D |           - alpha       = [    0.0500,     0.1000,     0.1500,     0.2000,   
  0.2500]                                                                                            
24-11-20 23:53:46 | D |           - beta        = [    0.9500,     0.9000,     0.8500,     0.8000,   
  0.7500]                                                                                            
24-11-20 23:53:46 | D |           - sum  error  = [ 6694.6653,  6413.5553,  5777.3939,  4848.7699,  3
967.3542]                                                                                            
24-11-20 23:53:46 | D |           - best error  = [ 1343.7403,  1343.7403,  1343.7403,  1343.7403,  1
343.7403]                                                                                            
24-11-20 23:53:46 | D |           - alpha       = [    0.3000,     0.3500,     0.4000,     0.4500,   
  0.5000]                                                                                            
24-11-20 23:53:46 | D |           - beta        = [    0.7000,     0.6500,     0.6000,     0.5500,   
  0.5000]                                                                                            
24-11-20 23:53:46 | D |           - sum  error  = [ 3603.3264,  3263.8088,  2801.4785,  2564.5048,  2
464.6804]                                                                                            
24-11-20 23:53:46 | D |           - best error  = [ 1343.7403,  1343.7403,  1343.7403,  1343.7403,  1
343.7403]                                                                                            
24-11-20 23:53:46 | D |           - alpha       = [    0.5500,     0.6000,     0.6500,     0.7000,   
  0.7500]                                                                                            24-11-20 23:53:46 | D |           - beta        = [    0.4500,     0.4000,     0.3500,     0.3000,   
  0.2500]                                                                                            24-11-20 23:53:46 | D |           - sum  error  = [ 2553.2303,  2514.6087,  2637.9077,  2787.7934,  3
063.0457]                                                                                            24-11-20 23:53:46 | D |           - best error  = [ 1343.7403,  1343.7403,  1343.7403,  1343.7403,  1
343.7403]                                                                                            24-11-20 23:53:46 | D |           - alpha       = [    0.8000,     0.8500,     0.9000,     0.9500]   
24-11-20 23:53:46 | D |           - beta        = [    0.2000,     0.1500,     0.1000,     0.0500]   24-11-20 23:53:46 | D |           - sum  error  = [ 3492.8151,  3789.2714,  3881.0138,  4274.4637]   
24-11-20 23:53:46 | D |           - best error  = [ 1343.7403,  1343.7403,  1343.7403,  1343.7403]   24-11-20 23:53:46 | D |         + error = 1343.7403                                                  
24-11-20 23:53:46 | D |         + scale = [min=0.3113, max=2.8596]                                   24-11-20 23:53:47 | D |       - transformer_blocks.1.attn add_qkv_proj                               
24-11-20 23:53:47 | D |         + w: sint4                                                           24-11-20 23:53:47 | D |         + x: sint4        
24-11-20 23:53:47 | D |         + y: None                                                            24-11-20 23:53:47 | D |         + tensor_type: TensorType.Weights, objective: SearchBasedCalibObjecti
ve.OutputsError, granularity: SearchBasedCalibGranularity.Layer                                      24-11-20 23:53:47 | D |         + finished parsing calibration arguments, ram usage: 15.7            
24-11-20 23:53:47 | D |         + x - AbsMax                                                         24-11-20 23:53:47 | D |         + x  = [min=0.1094, max=39.5000]                                     
24-11-20 23:53:47 | D |         + w - AbsMax                                                         24-11-20 23:53:47 | D |         + w  = [min=0.1177, max=0.3984]                                      
24-11-20 23:53:47 | D |         + finished reseting calibrator, ram usage: 15.7                      24-11-20 23:53:48 | D |         + finished calculating the original outputs, ram usage: 15.7

I'm trying again tonight but I suspect I will see the same issue this time.

Do you have any suggestions?

@samedii
Copy link
Author

samedii commented Nov 21, 2024

It did not work but maybe the output is correct already and I just need to convert it?

I tried reproducing the flux.1-dev results and I get very similar error sizes:

24-11-21 20:32:31 | D |         + x  = [min=0.0598, max=1.6406]                                                                                                                                            
24-11-21 20:32:31 | D |         + w - AbsMax                                                                                                                                                               
24-11-21 20:32:31 | D |         + w  = [min=0.0981, max=0.4082]                                                                                                                                            
24-11-21 20:32:31 | D |         + finished reseting calibrator, ram usage: 14.7                                                                                                                            
24-11-21 20:32:33 | D |         + finished calculating the original outputs, ram usage: 14.7                                                                                                               
24-11-21 20:34:44 | D |           - x / w range = AbsMax / AbsMax                                                                                                                                          
24-11-21 20:34:44 | D |           - alpha       = [    0.0000,     0.0500,     0.1000,     0.1500,     0.2000]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.0000,     0.0000,     0.0000,     0.0000,     0.0000]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1828.6648,  1724.5615,  1632.4606,  1553.8120,  1493.8046]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1828.6648,  1724.5615,  1632.4606,  1553.8120,  1493.8046]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.2500,     0.3000,     0.3500,     0.4000,     0.4500]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.0000,     0.0000,     0.0000,     0.0000,     0.0000]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1439.3716,  1404.2956,  1382.7623,  1370.8294,  1373.6909]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1439.3716,  1404.2956,  1382.7623,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.5000,     0.5500,     0.6000,     0.6500,     0.7000]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.0000,     0.0000,     0.0000,     0.0000,     0.0000]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1380.6241,  1400.4539,  1425.3653,  1461.7289,  1518.9166]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.7500,     0.8000,     0.8500,     0.9000,     0.9500]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.0000,     0.0000,     0.0000,     0.0000,     0.0000]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1575.1668,  1651.5349,  1726.3799,  1817.4276,  1918.2578]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.0500,     0.1000,     0.1500,     0.2000,     0.2500]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.9500,     0.9000,     0.8500,     0.8000,     0.7500]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 2283.5662,  2121.8141,  1968.3675,  1835.7584,  1728.6590]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.3000,     0.3500,     0.4000,     0.4500,     0.5000]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.7000,     0.6500,     0.6000,     0.5500,     0.5000]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1634.0025,  1553.8838,  1489.0616,  1451.9831,  1437.2101]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.5500,     0.6000,     0.6500,     0.7000,     0.7500]                                                                                             
24-11-21 20:34:44 | D |           - beta        = [    0.4500,     0.4000,     0.3500,     0.3000,     0.2500]                                                                                             
24-11-21 20:34:44 | D |           - sum  error  = [ 1425.6216,  1439.6716,  1465.1432,  1497.7786,  1554.3564]                                                                                             
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                             
24-11-21 20:34:44 | D |           - alpha       = [    0.8000,     0.8500,     0.9000,     0.9500]                                                                                                         
24-11-21 20:34:44 | D |           - beta        = [    0.2000,     0.1500,     0.1000,     0.0500]                                                                                                         
24-11-21 20:34:44 | D |           - sum  error  = [ 1625.8374,  1704.7249,  1792.3400,  1906.8528]                                                                                                         
24-11-21 20:34:44 | D |           - best error  = [ 1370.8294,  1370.8294,  1370.8294,  1370.8294]                                                                                                         
24-11-21 20:34:44 | D |         + error = 1370.8294                                                                                                                                                        
24-11-21 20:34:44 | D |         + scale = [min=0.3241, max=1.2190]                                                                                                                                         
24-11-21 20:34:44 | D |       - transformer_blocks.0.ff.up_proj                                                                                                                                            
24-11-21 20:34:44 | D |         + w: sint4                                                                                                                                                                 
24-11-21 20:34:44 | D |         + x: sint4                                                                                                                                                                 
24-11-21 20:34:44 | D |         + y: None                                                                                                                                                                  
24-11-21 20:34:44 | D |         + tensor_type: TensorType.Weights, objective: SearchBasedCalibObjective.OutputsError, granularity: SearchBasedCalibGranularity.Layer                                       
24-11-21 20:34:44 | D |         + finished parsing calibration arguments, ram usage: 14.6                                                                                                                  
24-11-21 20:34:45 | D |         + x - AbsMax                                                                                                                                                               
24-11-21 20:34:45 | D |         + x  = [min=0.0312, max=5.2188]                                                                                                                                            
24-11-21 20:34:45 | D |         + w - AbsMax                                                                                                                                                               
24-11-21 20:34:45 | D |         + w  = [min=0.0309, max=0.4395]                                                                                                                                            
24-11-21 20:34:45 | D |         + finished reseting calibrator, ram usage: 14.7                                                                                                                            
24-11-21 20:34:58 | D |         + finished calculating the original outputs, ram usage: 17.9 

@samedii
Copy link
Author

samedii commented Nov 22, 2024

Do you have any advice? @lmxyy @synxlin Any help would be greatly appreciated 🙏

@Howe2018
Copy link

I have the same question: how can a quantized checkpoint be converted into a safetensor format model that can be loaded in Nunchaku? hope @lmxyy can provide some assistance.

@synxlin
Copy link
Contributor

synxlin commented Nov 25, 2024

Hi @samedii and @Howe2018,

As for your question, DeepCompressor dumps floating-point dequantized weight in the checkpoint model.pt. You can convert the floating-point dequantized weights via standard quantization. scale.pt contains the quantization scaling information (e.g., searched min/max value during quantization).

We are currently working on a script to convert the checkpoint of DeepCompressor to the Nunchaku format. We'll keep this issue updated and notify you when the conversion script is released.

Let us know if you have any specific requirements or suggestions!

@kelisiya
Copy link

mark

@BBuf
Copy link

BBuf commented Dec 5, 2024

Hi @samedii and @Howe2018,

As for your question, DeepCompressor dumps floating-point dequantized weight in the checkpoint model.pt. You can convert the floating-point dequantized weights via standard quantization. scale.pt contains the quantization scaling information (e.g., searched min/max value during quantization).

We are currently working on a script to convert the checkpoint of DeepCompressor to the Nunchaku format. We'll keep this issue updated and notify you when the conversion script is released.

Let us know if you have any specific requirements or suggestions!

Any progress on script to convert the checkpoint of DeepCompressor to the Nunchaku format? thanks!

@yyfcc17
Copy link

yyfcc17 commented Dec 10, 2024

i noticed this issue too, after running the code. there are:

branch.pt
model.pt
scale.pt
smooth.pt
wgts.pt

but it cannot be loaded in Nunchaku, the converting script is missing?

@vvmatorin
Copy link

vvmatorin commented Dec 20, 2024

Hi @samedii and @Howe2018,

As for your question, DeepCompressor dumps floating-point dequantized weight in the checkpoint model.pt. You can convert the floating-point dequantized weights via standard quantization. scale.pt contains the quantization scaling information (e.g., searched min/max value during quantization).

We are currently working on a script to convert the checkpoint of DeepCompressor to the Nunchaku format. We'll keep this issue updated and notify you when the conversion script is released.

Let us know if you have any specific requirements or suggestions!

@lmxyy @synxlin Hi, do you have any updates on the conversion scripts to nunchaku format for diffusion models?
Doesn't having converted models on the HF imply you already have and used them, what's holding back the release? I would be glad to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants