-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to extract int8 weights from quantized model #1817
Comments
Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch. |
Hello, I used
for the smoothquant. What I've done is just running the script
I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully. My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model. |
Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using The command i used to quantize: If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57 |
Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel? |
The torch.jit model is well packed for inference so you cannot unpack it and see its weight. |
My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible. |
Yes, can you try this workaround:
|
thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer. |
when loading the quantized model (smoothquant) with
I got
RecursiveScriptModule(original_name=QuantizationDispatchModule)
I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?
The text was updated successfully, but these errors were encountered: