Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to extract int8 weights from quantized model #1817

Open
chensterliu opened this issue May 25, 2024 · 8 comments
Open

how to extract int8 weights from quantized model #1817

chensterliu opened this issue May 25, 2024 · 8 comments
Assignees
Labels
aitce AI TCE to handle it firstly

Comments

@chensterliu
Copy link

when loading the quantized model (smoothquant) with

from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)

I got
RecursiveScriptModule(original_name=QuantizationDispatchModule)
I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?

@srinarayan-srikanthan
Copy link

Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch.

@srinarayan-srikanthan srinarayan-srikanthan self-assigned this May 28, 2024
@srinarayan-srikanthan srinarayan-srikanthan added the aitce AI TCE to handle it firstly label May 28, 2024
@chensterliu
Copy link
Author

Hello, I used

neural_compressor             2.5.1
intel-extension-for-pytorch   2.3.0 

for the smoothquant. What I've done is just running the script
neural compressor/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py with arguments as follows:

    python -u run_clm_no_trainer.py \
        --model "facebook/opt-125m \
        --dataset "lambada" \
        --approach "static" \
        --output_dir "quan_out" \
        --quantize \
        --batch_size 16 \
         --ipex --int8_bf16_mixed --sq --alpha 0.5

I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully.

My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model.

@srinarayan-srikanthan
Copy link

Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using
from neural_compressor.utils.pytorch import load
qmodel = load("./saved_results")

The command i used to quantize:
python run_clm_no_trainer.py --dataset "lambada" --model facebook/opt-125m --quantize --batch_size 16 --sq --alpha 0.5 --ipex --output_dir "./saved_results" --int8_bf16_mixed

If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57

@chensterliu
Copy link
Author

Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel?
Screenshot_2024-06-06_15-07-34

@srinarayan-srikanthan
Copy link

The torch.jit model is well packed for inference so you cannot unpack it and see its weight.

@chensterliu
Copy link
Author

My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible.

@srinarayan-srikanthan
Copy link

srinarayan-srikanthan commented Jun 26, 2024

Yes, can you try this workaround:

# Function to extract constants
def extract_constants(frozen_model):
    constants = {}
    for node in frozen_model.graph.nodes():
           if node.output().type().isSubtypeOf(torch._C.TensorType.get()):
                constant_name = node.output().debugName()
                constant_value = node.output().toIValue()
                constants[constant_name] = constant_value
    return constants

# Extract and print constants
constants = extract_constants(a) #your model
print("Freezed Model Constants:")
for name, value in constants.items():
    print(f"{name}: {value}")

@chensterliu
Copy link
Author

thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aitce AI TCE to handle it firstly
Projects
None yet
Development

No branches or pull requests

2 participants