Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained Model Reload + SparseGPT Support #31

Merged
merged 3 commits into from
Apr 23, 2024
Merged

Pretrained Model Reload + SparseGPT Support #31

merged 3 commits into from
Apr 23, 2024

Conversation

Satrat
Copy link
Contributor

@Satrat Satrat commented Apr 19, 2024

Adding in helper functions to support reloading a quantized model from config with SparseAutoModel. This has a few steps:

  1. Load the model, apply any sparsity decompression, modify save_pretrained ( currently done in sparseML, but will move here soon!)
  2. Apply quantization config to model if one exists in config.json. This does NOT initialize any of the scale and zero points, there are empty Parameters after apply_quantization_config() is called
  3. Fill in the scale and zero points from the safetensors file with a new load_pretrained_quantization() function. This loops through the leaf modules and grabs the scale/zp from the safetensors file(s) at model_path

Example Usage (would be in SparseAutoModel.from_pretrained)

quantization_config = QuantizationConfig.from_model_config(pretrained_model_name_or_path)

model = super(AutoModelForCausalLM, cls).from_pretrained(pretrained_model_name_or_path)

# deal with sparsity compression, model modification here...

apply_quantization_config(model, quantization_config)
load_pretrained_quantization(model, pretrained_model_name_or_path)

@dbogunowicz I know a lot of the UX is going to change with your refactor, but I needed to get something up and running for testing. This is just adding in the helper functions that your UX could will eventually call

Associated SparseML branch: neuralmagic/sparseml#2246

Quick Note on SparseGPT/OBCQ

In the new fake_quantize implementation we overwrite the weights parameter in the forward call (forward.py)

self.weight.data = _maybe_calibrate_or_quantize(module, self.weight, "weight", scheme.weights)

This didn't happen in the old implementation, we never overwrote the actual parameter so the original unquantized weight was saved. This new implementation messes up OBCQ because we rely on the error between the unquantized and the quantized weight. As a workaround for now, I'm cloning the original weight then restoring it after the forward pass

Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Let's chat today about the design of the new SparseAutoModelForCausalLM, I think it is high time we pieced all elements together.

@Satrat Satrat merged commit 67005d7 into main Apr 23, 2024
2 checks passed
@Satrat Satrat deleted the sa/model_reload branch April 23, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants