Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Update LLM entry-point #987

Merged
merged 14 commits into from
Aug 20, 2024
Merged

Conversation

nickfraser
Copy link
Collaborator

@nickfraser nickfraser commented Jul 16, 2024

Addresses #889. Updated entry-point to leverage many features of our optimum-amd integration effort, as well as update the example to use available quantizers. Builds on #977 Now merged.

Todo:

  • Use dataset utils from optimum-amd
  • Update to use fx tracing method from optimum-amd Not necessary, already implemented
  • Decompose quantizer generation and model quantization (like SDXL example)
  • Test and fix all pre-quantization and PTQ techniques:
    • --ln-affine-merge fails
    • --weight-equalization
    • --act-equalization layerwise
    • --act-equalization fx
    • --bias-corr
    • --act-calibration
    • --gptq
  • Test --replace-mha
  • Update interface and add MX datatypes (depends on Feat: Support for Groupwise (MX) quantization #971)
  • Allow optional quantization of first (embedded) layer Disabled, see comment
  • Allow optional quantization of the last layer
  • Test various export flows:
    • ONNX QCDQ
    • torch QCDQ
    • torchmlir fails, won't fix in this PR
    • torchmlir (packed weights) fails, won't fix in this PR

@nickfraser nickfraser added the next release PRs which should be merged for the next release label Aug 14, 2024
@nickfraser nickfraser self-assigned this Aug 15, 2024
@nickfraser nickfraser marked this pull request as ready for review August 15, 2024 15:54
@nickfraser
Copy link
Collaborator Author

@Giuseppe5, I've removed the quant_embedding support. It is currently broken since channelwise scaling is not supported for QuantEmbedding layers, but our weight quantizers all use channelwise scaling for the generative examples.

Quantizing the embedding seems to have limited utility anyway because:

  • If "input quantization" is enabled, the linear layers in the first decoder layer are quantized anyway
    • It only adds quantization of the first residual path on the first attention layer
  • If input/embedding quantization is enabled, re-quantisation may occur at the first attention layer
  • The storage benefits from quantizing the embedded lookup are usually minimal



@torch.no_grad()
def add_zero_bias_to_linear(model: torch.nn.Module) -> torch.nn.Module:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this here for loading checkpoint + bias correction?
We have a context manager for that now (load_quant_model in graph/calibrate)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no it's for accelerate compatibility, nevermind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is to make bias correction work with accelerate properly. Can this also be handled by the context manager you mentioned?


from brevitas.graph.calibrate import bias_correction_mode


@torch.no_grad()
def apply_bias_correction(model, dataloader):
with bias_correction_mode(model):
for inps in dataloader:
for inps in tqdm(dataloader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add tqdm as required dependency (in examples requirements but at this point maybe everywhere)

default=None,
help="Filename to save checkpoint. If `None`, no checkpoint is saved (default: %(default)s)")
add_bool_arg(
parser, 'use-ocp', default=False, help='Use OCP format for float quantization. Default: False')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this and then I'll update the entrypoint here same style as the stable diffusion one in #971

@Giuseppe5
Copy link
Collaborator

Save for the tqdm requirements, LGTM

@nickfraser
Copy link
Collaborator Author

nickfraser commented Aug 20, 2024

There currently is no requirements file for the LLM example. I'm adding one in #1002. I'll add tdqm to the dependencies there.

@nickfraser nickfraser merged commit b9eecf7 into Xilinx:dev Aug 20, 2024
337 checks passed
@nickfraser nickfraser deleted the llm_entrypoint_update branch August 20, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release PRs which should be merged for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants