Feat: Update LLM entry-point #987

nickfraser · 2024-07-16T11:16:33Z

Addresses #889. Updated entry-point to leverage many features of our optimum-amd integration effort, as well as update the example to use available quantizers. ~~Builds on #977~~ Now merged.

Todo:

…rrection.

…abled.

nickfraser · 2024-08-19T11:10:31Z

@Giuseppe5, I've removed the quant_embedding support. It is currently broken since channelwise scaling is not supported for QuantEmbedding layers, but our weight quantizers all use channelwise scaling for the generative examples.

Quantizing the embedding seems to have limited utility anyway because:

If "input quantization" is enabled, the linear layers in the first decoder layer are quantized anyway
- It only adds quantization of the first residual path on the first attention layer
If input/embedding quantization is enabled, re-quantisation may occur at the first attention layer
The storage benefits from quantizing the embedded lookup are usually minimal

Giuseppe5 · 2024-08-20T09:18:32Z

src/brevitas_examples/llm/llm_quant/prepare_for_quantize.py

+
+
+@torch.no_grad()
+def add_zero_bias_to_linear(model: torch.nn.Module) -> torch.nn.Module:


Is this here for loading checkpoint + bias correction?
We have a context manager for that now (load_quant_model in graph/calibrate)

Ah no it's for accelerate compatibility, nevermind.

No, this is to make bias correction work with accelerate properly. Can this also be handled by the context manager you mentioned?

Giuseppe5 · 2024-08-20T09:20:32Z

src/brevitas_examples/llm/llm_quant/bias_corr.py


 from brevitas.graph.calibrate import bias_correction_mode


 @torch.no_grad()
 def apply_bias_correction(model, dataloader):
    with bias_correction_mode(model):
-        for inps in dataloader:
+        for inps in tqdm(dataloader):


I think we need to add tqdm as required dependency (in examples requirements but at this point maybe everywhere)

Giuseppe5 · 2024-08-20T09:36:42Z

src/brevitas_examples/llm/main.py

+    default=None,
+    help="Filename to save checkpoint. If `None`, no checkpoint is saved (default: %(default)s)")
+add_bool_arg(
+    parser, 'use-ocp', default=False, help='Use OCP format for float quantization. Default: False')


Let's merge this and then I'll update the entrypoint here same style as the stable diffusion one in #971

Giuseppe5 · 2024-08-20T09:39:31Z

Save for the tqdm requirements, LGTM

nickfraser · 2024-08-20T13:26:10Z

There currently is no requirements file for the LLM example. I'm adding one in #1002. I'll add tdqm to the dependencies there.

nickfraser added the next release PRs which should be merged for the next release label Aug 14, 2024

nickfraser self-assigned this Aug 15, 2024

nickfraser added 9 commits August 15, 2024 16:26

Feat (example/llm): Added fnuz/ocp args

622191d

Docs (example/llm): typo fix in args description

dbf5c03

Feat (example/llm): Add zero bias to linear layers when doing bias co…

b1cd44e

…rrection.

Fix (example/llm): Remove unnecessary forward pass

c7ba139

Feat (example/llm): Leveraged data utils from optimum-amd integration

091d17b

Feat (example/llm): Load KV Cache to correct dtype

ee53702

Feat (example/llm): Added progress bar to bias correction

e38ee40

Fix (example/llm): Fix formatting.

30bdf06

feat (example/llm): Switched ln_affine_merge to use HF's tracer

2030000

nickfraser force-pushed the llm_entrypoint_update branch from d530ab6 to 2030000 Compare August 15, 2024 15:54

nickfraser marked this pull request as ready for review August 15, 2024 15:54

nickfraser added 2 commits August 15, 2024 18:01

feat (example/llm): decompose quantize_model into component parts.

0e03623

Fix (example/llm): Assert that TorchQCDQ export & Eval aren't both en…

2b6763f

…abled.

nickfraser requested a review from Giuseppe5 August 16, 2024 14:47

nickfraser added 2 commits August 19, 2024 11:41

feat (example/llm): Added option not to quantize the last linear layer

aea3b4e

Fix precommit

b46a001

Fix (example/llm): disable embedded lookup quantization

8b18edc

nickfraser requested review from Giuseppe5 and removed request for Giuseppe5 August 19, 2024 12:45

nickfraser mentioned this pull request Aug 19, 2024

Test (example/llm): Refactor and add basic tests for the LLM entry-point #1002

Merged

30 tasks

Giuseppe5 reviewed Aug 20, 2024

View reviewed changes

Giuseppe5 approved these changes Aug 20, 2024

View reviewed changes

nickfraser merged commit b9eecf7 into Xilinx:dev Aug 20, 2024
337 checks passed

nickfraser deleted the llm_entrypoint_update branch August 20, 2024 13:39

nickfraser mentioned this pull request Aug 20, 2024

Update entrypoint for LLM #889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Update LLM entry-point #987

Feat: Update LLM entry-point #987

nickfraser commented Jul 16, 2024 •

edited

Loading

nickfraser commented Aug 19, 2024

Giuseppe5 Aug 20, 2024

Giuseppe5 Aug 20, 2024

nickfraser Aug 20, 2024

Giuseppe5 Aug 20, 2024

Giuseppe5 Aug 20, 2024

Giuseppe5 commented Aug 20, 2024

nickfraser commented Aug 20, 2024 •

edited

Loading



		@torch.no_grad()
		def add_zero_bias_to_linear(model: torch.nn.Module) -> torch.nn.Module:

Feat: Update LLM entry-point #987

Feat: Update LLM entry-point #987

Conversation

nickfraser commented Jul 16, 2024 • edited Loading

nickfraser commented Aug 19, 2024

Giuseppe5 Aug 20, 2024

Choose a reason for hiding this comment

Giuseppe5 Aug 20, 2024

Choose a reason for hiding this comment

nickfraser Aug 20, 2024

Choose a reason for hiding this comment

Giuseppe5 Aug 20, 2024

Choose a reason for hiding this comment

Giuseppe5 Aug 20, 2024

Choose a reason for hiding this comment

Giuseppe5 commented Aug 20, 2024

nickfraser commented Aug 20, 2024 • edited Loading

nickfraser commented Jul 16, 2024 •

edited

Loading

nickfraser commented Aug 20, 2024 •

edited

Loading