Added AutoGPTQ readme and test #1147

HolyFalafel · 2024-07-21T14:09:36Z

Added AutoGPTQ UINT4 to README.md (https://github.com/huggingface/optimum-habana/pull/279[)](https://github.com/huggingface/optimum-habana/commit/1e3ccc0fa5afae12cceea45868d8647ef9e48d33)

Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ

(#305)

Added gptq to test_text_gen (https://github.com/huggingface/optimum-habana/pull/307[)](https://github.com/huggingface/optimum-habana/commit/a9f4b5496cbfeb3f1c51ce2b0402775eb44cb62f)

* Added AutoGPTQ UINT4 to README.md * Weight only quantization

…dme, in AutoGPTQ (#305) * Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ * Update README.md with temp solution remark * Update README.md

libinta · 2024-07-22T15:26:53Z

examples/text-generation/README.md

+
+Llama2-7b in UINT4 weight only quantization is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch.
+Currently, the support is for UINT4 inference of pre-quantized models only.
+


what does the pre-quantized model mean? By which process to pre-quantized it and with which precision?

It means that we currently don't support the quantization process. but only support loading an existing quantized model

MrGeva

DO NOT merge it yet, need to get approval first

vidyasiv · 2024-08-02T21:45:49Z

@HolyFalafel , is this PR ready for v1.17? Please resolve conflicts against latest main.

vidyasiv · 2024-08-02T21:46:53Z

DO NOT merge it yet, need to get approval first

@MrGeva Can you clarify if this is targeted for v1.17?

emascarenhas · 2024-09-03T14:54:11Z

Please sync your PR with main/upstream and fix any merge conflicts. Thank you.

yuanwu2017 · 2024-09-06T07:28:59Z

@HolyFalafel @libinta
Will this patch be merged? At present, Gaudi has been integrated into AutoGPTQ. But I didn't see OH to install this package in this patch. I made a patch to integrate the AutoGPTQ in tgi-gaudi and submitted a patch for optimum. It works. Can you add AutoGPTQ package installation in this patch?

emascarenhas · 2024-09-06T14:23:52Z

Can you rebase and revise your patch if necessary, so that it merges cleanly with main?

Also do "pip install -U ruff; make style" and check for any issues.

In addition, please run "tests/ci/fast_tests.sh" after installing
pip install -e ".[tests]"
and any slow tests, e.g.,
GAUDI2_CI=1 RUN_SLOW=1 python -m pytest
tests/test_text_generation_example.py -v -s -k
and report the results here.

imangohari1 · 2024-09-10T19:23:02Z

Can you rebase and revise your patch if necessary, so that it merges cleanly with main?

Also do "pip install -U ruff; make style" and check for any issues.

In addition, please run "tests/ci/fast_tests.sh" after installing pip install -e ".[tests]" and any slow tests, e.g., GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -s -k and report the results here.

@HolyFalafel
Thanks for this draft PR.
what is the status of this, if this is still needed: Can you rebase, synch with top of main habana and follow the instructions to test it?
if not needed, please close.
thanks.

HolyFalafel added 3 commits July 21, 2024 17:05

Added AutoGPTQ UINT4 to README.md (#279)

1e3ccc0

* Added AutoGPTQ UINT4 to README.md * Weight only quantization

Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the rea…

6790dd2

…dme, in AutoGPTQ (#305) * Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ * Update README.md with temp solution remark * Update README.md

Added gptq to test_text_gen (#307)

a9f4b54

HolyFalafel requested a review from regisss as a code owner July 21, 2024 14:09

libinta added the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Jul 22, 2024

libinta reviewed Jul 22, 2024

View reviewed changes

MrGeva reviewed Jul 23, 2024

View reviewed changes

HolyFalafel marked this pull request as draft July 23, 2024 15:05

libinta removed the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Aug 5, 2024

libinta added the synapse 1.18 dependency label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added AutoGPTQ readme and test #1147

Added AutoGPTQ readme and test #1147

HolyFalafel commented Jul 21, 2024

libinta Jul 22, 2024

HolyFalafel Jul 23, 2024

MrGeva left a comment

vidyasiv commented Aug 2, 2024

vidyasiv commented Aug 2, 2024

emascarenhas commented Sep 3, 2024

yuanwu2017 commented Sep 6, 2024

emascarenhas commented Sep 6, 2024

imangohari1 commented Sep 10, 2024


		Llama2-7b in UINT4 weight only quantization is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch.
		Currently, the support is for UINT4 inference of pre-quantized models only.

Added AutoGPTQ readme and test #1147

Are you sure you want to change the base?

Added AutoGPTQ readme and test #1147

Conversation

HolyFalafel commented Jul 21, 2024

libinta Jul 22, 2024

Choose a reason for hiding this comment

HolyFalafel Jul 23, 2024

Choose a reason for hiding this comment

MrGeva left a comment

Choose a reason for hiding this comment

vidyasiv commented Aug 2, 2024

vidyasiv commented Aug 2, 2024

emascarenhas commented Sep 3, 2024

yuanwu2017 commented Sep 6, 2024

emascarenhas commented Sep 6, 2024

imangohari1 commented Sep 10, 2024