-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AutoGPTQ readme and test #1147
base: main
Are you sure you want to change the base?
Conversation
* Added AutoGPTQ UINT4 to README.md * Weight only quantization
…dme, in AutoGPTQ (#305) * Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ * Update README.md with temp solution remark * Update README.md
|
||
Llama2-7b in UINT4 weight only quantization is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch. | ||
Currently, the support is for UINT4 inference of pre-quantized models only. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the pre-quantized model mean? By which process to pre-quantized it and with which precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that we currently don't support the quantization process. but only support loading an existing quantized model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DO NOT merge it yet, need to get approval first
@HolyFalafel , is this PR ready for v1.17? Please resolve conflicts against latest main. |
@MrGeva Can you clarify if this is targeted for v1.17? |
Please sync your PR with main/upstream and fix any merge conflicts. Thank you. |
@HolyFalafel @libinta |
Can you rebase and revise your patch if necessary, so that it merges cleanly with main? Also do "pip install -U ruff; make style" and check for any issues. In addition, please run "tests/ci/fast_tests.sh" after installing |
@HolyFalafel |
Added AutoGPTQ UINT4 to README.md (https://github.com/huggingface/optimum-habana/pull/279[)](https://github.com/huggingface/optimum-habana/commit/1e3ccc0fa5afae12cceea45868d8647ef9e48d33)
Added SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED envar to the readme, in AutoGPTQ
(#305)
Added gptq to test_text_gen (https://github.com/huggingface/optimum-habana/pull/307[)](https://github.com/huggingface/optimum-habana/commit/a9f4b5496cbfeb3f1c51ce2b0402775eb44cb62f)