Load an AWQ model via python API #1849

GhostXu11 · 2024-12-03T16:14:00Z

Hi, I recently wanted to use litserve to deploy a service and use litgpt to load the model. However, due to video memory reasons, I wanted to use an AWQ model. However, I did not see the method of loading the awq model in the python API document. Do you have any methods?

Andrei-Aksionov · 2024-12-03T16:47:18Z

If you want to quantize an original model to lower the VRAM consumption, you can use BitsandBytes quantization that does it on-the-fly: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/quantize.md

But if you have weights in AWQ format and want to load them to LItGPT - it's not supported.
There was an attempt to support AutoGPTQ (that should also support AWQ format) in #924, but it was never merged.
(Making that PR up-to-date might could be a cool contribution 😉).

GhostXu11 · 2024-12-04T01:54:35Z

thanks for your reply :)

GhostXu11 added the question Further information is requested label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load an AWQ model via python API #1849

Load an AWQ model via python API #1849

GhostXu11 commented Dec 3, 2024

Andrei-Aksionov commented Dec 3, 2024

GhostXu11 commented Dec 4, 2024

Load an AWQ model via python API #1849

Load an AWQ model via python API #1849

Comments

GhostXu11 commented Dec 3, 2024

Andrei-Aksionov commented Dec 3, 2024

GhostXu11 commented Dec 4, 2024