Feat (gptq): optimizing CPU to GPU memory transfer #1009

i-colbert · 2024-08-26T23:55:37Z

No description provided.

nickfraser · 2024-08-27T12:02:08Z

I'm slightly worried about this messing with our interop with HuggingFace accelerate. Would you test this in a multi-GPU setup with accelerate? Easiest way is to use this: https://github.com/huggingface/optimum-amd/tree/main/examples/quantization/brevitas

nickfraser · 2024-09-12T10:56:16Z

I ran a small multi-GPU test with accelerate and this seems to work.

Feat (gptq): optimizing CPU to GPU memory transfer

059fdc1

i-colbert requested a review from Giuseppe5 August 26, 2024 23:55

Fix (gptq): pin_memory only available with CUDA

1f6432b

i-colbert requested review from Giuseppe5 and removed request for Giuseppe5 August 27, 2024 01:12

Giuseppe5 merged commit 10dcee3 into Xilinx:dev Sep 12, 2024
337 checks passed

i-colbert deleted the feat/gptq branch September 12, 2024 21:45

Provide feedback