Optimize Int8 Woq for CPU #161

yanbing-j · 2024-04-23T06:10:48Z

This PR is to optimize Int8 Woq both in gpt-fast and mixtral-moe.

At the current stage, we use torch.ops.aten._weight_int8pack_mm as an workaround. And this workaround will be removed when pytorch/pytorch#120985 is merged in PyTorch stable release. Meanwhile, update int8 weight dimension according to torch.ops.aten._weight_int8pack_mm in pytorch/pytorch#118056 and add CPU profiling.

update int4 weight dim Add CPU profiling

yanbing-j · 2024-04-23T06:13:59Z

@HDCharles could you please take a look? Thanks!

yanbing-j · 2024-05-07T05:26:22Z

Hi @yanboliang , could you please take a look? Thanks!

Add int8 Woq for CPU

1e51c00

update int4 weight dim Add CPU profiling

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Int8 Woq for CPU #161

Optimize Int8 Woq for CPU #161

yanbing-j commented Apr 23, 2024

yanbing-j commented Apr 23, 2024

yanbing-j commented May 7, 2024

Optimize Int8 Woq for CPU #161

Are you sure you want to change the base?

Optimize Int8 Woq for CPU #161

Conversation

yanbing-j commented Apr 23, 2024

yanbing-j commented Apr 23, 2024

yanbing-j commented May 7, 2024