Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support microsoft contrib ONNX operator MatMulNBits #3390

Open
hgaspar opened this issue Aug 19, 2024 · 0 comments
Open

Support microsoft contrib ONNX operator MatMulNBits #3390

hgaspar opened this issue Aug 19, 2024 · 0 comments
Labels

Comments

@hgaspar
Copy link

hgaspar commented Aug 19, 2024

Such an operator appears in LLM models quantized to int4 (also with GroupQueryAttention nodes), via the genai tool.

Only N=4 needs to be supported in near term (i.e. 4 bits)

For reference, see the operator description in:

https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits

Here, supports means:

  1. Ability to parse models that contain it.
  2. Implement it via known operators.
@hgaspar hgaspar added the UAI label Aug 19, 2024
@gyulaz-htec gyulaz-htec changed the title Support microsoft contib ONNX operator MatMulNBits Support microsoft contrib ONNX operator MatMulNBits Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant