Help! Want a toy example to run matmul with q40 weight by cuda kernel #9435

Closed Answered by slaren

Eutenacity asked this question in Q&A

Eutenacity
Sep 11, 2024

Sorry, i am not familiar with the library, I want to run a matmul between a tensor created by pytorch and the q40 weight read from gguf.
I can read the weight from gguf and convert it to pytorch tensor.
But I have no idea to run the matmul between a tensor created by pytorch and the q40 weight by cuda kernel.

Answered by slaren

Some resources:
https://huggingface.co/blog/introduction-to-ggml
https://github.com/ggerganov/ggml/blob/master/examples/simple/simple-backend.cpp

View full answer

Replies: 1 comment 1 reply

slaren
Sep 11, 2024
Collaborator

Some resources:
https://huggingface.co/blog/introduction-to-ggml
https://github.com/ggerganov/ggml/blob/master/examples/simple/simple-backend.cpp

1 reply

Eutenacity Sep 13, 2024
Author

Is it possible to turn ggml_tensor to pytorch tensor on gpu without copy?

Answer selected by Eutenacity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment