Skip to content

Why is vLLM CPU backend using oneDNN kernels? #10694

Answered by bigPYJ1151
sanketkaleoss asked this question in Q&A
Discussion options

You must be logged in to vote

There are two components using oneDNN.

  • nn.linear on CPU is using oneDNN by default.
  • vllm int8 models require INT8 GEMM kernel, the cuda backend is based on cutlass and the cpu backend is based on oneDNN.

Replies: 1 comment 8 replies

Comment options

You must be logged in to vote
8 replies
@bigPYJ1151
Comment options

@sanketkaleoss
Comment options

@bigPYJ1151
Comment options

@amd-lalithnc
Comment options

@sanketkaleoss
Comment options

Answer selected by sanketkaleoss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants