Skip to content

what is the purpose of ggml_half2 dm in quantization structure? #887

Answered by ggerganov
PenutChen asked this question in Q&A
Discussion options

You must be logged in to vote

It's used in GPU code that supports half2 ops. For example in CUDA:

const half2 d4d8_m4s8 = K_q4_1[ib].dm * Q_ds[k_KQ_0/WARP_SIZE];

Since this is a union, the dm member is basically an alias for the d and dmin factors

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@PenutChen
Comment options

Answer selected by PenutChen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants