Why use Q6_K for attn_v and ffn_down weights in Q4_K_L models? #9992

CHNtentes · 2024-10-22T03:17:08Z

CHNtentes
Oct 22, 2024

As the title suggests, all weights (except layer norms) are quantized to Q4_K, but attn_v and ffn_down are Q6_K.

What is the reason to keep these two weights at higher accuracy?