[stateless-llm]Query num_key_value_heads when available to support GQA models. #388

raikonenfnu · 2024-02-02T01:15:10Z

Models with GQA implemented would not have the same number of heads for K,V vs Query. Hence we need to query num_key_value_heads attribute to see if we require different value for KV cache size.

Not all model has num_key_value_heads as a part of their config such as QWEN. Phi has it, but it is set to null, hence the code is structured this way to handle those cases.

IanNod

Looks good to me

Models with GQA implemented would not have the same number of heads for K,V vs Query. Hence we need to query `num_key_value_heads` attribute to see if we require different value for KV cache size. Not all model has `num_key_value_heads` as a part of their config such as QWEN. Phi has it, but it is set to null, hence the code is structured this way to handle those cases.

raikonenfnu requested review from dan-garvey and IanNod February 2, 2024 01:15

IanNod approved these changes Feb 2, 2024

View reviewed changes

raikonenfnu force-pushed the gqaSupport branch from bc603f8 to c3bbed8 Compare February 6, 2024 17:50

IanNod merged commit 4c42076 into main Feb 6, 2024
4 checks passed

IanNod deleted the gqaSupport branch February 6, 2024 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stateless-llm]Query num_key_value_heads when available to support GQA models. #388

[stateless-llm]Query num_key_value_heads when available to support GQA models. #388

raikonenfnu commented Feb 2, 2024

IanNod left a comment

[stateless-llm]Query num_key_value_heads when available to support GQA models. #388

[stateless-llm]Query num_key_value_heads when available to support GQA models. #388

Conversation

raikonenfnu commented Feb 2, 2024

IanNod left a comment

Choose a reason for hiding this comment