[LLAMA_CPP] Enable batch size > 1 #905

vshampor · 2024-04-17T12:11:50Z

Passing tensors with batch size > 1 should now be working, with batch indices corresponding to independent prompts and generation sequences. The association of KV-cache entries with individual sequences is done internally inside the llama.cpp implementation level.

vshampor added 3 commits April 17, 2024 14:00

Add tests

412238b

Add implementation

e95ba3b

Apply comments from previous PR

81574ef

vshampor requested a review from a team as a code owner April 17, 2024 12:11

github-actions bot added the category: build OpenVINO cmake script / infra label Apr 17, 2024

ilya-lavrenov approved these changes Apr 18, 2024

View reviewed changes

ilya-lavrenov merged commit c4b3ef9 into openvinotoolkit:master Apr 18, 2024
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLAMA_CPP] Enable batch size > 1 #905

[LLAMA_CPP] Enable batch size > 1 #905

vshampor commented Apr 17, 2024

[LLAMA_CPP] Enable batch size > 1 #905

[LLAMA_CPP] Enable batch size > 1 #905

Conversation

vshampor commented Apr 17, 2024