Skip to content

Commit

Permalink
fix(llama.cpp): enable cont batching when parallel is set (mudler#1622)
Browse files Browse the repository at this point in the history
Signed-off-by: Ettore Di Giacinto <[email protected]>
  • Loading branch information
mudler authored Jan 21, 2024
1 parent 94261b1 commit 697c769
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion backend/cpp/llama/grpc-server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2465,10 +2465,10 @@ static void params_parse(const backend::ModelOptions* request,
const char *env_parallel = std::getenv("LLAMACPP_PARALLEL");
if (env_parallel != NULL) {
params.n_parallel = std::stoi(env_parallel);
params.cont_batching = true;
} else {
params.n_parallel = 1;
}
params.cont_batching = true;
// TODO: Add yarn

if (!request->tensorsplit().empty()) {
Expand Down

0 comments on commit 697c769

Please sign in to comment.