[Bug] Recovering logic of a long evicted request is broken #163

masahi · 2024-01-17T22:07:48Z

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L385-L399

For streaming case, we cannot clamp the generated tokens and recompute them.
Moreover, since the clamping logic is done in the worker but not in the main process, the discrepancy arises between the main and the worker process. See #158 and #164.

We need to either

Require that generation never grows beyond max_num_batched_tokens
Split recovering of such requires into multiple batch, using the new evaluate_multi_query function from Add new Relax function to the batched model for evaluating query tokens over multiple time steps in parallel #156

@elvin-n @sunggg

The text was updated successfully, but these errors were encountered:

…/ Update config name (octoml#163) This PR updates three places for better experience. * Unify the `--model-path` and `--model` args in build.py. Now we only take `--model`. * Hardcode the rotary embedding size for LLaMA to 2048. This enables us to build a model with different max sequence length without changing the built weights. * Update the generated config file name to `mlc-chat-config.json`.

masahi · 2024-02-01T12:00:56Z

@elvin-n After #157 lands, you can follow a similar strategy to use multiple EvalMultiQueryRequest to split restoring of a long request into several batches, each of which fits into max_num_batched_tokens.

masahi added the bug Something isn't working label Jan 17, 2024

This was referenced Feb 1, 2024

Parallel sampling eviction #157

Merged

[Bug] Fix broken test for cache eviction with staging engine #164

Closed

masahi changed the title ~~[Bug] Recovering logic of a long evicted request is broken for streaming case~~ [Bug] Recovering logic of a long evicted request is broken Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Recovering logic of a long evicted request is broken #163

[Bug] Recovering logic of a long evicted request is broken #163

masahi commented Jan 17, 2024 •

edited

Loading

masahi commented Feb 1, 2024

[Bug] Recovering logic of a long evicted request is broken #163

[Bug] Recovering logic of a long evicted request is broken #163

Comments

masahi commented Jan 17, 2024 • edited Loading

masahi commented Feb 1, 2024

masahi commented Jan 17, 2024 •

edited

Loading