[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic · 2025-01-12T18:47:05Z

SUMMARY:

VLLM V1 design minimizes number of python loops over all items in the batch for performance. As we add metrics and logging, we need to loop over all items in the batch another time
This PR renames Detokenizer >> OutputProcessor.
- All functionality that need to touch each item should implement XXXClass.update_from_output + be called in OutputProcessor.process_outputs loop.
- Moves self._process_request_outputs into this loop (previously this was a separate loop in output_handler)
- AddIterationStats.update_from_output() to this loop
Add more testing to abort

NOTES:

Follow on to: [V1][Core][1/n] Logging and Metrics #11962
Previous experiments showed that having Detokenizer in a separate process hurts performance ([V1] [7/N] API Server: Multiprocessing Detokenizer [ DO NOT MERGE ] #11636). So feel confident that adding all of this to a single loop is the right approach.

Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-01-12T18:47:16Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2025-01-12T22:23:45Z

vllm/v1/engine/output_processor.py

+            queue=queue,
+        )
+
+


NOTE: this was previously called Detokenizer

robertgshaw2-neuralmagic · 2025-01-12T22:29:43Z

vllm/v1/engine/async_llm.py

@@ -59,9 +59,6 @@ def __init__(
            lora_config=vllm_config.lora_config)
        self.tokenizer.ping()

-        # Request streams (map of request_id -> queue).


NOTE: these queues are held in OutputProcessor

mgoin

Nice structure and comments, this LGTM. It would be nice to have a test that IterationStats gets updated within the OutputProcessor

rickyyx · 2025-01-12T22:53:34Z

vllm/v1/engine/output_processor.py

+        If you need to touch every element of the batch, implement a
+        method called XXXClass.update_from_output() to be called
+        within the loop below. For examples, see:
+            * IterationStats.update_from_output()


This is a great abstraction IMO.

nit: I wonder if we also want to make it more explicit by having something like a OutputHandler protocol that takes in the engine core output + maybe a current request state?

I will add a comment suggesting that we do this once we add RequestStats. I want to keep flexibility while we are in the development stage

rickyyx · 2025-01-13T00:19:42Z

vllm/v1/metrics/stats.py

+            return
+
+        self.num_generation_tokens += len(output.new_token_ids)
+        if is_prefilling:


I think one scenario that might potentially complicate things is when a request has a pretty long prompt, and its prefill actually will span multiple "iteration". With current architecture, the prompt throughput stats is actually not accurate.

I wonder if we should have a path where we propogate each scheduler iteration rather than engine core iteration to the front end process for more accurate stats.

We don't send outputs from EngineCore until it generates a token.

So from the POV of the Engine len(new_token_ids) > 0. We should add an assert for this invariant.

added comment and assert.

robertgshaw2-neuralmagic · 2025-01-13T02:36:38Z

@mgoin. Tests are added. Could I get an approval?

tests/v1/engine/test_async_llm.py

tests/v1/engine/test_output_processor.py

Signed-off-by: [email protected] <[email protected]>

robertgshaw2-neuralmagic added 13 commits January 11, 2025 22:06

added code

cfa8c2b

Signed-off-by: [email protected] <[email protected]>

fixed

6d8e4f3

fixed

c78a56f

updated

7b39705

updated

6e9cd1c

fixed

2657b7f

updated

249b9ff

refactoring metrics

c1f9292

updated

c641866

updated

1ce7a5f

Merge branch 'v1-metrics' into v1-metrics-2

c1baa6d

added output processor

f8de299

added all files

49ca9bb

robertgshaw2-neuralmagic requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-neuralmagic as code owners January 12, 2025 18:47

robertgshaw2-neuralmagic changed the title ~~[V1] [2/n] Logging and Metrics - Output Processor Abstraction~~ [V1] [2/n] Logging and Metrics - OutputProcessor Abstraction Jan 12, 2025

robertgshaw2-neuralmagic added 10 commits January 12, 2025 18:58

stash

86d33a1

working again

4066fc8

Merge branch 'v1-metrics' into v1-metrics-2

5ef374c

fixed sorting

c9ffc60

Merge branch 'main' into v1-metrics-2

5f3f3b7

merged

e34b9dc

reduce number of changes

dd6e3d6

reduce changes

dbd86b8

reduce changes

ebf3530

updared

7b6d9b3

robertgshaw2-neuralmagic commented Jan 12, 2025

View reviewed changes

vllm/v1/engine/output_processor.py

queue=queue,

)

Copy link

Collaborator Author

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: this was previously called Detokenizer

robertgshaw2-neuralmagic requested a review from russellb January 12, 2025 22:26

robertgshaw2-neuralmagic commented Jan 12, 2025

View reviewed changes

updared

b9683d1

mgoin reviewed Jan 12, 2025

View reviewed changes

stash

92c3b0c

rickyyx reviewed Jan 12, 2025

View reviewed changes

robertgshaw2-neuralmagic added 6 commits January 12, 2025 23:00

added logging and comment

a985a73

starting to fix tests - stash

6c36d87

updated tests

595fd12

make tests pass

5ecfe8e

reduce LOC changes

5f37918

updated

1d9b233

rickyyx reviewed Jan 13, 2025

View reviewed changes

robertgshaw2-neuralmagic added 10 commits January 13, 2025 01:26

added IterationStats test

2880962

codespell

7de7c00

add comment about invairant

eec573c

updated

0427e03

tweak

9b49133

formatting and added test

bffa5d0

passing

605c5f0

ruff ruff

d0013a4

format

e01d236

run isort

a53f089

mgoin approved these changes Jan 13, 2025

View reviewed changes

tests/v1/engine/test_async_llm.py Show resolved Hide resolved

tests/v1/engine/test_output_processor.py Outdated Show resolved Hide resolved

undo fat finger

3e45fc6

Signed-off-by: [email protected] <[email protected]>

robertgshaw2-neuralmagic enabled auto-merge (squash) January 13, 2025 03:00

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 13, 2025

robertgshaw2-neuralmagic merged commit 619ae26 into vllm-project:main Jan 13, 2025
61 of 64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic commented Jan 12, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

mgoin left a comment

rickyyx Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

rickyyx Jan 13, 2025

robertgshaw2-neuralmagic Jan 13, 2025 •

edited

Loading

robertgshaw2-neuralmagic Jan 13, 2025

robertgshaw2-neuralmagic commented Jan 13, 2025 •

edited

Loading

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction #11973

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction #11973

Conversation

robertgshaw2-neuralmagic commented Jan 12, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

rickyyx Jan 12, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

rickyyx Jan 13, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 13, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jan 13, 2025 • edited Loading

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic commented Jan 12, 2025 •

edited by github-actions bot

Loading

robertgshaw2-neuralmagic Jan 13, 2025 •

edited

Loading

robertgshaw2-neuralmagic commented Jan 13, 2025 •

edited

Loading