-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast PA teardown #674
base: main
Are you sure you want to change the base?
Fast PA teardown #674
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the future, can you suggest an order of files to review along with some line-specific PR comments explaining at a high level what the changes are for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One high-level thing that crossed my mind is that: do we want to just drop requests like this PR implements? Maybe we can instead add a print statement explaining that profiling is complete, but that PA is waiting for unfinished requests and that user can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Longer term what we need is a proper way to cancel requests. Short term, this is holding up some GenAI-Perf scripting, where they are running with different request-rates in a bash script. ^C isn't going to help them. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -489,6 +489,14 @@ class NaggyMockClientBackend : public ClientBackend { | |
}); | ||
} | ||
|
||
~NaggyMockClientBackend() | ||
{ | ||
// Make sure no requests carry over to the next test | ||
while (stats_->num_active_infer_calls) { | ||
std::this_thread::sleep_for(std::chrono::milliseconds(1)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want a configurable timeout? Otherwise, it's going to be hard to know why the test times out for someone seeing it fail for that reason. |
||
} | ||
} | ||
|
||
MOCK_METHOD( | ||
Error, ModelConfig, | ||
(rapidjson::Document*, const std::string&, const std::string&), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first run is 1m54s:
The second run is 24s.
Is the additional time for the first run mainly after the last pass? Or is it distributed among the passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was 100% after the final results were printed. Old behavior is that it just sat there