-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time to first byte? #592
Comments
Can you provide some examples of how you’re using SSE and what metrics you’d want to measure? What technologies are you using? |
Hmmm, I can try and get more specific if you need, but one example would be load testing an API like ChatGPT, which uses SSE so that you can start to see the response streaming back as it is generated, rather than simply staring at a blank page for a long time before the entire response is complete. In these types of use cases, time-to-first-token (essentially time-to-first-byte) is the interesting metric, as that represents the latency between asking a query and when the user can begin to receive a response. This metric is often what dictates how responsive a streaming LLM API feels to a user. https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices says more about this if you go to the "Important Metrics for LLM Serving" heading.
So, the question then is: can Does that help clarify? |
That’s very helpful, yes. I’ll find some time to test and see what can be done. I expect it will take some code changes/additions to be useful. |
Suppose we want to load-test an API which uses server-sent events (SSE). Is it possible to measure the time-to-first-byte using Goose?
The text was updated successfully, but these errors were encountered: