-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONSVC-2052] feat: add timeout handling for query tasks #112
Conversation
I haven't read the code yet but does this mean that completed providers will be returned in the response? |
Correct. |
lookups: list[Task] = [] | ||
for p in search_from: | ||
task = metrics_client.timeit_task( | ||
p.query(srequest), f"providers.{p.name}.query" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So technically we are cancelling this task when we cancel the pending tasks upon timeout. does task cancellation propagate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the wait timeout occurs, we're cancelling all the pending tasks (one per each provider) one-by-one here. So there will no cancellation propagation involved here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense, I have another couple questions.
- the aiodogstatsd library uses a
done_callback
to record the timing of the underyling coroutine. does the done callback get called when a task is cancelled? - according to the
Task
docs callingTask.cancel
should throw aCancelledError
exception but i'm not seeing any exception catching. Whats going on there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the aiodogstatsd library uses a done_callback to record the timing of the underyling coroutine. does the done callback get called when a task is cancelled?
Yes, it will be called as usual, meaning that it will time a cancelled task in this case. I thought about clearing the done_callback before it gets cancelled, but then figured we might want to keep it for completeness. I am open to other options, too.
according to the Task docs calling Task.cancel should throw a CancelledError exception but i'm not seeing any exception catching. Whats going on there?
We only call cancel on those pending tasks, which should be fine and no CancelledError
will be raised. Are you concerning about calling cancel but somehow the task is completed at that point? In that case, it might raise I think we can add a check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you concerning about calling cancel but somehow the task is completed at that point?
Not particularly, the docs just make it seem like CancelledError
s are automatically thrown when you call cancel
on a Task
instance regardless of its completion status.
https://docs.python.org/3.11/library/asyncio-task.html#asyncio.Task.cancel
This arranges for a CancelledError exception to be thrown into the wrapped coroutine on the next cycle of the event loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. I think it will raise a CancelledError
if you await it or call task.result()
once it gets cancelled, but neither is the case here as it just cancels them and returns them as a separate task list along with the done tasks. If the consumer decides to use them regardless, the yeah CancelledError
will be thrown.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks 😅
@ncloudioj I've merged #106, I apologize for the conflicts this creates. I'll note two things:
|
Yep, sounds good, will do.
If you don't strongly disagree, I'd like to keep it in a separate test file as I am almost certain that we will add more related functionalities and test cases in the future, which might mess up those generic tests in |
9c7cfd4
to
31ae51b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things LGTM on the test side.
This adds a query handling timeout for the
suggest
endpoint. The rationale is that since Firefox uses a timeout (currently 200ms) for each query request to Merino, Merino should do the same to ensure slow providers would not prevent Merino from serving suggestions from other providers. When timeout occurs, Merino will cancel all ongoing query tasks and log/record metrics accordingly.Note that:
asyncio.gather()
also supports timeout, but it doesn't provide fine-grained control on timeout handling. Therefore, I ended up using the asyncio wait primitives. The benchmark shows that the overhead is negligible compared toasyncio.gather()
This fixes CONSVC-2052.