-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn if websocket connection can't be made #15261
Conversation
CodSpeed Performance ReportMerging #15261 will not alter performanceComparing Summary
|
92c4385
to
73687fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me; I requested a name change on the methods but otherwise curious for @chrisguidry 's review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes sense to me. It's stronger than I was originally thinking, where we'll actually raise an exception and stop execution. I was initially thinking we'd put this check in TaskRunContext
, so that folks who aren't even using tasks won't have to worry about it. Honestly, though, events are so core to Prefect now that we should probably ensure that we can emit them.
If this does cause folks irreconcilable trouble (like they are in environments where they just literally can't use websockets for some reason, which would be surprising), we can look into making an events client that uses the REST APIs to POST
batches of events (like the way APILogWorker
does it).
61a2722
to
73687fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense!
My only worry is that some users will be frustrated by not being able to do anything until we fix the issue, and ultimately even without events workflows can still execute properly. So what I think we should do is make a setting that defaults to raising this error but can be toggled off as an escape hatch, and we should include a message about this setting in the exception that the utilities raise.
@cicdw revisiting this pr after a little bit. I've come around that is likely too strict to error as the default behavior. I've updated this PR to log a an informative warning (which also allows for a much simpler implementation since we don't need to error out the main thread anymore). I've updated the PR description with an example of the warning being logged. Would appreciate another look if you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
I think I'll need an approval from @cicdw since changes were originally requested on the original error out implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the tag! I added a note about how the exception is displayed as the "Reason", up to you which you think is more helpful.
"Reason: %s. " | ||
"Set PREFECT_DEBUG_MODE=1 to see the full error.", | ||
self._events_socket_url, | ||
str(e), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker - it depends entirely on how you want this displayed - but str(e)
hides the exception type whereas repr(e)
preserves it; totally up to you which you think is better for this situation, e.g.,
str(ValueError("foo")) # 'foo'
repr(ValueError("foo")) # "ValueError('foo')"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault, I had enabled auto-merge here. I think repr
is probably more correct, let me open another pr to update
With the introduction of client-side orchestration,
Events
have become more critical to client side machinery of prefect.Previously, if a client could not establish a websocket connection, due to the EventsWorker being an auxiliary
QueueService
, we would log individual failures when sending events (e.g.,Service 'EventsWorker' failed with 1 pending item.
) without raising an error that would crash the process or an informative warning.Some users may not have noticed this was happening, but they are missing out on critical functionality of prefect. This PR ensures the
EventsClient
logs a warning when it cannot establish a websocket connection. This important to surface as exceptions from theEventsWorker
thread do will not propagate to the main thread and the user may be unaware there is a problem.Note: this PR also makes adjustments to fix some deprecated fields with the release of websockets 13.1: https://websockets.readthedocs.io/en/stable/project/changelog.html#backwards-incompatible-changes