Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if websocket connection can't be made #15261

Merged
merged 25 commits into from
Sep 26, 2024

Conversation

jakekaplan
Copy link
Contributor

@jakekaplan jakekaplan commented Sep 6, 2024

With the introduction of client-side orchestration, Events have become more critical to client side machinery of prefect.

Previously, if a client could not establish a websocket connection, due to the EventsWorker being an auxiliary QueueService, we would log individual failures when sending events (e.g., Service 'EventsWorker' failed with 1 pending item.) without raising an error that would crash the process or an informative warning.

Some users may not have noticed this was happening, but they are missing out on critical functionality of prefect. This PR ensures the EventsClient logs a warning when it cannot establish a websocket connection. This important to surface as exceptions from the EventsWorker thread do will not propagate to the main thread and the user may be unaware there is a problem.

Screenshot 2024-09-22 at 3 50 47 PM

Note: this PR also makes adjustments to fix some deprecated fields with the release of websockets 13.1: https://websockets.readthedocs.io/en/stable/project/changelog.html#backwards-incompatible-changes

Copy link

codspeed-hq bot commented Sep 6, 2024

CodSpeed Performance Report

Merging #15261 will not alter performance

Comparing ensure-websocket-connection-can-be-made (74e3503) with main (23262eb)

Summary

✅ 3 untouched benchmarks

@jakekaplan jakekaplan force-pushed the ensure-websocket-connection-can-be-made branch from 92c4385 to 73687fd Compare September 6, 2024 15:58
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me; I requested a name change on the methods but otherwise curious for @chrisguidry 's review

src/prefect/events/clients.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@chrisguidry chrisguidry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach makes sense to me. It's stronger than I was originally thinking, where we'll actually raise an exception and stop execution. I was initially thinking we'd put this check in TaskRunContext, so that folks who aren't even using tasks won't have to worry about it. Honestly, though, events are so core to Prefect now that we should probably ensure that we can emit them.

If this does cause folks irreconcilable trouble (like they are in environments where they just literally can't use websockets for some reason, which would be surprising), we can look into making an events client that uses the REST APIs to POST batches of events (like the way APILogWorker does it).

src/prefect/events/clients.py Outdated Show resolved Hide resolved
@jakekaplan jakekaplan force-pushed the ensure-websocket-connection-can-be-made branch from 61a2722 to 73687fd Compare September 6, 2024 21:45
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense!

My only worry is that some users will be frustrated by not being able to do anything until we fix the issue, and ultimately even without events workflows can still execute properly. So what I think we should do is make a setting that defaults to raising this error but can be toggled off as an escape hatch, and we should include a message about this setting in the exception that the utilities raise.

@jakekaplan
Copy link
Contributor Author

jakekaplan commented Sep 22, 2024

@cicdw revisiting this pr after a little bit. I've come around that is likely too strict to error as the default behavior. I've updated this PR to log a an informative warning (which also allows for a much simpler implementation since we don't need to error out the main thread anymore).

I've updated the PR description with an example of the warning being logged. Would appreciate another look if you get a chance.

@jakekaplan jakekaplan requested a review from cicdw September 22, 2024 20:02
@jakekaplan jakekaplan changed the title ensure websocket connection can be made Warn if websocket connection can't be made Sep 22, 2024
@jakekaplan jakekaplan marked this pull request as ready for review September 22, 2024 20:10
Copy link
Collaborator

@zzstoatzz zzstoatzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@jakekaplan jakekaplan enabled auto-merge (squash) September 25, 2024 19:57
@jakekaplan
Copy link
Contributor Author

I think I'll need an approval from @cicdw since changes were originally requested on the original error out implementation

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the tag! I added a note about how the exception is displayed as the "Reason", up to you which you think is more helpful.

"Reason: %s. "
"Set PREFECT_DEBUG_MODE=1 to see the full error.",
self._events_socket_url,
str(e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker - it depends entirely on how you want this displayed - but str(e) hides the exception type whereas repr(e) preserves it; totally up to you which you think is better for this situation, e.g.,

str(ValueError("foo")) # 'foo'
repr(ValueError("foo")) # "ValueError('foo')"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, I had enabled auto-merge here. I think repr is probably more correct, let me open another pr to update

@jakekaplan jakekaplan merged commit 953b733 into main Sep 26, 2024
30 checks passed
@jakekaplan jakekaplan deleted the ensure-websocket-connection-can-be-made branch September 26, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants