Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes worker use none for _request_timeout #16841

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jeanluciano
Copy link
Contributor

@jeanluciano jeanluciano commented Jan 23, 2025

Overrides the default total of the ClientTimeout that gets passed to list_namespace_job. This avoids timeouts where no event gets sent for more than the default duration of ClientTimeout ,300s.

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

@jeanluciano jeanluciano changed the title Kubernetes worker ` Kubernetes worker use none for _request_timeout Jan 23, 2025
@jeanluciano jeanluciano marked this pull request as ready for review January 23, 2025 21:20
@zzstoatzz zzstoatzz marked this pull request as draft January 23, 2025 21:51
@zzstoatzz
Copy link
Collaborator

moving to draft for now until we can articulate the need for this given #15744

cc @kevingrismore

…rnetes-worker' of https://github.com/PrefectHQ/prefect into jean/oss-5995-response-payload-is-not-completed-in-kubernetes-worker
@kevingrismore
Copy link
Contributor

We should be ok removing all the ClientTimeout instances we're passing to _request_timeout with the bumped dependency. I would verify by doing the following:
On the updated async k8s package and all ClientTimeouts removed,

  • Try to start a flow run with an impossibly large CPU request and a job start timeout of 10+ minutes. Ensure the timeout is enforced as intended rather than the connection dropping
  • Run a flow that doesn't log for 10+ minutes and ensure the connection doesn't drop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants