Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SYS_PTRACE setting to Co-pilot container config. #5746

Closed
wants to merge 5 commits into from
Closed

Add SYS_PTRACE setting to Co-pilot container config. #5746

wants to merge 5 commits into from

Conversation

dylanspag-lmco
Copy link

@dylanspag-lmco dylanspag-lmco commented Sep 13, 2024

Tracking issue

Related to #5462

Why are the changes needed?

The issue #5462 describes the problem in detail.

What changes were proposed in this pull request?

I made SYS_PTRACE an optional setting for co-pilot containers.

How was this patch tested?

I added the default setting to the "happy" and "happy stow backend" Co-pilot Container tests. I also added a test to ensure the SYS_PTRACE capability is present when the setting is enabled.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

See #5556.

Copy link

welcome bot commented Sep 13, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@dylanspag-lmco dylanspag-lmco marked this pull request as draft September 13, 2024 21:46
@dylanspag-lmco dylanspag-lmco marked this pull request as ready for review September 13, 2024 21:47
@neverett
Copy link
Contributor

Pinging @davidmirror-ops for deployment docs changes.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 31.29%. Comparing base (7989209) to head (8a7ece0).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5746      +/-   ##
==========================================
- Coverage   36.29%   31.29%   -5.00%     
==========================================
  Files        1305      921     -384     
  Lines      109991    88051   -21940     
==========================================
- Hits        39918    27556   -12362     
+ Misses      65918    57691    -8227     
+ Partials     4155     2804    -1351     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.59% <ø> (ø)
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl ?
unittests-flyteidl 7.12% <ø> (ø)
unittests-flyteplugins ?
unittests-flytepropeller 41.87% <ø> (ø)
unittests-flytestdlib 55.21% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidmirror-ops
Copy link
Contributor

@dylanspag-lmco Thank you for contributing! I was wondering if this wouldn't be a breaking change once released? Shouldn't we default to opt-out?

@dylanspag-lmco
Copy link
Author

@davidmirror-ops I'm not sure of all of the downstream implications. The conversation in #5462 seemed to indicate removing it might be relatively low impact, which is why I had set the default as opt-in. Which would you prefer?

@davidmirror-ops
Copy link
Contributor

@dylanspag-lmco I think the goal is to make it configurable as opposed to hardcoded. Disabling it by default may break functionality for current users.

@dylanspag-lmco
Copy link
Author

dylanspag-lmco commented Sep 17, 2024

@davidmirror-ops makes sense. I just flipped the default behavior and updated the tests accordingly.

@davidmirror-ops
Copy link
Contributor

@dylanspag-lmco thanks so much for your effort so far. I've been talking to other maintainers about this change and we think the best path forward is to no accept it as it would break RawContainer functionality, the only instance where the copilot container is used in Flyte.
Flyte needs a way to determine that the user code finished execution and to capture the outcome (Success/Failure) and three mechanisms were envisioned to achieve that goal as described in the original flytecopilot design

  1. Poll the kube-api: it would require less process on the user code but a deeper investigation/benchmarking to determine the sane defaults to configure the kube-client and avoid overloading the kube-api server. Recent testing showed that this won't work, and it would require to spend time updating the code.
  2. Implement a file-based protocol. With this option, the user code had to include logic to drop the outputs to a certain directory and make the User code use a FileWatcher to determine the result of its own execution. Depending on the outcome (SUCCESS/FAILURE) it would need to write a different file, all of that before returning. Recent testing shows that it works butthere are potential performance implications going this route.Also, there is now wat to inspect that user code ran, just waiting for a file to drop.

Notice that for the two options above there is no way to activate them or select them, no flag.

  1. Share process namespace between copilot and the user code container (current default). With this mechanism, the copilot container downloads the inputs, watches the user code container for a result (using the shared process namespace and, hence, requiring the SYS_PTRACE capability), and copies the outputs, or updates flyteadmin, depending on whether the task succeeded or failed, respectively.

Considering this context, we cannot make SYS_PTRACE configurable because then we'd also have to disable RawContainer, and also there is no flag today to do so.
The closest acceptable change would be about creating that flag and exposing it on the Helm chart.

I hope the above explanation help in some way, and please let us know how we can help you with your current use case.

@dylanspag-lmco
Copy link
Author

@davidmirror-ops Thank you for explanation. I'll consult with my organization's flyte admins to see if there's anything we can do internally to get ContainerTasks working without disabling this.

@vlibov
Copy link

vlibov commented Nov 27, 2024

Considering this context, we cannot make SYS_PTRACE configurable because then we'd also have to disable RawContainer, and also there is no flag today to do so. The closest acceptable change would be about creating that flag and exposing it on the Helm chart.

What is the principal difference between the proposed changes in this PR and "exposing on the Helm chart"?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants