Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to detach an eBPF program #986

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

reyzell
Copy link
Contributor

@reyzell reyzell commented Jul 11, 2024

Our user-space software agent has, until now, relied on catching the SIGINT signal (i.e. Ctl+C) for the application to know when to exit. Exiting causes handles in Rust to go out of scope, which invokes Drop handlers and unloads eBPF programs and maps from the kernel. However, if any unhandled signal were to cause the application to exit un-gracefully, Drop handlers would not be invoked, and eBPF resources would remain loaded in the kernel. The number of loaded programs would build up over time as the application is restarted.

Gracefully handling more types of signals is a good approach, but is best effort. If the drop logic were to fail, or an unhandle-able signal were received, resources would be left behind. A robust approach is to perform cleanup on application startup, where previously loaded eBPF programs are identified and unloaded from the kernel.

To support that effort, this change adds the a public detach_program() function, which wraps the internal bpf_prog_detach().


This change is Reviewable

Our user-space software agent has, until now, relied on catching the `SIGINT`
signal (i.e. Ctl+C) for the application to know when to exit.  Exiting causes
handles in Rust to go out of scope, which invokes `Drop` handlers and unloads
eBPF programs and maps from the kernel.  However, if any unhandled signal were
to cause the application to exit un-gracefully, `Drop` handlers would not be
invoked, and eBPF resources would remain loaded in the kernel.  The number of
loaded programs would build up over time as the application is restarted.

Gracefully handling more types of signals is a good approach, but is best
effort.  If the drop logic were to fail, or an unhandle-able signal were
received, resources would be left behind.  A robust approach is to perform
cleanup on application startup, where previously loaded eBPF programs are
identified and unloaded from the kernel.

To support that effort, this change adds the a public `detach_program()`
function, which wraps the internal `bpf_prog_detach()`.
Copy link

netlify bot commented Jul 11, 2024

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 4d953dc
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/668f268059c73700088af8d1
😎 Deploy Preview https://deploy-preview-986--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mergify mergify bot added the aya This is about aya (userspace) label Jul 11, 2024
@dave-tucker
Copy link
Member

dave-tucker commented Jul 11, 2024

Our user-space software agent has, until now, relied on catching the SIGINT signal (i.e. Ctl+C) for the application to know when to exit. Exiting causes handles in Rust to go out of scope, which invokes Drop handlers and unloads eBPF programs and maps from the kernel. However, if any unhandled signal were to cause the application to exit un-gracefully, Drop handlers would not be invoked, and eBPF resources would remain loaded in the kernel. The number of loaded programs would build up over time as the application is restarted.

That holds true only in very specific circumstances. Any BPF syscall that returns an FD has them set CLOEXEC, so FDs will get cleaned up when your process exits - gracefully or not.

Specific cases that this doesn't work are thinks like TC (which uses netlink) etc... and my guess is that you're using one of theses types of program.

Gracefully handling more types of signals is a good approach, but is best effort. If the drop logic were to fail, or an unhandle-able signal were received, resources would be left behind. A robust approach is to perform cleanup on application startup, where previously loaded eBPF programs are identified and unloaded from the kernel.

To support that effort, this change adds the a public detach_program() function, which wraps the internal bpf_prog_detach().

bpf_prog_detach() is not a generic detach function. It's a specific detach function for links that were created with bpf_prog_attach() so this approach isn't going to work.

If you let me know what type of program you're having issue with I can point you in the right direction.

EDIT: Based on your other open PR I'll assume it's SockOps.
See: #987

@reyzell
Copy link
Contributor Author

reyzell commented Jul 12, 2024

I think you're saying that once SockOps programs are updated to attach with FdLink (issue #987), that I'll no longer see lingering maps and programs when the user-space process terminates ungracefully. Is that right?

@dave-tucker
Copy link
Member

I think you're saying that once SockOps programs are updated to attach with FdLink (issue #987), that I'll no longer see lingering maps and programs when the user-space process terminates ungracefully. Is that right?

Yep I'm 99% sure that's the case.

@viveksb007
Copy link

I am seeing similar problem for TC program. I have attached the eBPF program on TC-Egress hook and sometimes they linger around. @dave-tucker can you share the references on how to cleanup existing TC eBPF program during startup?

@alessandrod
Copy link
Collaborator

I think you're saying that once SockOps programs are updated to attach with FdLink (issue #987), that I'll no longer see lingering maps and programs when the user-space process terminates ungracefully. Is that right?

Correct, bpf_link fixes this, at which point the only problematic program left is TC when attached with netlink, for which we already provide an utility to detach at startup: https://docs.aya-rs.dev/aya/programs/tc/fn.qdisc_detach_program cc @viveksb007

@nahuelfilipuzzi
Copy link

nahuelfilipuzzi commented Oct 16, 2024

This feature is this still needed if you want to use Kernel <5.7, right? If the app crashes the program will not be deleted.

https://github.com/aya-rs/aya/blob/main/aya/src/programs/sock_ops.rs

 pub fn attach<T: AsFd>(
        &mut self,
        cgroup: T,
        mode: CgroupAttachMode,
    ) -> Result<SockOpsLinkId, ProgramError> {
/...
        if KernelVersion::current().unwrap() >= KernelVersion::new(5, 7, 0) {
            let link_fd = bpf_link_create(...)
//...
        } else {
            let link = ProgAttachLink::attach(prog_fd, cgroup_fd, attach_type, mode)?;
// ...
        }
    }

Copy link

mergify bot commented Nov 24, 2024

@reyzell, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aya This is about aya (userspace) needs-rebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants