Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] New base_syscalls.exclude_enter_exit_set config #2960

Open
incertum opened this issue Dec 9, 2023 · 13 comments
Open

[DISCUSSION] New base_syscalls.exclude_enter_exit_set config #2960

incertum opened this issue Dec 9, 2023 · 13 comments

Comments

@incertum
Copy link
Contributor

incertum commented Dec 9, 2023

Motivation

The hardware landscape is evolving towards models with 96, 128, or more CPUs. However, Falco currently faces usability challenges on such machines, particularly those dealing with heavy traffic, especially in network and file-related activities.

One potential solution could involve allowing end users to specify a subset of enter or exit syscall events they want to drop on the kernel side. This feature would be flagged as very risky to use, similar to the existing base_syscalls feature.

For instance, users might opt to drop enter syscall events for open* and connect syscalls, even though they are aware that doing so could expose them to TOCTOU attacks (mitigated by default via this PR). Nevertheless, this trade-off might be preferable to completely disabling Falco.

Feature

Introduce a new config base_syscalls.exclude_enter_exit_set, allowing exclusion of specific enter or exit events that are part of the custom_set syscalls. This exclusion is limited to scenarios where it makes sense for enter or exit events. Ensure good documentation.

Additional context

falcosecurity/libs#1557

CC @falcosecurity/libs-maintainers

@incertum
Copy link
Contributor Author

incertum commented Dec 9, 2023

@stevenbrz let's see if the other maintainers are on board. If yes, it could be a great "warm up" contribution for you to take on 😉

@falcosecurity falcosecurity deleted a comment from poiana Dec 9, 2023
@Andreagit97
Copy link
Member

Yes, Falco doesn't scale on these huge servers and we need to find a possible solution to mitigate this case, one idea could be:

  1. adapt our sinsp state to be only populated by exit_events, enter_events are just needed to mitigate TOCTOU or in old kernel versions.
  2. when sinsp can reconstruct the state with only exit events, we can disable all enter events informing our users that this will turn Falco into a best-effort detection mode that could be vulnerable to some attacks. I would prefer to remove all enter events to reduce complexity instead of having a sort of simple consumer just for enter events 🤯. This point will halve our kernel events, and this is already a great result.
  3. With event throughputs of 20 milions/s the previous point is not enough, we will obtain 10 milions/s but Falco cannot handle it, so we need a sort of hash table in the drivers to filter exit events. My idea would be to expose some API in sinsp that allow different filters (on the comm, on the exepath, on the cmdline,...) These filters are evaluated in userspace when we read the event from the next (if we have a match we add the pid of this process inside the hash table used by the drivers so the following events will be excluded kernel side). Of course, we need to evaluate how many filters we can process because it could be quite heavy. Moreover, I would avoid filtering clone/execve/proc_exit events, we have already seen these don't cause perf overhead and we need them to keep a reliable process tree inside sinsp.

This is just an idea but maybe it could work

@incertum
Copy link
Contributor Author

Moreover, I would avoid filtering clone/execve/proc_exit events, we have already seen these don't cause perf overhead and we need them to keep a reliable process tree inside sinsp.

Big +1 those aren't an issue.

@Andreagit97 Andreagit97 added this to the TBD milestone Dec 14, 2023
@cccsss01
Copy link

I'm in support of this.

@leogr
Copy link
Member

leogr commented Jan 15, 2024

I'm in favor of investigating this front 👍

@poiana
Copy link
Contributor

poiana commented Apr 14, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@incertum
Copy link
Contributor Author

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jul 13, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Oct 13, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Oct 14, 2024

/remove-lifecycle stale

@Andreagit97
Copy link
Member

See falcosecurity/libs#2068

@poiana
Copy link
Contributor

poiana commented Jan 13, 2025

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants