Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] legacy ebpf SCHEMA VERSION doesn't follow our semver rules #1283

Open
Andreagit97 opened this issue Aug 7, 2023 · 22 comments
Open

[BUG] legacy ebpf SCHEMA VERSION doesn't follow our semver rules #1283

Andreagit97 opened this issue Aug 7, 2023 · 22 comments
Labels
kind/bug Something isn't working
Milestone

Comments

@Andreagit97
Copy link
Member

Andreagit97 commented Aug 7, 2023

Describe the bug

When the scap bpf engine loads the .o file it scans all elf sections. Right now when the engine finds a new filler section it expects to have the corresponding enum in userspace tables

but this is not always true. Consider this example:

Let's imagine SCAP_MINIMUM_DRIVER_SCHEMA_VERSION is like today:

#define SCAP_MINIMUM_DRIVER_SCHEMA_VERSION PPM_API_VERSION(2, 0, 0)

and we have a driver with SCHEMA VERSION 2.0.0. Now we add a new filler like in this PR #1256 sys_listen_e and we bump the driver SCHEMA VERSION to 2.1.0. When libscap tries to load the new .o with SCHEMA_VERSION 2.1.0 it will fail because it will find a sys_listen_e section not known and will print invalid filler name. For this reason, when we add a new filler we need to bump also a major for the SCHEMA_VERSION until we fix this issue

int prog_id = lookup_filler_id(event);
if(prog_id == -1)
{
	return scap_errprintf(handle->m_lasterr, 0, "invalid filler name: %s", event);
}

this issue happens when we use an old libscap version and a driver with at least one new filler, not a very common case but BTW it is a bug.

How to reproduce it

Build libscap with the commit before this PR (#1256) and build the bpf probe over this PR, you will see the invalid filler name error when you load the bpf probe

@Andreagit97 Andreagit97 added the kind/bug Something isn't working label Aug 7, 2023
@Andreagit97 Andreagit97 added this to the 0.13.0 milestone Aug 7, 2023
@Andreagit97 Andreagit97 modified the milestones: 0.13.0, 0.14.0 Aug 23, 2023
@Andreagit97 Andreagit97 modified the milestones: 0.14.0, TBD Sep 4, 2023
@Andreagit97 Andreagit97 changed the title [BUG] scap bpf engine requires SCHEMA VERSION major bumps when it shouldn't [BUG] scap bpf engine requires API VERSION major bumps when it shouldn't Nov 21, 2023
@poiana
Copy link
Contributor

poiana commented Feb 19, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member Author

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented May 26, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member Author

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Aug 25, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member Author

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Nov 24, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member Author

/remove-lifecycle stale

@Andreagit97
Copy link
Member Author

This could be related to this #2068

@Andreagit97
Copy link
Member Author

It is possible that we will need to add new filler bpf side so we need to bump the right version

@Andreagit97
Copy link
Member Author

The solution would be this:

diff --git a/userspace/libscap/engine/bpf/scap_bpf.c b/userspace/libscap/engine/bpf/scap_bpf.c
index 3f7ea6e58..a88aa9515 100644
--- a/userspace/libscap/engine/bpf/scap_bpf.c
+++ b/userspace/libscap/engine/bpf/scap_bpf.c
@@ -569,8 +569,6 @@ static int32_t load_single_prog(struct bpf_engine *handle,
                                BPF_PROGS_TAIL_CALLED_MAX);
                }
 
-               handle->m_tail_called_fds[handle->m_tail_called_cnt++] = fd;
-
                event += sizeof("filler/") - 1;
                if(*event == 0) {
                        return scap_errprintf(handle->m_lasterr, 0, "filler name cannot be empty");
@@ -578,7 +576,11 @@ static int32_t load_single_prog(struct bpf_engine *handle,
 
                int prog_id = lookup_filler_id(event);
                if(prog_id == -1) {
-                       return scap_errprintf(handle->m_lasterr, 0, "invalid filler name: %s", event);
+                       // This is possible when we load a new ebpf probe `.o` with an old scap version.
+                       // The ebpf probe defines a filler that is not present in the scap version. This should
+                       // be safe until we keep the old fillers in the probe like we are doing today
+                       close(fd);
+                       return SCAP_SUCCESS;
                } else if(prog_id >= BPF_PROGS_TAIL_CALLED_MAX) {
                        return scap_errprintf(handle->m_lasterr,
                                              0,
@@ -587,6 +589,8 @@ static int32_t load_single_prog(struct bpf_engine *handle,
                                              BPF_PROGS_TAIL_CALLED_MAX);
                }
 
+               handle->m_tail_called_fds[handle->m_tail_called_cnt++] = fd;
+
                /* Fill the tail table. The key is our filler internal code extracted
                 * from `g_filler_names` in `lookup_filler_id` function. The value
                 * is the program fd.

In this way what we need to do is to bump a MINOR for the SCAP_MINIMUM_DRIVER_SCHEMA_VERSION because old drivers won't be compatible with new libscap versions, but if we do things correctly new drivers should be compatible with old scap versions.

Let's consider this example:

https://github.com/falcosecurity/libs/pull/1256/files -> We added a new filler sys_listen_e

Let's say the driver version is 2.8.0 like in the PR and scap is #define SCAP_MINIMUM_DRIVER_SCHEMA_VERSION PPM_API_VERSION(2, 0, 0) like today.

With the above fix, the old scap should be able to address the case of a new filler sys_listen_e it will simply ignore it and configure the probe to work with the old filler autofill. Of course, we should always remember to keep the old logic so that we can guarantee this behavior.

In the opposite case, the new scap version is no longer compatible with the old drivers:

  • scap will map PPME_SOCKET_LISTEN_E -> PPM_FILLER_sys_listen_e [PPME_SOCKET_LISTEN_E] = {FILLER_REF(sys_listen_e)}, and will populate the ebpf map with this info
  • then scap will try to map the id PPM_FILLER_sys_listen_e to a bpf program that it should find in the .o but the probe won't have the program sys_listen_e so it won't populate the tail table with any entry for PPM_FILLER_sys_listen_e. The driver will try to call into the tail table but this will return a failure and we won't send the PPME_SOCKET_LISTEN_E to userspace.

So the 2 alternatives are:

  1. merging this fix + don't remove the old logic in the drivers + bump the minor version of SCAP_MINIMUM_DRIVER_SCHEMA_VERSION
  2. bump the major of SCHEMA_VERSION

Since we don't add new fillers every day I would go for the 2, it is less risky IMO. WDYT @falcosecurity/libs-maintainers? If we chose number 2 we can just close this issue and remember to bump the major when we add a new filler

@Andreagit97 Andreagit97 self-assigned this Nov 29, 2024
@Andreagit97 Andreagit97 modified the milestones: TBD, 0.20.0 Nov 29, 2024
@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2024

to bump the major when we add a new filler

This means that for each new implemented syscall we will have to bump the major :/ while i agree that we won't add new syscalls every day, this somewhat is still a limitation. The bug only happens for the old bpf probe, right? The modern one is always shipped with libscap therefore no problem on that one, while the kmod is not affected.

Since the old bpf probe is going to get killed sooner or later (no ETA of course but this is something we all have in mind), i agree that 2 is the best solution for now.

@Andreagit97
Copy link
Member Author

yep, every time we add a new syscall or change a filler we should bump a major.

The bug only happens for the old bpf probe, right?

Unfortunately yes. We can always decide to fallback to solution 1 if we need to do too many major bumps

@leogr
Copy link
Member

leogr commented Dec 10, 2024

I might miss some critical points, but I'm afraid I have to disagree with the solution of bumping a major when something is being added. Doing that would immediately make the versioning numbering useless.

So, I'm for option 1, and I want to keep the correct pattern of bumping the minor for additions.

@FedeDP
Copy link
Contributor

FedeDP commented Jan 8, 2025

Is this still planned for 0.20?

@leogr
Copy link
Member

leogr commented Jan 8, 2025

Is this still planned for 0.20?

I guess no, unfortunately. @Andreagit97 wdyt?

@FedeDP
Copy link
Contributor

FedeDP commented Jan 8, 2025

/milestone 0.21.0

@poiana poiana modified the milestones: 0.20.0, 0.21.0 Jan 8, 2025
@Andreagit97
Copy link
Member Author

I tried applying this patch, #1283 (comment), which I had proposed, but it’s not sufficient. In fact, I discovered something else...

Case: Running legacy eBPF with new filler code against a scap without that filler code (#2187)

When we add a filler code, the probe becomes completely incompatible with old scap versions because they see two enums with different codes. Specifically, the old scap sees PPM_FILLER_terminate_filler as x, while the probe sees it as x+1. When it attempts to make a tail call to x+1, it finds no program associated with it. → As of today, we need to bump the MAJOR version.

Case: Running legacy eBPF with a new event format against a scap with the old format (#2206)

When we add parameters to an event, we encounter another issue. Take PR #2206 as an example, which increases the PPME_SOCKET_SOCKET_X event parameters from 1 to 4. The probe will attempt to populate 4 parameters for the socket event, but when it checks that this number matches the one in the event table, the check fails. This is because the event table is a map populated by scap at load time. A scap version older than the probe still sees the socket event as having only 1 parameter. → This also leads to incompatibility, and a MAJOR bump would also be needed here.

So, with how our code works today, we don’t just need a major version bump when creating a new filler; we also need one when adding new parameters to an event. So the SCHEMA_VERSION for legacy eBPF does not comply with semver. Consequently, I suggest enforcing a manual check to ensure that the MAJOR and MINOR versions must match between scap and the driver.
This doesn’t make the legacy eBPF semver-compliant, but it at least avoids confusing loading errors.

@Andreagit97
Copy link
Member Author

Andreagit97 commented Jan 8, 2025

Consequently, I suggest enforcing a manual check to ensure that the MAJOR and MINOR versions must match between scap and the driver.
This doesn’t make the legacy eBPF semver-compliant, but it at least avoids confusing loading errors.

I proposed an implementation here #2228

@Andreagit97 Andreagit97 changed the title [BUG] scap bpf engine requires API VERSION major bumps when it shouldn't [BUG] legacy ebpf SCHEMA VERSION doesn't follow our semver rules Jan 8, 2025
@Andreagit97
Copy link
Member Author

Since #2228 is merged, we can close it or we can keep it open to track this inconsistency. Probably I would keep it open just to remember it but no strong opinions

@Andreagit97 Andreagit97 removed their assignment Jan 13, 2025
@FedeDP
Copy link
Contributor

FedeDP commented Jan 13, 2025

Agree, let's keep this open. I'll pin this one too!

@FedeDP
Copy link
Contributor

FedeDP commented Jan 13, 2025

/milestone TBD

@poiana poiana modified the milestones: 0.21.0, TBD Jan 13, 2025
@FedeDP FedeDP pinned this issue Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants