Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn users that kernel headers are missing during the Pixie install process #2051

Open
ddelnano opened this issue Dec 2, 2024 · 1 comment
Labels
area/deployment Issues replated to deployments kind/feature New feature or request

Comments

@ddelnano
Copy link
Member

ddelnano commented Dec 2, 2024

While Pixie has invested in its prepackaged linux headers and working without upstream headers, it is highly recommended to install the given distro's kernel header package. Upstream distros patch and backport many changes, which make the prepackaged option susceptible issues that are hard to anticipate and work around. Examples of these inconsistencies can be seen in #1863, #252 and the recent openSUSE (2037) and Amazon Linux 2023 (#1986) issues.

Some of these issues get reported, but my suspicion is that this poor experience causes people to fail to fully evaluate Pixie since their initial impression shows that the socket tracer isn't functional (the most common way this problem manifests). For example, the openSUSE case mentioned above was only determined through my outreach and was in a position where the end user had moved on from evaluating Pixie.

If Pixie had the ability to detect when kernel headers aren't installed, we could warn the user that it is recommended to do so and link to common problems caused by the lack of headers. This will provide the end user with quick feedback on an area that's currently arcane to debug and hopefully prevent people from having a poor experience in these cases.

@ddelnano ddelnano added kind/feature New feature or request area/deployment Issues replated to deployments labels Dec 2, 2024
ddelnano added a commit that referenced this issue Dec 11, 2024
… to `GetAgentStatus` (#2052)

Summary: Add UDTF that detects linux kernel header installation and add
column to `GetAgentStatus`

This is a prerequisite to accomplish #2051. The `px deploy` command uses
the GetAgentStatus UDTF in its final [healthcheck
step](https://github.com/pixie-io/pixie/blob/854062111cf4b91a40649a2e2647c88c0a68b0db/src/pixie_cli/pkg/cmd/deploy.go#L607-L613).
With this kernel header detection in place, the `px` cli can use the
results from the `px/agent_status` script to print a warning message if
kernel headers aren't detected.

The helm install flow needs to be covered as well. My hope is that this
UDTF could be used for that use case as well, but I need to further
investigate the details of that.

Relevant Issues: #2051

Type of change: /kind feature

Test Plan: Skaffolded to a Ubuntu GKE cluster and tested the following
- [x] Kelvin always reports `false` as it doesn't bind mount `/` to
`/host`
- [x] PEM running on host without `linux-headers-$(uname -r)` package
reports `false`
- [x] PEM running on host with `linux-headers-$(uname -r)` package
reports `true`
```
$ gcloud compute ssh gke-dev-cluster-ddelnano-default-pool-a27c1ac2-x5k2 --internal-ip -- 'ls -alh /lib/modules/$(uname -r)/build'

lrwxrwxrwx 1 root root 38 Aug  9 15:25 /lib/modules/5.15.0-1065-gke/build -> /usr/src/linux-headers-5.15.0-1065-gke

$ gcloud compute ssh gke-dev-cluster-ddelnano-default-pool-a27c1ac2-j6pg --internal-ip -- 'ls -alh /lib/modules/$(uname -r)/build'

ls: cannot access '/lib/modules/5.15.0-1065-gke/build': No such file or directory

```
![Screen Shot 2024-12-02 at 9 30 29
AM](https://github.com/user-attachments/assets/9fa862f8-5a6c-46d6-8899-bfaf2bdf3371)


Changelog Message: Add `GetLinuxHeadersStatus` UDTF and add
`kernel_headers_installed` column to `GetAgentStatus`

---------

Signed-off-by: Dom Del Nano <[email protected]>
aimichelle pushed a commit that referenced this issue Dec 16, 2024
…ing (#2061)

Summary: Update `GetAgentStatus` and kernel header UDTF to allow kelvin
filtering

In order to leverage the `GetAgentStatus`'s `kernel_headers_installed`
column for #2051, it would be convenient for the the UDTF to provide the
ability to filter kelvins out -- they don't have access to kernel
headers since they don't have the host filesystem volume mounted. This
change introduces an `include_kelvin` init argument to the UDTFs with a
default of `true` to preserve the existing behavior.

This change also fixes a bug with UDTF's init arg default values, which
didn't work prior to this change. Please review commit by commit to see
the default arg bug fix followed by the UDTF changes.

Relevant Issues: #2051

Type of change: /kind bug

Test Plan: New logical planner test no longer fails with the following
error
```
$ bazel test -c opt src/carnot/planner:logical_planner_test --test_output=all

[ RUN      ] LogicalPlannerTest.one_pems_one_kelvin
src/carnot/planner/logical_planner_test.cc:64: Failure
Value of: IsOK(::px::StatusAdapter(__status_or_value__64))
  Actual: false (Invalid Argument : DATA_TYPE_UNKNOWN not handled as a default value)
Expected: true
```
@ddelnano
Copy link
Member Author

#1986 is another great case of the need for this warning/tooling. That was a bug that lasted from August until now and was due to the fact that Amazon linux headers needed to be installed since pixie's pre-packaged headers resulted in broken Go TLS tracing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deployment Issues replated to deployments kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant