Skip to content

Commit

Permalink
tetragon: Add docs for tetragon performance stats
Browse files Browse the repository at this point in the history
Adding details on 'tetra debug progs' and overhead metrics.

Signed-off-by: Jiri Olsa <[email protected]>
  • Loading branch information
olsajiri committed Oct 31, 2024
1 parent 4aace87 commit 16602dd
Showing 1 changed file with 172 additions and 0 deletions.
172 changes: 172 additions & 0 deletions docs/content/en/docs/concepts/performance-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
---
title: "BPF programs statistics"
weight: 5
description: "Monitor bpf programs statistics"
---

This page shows you how to monitor bpf programs statistics.

## Concept

The BPF subsystem provides performance data for each loaded program and tetragon
exports that in metrics or display that in terminal in top like tool.

## In terminal

The tetra command allows to display loaded BPF programs in terminal with:

```
# tetra debug progs
```

The default output shows tetragon programs only and looks like:

```
2024-10-31 11:12:45.94715546 +0000 UTC m=+8.038098448
Ovh(%) Id Cnt Time Name Pin
0.00 22201 0 0 event_execve /sys/fs/bpf/tetragon/__base__/event_execve/prog
0.00 22198 0 0 event_exit_acct_process /sys/fs/bpf/tetragon/__base__/event_exit/prog
0.00 22200 0 0 event_wake_up_new_task /sys/fs/bpf/tetragon/__base__/kprobe_pid_clear/prog
0.00 22207 0 0 tg_cgroup_rmdir /sys/fs/bpf/tetragon/__base__/tg_cgroup_rmdir/prog
0.00 22206 0 0 tg_kp_bprm_committing_creds /sys/fs/bpf/tetragon/__base__/tg_kp_bprm_committing_creds/prog
0.00 22221 0 0 generic_kprobe_event /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/__x64_sys_close/prog
0.00 22225 0 0 generic_kprobe_event /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/__x64_sys_write/prog
0.00 22211 0 0 generic_kprobe_event /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/fd_install/prog
```

The fields have following meaning:

- `Ovh` is system wide overhead of the BPF program
- `Id` is global BPF ID of the program (as shown by `bpftool prog`)
- `Cnt` is count with number of BPF program executions
- `Time` is sum of the time of all BPF program executions
- `Pin` is BPF program pin path in bpfffs

It's possible to display all BPF programs with `--all`:

```
# tetra debug progs --all
```

That has following output:

```
2024-10-31 11:19:37.720137195 +0000 UTC m=+7.165535117
Ovh(%) Id Cnt Time Name Pin
0.00 159 2 82620 event_execve -
0.00 171 68 18564 iter -
0.00 158 2 10170 event_wake_up_n -
0.00 164 2 4254 tg_kp_bprm_comm -
0.00 157 2 3868 event_exit_acct -
0.00 97 2 1680 -
0.00 35 2 1442 -
0.00 83 0 0 sd_devices -
0.00 9 0 0 -
0.00 7 0 0 -
0.00 8 0 0 -
0.00 87 0 0 sd_devices -
...
```

Above commands should run properly on top of the tetragon sources.
At the moment to run it properly under K8S you need to specify extra
directory flags:

```
# kubectl exec -ti -n kube-system tetragon-66rk4 -c tetragon -- tetra debug progs --bpf-dir /run/cilium/bpffs/tetragon/ --all --bpf-lib /var/lib/tetragon/
```

Note that there are other options to customize the behaviour:

```
# tetra debug progs --help
Retrieve information about BPF programs on the host.
Examples:
- tetragon BPF programs top style
# tetra debug progs
- all BPF programs top style
# tetra debug progs --all
- one shot mode (displays one interval data)
# tetra debug progs --once
- change interval to 10 seconds
# tetra debug progs --timeout 10
- change interval to 10 seconds in one shot mode
# tetra debug progs --once --timeout 10
Usage:
tetra debug progs [flags]
Aliases:
progs, top
Flags:
--all Get all programs
--bpf-dir string Location of bpffs tetragon directory (default "/sys/fs/bpf/tetragon")
--bpf-lib string Location of Tetragon libs (btf and bpf files) (default "bpf/objs/")
-h, --help help for progs
--no-clear Do not clear screen between rounds
--once Run in one shot mode
--timeout int Interval in seconds (delay in one shot mode) (default 1)
```

## Metrics

The BPF subsystem provides performance data for each loaded program
and tetragon exports that in metrics.

For each loaded BPF program we get:
- `run count` which counts how many times the BPF program was executed
- `run time` which sums the time BPF program spent in all its executions


Hence for each loaded BPF program we export 2 related metrics:

- `tetragon_overhead_time_program_total[namespace,policy,sensor,attach]`
- `tetragon_overhead_cnt_program_total[namespace,policy,sensor,attach]`


Each loaded program is identified by labels:

- `namespace` is policy K8S namespace
- `policy` is policy name
- `sensor` is sensor name
- `attach` is program attachment name


If we have `generic_kprobe` sensor attached on `__x64_sys_close` kernel function
under `syswritefollowfdpsswd` policy, the related metrics will look like:

```
tetragon_overhead_program_runs_total{attach="__x64_sys_close",policy="syswritefollowfdpsswd",policy_namespace="",sensor="generic_kprobe"} 15894
tetragon_overhead_program_seconds_total{attach="__x64_sys_close",policy="syswritefollowfdpsswd",policy_namespace="",sensor="generic_kprobe"} 1.03908217e+08
```


## Limitations

Note that the BPF programs statistics are not enabled by default, because they introduce extra overhead,
so it's necessary to enable them manually.

- Either with `sysctl`:

```
# sysctl kernel.bpf_stats_enabled=1
```

and make sure you disable the stats when it's no longer needed:

```
# sysctl kernel.bpf_stats_enabled=0
```

- Or with following `tetra` command:

```
# tetra debug enable-stats
^C
```

where the stats are enabled as long as the command is running (sleeping really).

0 comments on commit 16602dd

Please sign in to comment.