Skip to content

Commit

Permalink
docs: fixes on new persistent enforcement and cgroup rate pages
Browse files Browse the repository at this point in the history
I was too slow to review #2748 properly and some things could have been
enhanced on the docs side. Here are some changes.

Signed-off-by: Mahe Tardy <[email protected]>
  • Loading branch information
mtardy committed Aug 5, 2024
1 parent d42569d commit b85e132
Show file tree
Hide file tree
Showing 4 changed files with 197 additions and 171 deletions.
177 changes: 96 additions & 81 deletions docs/content/en/docs/concepts/cgroup-rate.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Cgroup rate throtling"
weight: 2
title: "Event throttling"
weight: 5
description: "Monitor and throttle cgroup events rate"
---

Expand All @@ -21,14 +21,17 @@ The throttle action generates following events:
- `THROTTLE` start event is sent when the group rate limit is crossed
- `THROTTLE` stop event is sent when the cgroup rate is again below the limit stable for 5 seconds

**NOTE** The threshold for given cgroup is monitored *per CPU*.
{{< note >}}
The threshold for given cgroup is monitored *per CPU*.
When the events are spread around on multiple CPUs we will throttle
them per CPU only if they cross the threshold on that CPU.
{{< /note >}}

**NOTE** At the moment we monitor and limit base sensor events:
{{< note >}}
At the moment we monitor and limit base sensor events:
- `PROCESS_EXEC`
- `PROCESS_EXIT`

{{< /note >}}

## Setup

Expand Down Expand Up @@ -63,95 +66,107 @@ The throttle events contains fields as follows.

- `THROTTLE_START`

```json
{
"process_throttle": {
"type": "THROTTLE_START",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:43.178407128Z"
}
```
```json
{
"process_throttle": {
"type": "THROTTLE_START",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:43.178407128Z"
}
```

- `THROTTLE_STOP`

```json
"process_throttle": {
"type": "THROTTLE_STOP",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:55.501718877Z"
```
```json
{
"process_throttle": {
"type": "THROTTLE_STOP",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:55.501718877Z"
}
```


## Example

This example shows how to generate throttle events when cgroup rate monitoring is enabled.


- Start tetragon with cgroup rate monitoring 10 events per second, the successfull configuration will show in tetragon log

```
# tetragon --bpf-lib ./bpf/objs/ --cgroup-rate=10,1s
...
time="2024-07-26T13:33:19Z" level=info msg="Cgroup rate started (10/1s)"
...
```

- Spawn more than 10 events per second

```
$ while :; do sleep 0.001s; done
```

- Monitor events shows throttling


```
$ tetra getevents -o compact
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle START session-429.scope
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle STOP session-429.scope
```

When you stop the while loop from thr other terminal you will get above `throttle STOP` event after 5 seconds.
1. Start tetragon with cgroup rate monitoring 10 events per second.

```shell
tetragon --bpf-lib ./bpf/objs/ --cgroup-rate=10,1s
```

The successful configuration will show in tetragon log.

```
...
time="2024-07-26T13:33:19Z" level=info msg="Cgroup rate started (10/1s)"
...
```

1. Spawn more than 10 events per second.

```shell
while :; do sleep 0.001s; done
```

1. Monitor events shows throttling.


```shell
tetra getevents -o compact
```

The output should be similar to:

```
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle START session-429.scope
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle STOP session-429.scope
```

When you stop the while loop from the other terminal you will get above
`throttle STOP` event after 5 seconds.


## Limitations

- The cgroup rate is monitored per CPU

- At the moment we monitor and limit base sensor and kprobe events:
- At the moment we only monitor and limit base sensor and kprobe events:
- `PROCESS_EXEC`
- `PROCESS_EXIT`

100 changes: 100 additions & 0 deletions docs/content/en/docs/concepts/enforcement/persistent-enforcement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: "Persistent enforcement"
weight: 1
description: "How to configure persistent enforcement"
---

This page shows you how to configure persistent enforcement.

## Concept

The idea of persistent enforcement is to allow the enforcement policy to continue
running even when its tetragon process is gone.

This is configured with the `--keep-sensors-on-exit` option.

When the tetragon process exits, the policy stays active because it's pinned
in sysfs bpf tree under `/sys/fs/bpf/tetragon` directory.

When a new tetragon process is started, it performs the following actions:

- checks if there's existing `/sys/fs/bpf/tetragon` and moves it to
`/sys/fs/bpf/tetragon_old` directory;
- sets up configured policy;
- removes `/sys/fs/bpf/tetragon_old` directory.

## Example

This example shows how the persistent enforcement works on simple tracing policy.

1. Consider the following enforcement tracing policy that kills any process that touches `/tmp/tetragon` file.

```yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "enforcement"
spec:
kprobes:
- call: "fd_install"
syscall: false
args:
- index: 0
type: int
- index: 1
type: "file"
selectors:
- matchArgs:
- index: 1
operator: "Equal"
values:
- "/tmp/tetragon"
matchActions:
- action: Sigkill
```
1. Spawn tetragon with the above policy and `--keep-sensors-on-exit` option.

```shell
tetragon --bpf-lib bpf/objs/ --keep-sensors-on-exit --tracing-policy enforcement.yaml
```

1. Verify that the enforcement policy is in place.

```shell
cat /tmp/tetragon
```

The output should be similar to

```
Killed
```
1. Kill tetragon with <kbd>CTRL+C</kbd>.
```
time="2024-07-26T14:47:45Z" level=info msg="Perf ring buffer size (bytes)" percpu=68K total=272K
time="2024-07-26T14:47:45Z" level=info msg="Perf ring buffer events queue size (events)" size=63K
time="2024-07-26T14:47:45Z" level=info msg="Listening for events..."
^C
time="2024-07-26T14:50:50Z" level=info msg="Received signal interrupt, shutting down..."
time="2024-07-26T14:50:50Z" level=info msg="Listening for events completed." error="context canceled"
```
1. Verify that the enforcement policy is **STILL** in place.
```shell
cat /tmp/tetragon
```

The output should be still similar to

```
Killed
```

## Limitations

At the moment we are not able to receive any events during the tetragon down time,
only the the enforcement is in place.
Loading

0 comments on commit b85e132

Please sign in to comment.