Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a TracingPolicyNamespaced with the same name for a different namespace does not get applied. #2299

Closed
Tracked by #1125
joshuajorel opened this issue Apr 5, 2024 · 14 comments · Fixed by #2337
Assignees
Labels
kind/bug Something isn't working

Comments

@joshuajorel
Copy link
Contributor

What happened?

  1. In our K8s environment, we deployed the same policy to two different namespaces. However, only the first policy gets applied. This was confirmed by running the tetra tp list command on the tetra pods. We tested this behavior with the fd-install TracingPolicyNamespaced config in two different namespaces (default and test):
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "fd-install"
spec:
  kprobes:
  - call: "fd_install"
    syscall: false
    args:
    - index: 0
      type: "int"
    - index: 1
      type: "file"
    selectors:
    - matchArgs:
      - index: 1
        operator: "Equal"
        values:
        - "/tmp/tetragon"
      matchActions:
      - action: Sigkill

The following is the output of tetra tp list:

[5] fd-install enabled:true filterID:5 namespace:default sensors:gkp-sensor-5

Only one policy is applied and the policy for test is not even though the k8s resource exists.

Tetragon Version

CLI version: v1.0.1
Server version: v1.0.2 (installed via Helm)

Kernel Version

Linux ubuntu-noble 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3

Bugtool

time="2024-04-03T23:20:06Z" level=info msg="saving init info"
time="2024-04-03T23:20:06Z" level=info msg="retrieving lib directory" libDir=/var/lib/tetragon/
time="2024-04-03T23:20:06Z" level=warning msg="not an object file, ignoring" path=/var/lib/tetragon/
time="2024-04-03T23:20:10Z" level=info msg="skipping metadata directory" path=/var/lib/tetragon/metadata
time="2024-04-03T23:20:10Z" level=warning msg="no btf filename in tetragon config, attempting to fall back to /sys/kernel/btf/vmlinux"
time="2024-04-03T23:20:11Z" level=info msg="btf file added" btfFname=/sys/kernel/btf/vmlinux
time="2024-04-03T23:20:11Z" level=info msg="tetragon log file added" exportFname=/var/run/cilium/tetragon/tetragon.log
time="2024-04-03T23:20:11Z" level=info msg="contacting metrics server" metricsAddr="http://localhost:2112/metrics"
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd=/bin/dmesg dstFname=dmesg.out ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo ingress" dstFname=tc-info.lo.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo egress" dstFname=tc-info.lo.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 ingress" dstFname=tc-info.eth0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 egress" dstFname=tc-info.eth0.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 ingress" dstFname=tc-info.vethf85e1c33.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 egress" dstFname=tc-info.vethf85e1c33.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 ingress" dstFname=tc-info.vethd2bc04e0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 egress" dstFname=tc-info.vethd2bc04e0.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 ingress" dstFname=tc-info.vethd92333b8.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 egress" dstFname=tc-info.vethd92333b8.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea ingress" dstFname=tc-info.vethc9f1bfea.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea egress" dstFname=tc-info.vethc9f1bfea.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 ingress" dstFname=tc-info.vethdc8843f6.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 egress" dstFname=tc-info.vethdc8843f6.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 ingress" dstFname=tc-info.vethdd056db0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 egress" dstFname=tc-info.vethdd056db0.egress ret=0
time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool map show -j" dstFname=bpftool-maps.json ret=0
time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool prog show -j" dstFname=bpftool-progs.json ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/bpftool cgroup tree -j" dstFname=bpftool-cgroups.json ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stack localhost:8118" dstFname=gops.stack ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stats localhost:8118" dstFname=gpos.stats ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops memstats localhost:8118" dstFname=gops.memstats ret=0
time="2024-04-03T23:20:13Z" level=info msg="dumped tracing policies in tracing-policies.json"

Relevant log output

No response

Anything else?

No response

@joshuajorel joshuajorel added the kind/bug Something isn't working label Apr 5, 2024
@kkourt
Copy link
Contributor

kkourt commented Apr 5, 2024

Indeed, this is currently the case, i.e., the policy name should be unique across all other policies. I believe this also includes non-namespaced policies. This can be fixed, but it requires a significant amount of changes. My suggestion would be to use different policy names.

@joshuajorel
Copy link
Contributor Author

Is that the intended behavior? I can understand for non-namespaced policies, that policy names need to be unique. However it doesn't seem intuitive from a k8s perspective to not allow this behavior.

@kkourt
Copy link
Contributor

kkourt commented Apr 8, 2024

It's not the intended behavior, and I agree that it's counterintuitive.

Originally, Tetragon did not support namespaced policies so we used the policy name as a key, to uniquely identify a policy. When we introduced namespaced policies, this was not changed and we were left with the above limitation.

Internally, we maintain a mapping from a string (the policy name) to a collection:

collections map[string]collection

Which is the internal state we keep for each policy:

// collection is a collection of sensors
// This can either be creating from a tracing policy, or by loading sensors indepenently for sensors
// that are not loaded via a tracing policy (e.g., base sensor) and testing.
type collection struct {

Changing the code so that we something like the following for the key:

type collection_key struct {
    name, namespace string
}

should allow us to have the same policy name in different namespaces.

@mtardy mtardy added the release-blocker This PR or issue is blocking the next release. label Apr 8, 2024
@joshuajorel
Copy link
Contributor Author

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

@kkourt
Copy link
Contributor

kkourt commented Apr 9, 2024

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

We 've discussed this in the community call yesterday (https://docs.google.com/document/d/1BFMJLdtisiCSLfMct0GHof_ioL-5QVNLEaeMSlk_7Eo/edit) and the consensus was that this is something the community would defintely be interested in.

I'm not aware of anyone working on it, and we would gladly take this contribution. Happy to also guide along the way.

Thanks!

@joshuajorel
Copy link
Contributor Author

@kkourt I created a draft PR here: #2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

@kkourt
Copy link
Contributor

kkourt commented Apr 16, 2024

@kkourt I created a draft PR here: #2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

Cool, thanks!

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

Does everything work as expected if the policies have different names?

@joshuajorel
Copy link
Contributor Author

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

@kkourt
Copy link
Contributor

kkourt commented Apr 16, 2024

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

So you mean that even if the policy names are not the same, the policies do not take effect?
Can you please open a separate issue for this? Please include a sysdump, the policies themselves, and what is it the expected and actual results of the policies.

@joshuajorel
Copy link
Contributor Author

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

@kkourt
Copy link
Contributor

kkourt commented Apr 16, 2024

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

I'm not sure about WSL2, but I wouldn't be surprised if there was an issue with it. Can you please create another issue with it? Should be possible to figure out what's wrong with a sysdump.

@joshuajorel
Copy link
Contributor Author

@kkourt created a separate issue here

@kkourt
Copy link
Contributor

kkourt commented Apr 17, 2024

@kkourt created a separate issue #2338

thanks!

Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

It seesm that WSL2 is not working properly. Would need to investigate further to figure out how to address the issue. In the meantime, would it be possible to use another environment (e.g., a normal linux VM) for testing? Thanks!

@kkourt kkourt self-assigned this Apr 17, 2024
@joshuajorel
Copy link
Contributor Author

@kkourt will just be reinstalling my tools in a VM and continue testing there.

@mtardy mtardy removed the release-blocker This PR or issue is blocking the next release. label Apr 26, 2024
joshuajorel added a commit to joshuajorel/tetragon that referenced this issue May 3, 2024
TracingPolicy and TracingPolicyNamespaced encounter conflicts whenever a policy is applied with the same name. A policy with the same name, even if it is in a different namespace does not get applied. This commit adds a collectionKey to differentiate policies with the same, but in different namespaces.

Fixes: cilium#2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
joshuajorel added a commit to joshuajorel/tetragon that referenced this issue May 15, 2024
TracingPolicy and TracingPolicyNamespaced encounter conflicts whenever a policy is applied with the same name. A policy with the same name, even if it is in a different namespace does not get applied. This commit adds a collectionKey to differentiate policies with the same, but in different namespaces.

Fixes: cilium#2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
joshuajorel added a commit to joshuajorel/tetragon that referenced this issue May 15, 2024
By adding a collectionKey to differentiate namespaces, the CLI needs to be updated to support this update. This change adds a namespace argument to the CLI for the delete, enable, and disable commands. If the namespace argument is unset, it will reference the global TracingPolicy.

Fixes: cilium#2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
@kkourt kkourt mentioned this issue May 17, 2024
3 tasks
joshuajorel added a commit to joshuajorel/tetragon that referenced this issue May 20, 2024
By adding a collectionKey to differentiate namespaces, the CLI needs to be updated to support this update. This change adds a namespace flag to the CLI for the delete, enable, and disable commands. If the namespace flag is unset, it will reference the global TracingPolicy.

Fixes: cilium#2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
kkourt pushed a commit that referenced this issue May 22, 2024
TracingPolicy and TracingPolicyNamespaced encounter conflicts whenever a policy is applied with the same name. A policy with the same name, even if it is in a different namespace does not get applied. This commit adds a collectionKey to differentiate policies with the same, but in different namespaces.

Fixes: #2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
kkourt pushed a commit that referenced this issue May 22, 2024
By adding a collectionKey to differentiate namespaces, the CLI needs to be updated to support this update. This change adds a namespace flag to the CLI for the delete, enable, and disable commands. If the namespace flag is unset, it will reference the global TracingPolicy.

Fixes: #2299

Signed-off-by: Joshua Jorel Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants