Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nftables with conntrack can break kube-router #1777

Open
lspgn opened this issue Dec 6, 2024 · 3 comments
Open

nftables with conntrack can break kube-router #1777

lspgn opened this issue Dec 6, 2024 · 3 comments
Labels

Comments

@lspgn
Copy link

lspgn commented Dec 6, 2024

What happened?

When a certain conntrack rule is added via the nft command, it breaks the netpol controller when it tries to setup the firewall with k3s.
I created k3s-io/k3s#11415 but they indicated the error was coming from this library.

Thank you for your help

The logs indicate the following:

Dec 05 08:05:15 abc k3s[10542]: panic: F1205 08:05:15.257193   10542 network_policy_controller.go:413] failed to list rules in filter table INPUT chain due to running [/usr/sbin/iptables -t filter -S INPUT --wait]: exit status 1: iptables v1.8.10 (nf_tables): chain `INPUT' in table `filter' is incompatible, use 'nft' tool.
Dec 05 08:05:15 abc k3s[10542]: goroutine 58811 [running]:
Dec 05 08:05:15 abc k3s[10542]: k8s.io/klog/v2.(*loggingT).output(0xaab94c0, 0x3, 0xc000a50de0, 0xc0017e9880, 0x1, {0x86a37a6?, 0x2?}, 0xc00fc63250?, 0x0)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/klog/[email protected]/klog.go:965 +0x73d
Dec 05 08:05:15 abc k3s[10542]: k8s.io/klog/v2.(*loggingT).printfDepth(0xaab94c0, 0x3, 0xc000a50de0, {0x0, 0x0}, 0x1, {0x6439f02, 0x37}, {0xc00fc2bde0, 0x2, ...})
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/klog/[email protected]/klog.go:767 +0x1f0
Dec 05 08:05:15 abc k3s[10542]: k8s.io/klog/v2.(*loggingT).printf(...)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/klog/[email protected]/klog.go:744
Dec 05 08:05:15 abc k3s[10542]: k8s.io/klog/v2.Fatalf(...)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/klog/[email protected]/klog.go:1655
Dec 05 08:05:15 abc k3s[10542]: github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).ensureTopLevelChains.func2({0x7298940, 0xc0101be730}, {0x62b5350, 0x5}, {0xc0101f4c00, 0x6, 0x6}, {0xc0101b4b00, 0x10}, 0x1)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/kube-router/[email protected]/pkg/controllers/netpol/network_policy_controller.go:413 +0x316
Dec 05 08:05:15 abc k3s[10542]: github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).ensureTopLevelChains(0xc008b53320)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/kube-router/[email protected]/pkg/controllers/netpol/network_policy_controller.go:467 +0x1be9
Dec 05 08:05:15 abc k3s[10542]: github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).Run(0xc008b53320, 0xc0065e1320, 0xc001a78a20, 0xc004e1a040)
Dec 05 08:05:15 abc k3s[10542]:         /go/pkg/mod/github.com/k3s-io/kube-router/[email protected]/pkg/controllers/netpol/network_policy_controller.go:168 +0x171
Dec 05 08:05:15 abc k3s[10542]: created by github.com/k3s-io/k3s/pkg/agent/netpol.Run in goroutine 1
Dec 05 08:05:15 abc k3s[10542]:         /go/src/github.com/k3s-io/k3s/pkg/agent/netpol/netpol.go:184 +0xe34
Dec 05 08:05:15 abc systemd[1]: k3s.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 05 08:05:15 abc systemd[1]: k3s.service: Failed with result 'exit-code'.

What did you expect to happen?

The following code was expected to return instead of failing

rules, err := iptablesCmdHandler.List("filter", chain)
if err != nil {
klog.Fatalf("failed to list rules in filter table %s chain due to %s", chain, err.Error())
}

How can we reproduce the behavior you experienced?

When using

sudo nft add rule filter INPUT ct state related,established accept

The command generated by the library will fail with

$ sudo /usr/sbin/iptables -t filter -S INPUT --wait
iptables v1.8.10 (nf_tables): chain `INPUT' in table `filter' is incompatible, use 'nft' tool.

Whereas if the rule has been added with:

sudo iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

The command correctly returns and can be parsed

$ sudo /usr/sbin/iptables -t filter -S INPUT --wait
-P INPUT DROP
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

The problem might be with iptables-nft but just wanted to raise it here since this library parses the output.

$ sudo nft -a -n list table ip filter
# Warning: table ip filter is managed by iptables-nft, do not touch!
table ip filter { # handle 32
	chain INPUT { # handle 1
		type filter hook input priority 0; policy drop;
		ct state related,established counter packets 0 bytes 0 accept # handle 164 --> this is with iptables command
		ct state 0x2,0x4 accept # handle 163 --> this is with the nft command
	}

Screenshots / Architecture Diagrams / Network Topologies

Just a single node

System Information (please complete the following information)

  • Kube-Router Version (kube-router --version): v2.2.1
  • Kube-Router Parameters: unknown
  • Kubernetes Version (kubectl version) : v1.30.6
  • Cloud Type: on-premise
  • Kubernetes Deployment Type: k3s
  • Kube-Router Deployment Type: none
  • Cluster Size: 1

This is running on Ubuntu 24.04

$ uname -a
Linux abc 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov  4 02:06:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Logs, other output, metrics

NA

Additional context

NA

@lspgn lspgn added the bug label Dec 6, 2024
@aauren
Copy link
Collaborator

aauren commented Dec 6, 2024

That's good to know. I'm not totally sure that there's anything that the kube-router project can do about it. From what I can tell, its likely an upstream netfilter incompatibility.

Essentially kube-router uses the old iptables legacy binaries from the netfilter project to generate nft rules. We've never done the conversion to using nft natively because:

  • The iptables legacy binaries have always seemed good enough
  • The iptables legacy binaries write to the same nft backend structures, so the data path results stay the same, its only the loading that changes
  • Converting from iptables -> nft calls is a pretty big lift for kube-router and it would probably take several releases to stabilize and find bugs so it hasn't seemed worth it

In the past, the user-space tooling from the netfilter project has been pretty stable, but recently (in the last year and a half or so) it has become a lot less stable for some reason. We've found API incompatibility between bug fix releases of the tooling several times over the last few releases, and a current break in functionality has kept kube-router pinned on an older version of Alpine the last 2 or 3 minor releases.

To be fair, I think that the upstream project has struggled to keep pace with the container ecosystem. Having a containerized version of the user-space tooling writing one set of rules to the kernel structures and having an OS host version of the user-space tooling writing to the same set of kernel structures is probably a use-case that they didn't architect for when starting out with netfilter.

It looks like this time, they've broken compatibility between the legacy binary and the rules that the nft binary writes. If this is a blocker for your workflow, or something that you want to save other people from stumbling across, I'd recommend reporting it upstream. If they fix it, then let us know what version it is fixed in and we'll try to upgrade our iptables userspace tooling in the container image.

@lspgn
Copy link
Author

lspgn commented Dec 8, 2024

Thank you @aauren for this very in-depth response.

That's good to know. I'm not totally sure that there's anything that the kube-router project can do about it. From what I can tell, its likely an upstream netfilter incompatibility.

I totally understand. I mostly wanted to report a new way a breakage could happen and was curious on the nft roadmap (great insights too).
I'll keep using iptables-nft and have a look reporting it upstream.

Feel free to close this issue. I can always re-open if this gets fixed upstream.

@aauren
Copy link
Collaborator

aauren commented Dec 26, 2024

It looks like iptables-1.8.11 fixes the issue with checking iptables rules that was introduced in iptables-1.8.10 and kept us on a legacy iptables userspace in the kube-router container (1.8.9).

My hope is that once Alpine releases a version that includes the iptables-1.8.11 userspace that we can use that and it should hopefully make it more resilient to host userspace tooling versions, since it should hopefully be more compatible with newer userspace version.

While we wait for Alpine to release, I've built my own iptables 1.8.11 packages for Alpine and included them in a PR build here: #1790

If the machine that you're testing doesn't have access to sensitive data, and you're willing to trust custom iptables binaries made by the project, and your cluster is using AMD64 architecture machines, you can give cloudnativelabs/kube-router-git:PR-1790 a try: https://hub.docker.com/layers/cloudnativelabs/kube-router-git/PR-1790/images/sha256-cbf606605c9e5d9111c952e96d337074e83a60e495f30cac74c5d85fd27252f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants