Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flannel Dualstack crash on 1.30.3 #10726

Open
ungarscool1 opened this issue Aug 18, 2024 · 8 comments
Open

Flannel Dualstack crash on 1.30.3 #10726

ungarscool1 opened this issue Aug 18, 2024 · 8 comments

Comments

@ungarscool1
Copy link

ungarscool1 commented Aug 18, 2024

Environmental Info:
K3s Version: v1.30.3+k3s1 (f646604)

Node(s) CPU architecture, OS, and Version:

  1. Linux REDACTED-server 6.8.0-38-generic #38-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 7 15:25:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  2. Linux kube-1 6.8.0-38-generic #38-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 7 15:25:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  3. Linux kube-2 6.8.0-38-generic #38-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 7 15:25:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 server, 2 agents. Flannel over wireguard (but works fine on IPv4).

Describe the bug:

Aug 17 23:51:10 REDACTED-server k3s[1060103]: I0817 23:51:10.876617 1060103 kube.go:636] List of node(REDACTED-server) annotations: map[string]string{"alpha.kubernetes.io/provided-node-ip":"10.6.99.1,fdfe:a65f:20d::1", "csi.volume.kubernetes.io/nodeid":"{\"driver.longhorn.io\":\"REDACTED-server\"}", "etcd.k3s.cattle.io/local-snapshots-timestamp":"2024-08-17T23:16:35Z", "etcd.k3s.cattle.io/node-address":"10.6.99.1", "etcd.k3s.cattle.io/node-name":"REDACTED-server-89decbb8", "flannel.alpha.coreos.com/backend-data":"{\"VNI\":1,\"VtepMAC\":\"f2:4b:ca:0b:ef:e2\"}", "flannel.alpha.coreos.com/backend-type":"vxlan", "flannel.alpha.coreos.com/backend-v6-data":"{\"VNI\":1,\"VtepMAC\":\"2a:24:27:a4:a2:1b\"}", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"10.6.99.1", "flannel.alpha.coreos.com/public-ipv6":"fdfe:a65f:20d::1", "k3s.io/external-ip":"REDACTED-public-IPv4,REDACTED-public-IPv6", "k3s.io/hostname":"REDACTED-server", "k3s.io/internal-ip":"10.6.99.1,fdfe:a65f:20d::1", "k3s.io/node-args":"[\"server\",\"--write-kubeconfig-mode\",\"644\",\"--tls-san\",\"REDACTED-public-IPv4,REDACTED-public-IPv6\",\"--flannel-iface\",\"wg0\",\"--node-ip\",\"10.6.99.1,fdfe:a65f:20d::1\",\"--node-external-ip\",\"REDACTED-public-IPv4,REDACTED-public-IPv6\",\"--advertise-address\",\"10.6.99.1\",\"--cluster-cidr\",\"10.42.0.0/16,2001:cafe:42::/56\",\"--service-cidr\",\"10.43.0.0/16,2001:cafe:43::/112\",\"--flannel-ipv6-masq\",\"--cluster-init\"]", "k3s.io/node-config-hash":"REDACTED", "k3s.io/node-env":"{}", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
Aug 17 23:51:10 REDACTED-server k3s[1060103]: I0817 23:51:10.876871 1060103 vxlan.go:155] Interface flannel.1 mac address set to: f2:4b:ca:0b:ef:e2
Aug 17 23:51:10 REDACTED-server k3s[1060103]: I0817 23:51:10.878692 1060103 vxlan.go:183] Interface flannel-v6.1 mac address set to: 2a:24:27:a4:a2:1b
Aug 17 23:51:10 REDACTED-server k3s[1060103]: panic: runtime error: invalid memory address or nil pointer dereference
Aug 17 23:51:10 REDACTED-server k3s[1060103]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x598197]
Aug 17 23:51:10 REDACTED-server k3s[1060103]: goroutine 27215 [running]:
Aug 17 23:51:10 REDACTED-server k3s[1060103]: math/big.(*Int).Bytes(0x0)
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /usr/local/go/src/math/big/int.go:527 +0x17
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/flannel-io/flannel/pkg/ip.(*IP6).ToIP(0x0)
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/pkg/mod/github.com/flannel-io/[email protected]/pkg/ip/ip6net.go:82 +0x1c
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/flannel-io/flannel/pkg/ip.IP6Net.ToIPNet({0x0?, 0x0?})
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/pkg/mod/github.com/flannel-io/[email protected]/pkg/ip/ip6net.go:175 +0x25
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/flannel-io/flannel/pkg/ip.EnsureV6AddressOnLink({0x0?, 0xc0168920d8?}, {0xc01d400700?, 0x30?}, {0x71c2e30, 0xc015f86c40})
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/pkg/mod/github.com/flannel-io/[email protected]/pkg/ip/iface.go:298 +0x52
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/flannel-io/flannel/pkg/backend/vxlan.(*vxlanDevice).ConfigureIPv6(0xc0073391f0, {0x0?, 0xc001358550?}, {0xc01d400700?, 0x10?})
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/pkg/mod/github.com/flannel-io/[email protected]/pkg/backend/vxlan/device.go:153 +0x50
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/flannel-io/flannel/pkg/backend/vxlan.(*VXLANBackend).RegisterNetwork(0xc00f644a68, {0x71f93c0, 0xc001358550}, 0xc001358550?, 0xc01cf00b00)
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/pkg/mod/github.com/flannel-io/[email protected]/pkg/backend/vxlan/vxlan.go:228 +0xd25
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/k3s-io/k3s/pkg/agent/flannel.flannel({0x71f93c0, 0xc001358550}, 0xc023979fd0?, {0xc007415a40, 0x34}, {0xc007a48cf0, 0x2d}, 0x1, 0xb)
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/flannel.go:82 +0x222
Aug 17 23:51:10 REDACTED-server k3s[1060103]: github.com/k3s-io/k3s/pkg/agent/flannel.Run.func1()
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/setup.go:78 +0x46
Aug 17 23:51:10 REDACTED-server k3s[1060103]: created by github.com/k3s-io/k3s/pkg/agent/flannel.Run in goroutine 1
Aug 17 23:51:10 REDACTED-server k3s[1060103]:         /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/setup.go:77 +0x152
Aug 17 23:51:11 REDACTED-server systemd[1]: k3s.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Steps To Reproduce:
Have this configuration:

/usr/local/bin/k3s \
    server \
        '--write-kubeconfig-mode' \
        '644' \
        '--tls-san' \
        'REDACTED-public-IPv4,REDACTED-public-IPv6' \
        '--flannel-iface' \
        'wg0' \
        '--node-ip' \
        '10.6.99.1,fdfe:a65f:20d::1' \
        '--node-external-ip' \
        'REDACTED-public-IPv4,REDACTED-public-IPv6' \
        '--advertise-address' \
        '10.6.99.1' \
        '--cluster-cidr' \
        '10.42.0.0/16,2001:cafe:42::/56' \
        '--service-cidr' \
        '10.43.0.0/16,2001:cafe:43::/112' \
        '--flannel-ipv6-masq' \
        '--cluster-init' \
  • Installed K3s: curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.30.2+k3s1 sh -s - server --write-kubeconfig-mode 644 --tls-san REDACTED-public-IPv4 --flannel-iface wg0 --node-ip 10.6.99.1 --node-external-ip REDACTED-public-IPv4 --advertise-address 10.6.99.1, after that I modified systemd service to add etcd and now IPv6.

Wireguard configuration:

interface: wg0
  public key: REDACTED
  private key: (hidden)
  listening port: 51821

peer: REDACTED
  preshared key: (hidden)
  endpoint: REDACTED:20318
  allowed ips: 10.6.99.2/32, fdfe:a65f:20d::2/128
  latest handshake: 23 seconds ago
  transfer: 673.30 MiB received, 1.56 GiB sent

peer: REDACTED
  preshared key: (hidden)
  endpoint: REDACTED:37999
  allowed ips: 10.6.99.3/32, fdfe:a65f:20d::3/128
  latest handshake: 1 minute, 48 seconds ago
  transfer: 1.63 GiB received, 1.14 GiB sent
@brandond
Copy link
Contributor

Just to be clear, this happened when trying to add IPv6 and etcd to a cluster that was started with sqlite and only IPv4?

@ungarscool1
Copy link
Author

ungarscool1 commented Aug 18, 2024

My cluster was started from SQLite and IPv4. I switched from SQLite to etcd like 2 months ago.
Now, I am trying to add IPv6, to enable traefik on IPv4/6. However, I just read on the documentation that I can't do dualstack because I started my cluster with IPv4 only.
So, do I really need to destroy my cluster?

@hofq
Copy link

hofq commented Sep 4, 2024

same issue here - tried switching from ipv4 to dualstack. Running single node

@brandond
Copy link
Contributor

brandond commented Sep 4, 2024

You can try deleting the node via kubectl delete node before restarting it as dual-stack.

The issue is that Kubernetes only assigns pod CIDRs to nodes when the node resource is created. If you try to switch from single-stack to dual-stack after the nodes have already joined the cluster, it won't add an IPv6 pod CIDR.

@hofq
Copy link

hofq commented Sep 4, 2024

Looking Good! Thank you really much. Maybe we can add error handling for this?

@brandond
Copy link
Contributor

brandond commented Sep 4, 2024

We don't technically support changing CIDRs or other core bits of CNI config after the cluster is up, and we don't want to be in the business of deleting nodes for people... but yes Flannel could probably be fixed to not crash.

@ungarscool1
Copy link
Author

Hi @brandond

You can try deleting the node via kubectl delete node before restarting it as dual-stack.

Even agents nodes or just the server?

@brandond
Copy link
Contributor

brandond commented Sep 5, 2024

Even agents nodes or just the server?

All the nodes that you want to be dual-stack. As I said, they need to be deleted and recreated to get new CIDRs assigned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

No branches or pull requests

3 participants