Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalid subnet CIDR block causes kube-ovn-cni pods to panic and enter crashloopbackoff #4699

Open
isabelleatkins opened this issue Nov 5, 2024 · 1 comment
Labels
bug Something isn't working subnet

Comments

@isabelleatkins
Copy link

Kube-OVN Version

v1.11.5

Kubernetes Version

v1.28.6

Operation-system/Kernel Version

"Ubuntu 22.04.5 LTS"

5.15.0-124-generic

Description

A subnet created with an invalid CIDR causes the kube-ovn-cni pods to panic and enter a crashloopbackoff.

Instead, there should be validation on the API and the kubectl command to create the subnet with the inccorect cidr should fail, with a clear message explaining that the CIDR block was of the wrong format.

In addition, there should be validation in the kube-ovn code, so that if it tries to parse a CIDR block and it can't, it handles the error and does not panic.

Pod in crashloopbackoff yaml:

kubectl get subnet belle-test-3 -oyaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubeovn.io/v1","kind":"Subnet","metadata":{"annotations":{},"name":"belle-test-3"},"spec":{"cidrBlock":"101/64"}}
  creationTimestamp: "2024-10-30T17:22:51Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-10-31T13:45:03Z"
  finalizers:
  - kube-ovn-controller
  generation: 4
  name: belle-test-3
  resourceVersion: "157906490"
  uid: c2d755f2-f93a-469b-ad46-435ba74a5d52
spec:
  cidrBlock: 101/64
  default: false
  excludeIps:
  - 2002:0:0:1234::1
  gateway: 2002:0:0:1234::1
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv6
  provider: ovn
  vpc: ovn-cluster
status:
  activateGateway: ""
  conditions:
  - lastTransitionTime: "2024-10-31T13:28:19Z"
    lastUpdateTime: "2024-10-31T13:28:19Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "False"
    type: Validated
  - lastTransitionTime: "2024-10-30T17:22:54Z"
    lastUpdateTime: "2024-10-30T17:22:54Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "True"
    type: Error
  - lastTransitionTime: "2024-10-30T17:22:54Z"
    lastUpdateTime: "2024-10-30T17:22:54Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "False"
    type: Ready
  dhcpV4OptionsUUID: ""
  dhcpV6OptionsUUID: ""
  u2oInterconnectionIP: ""
  v4availableIPs: 0
  v4usingIPs: 0
  v6availableIPs: 1.8446744073709552e+19
  v6usingIPs: 0

Logs on kube-ovn-cni pod:

E1101 16:04:53.669762 3360998 runtime.go:79] Observed a panic: &logrus.Entry{Logger:(*logrus.Logger)(0xc000264980), Data:logrus.Fields{"cidr":"101/64", "error":(*net.ParseError)(0xc000804440)}, Time:time.Date(2024, time.November, 1, 16, 4, 53, 669293613, time.Local), Level:0x0, Caller:(*runtime.Frame)(nil), Message:"Failed to parse CIDR", Buffer:(*bytes.Buffer)(nil), Context:context.Context(nil), err:""} (&{0xc000264980 map[cidr:101/64 error:invalid CIDR address: 101/64] 2024-11-01 16:04:53.669293613 +0000 UTC m=+0.516971737 panic <nil> Failed to parse CIDR <nil> <nil> })
goroutine 94 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x56a199a28240?, 0xc0003b5180})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x56a199a28240, 0xc0003b5180})
        /opt/hostedtoolcache/go/1.19.9/x64/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).log(0xc0003b50a0, 0x0, {0xc000654018, 0x14})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc0003b50a0, 0x0, {0xc000b7d6e8?, 0x4?, 0xc000804440?})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(...)
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:342
github.com/alauda/felix/ip.MustParseCIDROrIP({0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ip/ip_addr.go:214 +0x177
github.com/alauda/felix/ipsets.IPSetType.CanonicaliseMember({0x56a1988d1aff?, 0xc000b01bc8?}, {0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipset_defs.go:172 +0x159
github.com/alauda/felix/ipsets.(*IPSets).filterAndCanonicaliseMembers(0xc000292480, {0x56a1988d1aff, 0x8}, {0xc0009426e0, 0x8, 0xc0000fc210?})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:238 +0xed
github.com/alauda/felix/ipsets.(*IPSets).AddOrReplaceIPSet(0xc000292480, {{0x56a1988d0646, 0x7}, {0x56a1988d1aff, 0x8}, 0x100000}, {0xc0009426e0, 0x8, 0xb})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:120 +0x22e
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).setIPSet(0xc000a21600)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/gateway_linux.go:94 +0x31a
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).Run(0xc000a21600, 0xc00051d1a0)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/controller.go:582 +0x3f7
created by github.com/kubeovn/kube-ovn/cmd/daemon.CmdMain
        /home/runner/work/kube-ovn/kube-ovn/cmd/daemon/cniserver.go:83 +0x5ca
panic: (*logrus.Entry) 0xc0003b5180 [recovered]
        panic: (*logrus.Entry) 0xc0003b5180

goroutine 94 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x56a199a28240, 0xc0003b5180})
        /opt/hostedtoolcache/go/1.19.9/x64/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).log(0xc0003b50a0, 0x0, {0xc000654018, 0x14})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc0003b50a0, 0x0, {0xc000b7d6e8?, 0x4?, 0xc000804440?})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(...)
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:342
github.com/alauda/felix/ip.MustParseCIDROrIP({0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ip/ip_addr.go:214 +0x177
github.com/alauda/felix/ipsets.IPSetType.CanonicaliseMember({0x56a1988d1aff?, 0xc000b01bc8?}, {0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipset_defs.go:172 +0x159
github.com/alauda/felix/ipsets.(*IPSets).filterAndCanonicaliseMembers(0xc000292480, {0x56a1988d1aff, 0x8}, {0xc0009426e0, 0x8, 0xc0000fc210?})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:238 +0xed
github.com/alauda/felix/ipsets.(*IPSets).AddOrReplaceIPSet(0xc000292480, {{0x56a1988d0646, 0x7}, {0x56a1988d1aff, 0x8}, 0x100000}, {0xc0009426e0, 0x8, 0xb})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:120 +0x22e
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).setIPSet(0xc000a21600)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/gateway_linux.go:94 +0x31a
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).Run(0xc000a21600, 0xc00051d1a0)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/controller.go:582 +0x3f7
created by github.com/kubeovn/kube-ovn/cmd/daemon.CmdMain
        /home/runner/work/kube-ovn/kube-ovn/cmd/daemon/cniserver.go:83 +0x5ca

Steps To Reproduce

  1. Create file "subnet.yaml" with contents:
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: broken-cidr-subnet
spec:
  cidrBlock: 104/64
  1. Run kubectl apply -f subnet.yaml
  2. Run kubectl get pods -n kube-system and see the kube-ovn-cni pods in crashloopbackackoff

Current Behavior

Panics, and the kube-ovn-cni pod enters a crashloopbackoff.

Pod in crashloopbackoff yaml:

kubectl get subnet belle-test-3 -oyaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubeovn.io/v1","kind":"Subnet","metadata":{"annotations":{},"name":"belle-test-3"},"spec":{"cidrBlock":"101/64"}}
  creationTimestamp: "2024-10-30T17:22:51Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-10-31T13:45:03Z"
  finalizers:
  - kube-ovn-controller
  generation: 4
  name: belle-test-3
  resourceVersion: "157906490"
  uid: c2d755f2-f93a-469b-ad46-435ba74a5d52
spec:
  cidrBlock: 101/64
  default: false
  excludeIps:
  - 2002:0:0:1234::1
  gateway: 2002:0:0:1234::1
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv6
  provider: ovn
  vpc: ovn-cluster
status:
  activateGateway: ""
  conditions:
  - lastTransitionTime: "2024-10-31T13:28:19Z"
    lastUpdateTime: "2024-10-31T13:28:19Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "False"
    type: Validated
  - lastTransitionTime: "2024-10-30T17:22:54Z"
    lastUpdateTime: "2024-10-30T17:22:54Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "True"
    type: Error
  - lastTransitionTime: "2024-10-30T17:22:54Z"
    lastUpdateTime: "2024-10-30T17:22:54Z"
    message: subnet belle-test-3 cidr 2002:0:0:1234::/64 is conflict with subnet belle-test-2
      cidr 2002:0:0:1234::/64
    reason: ValidateLogicalSwitchFailed
    status: "False"
    type: Ready
  dhcpV4OptionsUUID: ""
  dhcpV6OptionsUUID: ""
  u2oInterconnectionIP: ""
  v4availableIPs: 0
  v4usingIPs: 0
  v6availableIPs: 1.8446744073709552e+19
  v6usingIPs: 0

Logs on kube-ovn-cni pod:

E1101 16:04:53.669762 3360998 runtime.go:79] Observed a panic: &logrus.Entry{Logger:(*logrus.Logger)(0xc000264980), Data:logrus.Fields{"cidr":"101/64", "error":(*net.ParseError)(0xc000804440)}, Time:time.Date(2024, time.November, 1, 16, 4, 53, 669293613, time.Local), Level:0x0, Caller:(*runtime.Frame)(nil), Message:"Failed to parse CIDR", Buffer:(*bytes.Buffer)(nil), Context:context.Context(nil), err:""} (&{0xc000264980 map[cidr:101/64 error:invalid CIDR address: 101/64] 2024-11-01 16:04:53.669293613 +0000 UTC m=+0.516971737 panic <nil> Failed to parse CIDR <nil> <nil> })
goroutine 94 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x56a199a28240?, 0xc0003b5180})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x56a199a28240, 0xc0003b5180})
        /opt/hostedtoolcache/go/1.19.9/x64/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).log(0xc0003b50a0, 0x0, {0xc000654018, 0x14})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc0003b50a0, 0x0, {0xc000b7d6e8?, 0x4?, 0xc000804440?})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(...)
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:342
github.com/alauda/felix/ip.MustParseCIDROrIP({0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ip/ip_addr.go:214 +0x177
github.com/alauda/felix/ipsets.IPSetType.CanonicaliseMember({0x56a1988d1aff?, 0xc000b01bc8?}, {0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipset_defs.go:172 +0x159
github.com/alauda/felix/ipsets.(*IPSets).filterAndCanonicaliseMembers(0xc000292480, {0x56a1988d1aff, 0x8}, {0xc0009426e0, 0x8, 0xc0000fc210?})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:238 +0xed
github.com/alauda/felix/ipsets.(*IPSets).AddOrReplaceIPSet(0xc000292480, {{0x56a1988d0646, 0x7}, {0x56a1988d1aff, 0x8}, 0x100000}, {0xc0009426e0, 0x8, 0xb})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:120 +0x22e
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).setIPSet(0xc000a21600)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/gateway_linux.go:94 +0x31a
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).Run(0xc000a21600, 0xc00051d1a0)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/controller.go:582 +0x3f7
created by github.com/kubeovn/kube-ovn/cmd/daemon.CmdMain
        /home/runner/work/kube-ovn/kube-ovn/cmd/daemon/cniserver.go:83 +0x5ca
panic: (*logrus.Entry) 0xc0003b5180 [recovered]
        panic: (*logrus.Entry) 0xc0003b5180

goroutine 94 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x56a199a28240, 0xc0003b5180})
        /opt/hostedtoolcache/go/1.19.9/x64/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).log(0xc0003b50a0, 0x0, {0xc000654018, 0x14})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc0003b50a0, 0x0, {0xc000b7d6e8?, 0x4?, 0xc000804440?})
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(...)
        /home/runner/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:342
github.com/alauda/felix/ip.MustParseCIDROrIP({0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ip/ip_addr.go:214 +0x177
github.com/alauda/felix/ipsets.IPSetType.CanonicaliseMember({0x56a1988d1aff?, 0xc000b01bc8?}, {0xc000a9e2a0, 0x6})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipset_defs.go:172 +0x159
github.com/alauda/felix/ipsets.(*IPSets).filterAndCanonicaliseMembers(0xc000292480, {0x56a1988d1aff, 0x8}, {0xc0009426e0, 0x8, 0xc0000fc210?})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:238 +0xed
github.com/alauda/felix/ipsets.(*IPSets).AddOrReplaceIPSet(0xc000292480, {{0x56a1988d0646, 0x7}, {0x56a1988d1aff, 0x8}, 0x100000}, {0xc0009426e0, 0x8, 0xb})
        /home/runner/go/pkg/mod/github.com/kubeovn/[email protected]/ipsets/ipsets.go:120 +0x22e
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).setIPSet(0xc000a21600)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/gateway_linux.go:94 +0x31a
github.com/kubeovn/kube-ovn/pkg/daemon.(*Controller).Run(0xc000a21600, 0xc00051d1a0)
        /home/runner/work/kube-ovn/kube-ovn/pkg/daemon/controller.go:582 +0x3f7
created by github.com/kubeovn/kube-ovn/cmd/daemon.CmdMain
        /home/runner/work/kube-ovn/kube-ovn/cmd/daemon/cniserver.go:83 +0x5ca

Expected Behavior

Kubectl command to apply the subnet with the incorrect CIDR fails with clear message as to why e.g. "Invalid CIDR Block: should be of form ..."

@isabelleatkins isabelleatkins added the bug Something isn't working label Nov 5, 2024
@dosubot dosubot bot added the subnet label Nov 5, 2024
Copy link

dosubot bot commented Nov 5, 2024

@isabelleatkins Hello there! I'm Dosu, a bot here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

To address the issue of invalid CIDR blocks causing the kube-ovn-cni pods to panic and enter a crashloopbackoff, you can implement validation in the Kube-OVN API and code. Here are some steps and existing mechanisms that can help:

  1. Validation in the API: Implement validation checks in the Kube-OVN API to ensure that invalid CIDR blocks are not accepted. The ValidateSubnet function in pkg/util/validator.go already includes several validation checks for CIDR blocks, such as ensuring the CIDR is a valid format and that the gateway is within the CIDR block [1].

  2. Error Handling in CIDR Parsing: The AddOrUpdateSubnet function in pkg/ipam/ipam.go handles CIDR parsing and logs errors if the CIDR format is invalid. This function should be enhanced to ensure that any invalid CIDR blocks are caught and handled gracefully, preventing panics [2].

  3. Webhook Configurations: Use existing webhook configurations in pkg/webhook/subnet.go to validate subnet configurations during creation and updates. These webhooks can prevent invalid configurations from being applied [3].

  4. Error Handling in kube-ovn-cni: The kube-ovn-cni pod should handle errors related to CIDR parsing by logging warnings and continuing without adding the problematic CIDR to the list of addresses. This is done in the AddressSetUpdateAddress function, which logs a warning if a CIDR cannot be parsed [4].

By implementing these validation and error-handling mechanisms, you can prevent invalid CIDR blocks from causing the kube-ovn-cni pods to panic and enter a crashloopbackoff.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working subnet
Projects
None yet
Development

No branches or pull requests

1 participant