Node mac address not updated in SNAT info tables after VM image update #1204

robinvalk · 2024-10-21T10:58:52Z

Hi all,

We've encountered what we assume is a bug in the ACI CNI. (Please let us know if this is not the correct place to report non provisioning related issues.)

We have a native Kubernetes cluster running on VMs using vSphere. During one of our upgrade procedures of the VM images we noticed that the ACI CNI doesn't update the SNAT info tables correctly. Both the global table as well as node tables are not updated with node changes after the template upgrade.

The upgrade procedure that triggers the bug looks as follows:

We taint the node we want to upgrade to remove any workloads from it
We delete the node's VM
Kubernetes recognises that the node is offline and marks it as not-ready
We create a new VM instance using a new VM template image version
The node VM comes back online and announces itself to the cluster. As it has the same name etc the kubernetes cluster recognises this node is back online and marks it back as ready.
The ACI CNI detect that the node is back online and checks what needs to be updated.
It recognises that the same node is back online, and because of apparent optimisations, it doesn't update any record in the SNAT info tables... Because of this the mac address in the info table is never updated!
We untaint the node so workload is scheduled to it again

The result is that the SNAT response packets never get redirected to pods running on this node anymore. The other nodes assume that the upgraded node still has the old mac address (because of the ACI CNI config update optimisation). The ACI CNI recognises that the node is updated but it chooses not to act on it. This issue can be fixed by just always acting on node updates.

Luckily we've found a workaround to ensure the SNAT table gets the correct node mac address in the list. Instead of only tainting the node to remove the workloads we completely delete the node object from the cluster. This causes the ACI CNI to remove the Node entry from the SNAT tables. Once the node comes back online and registers itself it's just seen as a new node and a new entry (with the correct mac address) is added to the SNAT table.

SNAT global info table CRD: aci.snat/v1 snatglobalinfos

fwardzic · 2024-11-04T20:07:33Z

Hi Robin,
Thanks for submitting this issue with great details. We are looking into it.

fwardzic self-assigned this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node mac address not updated in SNAT info tables after VM image update #1204

Node mac address not updated in SNAT info tables after VM image update #1204

robinvalk commented Oct 21, 2024

fwardzic commented Nov 4, 2024

Node mac address not updated in SNAT info tables after VM image update #1204

Node mac address not updated in SNAT info tables after VM image update #1204

Comments

robinvalk commented Oct 21, 2024

fwardzic commented Nov 4, 2024