Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

ArtemProskochylo · 2024-04-24T15:15:41Z

What happened:
After upgrading vpc-cni plugin to v1.17.1 and v1.18.0 versions I see a lot of errors for the aws-network-policy-agent container with v1.1.0 version. The issue is occurring even on fresh EKS installations where we are not using Network Policies.

Attach logs
W0424 08:27:34.397257 1 reflector.go:462] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

What you expected to happen:
No error messages.

How to reproduce it (as minimally and precisely as possible):

Deploy v1.29 EKS cluster
Deploy VPC CNI Add-on v1.17.1-eksbuild.1 or v1.18.0-eksbuild.1 version.
Run kubectl -n kube-system logs aws-node-*

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Client Version: v1.29.1
Server Version: v1.29.1-eks-b9c9ed7
CNI Version: v1.17.1 and v1.18.0
Network Policy Agent Version: v1.1.0
OS (e.g: cat /etc/os-release): Bottlerocket OS 1.19.2 (aws-k8s-1.29)
Kernel (e.g. uname -a): 6.1.77

The text was updated successfully, but these errors were encountered:

achevuru · 2024-06-03T21:01:05Z

@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the aws-node pod. Did you apply the corresponding version specific manifest?

danielap-ma · 2024-06-19T08:04:41Z

Facing the same issue after upgrading to EKS 1.29 with CNI 1.18.0.
@achevuru I upgraded the addon directly from AWS using Terraform. I checked the ClusterRole configuration and it has the permissions you referred to:

apiGroups:
- networking.k8s.aws
  resources:
- policyendpoints
  verbs:
- get
- list
- watch

Seems like a bug.

achevuru · 2024-06-19T16:46:23Z

@danielap-ma If you're seeing the same error as above - then either the permissions are missing (please check if CNI pods have correct SA in place) or there are connectivity issues with your API Server. I quickly tried it and I don't see any such issue(s) on my end.

ArtemProskochylo · 2024-06-19T20:06:00Z

@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the aws-node pod. Did you apply the corresponding version specific manifest?

Hi @achevuru
Sorry for the late response. It was also updated through Terraform. But in my case only add-on version was set through Terraform, configmaps, daemonset and other resources are managed by AWS. I have checked RBACs for vpc-cni v1.17.1 and required permissions are presented there:
`- apiGroups:

networking.k8s.aws
resources:
policyendpoints
verbs:
get
list
watch
apiGroups:
- networking.k8s.aws
  resources:
- policyendpoints/status
  verbs:
- get`

But I still see the following error in logs for v1.17.1:
W0509 03:34:41.481449 1 reflector.go:462] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

In another cluster running the updated version v1.18.1, I do not see those errors. I suppose it is a version-specific issue.

I hope provided info will be useful for you.

Thanks

omfurman-ma · 2024-07-24T11:27:33Z

In another cluster running the updated version v1.18.1, I do not see those errors. I suppose it is a version-specific issue.

Hey @achevuru,
Working with @danielap-ma on this issue. We still see these errors even though the CNI pods have the right SA, as Daniel wrote in the above comment. Anything we can do to overcome these errors?

maiconrocha · 2024-08-06T02:48:13Z

Hi @omfurman-ma @danielap-ma , can you please ensure you have eks:addon-cluster-admin ClusterRoleBinding deployed into your cluster? if not, please follow solution provided on https://repost.aws/questions/QUEAwOTFmCTLG-SzJQOhkx3w/accessdenied-when-create-ebs-csi-driver

ArtemProskochylo added the bug Something isn't working label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

ArtemProskochylo commented Apr 24, 2024

achevuru commented Jun 3, 2024

danielap-ma commented Jun 19, 2024 •

edited

Loading

achevuru commented Jun 19, 2024

ArtemProskochylo commented Jun 19, 2024

omfurman-ma commented Jul 24, 2024 •

edited

Loading

maiconrocha commented Aug 6, 2024

Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

Comments

ArtemProskochylo commented Apr 24, 2024

achevuru commented Jun 3, 2024

danielap-ma commented Jun 19, 2024 • edited Loading

achevuru commented Jun 19, 2024

ArtemProskochylo commented Jun 19, 2024

omfurman-ma commented Jul 24, 2024 • edited Loading

maiconrocha commented Aug 6, 2024

danielap-ma commented Jun 19, 2024 •

edited

Loading

omfurman-ma commented Jul 24, 2024 •

edited

Loading