Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0 #257

Open
ArtemProskochylo opened this issue Apr 24, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@ArtemProskochylo
Copy link

What happened:
After upgrading vpc-cni plugin to v1.17.1 and v1.18.0 versions I see a lot of errors for the aws-network-policy-agent container with v1.1.0 version. The issue is occurring even on fresh EKS installations where we are not using Network Policies.

Attach logs
W0424 08:27:34.397257 1 reflector.go:462] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

What you expected to happen:
No error messages.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy v1.29 EKS cluster
  2. Deploy VPC CNI Add-on v1.17.1-eksbuild.1 or v1.18.0-eksbuild.1 version.
  3. Run kubectl -n kube-system logs aws-node-*

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: v1.29.1
    Server Version: v1.29.1-eks-b9c9ed7
  • CNI Version: v1.17.1 and v1.18.0
  • Network Policy Agent Version: v1.1.0
  • OS (e.g: cat /etc/os-release): Bottlerocket OS 1.19.2 (aws-k8s-1.29)
  • Kernel (e.g. uname -a): 6.1.77
@ArtemProskochylo ArtemProskochylo added the bug Something isn't working label Apr 24, 2024
@achevuru
Copy link
Contributor

achevuru commented Jun 3, 2024

@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the aws-node pod. Did you apply the corresponding version specific manifest?

@danielap-ma
Copy link

danielap-ma commented Jun 19, 2024

Facing the same issue after upgrading to EKS 1.29 with CNI 1.18.0.
@achevuru I upgraded the addon directly from AWS using Terraform. I checked the ClusterRole configuration and it has the permissions you referred to:

  • apiGroups:
    • networking.k8s.aws
      resources:
    • policyendpoints
      verbs:
    • get
    • list
    • watch

Seems like a bug.

@achevuru
Copy link
Contributor

@danielap-ma If you're seeing the same error as above - then either the permissions are missing (please check if CNI pods have correct SA in place) or there are connectivity issues with your API Server. I quickly tried it and I don't see any such issue(s) on my end.

@ArtemProskochylo
Copy link
Author

@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the aws-node pod. Did you apply the corresponding version specific manifest?

Hi @achevuru
Sorry for the late response. It was also updated through Terraform. But in my case only add-on version was set through Terraform, configmaps, daemonset and other resources are managed by AWS. I have checked RBACs for vpc-cni v1.17.1 and required permissions are presented there:
`- apiGroups:

  • networking.k8s.aws
    resources:
  • policyendpoints
    verbs:
  • get
  • list
  • watch
  • apiGroups:
    • networking.k8s.aws
      resources:
    • policyendpoints/status
      verbs:
    • get`

But I still see the following error in logs for v1.17.1:
W0509 03:34:41.481449 1 reflector.go:462] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

In another cluster running the updated version v1.18.1, I do not see those errors. I suppose it is a version-specific issue.

I hope provided info will be useful for you.

Thanks

@omfurman-ma
Copy link

omfurman-ma commented Jul 24, 2024

In another cluster running the updated version v1.18.1, I do not see those errors. I suppose it is a version-specific issue.

Hey @achevuru,
Working with @danielap-ma on this issue. We still see these errors even though the CNI pods have the right SA, as Daniel wrote in the above comment. Anything we can do to overcome these errors?

@maiconrocha
Copy link

Hi @omfurman-ma @danielap-ma , can you please ensure you have eks:addon-cluster-admin ClusterRoleBinding deployed into your cluster? if not, please follow solution provided on https://repost.aws/questions/QUEAwOTFmCTLG-SzJQOhkx3w/accessdenied-when-create-ebs-csi-driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants