Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not create pod on specific node any more #9218

Open
hchyue opened this issue Sep 11, 2024 · 6 comments
Open

can not create pod on specific node any more #9218

hchyue opened this issue Sep 11, 2024 · 6 comments

Comments

@hchyue
Copy link

hchyue commented Sep 11, 2024

I have a rke2 cluster with 3 master and 2 worker node.
image
node1 has 134 running pod
node2 has 273 running pod
calico info is :
image
no more pod can run on node2, events of new pod on node2 is:
image

image

I can not find the root cause, any help is appreciated.

@hchyue
Copy link
Author

hchyue commented Sep 24, 2024

I find that no corresponding ipamhandle crd created
image

@hchyue
Copy link
Author

hchyue commented Sep 24, 2024

calico-kube-controllers pod on node2. there is "healthz check failed" in the logs. In fact,etcd is running
image
image

@hchyue
Copy link
Author

hchyue commented Sep 24, 2024

time in apiserver logs
image

@coutinhop
Copy link
Contributor

node1 has 134 running pod
node2 has 273 running pod

@hchyue you seem to be pushing past the recommended limits of kubernetes itself: https://kubernetes.io/docs/setup/best-practices/cluster-large/

Furthermore, a /32 block size means you have only one IP address per block, which will surely have performance implications at this scale. Any specific reason for using that?

@hchyue
Copy link
Author

hchyue commented Sep 25, 2024

We are conducting stress tests.

We use BGP mode. When the block size is not 32, pod with persistent IP addresses restart on other nodes,there is a blackhole route causing network unreachable to the pod from original node.

@caseydavenport
Copy link
Member

persistent IP addresses restart on other nodes,there is a blackhole route causing network unreachable to the pod from original node.

This isn't expected behavior - Calico should advertise a /32 route that takes precedence over the blackhole route.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants