Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GKE Standard/Terraform] Cluster nodes aren't upgrading with the control plane #3339

Closed
markmandel opened this issue Aug 22, 2023 · 3 comments · Fixed by #3612
Closed

[GKE Standard/Terraform] Cluster nodes aren't upgrading with the control plane #3339

markmandel opened this issue Aug 22, 2023 · 3 comments · Fixed by #3612
Labels
area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.

Comments

@markmandel
Copy link
Member

markmandel commented Aug 22, 2023

What happened:

terraform apply the GKE Terraform modules repeatedly over multiple Agones versions, I am noticing that while my Kubernetes control plane version is updating, the nodes aren't being updated.

Noticed this in the Global Scale Game demo: googleforgames/global-multiplayer-demo#190 my Control Plane was 1.27.3-gke.100 but my nodes, created a while ago remained on 1.24.9-gke.3200.

This is the source: https://github.com/googleforgames/global-multiplayer-demo/blob/main/infrastructure/agones-gke.tf

This doesn't seem to be an issue with Autopilot.

What you expected to happen:

Nodes to be upgraded with the Control Plane.

How to reproduce it (as minimally and precisely as possible):

Create a cluster with the Terraform module with an older version set of Kuberntes, then change the Kubernetes version and apply again. The node version won't change.

Anything else we need to know?:

Environment:

  • Agones version: 1.33.0
  • Kubernetes version (use kubectl version): 1.27.x
  • Cloud provider or hardware configuration: GKE
  • Install method (yaml/helm): helm
  • Troubleshooting guide log(s): N/A
  • Others: N/A
@markmandel markmandel added kind/bug These are bugs. area/operations Installation, updating, metrics etc help wanted We would love help on these issues. Please come help us! labels Aug 22, 2023
@markmandel
Copy link
Member Author

Digging in, I'm not 100% sure that TF will actually upgrade the nodes that exist (it's possible that google_container_cluster is immutable?)

Looking at: hashicorp/terraform-provider-google#10895 (and https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle) it gives me the idea that we should put the kubernetes version in the nodepool name, so that on upgrades a new node pool would get created with the new version, and then delete the old one - migrating the pods across.

Not quite as nice as a rolling update within a node pool, but it's a thought at least.

@markmandel
Copy link
Member Author

Transcribing interesting comment from chat:

I checked terraform resource google_container_cluster’s node_pool .
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#node_pool
It says that…
Warning: node pools defined inside a cluster can't be changed (or added/removed) after cluster creation without deleting and recreating the entire cluster.
Unless you absolutely need the ability to say "these are the only node pools associated with this cluster", use the google_container_node_pool resource instead of this property.
I think this is because node-pool didn’t change.
It seems Terraform resource google_container_node_pool would be better.
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool

@markmandel
Copy link
Member Author

Reading the above docs (which have a great example), yep - looks like the GKE terraform should be changed to google_container_node_pool.

@markmandel markmandel added the good first issue These are great first issues. If you are looking for a place to start, start here! label Aug 24, 2023
junekhan added a commit to junekhan/agones that referenced this issue Jan 25, 2024
markmandel added a commit to markmandel/agones that referenced this issue Jan 25, 2024
Per the docs:
"...node pools defined inside a cluster can't be changed (or
added/removed) after cluster creation without deleting and recreating
the entire cluster."

Which is not great - since you can end up with out-of-sync K8s versions
between the control plane and nodes, an inability to change nodepool
sizes and just a general lack of flexibility.

Moving the node pool definitions out of the cluster definition solves
this issue!

Closes googleforgames#3339
gongmax added a commit that referenced this issue Jan 26, 2024
Per the docs:
"...node pools defined inside a cluster can't be changed (or
added/removed) after cluster creation without deleting and recreating
the entire cluster."

Which is not great - since you can end up with out-of-sync K8s versions
between the control plane and nodes, an inability to change nodepool
sizes and just a general lack of flexibility.

Moving the node pool definitions out of the cluster definition solves
this issue!

Closes #3339

Co-authored-by: Mengye (Max) Gong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Projects
None yet
1 participant