Unable to create GCP cluster by following steps mentioned in quick start guide #625

mmlk09 · 2022-06-13T12:43:41Z

What steps did you take and what happened:
Following GCP instructions mentioned here:
https://cluster-api.sigs.k8s.io/user/quick-start.html

What did you expect to happen:
A GCP cluster expected to be created in specified region and project

Anything else you would like to add:
Control plane VM is created and available in GCP console, steps after this seems to be not proceeding to complete the cluster creation process.

Following error seen in the logs of capg-controller-manager:

E0613 12:27:18.384087 1 gcpmachine_controller.go:231] controller/gcpmachine "msg"="Error reconciling instance resources" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine"
E0613 12:27:18.385544 1 controller.go:317] controller/gcpmachine "msg"="Reconciler error" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine"

capi-kubeadm-control-plane-controller-manager Logs:

I0613 12:43:01.304022 1 controller.go:251] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"
E0613 12:43:21.499751 1 controller.go:188] controller/kubeadmcontrolplane "msg"="Failed to update KubeadmControlPlane Status" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"
E0613 12:43:21.500754 1 controller.go:317] controller/kubeadmcontrolplane "msg"="Reconciler error" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"

Environment:

- Cluster-api version:
clusterctl version: &version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.4", GitCommit:"1c3a1526f101d4b07d2eec757fe75e8701cf6212", GitTreeState:"clean", BuildDate:"2022-06-03T17:11:09Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}

- Minikube/KIND version:
kind v0.12.0 go1.17.8 linux/amd64

- Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-03-06T21:32:53Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

- OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

The text was updated successfully, but these errors were encountered:

mmlk09 · 2022-06-15T15:39:22Z

New error in capi-kubeadm-control-plane-controller-manager:

I0615 15:32:47.847933 1 kubeadmconfig_controller.go:236] controller/kubeadmconfig "msg"="Cluster infrastructure is not ready, waiting" "kind"="Machine" "name"="gke-capi-md-0-7fbbd576bd-j56dm" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1377"
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:46929: EOF
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:7688: EOF
I0615 15:34:32.121838 1 control_plane_init_mutex.go:99] init-locker "msg"="Attempting to acquire the lock" "cluster-name"="gke-capi" "configmap-name"="gke-capi-lock" "machine-name"="gke-capi-control-plane-xvrld" "namespace"="default"
I0615 15:34:32.125356 1 kubeadmconfig_controller.go:380] controller/kubeadmconfig "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "name"="gke-capi-control-plane-xvrld" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1856"
I0615 15:34:32.125793 1 kubeadmconfig_controller.go:872] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ControlPlaneEndpoint"="34.149.221.102:443"
I0615 15:34:32.125835 1 kubeadmconfig_controller.go:878] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ClusterName"="gke-capi"
I0615 15:34:32.125851 1 kubeadmconfig_controller.go:897] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "PodSubnet"="192.168.0.0/16"
I0615 15:34:32.125866 1 kubeadmconfig_controller.go:904] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "KubernetesVersion"="v1.23.0"
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:54614: EOF

itspngu · 2022-06-18T13:35:48Z

The nodes will not be provisioned before the control plane is ready, and the control plane will not announce itself as ready before a CNI plugin has been installed. If you did deploy CNI and the KubeadmControlPlane still refuses to enter Ready state, another good place to look for problems with control plane bootstrap is the serial console output of the control plane VM on GCP, kubelet will typically report more problems than you might be able to see in the cluster-api controller logs.

mmlk09 · 2022-06-19T04:49:06Z

Following error is showing on GCM VM serial console, how do I fix this?

gke-capi-control-plane-bzjmd login: Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 74.
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.

Jun 19 04:46:52 gke-capi-control-plane-bzjmd kubelet[1831]: E0619 04:46:52.221960 1831 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"

Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 75.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd kubelet[1838]: E0619 04:47:02.471678 1838 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 76.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd kubelet[1845]: E0619 04:47:12.720046 1845 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.

itspngu · 2022-06-19T07:57:07Z

kubelet.service: Scheduled restart job, restart counter is at 75.

You will likely find the reason it fails to start earlier in the logs, I remember seeing kubelet complain about missing /var/lib/kubelet/config.yaml and then it ended up being due to CNI problems.

PS: If you post code or log messages on Github, it's a lot easier for everyone to read them if you format them as code: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#fenced-code-blocks

zkl94 · 2022-06-23T06:15:13Z

having exactly the same issue, getting /var/lib/kubelet/config.yaml: no such file or directory error from the first control plane VM.

harveyxia · 2022-08-30T19:03:27Z

I'm having the same issue. I'm also unable to access the kube-apiserver via the capg-managed LB because the health check is failing (targeting port 6443), which in turn is failing because the kube-apiserver is not running on the VM. I'm not sure whether the kube-apiserver should be up at this stage of bootstrapping.

For context my team is trying to implement support for MachinePools via MIGs (issue here), but we can't start development until we have the current master state working. Could we get some assistance?

k8s-triage-robot · 2022-11-28T19:54:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

stg-0 · 2022-12-13T11:46:25Z

Just in case this helps someone else, I just ran into this issue and after debugging I found out that CAPG needs a Cloud NAT in the project (I didn't have time to track the cause further yet). Once I've created it (manually), the control-plane node started successfully and, after that, the other control-plane nodes and worker were instantiated.

k8s-triage-robot · 2023-01-27T18:38:13Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-02-26T19:05:17Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-02-26T19:05:21Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 13, 2022

Ankitasw mentioned this issue Nov 15, 2022

[Failing test] periodic-conformance-main-k8s-main kubernetes/kubernetes#113449

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 27, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to create GCP cluster by following steps mentioned in quick start guide #625

Unable to create GCP cluster by following steps mentioned in quick start guide #625

mmlk09 commented Jun 13, 2022 •

edited

Loading

mmlk09 commented Jun 15, 2022

itspngu commented Jun 18, 2022

mmlk09 commented Jun 19, 2022

itspngu commented Jun 19, 2022

zkl94 commented Jun 23, 2022 •

edited

Loading

harveyxia commented Aug 30, 2022 •

edited

Loading

k8s-triage-robot commented Nov 28, 2022

stg-0 commented Dec 13, 2022 •

edited

Loading

k8s-triage-robot commented Jan 27, 2023

k8s-triage-robot commented Feb 26, 2023

k8s-ci-robot commented Feb 26, 2023

Unable to create GCP cluster by following steps mentioned in quick start guide #625

Unable to create GCP cluster by following steps mentioned in quick start guide #625

Comments

mmlk09 commented Jun 13, 2022 • edited Loading

mmlk09 commented Jun 15, 2022

itspngu commented Jun 18, 2022

mmlk09 commented Jun 19, 2022

itspngu commented Jun 19, 2022

zkl94 commented Jun 23, 2022 • edited Loading

harveyxia commented Aug 30, 2022 • edited Loading

k8s-triage-robot commented Nov 28, 2022

stg-0 commented Dec 13, 2022 • edited Loading

k8s-triage-robot commented Jan 27, 2023

k8s-triage-robot commented Feb 26, 2023

k8s-ci-robot commented Feb 26, 2023

mmlk09 commented Jun 13, 2022 •

edited

Loading

zkl94 commented Jun 23, 2022 •

edited

Loading

harveyxia commented Aug 30, 2022 •

edited

Loading

stg-0 commented Dec 13, 2022 •

edited

Loading