Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create GCP cluster by following steps mentioned in quick start guide #625

Closed
mmlk09 opened this issue Jun 13, 2022 · 11 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mmlk09
Copy link

mmlk09 commented Jun 13, 2022

What steps did you take and what happened:
Following GCP instructions mentioned here:
https://cluster-api.sigs.k8s.io/user/quick-start.html

What did you expect to happen:
A GCP cluster expected to be created in specified region and project

Anything else you would like to add:
Control plane VM is created and available in GCP console, steps after this seems to be not proceeding to complete the cluster creation process.

Following error seen in the logs of capg-controller-manager:

E0613 12:27:18.384087 1 gcpmachine_controller.go:231] controller/gcpmachine "msg"="Error reconciling instance resources" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine"
E0613 12:27:18.385544 1 controller.go:317] controller/gcpmachine "msg"="Reconciler error" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine"

capi-kubeadm-control-plane-controller-manager Logs:

I0613 12:43:01.304022 1 controller.go:251] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"
E0613 12:43:21.499751 1 controller.go:188] controller/kubeadmcontrolplane "msg"="Failed to update KubeadmControlPlane Status" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"
E0613 12:43:21.500754 1 controller.go:317] controller/kubeadmcontrolplane "msg"="Reconciler error" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"

Environment:

- Cluster-api version:
clusterctl version: &version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.4", GitCommit:"1c3a1526f101d4b07d2eec757fe75e8701cf6212", GitTreeState:"clean", BuildDate:"2022-06-03T17:11:09Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}

- Minikube/KIND version:
kind v0.12.0 go1.17.8 linux/amd64

- Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-03-06T21:32:53Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

- OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 13, 2022
@mmlk09
Copy link
Author

mmlk09 commented Jun 15, 2022

New error in capi-kubeadm-control-plane-controller-manager:

I0615 15:32:47.847933 1 kubeadmconfig_controller.go:236] controller/kubeadmconfig "msg"="Cluster infrastructure is not ready, waiting" "kind"="Machine" "name"="gke-capi-md-0-7fbbd576bd-j56dm" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1377"
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:46929: EOF
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:7688: EOF

I0615 15:34:32.121838 1 control_plane_init_mutex.go:99] init-locker "msg"="Attempting to acquire the lock" "cluster-name"="gke-capi" "configmap-name"="gke-capi-lock" "machine-name"="gke-capi-control-plane-xvrld" "namespace"="default"
I0615 15:34:32.125356 1 kubeadmconfig_controller.go:380] controller/kubeadmconfig "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "name"="gke-capi-control-plane-xvrld" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1856"
I0615 15:34:32.125793 1 kubeadmconfig_controller.go:872] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ControlPlaneEndpoint"="34.149.221.102:443"
I0615 15:34:32.125835 1 kubeadmconfig_controller.go:878] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ClusterName"="gke-capi"
I0615 15:34:32.125851 1 kubeadmconfig_controller.go:897] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "PodSubnet"="192.168.0.0/16"
I0615 15:34:32.125866 1 kubeadmconfig_controller.go:904] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "KubernetesVersion"="v1.23.0"
2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:54614: EOF

@itspngu
Copy link

itspngu commented Jun 18, 2022

The nodes will not be provisioned before the control plane is ready, and the control plane will not announce itself as ready before a CNI plugin has been installed. If you did deploy CNI and the KubeadmControlPlane still refuses to enter Ready state, another good place to look for problems with control plane bootstrap is the serial console output of the control plane VM on GCP, kubelet will typically report more problems than you might be able to see in the cluster-api controller logs.

@mmlk09
Copy link
Author

mmlk09 commented Jun 19, 2022

Following error is showing on GCM VM serial console, how do I fix this?

gke-capi-control-plane-bzjmd login: Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 74.
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.

Jun 19 04:46:52 gke-capi-control-plane-bzjmd kubelet[1831]: E0619 04:46:52.221960 1831 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"

Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 75.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 19 04:47:02 gke-capi-control-plane-bzjmd kubelet[1838]: E0619 04:47:02.471678 1838 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 76.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 19 04:47:12 gke-capi-control-plane-bzjmd kubelet[1845]: E0619 04:47:12.720046 1845 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.

@itspngu
Copy link

itspngu commented Jun 19, 2022

kubelet.service: Scheduled restart job, restart counter is at 75.

You will likely find the reason it fails to start earlier in the logs, I remember seeing kubelet complain about missing /var/lib/kubelet/config.yaml and then it ended up being due to CNI problems.

PS: If you post code or log messages on Github, it's a lot easier for everyone to read them if you format them as code: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#fenced-code-blocks

@zkl94
Copy link

zkl94 commented Jun 23, 2022

having exactly the same issue, getting /var/lib/kubelet/config.yaml: no such file or directory error from the first control plane VM.

@harveyxia
Copy link

harveyxia commented Aug 30, 2022

I'm having the same issue. I'm also unable to access the kube-apiserver via the capg-managed LB because the health check is failing (targeting port 6443), which in turn is failing because the kube-apiserver is not running on the VM. I'm not sure whether the kube-apiserver should be up at this stage of bootstrapping.

For context my team is trying to implement support for MachinePools via MIGs (issue here), but we can't start development until we have the current master state working. Could we get some assistance?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2022
@stg-0
Copy link

stg-0 commented Dec 13, 2022

Just in case this helps someone else, I just ran into this issue and after debugging I found out that CAPG needs a Cloud NAT in the project (I didn't have time to track the cause further yet). Once I've created it (manually), the control-plane node started successfully and, after that, the other control-plane nodes and worker were instantiated.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 27, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants