Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Integration tests fails on Buildkite with kind v0.24.0 #41257

Closed
mauri870 opened this issue Oct 16, 2024 · 5 comments · Fixed by #41309
Closed

Kubernetes Integration tests fails on Buildkite with kind v0.24.0 #41257

mauri870 opened this issue Oct 16, 2024 · 5 comments · Fixed by #41309
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@mauri870
Copy link
Member

While working on #41081 I discovered that the Go integration tests for the kubernetes module in metricbeat fail silently when running on kind v0.24.0. Adding a debug log it shows that it is failing when creating the kind cluster:

Error: failed to run integration tests for module kubernetes:
1 error: kind setup failed: running "kind create cluster --name metricbeat-9-0-0-921fd909f7-snapshot --kubeconfig /opt/buildkite-agent/builds/bk-agent-prod-gcp-1729021037068466664/elastic/beats-metricbeat/metricbeat/build/kind/metricbeat-9-0-0-921fd909f7-snapshot/kubecfg --wait 300s --image kindest/node:v1.31.0" failed with exit code 126

https://buildkite.com/elastic/beats-metricbeat/builds/10481#019291a8-1c9d-4180-b768-80e832625d12/191-2135

In order to avoid blocking that PR I decided to raise this issue so we can investigate it separately. Since kind v0.20.0 is from 2022 would be good to run a more recent version of kind.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 16, 2024
@mauri870 mauri870 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 16, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@gizas
Copy link
Contributor

gizas commented Oct 17, 2024

@mauri870 I have built a seperate vm to troubleshoot.

So until now I have managed to reproduce the issue, partially:
(you can define MODULE variable to skip rest of tests)

MODULE=kubernetes  mage goIntegTest
Generated fields.yml for metricbeat to /home/andreasgkizas/beats/metricbeat/fields.yml
Generated fields.yml for metricbeat to /home/andreasgkizas/beats/metricbeat/fields.yml
Error: failed to run integration tests for module kubernetes:
1 error: kind setup failed: running "kind create cluster --name metricbeat-9-0-0-a706c7912e-snapshot --kubeconfig /home/andreasgkizas/beats/metricbeat/build/kind/metricbeat-9-0-0-a706c7912e-snapshot/kubecfg --wait 300s" failed with exit code 1
Error: failed modules: kubernetes

And the reason for failure seems that the kubecfg file is not created:

ls -lrt /home/andreasgkizas/beats/metricbeat/build/kind/metricbeat-9-0-0-a706c7912e-snapshot/
total 0

Still trubleshooting ...

One minor: My exit code is 1 and yours is 126 ! Wondering if there are permission issues in your case

@gizas
Copy link
Contributor

gizas commented Oct 17, 2024

Seems that in my case I had to provide sudo access to /var/run/docker.sock .

Repeating tests:

`MODULE=kubernetes KIND_SKIP_DELETE=1 mage goIntegTest`
MODULE=kubernetes KIND_SKIP_DELETE=1 mage goIntegTest -vvv
Generated fields.yml for metricbeat to /home/andreasgkizas/beats/metricbeat/fields.yml
Generated fields.yml for metricbeat to /home/andreasgkizas/beats/metricbeat/fields.yml
Error: failed to run integration tests for module kubernetes:
1 error: execute pod container never started: timed out waiting for the condition
Error: failed modules: kubernetes

The KIND_SKIP_DELETE=1 will preserve your cluster from deletion

watch kubectl get pods --kubeconfig /home/andreasgkizas/beats/metricbeat/build/kind/metricbeat-9-0-0-a706c7912e-snapshot/kubecfg -A

kube-system          coredns-6f6b679f8f-rnf4n                                                     1/1     Running   0          10m
kube-system          coredns-6f6b679f8f-z6qw9                                                     1/1     Running   0          10m
kube-system          etcd-metricbeat-9-0-0-a706c7912e-snapshot-control-plane                      1/1     Running   0          10m
kube-system          kindnet-7s4h6                                                                1/1     Running   0          10m
kube-system          kube-apiserver-metricbeat-9-0-0-a706c7912e-snapshot-control-plane            1/1     Running   0          10m
kube-system          kube-controller-manager-metricbeat-9-0-0-a706c7912e-snapshot-control-plane   1/1     Running   0          10m
kube-system          kube-proxy-f6tmx                                                             1/1     Running   0          10m
kube-system          kube-scheduler-metricbeat-9-0-0-a706c7912e-snapshot-control-plane            1/1     Running   0          10m
local-path-storage   local-path-provisioner-57c5987fd4-kfrzd                                      1/1     Running   0          10m

In my case the metricbeat was failing to initialise because of build error but seems another error, after the kind create cluster command you used

kubectl logs --kubeconfig /home/andreasgkizas/beats/metricbeat/build/kind/metricbeat-9-0-0-a706c7912e-snapshot/kubecfg metricbeat-9-0-0-a706c7912e-snapshot
Defaulted container "exec" out of: exec, sync-init (init)
Error from server (BadRequest): container "exec" in pod "metricbeat-9-0-0-a706c7912e-snapshot" is waiting to start: PodInitializing

Please let me know if the above commands will help you in your trebleshooting

@mauri870
Copy link
Member Author

mauri870 commented Oct 18, 2024

Thanks for looking into this! Unfortunately the BuildKite builder is failing with code 126 when I bump the version of kind to v0.24, and I was unable to reproduce this particular failure locally.

I tested again on BK but with verbose logs this time around, the logs are more descriptive of the issue now:

Kubeconfig:  /opt/buildkite-agent/builds/bk-agent-prod-gcp-1729259693124301033/elastic/beats-metricbeat/metricbeat/build/kind/metricbeat-9-0-0-110483b08e-snapshot/kubecfg
exec: kind "create" "cluster" "--name" "metricbeat-9-0-0-110483b08e-snapshot" "--kubeconfig" "/opt/buildkite-agent/builds/bk-agent-prod-gcp-1729259693124301033/elastic/beats-metricbeat/metricbeat/build/kind/metricbeat-9-0-0-110483b08e-snapshot/kubecfg" "--wait" "300s" "--image" "kindest/node:v1.31.0"
No preset version installed for command kind
Please install a version by running one of the following:
asdf install kind 0.24.0
or add one of the following versions in your config file at
kind 0.20.0
Teardown mage...
Error: failed to run integration tests for module kubernetes:
1 error: kind setup failed: running "kind create cluster --name metricbeat-9-0-0-110483b08e-snapshot --kubeconfig /opt/buildkite-agent/builds/bk-agent-prod-gcp-1729259693124301033/elastic/beats-metricbeat/metricbeat/build/kind/metricbeat-9-0-0-110483b08e-snapshot/kubecfg --wait 300s --image kindest/node:v1.31.0" failed with exit code 126
Error: failed modules: kubernetes

https://buildkite.com/elastic/beats-metricbeat/builds/10643#01929fe7-f3af-4c41-89fe-7c0bb936d396

@mauri870
Copy link
Member Author

A little update, turns out the preinstalled kind version comes from the ci-agent-images repo, I have opened a PR to bump the version there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants