Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC Fail "existing disk format of " #581

Open
mattiashem opened this issue Mar 15, 2024 · 9 comments
Open

PVC Fail "existing disk format of " #581

mattiashem opened this issue Mar 15, 2024 · 9 comments
Assignees
Labels
bug Something isn't working Stale

Comments

@mattiashem
Copy link

TL;DR

Warning FailedMount 38s (x3536 over 4d23h) kubelet MountVolume.SetUp failed for volume "pvc-a1e2b216-bd1f-4e3f-b54f-ebc8ce7760bd" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions

Expected behavior

Have 3 cluster and it workes fine and mount the files in the other clusters

Observed behavior

Warning FailedMount 38s (x3536 over 4d23h) kubelet MountVolume.SetUp failed for volume "pvc-a1e2b216-bd1f-4e3f-b54f-ebc8ce7760bd" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions

Minimal working example

Used the install instructions from the guide. applied the PVC

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: prometheus-server1
  namespace: metrics
spec:
  storageClassName: hcloud-volumes
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50G

Log output

[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-attacher
I0309 18:42:01.900716       1 main.go:94] Version: v4.1.0
I0309 18:42:06.245083       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:06.250370       1 controller.go:130] Starting CSI attacher



[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-attacher
I0309 18:42:01.900716       1 main.go:94] Version: v4.1.0
I0309 18:42:06.245083       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:06.250370       1 controller.go:130] Starting CSI attacher
^C
[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-provisioner
W0309 18:42:02.709835       1 feature_gate.go:241] Setting GA feature gate Topology=true. It will be removed in a future release.
I0309 18:42:02.709895       1 csi-provisioner.go:154] Version: v3.4.0
I0309 18:42:02.709902       1 csi-provisioner.go:177] Building kube configs for running in cluster...
I0309 18:42:05.661408       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:05.666448       1 csi-provisioner.go:299] CSI driver supports PUBLISH_UNPUBLISH_VOLUME, watching VolumeAttachments
I0309 18:42:05.768951       1 controller.go:811] Starting provisioner controller csi.hetzner.cloud_hcloud-csi-controller-597f65fc8f-9mcbz_2534d57e-be30-4ed7-a386-6614373244c0!
I0309 18:42:05.769030       1 volume_store.go:97] Starting save volume queue
I0309 18:42:05.870436       1 controller.go:860] Started provisioner controller csi.hetzner.cloud_hcloud-csi-controller-597f65fc8f-9mcbz_2534d57e-be30-4ed7-a386-6614373244c0!
I0309 19:19:14.406580       1 controller.go:1337] provision "default/data-mysql-operator-0" class "hcloud-volumes": started
I0309 19:19:14.406772       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-mysql-operator-0", UID:"625aafa5-0e6f-4826-9cef-5698ed2bd148", APIVersion:"v1", ResourceVersion:"118556527", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-mysql-operator-0"
I0309 19:19:18.419986       1 controller.go:1442] provision "default/data-mysql-operator-0" class "hcloud-volumes": volume "pvc-625aafa5-0e6f-4826-9cef-5698ed2bd148" provisioned
I0309 19:19:18.420010       1 controller.go:1455] provision "default/data-mysql-operator-0" class "hcloud-volumes": succeeded
I0309 19:19:18.448288       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-mysql-operator-0", UID:"625aafa5-0e6f-4826-9cef-5698ed2bd148", APIVersion:"v1", ResourceVersion:"118556527", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-625aafa5-0e6f-4826-9cef-5698ed2bd148

Additional information

It looks like the provison work and in the cloud console I can see the disk and it looks as it has no filesystem

@mattiashem mattiashem added the bug Something isn't working label Mar 15, 2024
@mattiashem
Copy link
Author

[core@bastion storage]$ kubectl logs -f hcloud-csi-node-jcsdp -n kube-system -c hcloud-csi-driver level=info ts=2024-03-15T13:33:11.303902051Z msg="Fetched data from metadata service" id=41964065 location=nbg1 ^C [core@bastion storage]$ kubectl logs -f hcloud-csi-node-p6jj2 -n kube-system -c hcloud-csi-driver level=info ts=2024-03-15T13:33:09.161029164Z msg="Fetched data from metadata service" id=41963957 location=nbg1 level=error ts=2024-03-15T13:34:49.615777847Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions"

@fallenby-klar
Copy link

I'm also running into this issue.

My manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: task-pv-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage

The error:

MountVolume.SetUp failed for volume "pvc-15b2696c-eaf0-4e79-97de-71903a597ebb" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100612315: disk /dev/disk/by-id/scsi-0HC_Volume_100612315 propably contains partitions

@apricote
Copy link
Member

I am unable to reproduce this with our dev setup and the Getting Started guide.

Some questions that might help to pinpoint the issue:

  • What version of csi-driver are you using?
  • How often does this issue happen?
  • What OS are you using on the nodes where this happens?
  • What Kubernetes distribution and what version are you using?
  • Did you make any modifications to the StorageClass?

@mattiashem
Copy link
Author

  • Latest from helm
  • Every time
  • Talos cluster
  • v1.27.4 vanilla
  • No

@apricote apricote self-assigned this Apr 29, 2024
@apricote
Copy link
Member

I am still unable to reproduce this. What I have done:

  1. Setup a Cluster with Talos 1.7.0 using their docs
    I used Packer to create the snapshots. To generate the Talos Config I used this command instead, to get below the 32kb limit on userdata:
talosctl gen config talos-k8s-hcloud-tutorial https://$LOAD_BALANCER_IP:6443 --kubernetes-version 1.27.4 --with-docs=false --with-examples=false
  1. Followed the Steps from our Getting Started on Kubernetes guide, installing the Helm chart and applying my-csi-app.

And my-csi-app successfully started and the volume is mounted.

Do you have some steps for me to reproduce this?

@mattiashem
Copy link
Author

It looks like I get the error from the API, or it bounds the volumes wrong. I only have the problem with an older cluster. If I create a new cluster (even on old talos like 1.6), it works.

Is there some API in the client ore how the volumes that are attached are different ?
Have 2 cluster now on the same project work on one but not on the other ?

I have given up and move

@kosh30
Copy link

kosh30 commented Jun 7, 2024

Same issue

  • cluster setup with: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner (3 controller, 3 nodes)
  • installed csi_driver by helm (latest)
  • any hcloud volumes that i try to create fails with failed to publish volume: │ │ unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_xxxxxxxx: disk /dev/disk/by-id/scsi-0 │ │ HC_Volume_xxxxxxxx propably contains partitions

@apricote
Copy link
Member

Some more questions:

  • Do you use LUKS with the csi-driver?
  • @kosh30 did you install the csi-driver with kube-hetzner & manually, or did you disable it in kube-hetzner and only install it manually?
  • Could you post the output of the following commands?
uname -a
blkid -p -o export /dev/disk/by-id/scsi-0HC_Volume_xxxxxxxx

On Talos without a regular shell you should be able to exec into the csi-driver Node Pod to execute these commands.

Copy link

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the Stale label Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants