Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodDisruptionBudgetLimit - open-cluster-management-observability/observability-thanos-receive-default #197

Closed
rbo opened this issue Aug 29, 2024 · 3 comments
Assignees
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter

Comments

@rbo
Copy link
Member

rbo commented Aug 29, 2024

~ % oc describe pdb/observability-thanos-receive-default
Name:             observability-thanos-receive-default
Namespace:        open-cluster-management-observability
Max unavailable:  1
Selector:         app.kubernetes.io/component=database-write-hashring,app.kubernetes.io/instance=observability,app.kubernetes.io/name=thanos-receive,app.kubernetes.io/part-of=observatorium,controller.receive.thanos.io/hashring=default
Status:
    Allowed disruptions:  0
    Current:              1
    Desired:              2
    Total:                3
Events:                   <none>
~ % oc get pods -l app.kubernetes.io/component=database-write-hashring,app.kubernetes.io/instance=observability,app.kubernetes.io/name=thanos-receive,app.kubernetes.io/part-of=observatorium,controller.receive.thanos.io/hashring=default
NAME                                     READY   STATUS              RESTARTS   AGE
observability-thanos-receive-default-0   1/1     Running             1          23d
observability-thanos-receive-default-1   0/1     ContainerCreating   0          2m40s
~ %
~ % oc describe pod observability-thanos-receive-default-1
Name:             observability-thanos-receive-default-1
Namespace:        open-cluster-management-observability
Priority:         0
Service Account:  observability-thanos-receive
Node:             ucs56/10.32.96.56
Start Time:       Thu, 29 Aug 2024 15:23:18 +0200
Labels:           app.kubernetes.io/component=database-write-hashring
                  app.kubernetes.io/instance=observability
                  app.kubernetes.io/name=thanos-receive
                  app.kubernetes.io/part-of=observatorium
                  app.kubernetes.io/version=v0.24.0
                  apps.kubernetes.io/pod-index=1
                  controller-revision-hash=observability-thanos-receive-default-7c8877fc54
                  controller.receive.thanos.io/hashring=default
                  statefulset.kubernetes.io/pod-name=observability-thanos-receive-default-1
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.129.13.209/21"],"mac_address":"0a:58:0a:81:0d:d1","gateway_ips":["10.129.8.1"],"routes":[{"dest":"10.128.0...
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:
IPs:              <none>
Controlled By:    StatefulSet/observability-thanos-receive-default
Containers:
  thanos-receive:
    Container ID:
    Image:         registry.redhat.io/rhacm2/thanos-rhel9@sha256:9f85d747ef8c11a0e5c6612110adc7e8a180750057a33ef41554ae6f1de175b0
    Image ID:
    Ports:         10901/TCP, 10902/TCP, 19291/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      receive
      --log.level=info
      --log.format=logfmt
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --remote-write.address=0.0.0.0:19291
      --receive.replication-factor=3
      --tsdb.path=/var/thanos/receive
      --tsdb.retention=48h
      --label=replica="$(NAME)"
      --label=receive="true"
      --objstore.config=$(OBJSTORE_CONFIG)
      --receive.local-endpoint=$(NAME).observability-thanos-receive-default.$(NAMESPACE).svc.cluster.local:10901
      --receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
      --tsdb.too-far-in-future.time-window=5m
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     300m
      memory:  2Gi
    Requests:
      cpu:      300m
      memory:   512Mi
    Liveness:   http-get http://:10902/-/healthy delay=0s timeout=1s period=30s #success=1 #failure=8
    Readiness:  http-get http://:10902/-/ready delay=0s timeout=1s period=5s #success=1 #failure=20
    Environment:
      NAME:             observability-thanos-receive-default-1 (v1:metadata.name)
      NAMESPACE:        open-cluster-management-observability (v1:metadata.namespace)
      HOST_IP_ADDRESS:   (v1:status.hostIP)
      OBJSTORE_CONFIG:  <set to the key 'thanos.yaml' in secret 'thanos-object-storage'>  Optional: false
    Mounts:
      /var/lib/thanos-receive from hashring-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6nvrw (ro)
      /var/thanos/receive from data (rw)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-observability-thanos-receive-default-1
    ReadOnly:   false
  hashring-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      observability-thanos-receive-controller-tenants-generated
    Optional:  false
  kube-api-access-6nvrw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Normal   Scheduled           2m49s  default-scheduler        Successfully assigned open-cluster-management-observability/observability-thanos-receive-default-1 to ucs56
  Warning  FailedAttachVolume  2m49s  attachdetach-controller  Multi-Attach error for volume "pvc-23ed4021-866d-41c0-9eb6-1c0dc08610a5" Volume is already exclusively attached to one node and can't be attached to another
~ %
@rbo rbo added bug Something isn't working cluster/isar BareMetal COE Cluter labels Aug 29, 2024
@rbo rbo self-assigned this Aug 29, 2024
@rbo
Copy link
Member Author

rbo commented Aug 29, 2024

~ % oc describe pv pvc-23ed4021-866d-41c0-9eb6-1c0dc08610a5
Name:            pvc-23ed4021-866d-41c0-9eb6-1c0dc08610a5
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: csi.trident.netapp.io
                 volume.kubernetes.io/provisioner-deletion-secret-name:
                 volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:      [kubernetes.io/pv-protection external-attacher/csi-trident-netapp-io]
StorageClass:    coe-netapp-san
Status:          Bound
Claim:           open-cluster-management-observability/data-observability-thanos-receive-default-1
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            csi.trident.netapp.io
    FSType:            ext4
    VolumeHandle:      pvc-23ed4021-866d-41c0-9eb6-1c0dc08610a5
    ReadOnly:          false
    VolumeAttributes:      backendUUID=cd7b267e-7ff1-42ff-a2b5-617216ba06ea
                           internalName=isar_pvc_23ed4021_866d_41c0_9eb6_1c0dc08610a5
                           name=pvc-23ed4021-866d-41c0-9eb6-1c0dc08610a5
                           protocol=block
                           storage.kubernetes.io/csiProvisionerIdentity=1716842609293-2854-csi.trident.netapp.io
Events:                <none>
~ %

~ % ssh -l admin netapp-mgmt.coe.muc.redhat.com lun mapping show  | grep isar_pvc_23ed4021_866d_41c0_9eb6_1c0dc08610a5
Warning: Permanently added 'netapp-mgmt.coe.muc.redhat.com' (ED25519) to the list of known hosts.
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_23ed4021_866d_41c0_9eb6_1c0dc08610a5  ucs-blade-server-3-0f228f1d-6034-47c8-b456-0e13c65e964c  7  iscsi
~ %

PVC is mapped to ucs-blade-server-3

@rbo
Copy link
Member Author

rbo commented Aug 29, 2024

Let's try to drain and reboot ucs-blade-server-3

@rbo
Copy link
Member Author

rbo commented Aug 29, 2024

Fixed:

~ % oc get pods -l app.kubernetes.io/component=database-write-hashring,app.kubernetes.io/instance=observability,app.kubernetes.io/name=thanos-receive,app.kubernetes.io/part-of=observatorium,controller.receive.thanos.io/hashring=default -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
observability-thanos-receive-default-0   1/1     Running   0          46s   10.130.12.152   ucs57    <none>           <none>
observability-thanos-receive-default-1   1/1     Running   0          11m   10.129.13.209   ucs56    <none>           <none>
observability-thanos-receive-default-2   1/1     Running   0          97s   10.128.24.51    ceph12   <none>           <none>
~ %

@rbo rbo closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter
Projects
None yet
Development

No branches or pull requests

1 participant