Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tempo] Fix tempo permissions after update to v1.9.0 #3161

Merged
merged 3 commits into from
Aug 12, 2024

Conversation

StefanLobbenmeierObjego
Copy link
Contributor

Tempo v2.5.0-rc.0 is now being run as non-root, see grafana/tempo#2265. Change default values.yaml to avoid having every user apply this to their configuration.

@CLAassistant
Copy link

CLAassistant commented Jun 8, 2024

CLA assistant check
All committers have signed the CLA.

@nrekretep
Copy link

Any news on the ETA of this fix?

@StefanLobbenmeierObjego
Copy link
Contributor Author

Any news on the ETA of this fix?

if it is blocking you, you can also just apply it to your own values.yaml. Not sure when someone will take a look at the PR / if there is something I can do to speed this up.

@nrekretep
Copy link

nrekretep commented Jun 12, 2024

Any news on the ETA of this fix?

if it is blocking you, you can also just apply it to your own values.yaml. Not sure when someone will take a look at the PR / if there is something I can do to speed this up.

Unfortunately tempo is still producing "permission denied" errors after applying this fix to values.yaml.

tempo level=warn ts=2024-06-12T09:55:20.851743807Z caller=wal.go:126 msg="failed to replay block. removing." file=ae00daae-b69e-xxxx-yyyy-13cde344d947+single-tenant+vParquet3 err="error reading wal meta json: /var/tempo/wal/ae00daae-b69e-xxxx-yyyy-13cde344d947+single-tenant+vParquet3/meta.json: open /var/tempo/wal/ae00daae-b69e-xxxx-yyyy-13cde344d947+single-tenant+vParquet3/meta.json: permission denied"    
tempo level=error ts=2024-06-12T09:55:20.852017328Z caller=app.go:229 msg="module failed" module=ingester err="starting module ingester: invalid service state: Failed, expected: Running, failure: failed to replay wal: fatal error replaying wal: unlinkat /var/tempo/wal/ae00daae-b69e-xxxx-yyyy-13cde344d947+single-tenant+vParquet3/0000000022: permission denied"

Is there anything else I have to change?

Supporting the necessary chown -R on /var/tempo via initContainer in the Helm chart would be really helpful.

@StefanLobbenmeierObjego
Copy link
Contributor Author

Hmm in my case there was no issue with permissions after that, but my setup is as simple as it gets. Would you mind pasting your configuration here as well?

Supporting the necessary chown -R on /var/tempo via initContainer in the Helm chart would be really helpful.

If that fixes the issue for you, good idea. Feel free to make a PR with that or post the diff here and I will add it here.

@zalegrala
Copy link
Contributor

The change looks good to me. We need to update the readme and bump the chart version to get the lint to pass.

Additionally, I'd like to see some output @nrekretep. You may be running into the issue that made me question if this was enough of a change to resolve the issue. Please give the output from within the container of the following: id and also ls -ld /var/tempo. For the read warning you are seeing, what is the ownership of that file look like here?

@StefanLobbenmeierObjego
Copy link
Contributor Author

We need to update the readme and bump the chart version to get the lint to pass.

I bumped the versions, anything else to do in the readme?

@zalegrala zalegrala changed the title Fix tempo permissions after update to v1.9.0 [tempo] Fix tempo permissions after update to v1.9.0 Jun 21, 2024
@zalegrala
Copy link
Contributor

For the readme, I usually do docker run --rm --volume "$(pwd):/helm-docs" -u "$(id -u)" jnorwood/helm-docs:v1.8.1 which is the same thing that CI runs. Then commit the result. Looks like whitespace changes are failing the CI.

@zalegrala
Copy link
Contributor

The CI is failing to install the chart for a field incompatibility, but I've not the time to dig on this. Could you take a look and see if that can be reproduced locally?

@StefanLobbenmeierObjego
Copy link
Contributor Author

Hmm I expected that adding the values there would be equivalent to adding them locally in my values.yaml, weird that this fails in CI

I found other people running into this error here: minio/minio#16521

@zalegrala
Copy link
Contributor

Its possible the version of k8s that CI is running is too old, but that's a guess.

@nrekretep
Copy link

Its possible the version of k8s that CI is running is too old, but that's a guess.

I think the cause of the failing CI is not the k8s version used by kind.

This PR suggests to set the fsGroup via the tempo.securityContext object.

This will modify the securityContext of the tempo container inside the pod. But this securityContext object does not support the fsGroup field.

If you would use the top level securityContext object in values.yaml then the fsGroup field is supported and it should work.

@StefanLobbenmeierObjego
Copy link
Contributor Author

i see, my bad. This is now changing the top level securityContext, please trigger CI again

@pavolloffay
Copy link

pavolloffay commented Jun 28, 2024

We are running into the same issue in the tempo-operator.

Setting

      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001

on ingester statefulset pod spec (nor container, and keeping fsGroup on pod) didn't work

k get statefulsets.apps tempo-simplest-ingester -o yaml                                                                            ploffay@fedora
apiVersion: apps/v1
kind: StatefulSet
metadata:
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: ingester
      app.kubernetes.io/instance: simplest
      app.kubernetes.io/managed-by: tempo-operator
      app.kubernetes.io/name: tempo
  serviceName: ""
  template:
    metadata:
      annotations:
        tempo.grafana.com/config.hash: 59596682c5d95e8c7130e68830c2e6df94891fb73cc2d7a22b61c59e7e6ea495
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: ingester
        app.kubernetes.io/instance: simplest
        app.kubernetes.io/managed-by: tempo-operator
        app.kubernetes.io/name: tempo
        tempo-gossip-member: "true"
    spec:
      containers:
      - args:
        - -target=ingester
        - -config.file=/conf/tempo.yaml
        - -log.level=info
        - --storage.trace.s3.secret_key=$(S3_SECRET_KEY)
        - --storage.trace.s3.access_key=$(S3_ACCESS_KEY)
        env:
        - name: S3_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: access_key_secret
              name: minio-test
        - name: S3_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: access_key_id
              name: minio-test
        image: docker.io/grafana/tempo:2.5.0
        imagePullPolicy: IfNotPresent
        name: tempo
        ports:
        - containerPort: 7946
          name: http-memberlist
          protocol: TCP
        - containerPort: 3200
          name: http
          protocol: TCP
        - containerPort: 9095
          name: grpc
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 3101
            scheme: HTTPS
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 760m
            memory: 1Gi
          requests:
            cpu: 228m
            memory: "322122560"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: tempo-conf
          readOnly: true
        - mountPath: /var/tempo
          name: data
        - mountPath: /var/run/ca
          name: tempo-simplest-ca-bundle
        - mountPath: /var/run/tls/server
          name: tempo-simplest-ingester-mtls
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: tempo-simplest
      serviceAccountName: tempo-simplest
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: tempo-simplest
        name: tempo-conf
      - configMap:
          defaultMode: 420
          name: tempo-simplest-ca-bundle
        name: tempo-simplest-ca-bundle
      - name: tempo-simplest-ingester-mtls
        secret:
          defaultMode: 420
          secretName: tempo-simplest-ingester-mtls
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Logs:

k logs tempo-simplest-ingester-0                                                                                                   ploffay@fedora
level=warn ts=2024-06-28T15:40:43.492260734Z caller=main.go:130 msg="-- CONFIGURATION WARNINGS --"
level=warn ts=2024-06-28T15:40:43.492297072Z caller=main.go:136 msg="c.StorageConfig.Trace.Cache is deprecated and will be removed in a future release." explain="Please migrate to the top level cache settings config."
level=info ts=2024-06-28T15:40:43.492307621Z caller=main.go:225 msg="initialising OpenTracing tracer"
level=info ts=2024-06-28T15:40:43.493344752Z caller=main.go:118 msg="Starting Tempo" version="(version=2.5.0, branch=HEAD, revision=46dad3411)"
level=info msg="server listening on addresses" http=[::]:3101 grpc=[::]:33157
level=info ts=2024-06-28T15:40:43.495089336Z caller=server.go:240 msg="server listening on addresses" http=[::]:3200 grpc=[::]:9095
level=info ts=2024-06-28T15:40:43.498308979Z caller=cache.go:55 msg="caches available to storage backend" footer=false bloom=false offset_idx=false column_idx=false trace_id_idx=false page=false
level=info ts=2024-06-28T15:40:43.527404961Z caller=memberlist_client.go:435 msg="Using memberlist cluster label and node name" cluster_label= node=tempo-simplest-ingester-0-81d8842a
level=info ts=2024-06-28T15:40:43.527526809Z caller=module_service.go:82 msg=starting module=cache-provider
level=info ts=2024-06-28T15:40:43.527615424Z caller=module_service.go:82 msg=starting module=store
level=info ts=2024-06-28T15:40:43.527720471Z caller=module_service.go:82 msg=starting module=internal-server
level=info ts=2024-06-28T15:40:43.527871333Z caller=module_service.go:82 msg=starting module=server
level=info ts=2024-06-28T15:40:43.52792299Z caller=module_service.go:82 msg=starting module=overrides
level=info ts=2024-06-28T15:40:43.527950361Z caller=module_service.go:82 msg=starting module=memberlist-kv
level=info ts=2024-06-28T15:40:43.528630814Z caller=module_service.go:82 msg=starting module=ingester
level=info ts=2024-06-28T15:40:43.528664948Z caller=ingester.go:353 msg="beginning wal replay"
level=info ts=2024-06-28T15:40:43.528792456Z caller=wal.go:120 msg="beginning replay" file=30f4c59c-1376-40eb-98f7-03776eb2ec79+single-tenant+vParquet3 size=58
level=warn ts=2024-06-28T15:40:43.528861726Z caller=wal.go:126 msg="failed to replay block. removing." file=30f4c59c-1376-40eb-98f7-03776eb2ec79+single-tenant+vParquet3 err="error reading wal meta json: /var/tempo/wal/30f4c59c-1376-40eb-98f7-03776eb2ec79+single-tenant+vParquet3/meta.json: open /var/tempo/wal/30f4c59c-1376-40eb-98f7-03776eb2ec79+single-tenant+vParquet3/meta.json: permission denied"
level=error ts=2024-06-28T15:40:43.529001768Z caller=app.go:229 msg="module failed" module=ingester err="starting module ingester: invalid service state: Failed, expected: Running, failure: failed to replay wal: fatal error replaying wal: unlinkat /var/tempo/wal/30f4c59c-1376-40eb-98f7-03776eb2ec79+single-tenant+vParquet3/0000000001: permission denied"
level=info ts=2024-06-28T15:40:43.529044137Z caller=module_service.go:120 msg="module stopped" module=overrides
level=info ts=2024-06-28T15:40:43.529090794Z caller=module_service.go:120 msg="module stopped" module=store
level=info ts=2024-06-28T15:40:43.529267014Z caller=module_service.go:120 msg="module stopped" module=cache-provider
level=info ts=2024-06-28T15:40:43.529772329Z caller=memberlist_client.go:541 msg="memberlist fast-join starting" nodes_found=1 to_join=2
level=warn ts=2024-06-28T15:40:43.52978864Z caller=memberlist_client.go:561 msg="memberlist fast-join finished" joined_nodes=0 elapsed_time=18.745µs
level=info ts=2024-06-28T15:40:43.52979929Z caller=memberlist_client.go:720 msg="leaving memberlist cluster"
level=info ts=2024-06-28T15:40:43.529840547Z caller=module_service.go:120 msg="module stopped" module=memberlist-kv
level=info ts=2024-06-28T15:40:43.529982122Z caller=server_service.go:164 msg="server stopped"
level=info ts=2024-06-28T15:40:43.529995777Z caller=module_service.go:120 msg="module stopped" module=server
level=info ts=2024-06-28T15:40:43.530052914Z caller=server_service.go:164 msg="server stopped"
level=info ts=2024-06-28T15:40:43.530060028Z caller=module_service.go:120 msg="module stopped" module=internal-server
level=info ts=2024-06-28T15:40:43.530067061Z caller=app.go:215 msg="Tempo stopped"
-----------------------------------------------------------------------------------

ls output from the pod filesystem:

/ $ ls -al /var/tempo/ 
total 4
drwxrwxrwx    1 root     root            28 Jun 28 15:00 .
drwxr-xr-x    1 root     root            10 May 31 15:10 ..
-rw-r--r--    1 root     root          1387 Jun 28 15:00 tokens.json
drwxr-xr-x    1 root     root           132 Jun 28 15:01 wal
/ $ ls -al /tempo 
-rwxr-xr-x    1 root     root     102880905 May 31 15:10 /tempo

@pavolloffay
Copy link

@zalegrala
Copy link
Contributor

I think it would be good to have an option to allow an init container to be added so that folks have the option to specify a chown if necessary. This would help accommodate the wide range of scenarios. Perhaps that could be added to this PR, but what do folks think?

Copy link
Collaborator

@Sheikh-Abubaker Sheikh-Abubaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StefanLobbenmeierObjego Could you please bump chart version to fix CI lint.

@StefanLobbenmeierObjego
Copy link
Contributor Author

bumped the chart version, also squashed all those formatting commits now to avoid having that noise on main

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat inclined to only want the fsGroup default change here, but I think what is here is still an improvement to the situation when persistence is enabled.

@Sheikh-Abubaker Sheikh-Abubaker merged commit 0519738 into grafana:main Aug 12, 2024
6 checks passed
lumiere-bot bot added a commit to coolguy1771/home-ops that referenced this pull request Aug 14, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [tempo](https://grafana.net)
([source](https://togithub.com/grafana/helm-charts)) | patch | `1.10.2`
-> `1.10.3` |

---

### Release Notes

<details>
<summary>grafana/helm-charts (tempo)</summary>

###
[`v1.10.3`](https://togithub.com/grafana/helm-charts/releases/tag/tempo-1.10.3)

[Compare
Source](https://togithub.com/grafana/helm-charts/compare/tempo-1.10.2...tempo-1.10.3)

Grafana Tempo Single Binary Mode

#### What's Changed

- \[tempo] Fix tempo permissions after update to v1.9.0 by
[@&#8203;StefanLobbenmeierObjego](https://togithub.com/StefanLobbenmeierObjego)
in
[grafana/helm-charts#3161

#### New Contributors

-
[@&#8203;StefanLobbenmeierObjego](https://togithub.com/StefanLobbenmeierObjego)
made their first contribution in
[grafana/helm-charts#3161

**Full Changelog**:
grafana/helm-charts@loki-distributed-0.79.3...tempo-1.10.3

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4yNS4xIiwidXBkYXRlZEluVmVyIjoiMzguMjUuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsicmVub3ZhdGUvaGVsbSIsInR5cGUvcGF0Y2giXX0=-->

Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants