Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tempo to 2.5.0 #958

Merged
merged 8 commits into from
Jul 4, 2024
Merged

Update tempo to 2.5.0 #958

merged 8 commits into from
Jul 4, 2024

Conversation

pavolloffay
Copy link
Collaborator

@pavolloffay pavolloffay commented Jun 28, 2024

The upstream tempo 2.5.0 docker images has a breaking change https://github.com/grafana/tempo/releases/tag/v2.5.0

The user changed from root to tempo (10001:10001) which reads /var/tempo.

Notable changes:

  • upgrade producer for 0.11.0 that deploys a Job to chown -R /var/tempo for every PV of the ingester.
  • the upgrade procedure does not run on OCP. The RH product uses a different tempo image that was not using root and OpenShift sets securityContext.fsGroup so even a community image works well.

Upgrade test

k apply -f https://github.com/grafana/tempo-operator/releases/download/v0.10.0/tempo-operator.yaml

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest4
spec:
  managementState: Managed
  storage:
    secret:
      name: minio-test
      type: s3
  storageSize: 1Gi
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
    ingester:
      replicas: 2
EOF

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoMonolithic
metadata:
  name: sample
spec:
  storage:
    traces:
      backend: pv
      size: 2Gi
EOF

REPORT data

IMG_PREFIX=docker.io/pavolloffay OPERATOR_VERSION=0.11.0 TEMPO_VERSION=2.5.0 make docker-build docker-push deploy

@codecov-commenter
Copy link

codecov-commenter commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 14.91228% with 97 lines in your changes missing coverage. Please review.

Project coverage is 73.35%. Comparing base (fc459a8) to head (4554d6a).
Report is 3 commits behind head on main.

Files Patch % Lines
internal/upgrade/v0_11_0.go 14.91% 93 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #958      +/-   ##
==========================================
- Coverage   74.37%   73.35%   -1.02%     
==========================================
  Files         104      105       +1     
  Lines        6372     6486     +114     
==========================================
+ Hits         4739     4758      +19     
- Misses       1346     1438      +92     
- Partials      287      290       +3     
Flag Coverage Δ
unittests 73.35% <14.91%> (-1.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pavolloffay pavolloffay changed the title Update tempo to 2.4.2 Update tempo to 2.5.0 Jun 28, 2024
@andreasgerstmayr
Copy link
Collaborator

The ownership of /var/tempo changed, see the breaking changes of the 2.5.0 release: https://github.com/grafana/tempo/releases/tag/v2.5.0

Ownership of /var/tempo is changing. Historically this has been owned by root:root, and with this change it will now be owned by tempo:tempo with the UID/GID of 10001. The ingester and metrics-generator statefulsets may need to be chown'd in order to start properly. A jsonnet example of an init container is included with the PR. This impacts impacts all users of the grafana/tempo Docker image.

I think we'll need an init container to adjust existing installations.

Signed-off-by: Pavol Loffay <[email protected]>
@rubenvp8510
Copy link
Collaborator

The ownership of /var/tempo changed, see the breaking changes of the 2.5.0 release: https://github.com/grafana/tempo/releases/tag/v2.5.0

Ownership of /var/tempo is changing. Historically this has been owned by root:root, and with this change it will now be owned by tempo:tempo with the UID/GID of 10001. The ingester and metrics-generator statefulsets may need to be chown'd in order to start properly. A jsonnet example of an init container is included with the PR. This impacts impacts all users of the grafana/tempo Docker image.

I think we'll need an init container to adjust existing installations.

The ownership of /var/tempo changed, see the breaking changes of the 2.5.0 release: https://github.com/grafana/tempo/releases/tag/v2.5.0

Ownership of /var/tempo is changing. Historically this has been owned by root:root, and with this change it will now be owned by tempo:tempo with the UID/GID of 10001. The ingester and metrics-generator statefulsets may need to be chown'd in order to start properly. A jsonnet example of an init container is included with the PR. This impacts impacts all users of the grafana/tempo Docker image.

I think we'll need an init container to adjust existing installations.

Is this not something solved by the securityContext?

@pavolloffay
Copy link
Collaborator Author

good that we have the upgrade tests

an instance created with 2.4.2 and upgraded to 2.5.0 throws

level=error ts=2024-06-28T14:45:59.767604772Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = Unknown desc = mkdir /var/tempo/wal/d3ff7b03-3e2e-4380-aa94-d8930e2f97c2+single-tenant+vParquet3: permission denied"
level=error ts=2024-06-28T14:46:00.761519343Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = Unknown desc = mkdir /var/tempo/wal/82e00424-ff58-4a66-b83b-ecd8f4cdb8c1+single-tenant+vParquet3: permission denied"
level=error ts=2024-06-28T14:46:01.763387853Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = Unknown desc = mkdir /var/tempo/wal/2bf4ee9e-9701-43b7-858e-f469229ec921+single-tenant+vParquet3: permission denied"
level=error ts=2024-06-28T14:46:02.76407992Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = Unknown desc = mkdir /var/tempo/wal/a9b65782-2bc0-482c-a185-84eb36cd7301+single-tenant+vParquet3: permission denied"

@pavolloffay
Copy link
Collaborator Author

PR in the tempo helm chart grafana/helm-charts#3161 (comment)

@pavolloffay
Copy link
Collaborator Author

pavolloffay commented Jul 1, 2024

The issue does not seem to occur on OCP. The OCP seems to be setting this securityContext by default

  securityContext:
    fsGroup: 1001020000
    seLinuxOptions:
      level: s0:c32,c14
    seccompProfile:
      type: RuntimeDefault

Tested with

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest
spec:
  managementState: Managed
  images:
    tempo: docker.io/grafana/tempo:2.4.1
  storage:
    secret:
      name: minio-test
      type: s3
  storageSize: 1Gi
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
EOF

reported some data and then changed to docker.io/grafana/tempo:2.5.0

~ $ ls -al /var/tempo/
total 28
drwxrwsr-x    4 root     10010200      4096 Jul  1 16:57 .
drwxr-xr-x    1 root     root            19 May 31 15:10 ..
drwxrws---    2 root     10010200     16384 Jul  1 16:54 lost+found
-rw-r--r--    1 10010200 10010200      1388 Jul  1 16:57 tokens.json
drwxrwsr-x    4 10010200 10010200      4096 Jul  1 17:07 wal

Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay pavolloffay requested review from andreasgerstmayr and rubenvp8510 and removed request for andreasgerstmayr July 2, 2024 17:18
Signed-off-by: Pavol Loffay <[email protected]>

func upgrade0_11_0_pvcs(ctx context.Context, u Upgrade, tempo metav1.ObjectMeta, nodeSelector map[string]string, image string, pvcs *corev1.PersistentVolumeClaimList) error {
// keep the jobs around for 1 day
ttl := int32(60 * 60 * 24)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TTL necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise the job would hang there forever

Copy link
Collaborator

@rubenvp8510 rubenvp8510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment but other than that LGTM

// so the issue does not happen on OpenShift.
func upgrade0_11_0(ctx context.Context, u Upgrade, tempo *v1alpha1.TempoStack) error {
// do nothing on OpenShift
if u.CtrlConfig.Gates.OpenShift.OpenShiftRoute {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny way of detect openshift :D

Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
@@ -18,6 +18,7 @@ COPY . .

# Build
ARG OPERATOR_VERSION
ARG TEMPO_VERSION
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenvp8510 do you know how OPERATOR_VERSION is passed in GHA when the image is published? We might need to update the workflow to include the tempo version

@@ -159,7 +159,7 @@ run: manifests generate ## Run a controller from your host.

.PHONY: docker-build
docker-build: ## Build docker image with the manager.
docker buildx build --load --platform linux/${ARCH} --build-arg OPERATOR_VERSION -t ${IMG} .
docker buildx build --load --platform linux/${ARCH} --build-arg OPERATOR_VERSION --build-arg TEMPO_VERSION -t ${IMG} .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaics we don't need to pass OPERATOR_VERSION or TEMPO_VERSION here, because in the Dockerfile we run make build, and the Makefile contains the versions (hardcoded).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not work for e.g.

IMG_PREFIX=docker.io/pavolloffay OPERATOR_VERSION=0.11.0 TEMPO_VERSION=2.5.0 make docker-build docker-push deploy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, but this is only required for testing, right? Anyway, still useful 👍

Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about race conditions, this is the timeline on upgrade:

  1. old tempo version runs
  2. operator restarts with new operator version
  3. operator reconciles all managed instances
    3.1. at the start of reconcile loop, operator runs upgrades
    3.2. upgrade procedure creates job to chown PV contents (while old tempo pods are still running)
    3.3. reconcile create the manifests, using the new tempo image in all Deployment and StatefulSet manifests
    3.4. reconcile detects changes and applies the new manifests to the cluster, triggering a restart of the tempo pods (the upgrade job may or may not be completed at this time)

I guess this works in most cases, but not all (busy cluster, chown-job takes long time to schedule/execute, the ingester keeps writing during chmod, so we end up with some root-owned and some tempo-owned files in the PV, etc).

Ideally we stop running ingesters, perform chown, wait until chown is done, and then only start the new version.

I think adding an initContainer would solve the timing issues. init containers always start and finish before the main container, and afaics (correct me if I'm wrong) while the init container of the new statefulset runs, the old statefulset is already terminated (otherwise two pods access the same PV, I think this is prevented).

@pavolloffay
Copy link
Collaborator Author

I think adding an initContainer would solve the timing issues. init containers always start and finish before the main container, and afaics (correct me if I'm wrong) while the init container of the new statefulset runs, the old statefulset is already terminated (otherwise two pods access the same PV, I think this is prevented).

There is an issue with the init-container approach. The init-container will run only for ingesters (and PVs) that are scheduled to run. The ingesters might be scaled down (e.g. from 5 instances to 2) therefore there will be PVs that won't be upgraded.

@andreasgerstmayr
Copy link
Collaborator

I think adding an initContainer would solve the timing issues. init containers always start and finish before the main container, and afaics (correct me if I'm wrong) while the init container of the new statefulset runs, the old statefulset is already terminated (otherwise two pods access the same PV, I think this is prevented).

There is an issue with the init-container approach. The init-container will run only for ingesters (and PVs) that are scheduled to run. The ingesters might be scaled down (e.g. from 5 instances to 2) therefore there will be PVs that won't be upgraded.

We'll have to keep the initContainer indefinitely, so when the ingesters are scaled down it doesn't matter if the PV is in an outdated state, but when they get scaled up again they will be upgraded.

Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay
Copy link
Collaborator Author

We'll have to keep the initContainer indefinitely

This is certainly doable but it will clutter non upgrade related packages. I would prefer to keep the upgrade code in an isolated standalone package.

I have changed the upgrade procedure to scale ingesters down to zero before upgrading, then wait until a single job that mounts all volumes finishes.

Signed-off-by: Pavol Loffay <[email protected]>
Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice job (pun intended)


return false, nil
})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could remove the job here, instead of leaving it there until the 24h TTL. but it's fine either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it there intentionally for awareness :)

@pavolloffay pavolloffay merged commit 53d479a into grafana:main Jul 4, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants