Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextcloud-nginx container crashlooping after securityContext update; /var/www/html/config always owned by root #335

Open
jessebot opened this issue Jan 23, 2023 · 18 comments · May be fixed by #379
Labels
Persistence Anything to do with external storage or persistence. This is also where we triage things like NFS. securityContext issues related security contexts

Comments

@jessebot
Copy link
Collaborator

jessebot commented Jan 23, 2023

Description

I've edited this for full context of how we got here, as this issue is getting kind of long, because it needed to be tested in a lot of different ways which lead me in several directions.

This issue is a continuation of the conversation started after #269 was merged. I was originally trying to changed the podSecurityContext.runAsUser and podSecurityContext.runAsGroup to 33 because I was trying to diagnose why the /var/www/html/config directory was always owned by root. I am deploying the nextcloud helm chart using persistent volumes on k3s with the default local path provisioner.

I learned that the podSecurityContext.fsGroup was always being set to 82 anytime you used nginx.enabled and didn't set podSecurityContext.fsGroup explicitly, so I submitted a draft PR here to fix it to so that it checks image.flavor for alpine: #379

Through the comments here you can see other things I'm currently testing, because I'm still not sure is it's just the local path provisioner on k3s or k3s itself or what, but the best I can get is 🤷 I'll update this issue description with more clarity as it comes.

Original Issue that was opened on Jan 23

The nginx container in the nextcloud pod won't start and complains of a readonly file system, even if I try to only set the nextcloud.securityContext.

I have created a new cluster and deployed nextcloud with the securityContext parameters from the values.yaml of this repo, including the nginx security context. My entire values.yaml is here, but the parts that matter are:

securityContext parameters in my old `values.yaml`
nextcloud:
  # securityContext parameters. For example you may need to define runAsNonRoot directive
  securityContext:
    runAsUser: 33
    runAsGroup: 33
    runAsNonRoot: true
    readOnlyRootFilesystem: false

  # securityContext parameters. For example you may need to define runAsNonRoot directive
  podSecurityContext:
    runAsUser: 33
    runAsGroup: 33
    runAsNonRoot: true
    readOnlyRootFilesystem: false

...

  nginx:
    ## You need to set an fpm version of the image for nextcloud if you want to use nginx!
    enabled: true
    image:
      repository: nginx
      tag: alpine
      pullPolicy: Always
    
    # this is copied almost directly from the values.yaml, but I changed readOnlyRootFilesystem to false while testing
    securityContext:
      runAsUser: 82
      runAsGroup: 33
      runAsNonRoot: true
      readOnlyRootFilesystem: false

The nextcloud pod is in a crashloopbackoff state with the offending container being nginx, and this being the logs:

│ 2023-01-23T15:44:59.428798413+01:00 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration        │
│ 2023-01-23T15:44:59.428820874+01:00 /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/                               │
│ 2023-01-23T15:44:59.429173908+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh                   │
│ 2023-01-23T15:44:59.429979412+01:00 10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file sy │
│ stem?)                                                                                                                                      │
│ 2023-01-23T15:44:59.430071429+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh                       │
│ 2023-01-23T15:44:59.431167356+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh                       │
│ 2023-01-23T15:44:59.431715519+01:00 /docker-entrypoint.sh: Configuration complete; ready for start up                                       │
│ 2023-01-23T15:44:59.433513935+01:00 2023/01/23 14:44:59 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)  │
│ 2023-01-23T15:44:59.433519229+01:00 nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)                    │
│ 2023-01-23T14:45:24.296336176Z Stream closed EOF for nextcloud/nextcloud-web-app-66fc5dfcb7-kxlnp (nextcloud-nginx)

This is the resulting deployment.yaml when I do a kubectl get deployment -n nextcloud nextcloud-web-app > deployment.yaml:

Click me for the nextcloud deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2023-01-23T14:43:58Z"
  generation: 52
  labels:
    app.kubernetes.io/component: app
    app.kubernetes.io/instance: nextcloud-web-app
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: nextcloud
    argocd.argoproj.io/instance: nextcloud-web-app
    helm.sh/chart: nextcloud-3.4.1
  name: nextcloud-web-app
  namespace: nextcloud
  resourceVersion: "3340"
  uid: cde1dd07-103a-4c04-931d-071ab3c5b448
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: app
      app.kubernetes.io/instance: nextcloud-web-app
      app.kubernetes.io/name: nextcloud
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        nextcloud-config-hash: d1d9ac6f86f643b460f8e4e8e886b65382ad49aede8762f8ea74ccd86b7e3f28
        nginx-config-hash: 16c61772d9e74de7322870fd3a045598ea01f6e16be155d116423e6a246dcddc
        php-config-hash: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: app
        app.kubernetes.io/instance: nextcloud-web-app
        app.kubernetes.io/name: nextcloud
    spec:
      containers:
      - env:
        - name: POSTGRES_HOST
          value: nextcloud-web-app-postgresql
        - name: POSTGRES_DB
          value: nextcloud
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              key: username
              name: nextcloud-pgsql-credentials
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: nextcloudPassword
              name: nextcloud-pgsql-credentials
        - name: NEXTCLOUD_ADMIN_USER
          valueFrom:
            secretKeyRef:
              key: username
              name: nextcloud-admin-credentials
        - name: NEXTCLOUD_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: nextcloud-admin-credentials
        - name: NEXTCLOUD_TRUSTED_DOMAINS
          value: nextcloud.vleermuis.tech
        - name: NEXTCLOUD_DATA_DIR
          value: /var/www/html/data
        image: nextcloud:25.0.3-fpm
        imagePullPolicy: Always
        name: nextcloud
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/
          name: nextcloud-main
          subPath: root
        - mountPath: /var/www/html
          name: nextcloud-main
          subPath: html
        - mountPath: /var/www/html/data
          name: nextcloud-main
          subPath: data
        - mountPath: /var/www/html/config
          name: nextcloud-main
          subPath: config
        - mountPath: /var/www/html/custom_apps
          name: nextcloud-main
          subPath: custom_apps
        - mountPath: /var/www/tmp
          name: nextcloud-main
          subPath: tmp
        - mountPath: /var/www/html/themes
          name: nextcloud-main
          subPath: themes
        - mountPath: /var/www/html/config/logging.config.php
          name: nextcloud-config
          subPath: logging.config.php
        - mountPath: /var/www/html/config/proxy.config.php
          name: nextcloud-config
          subPath: proxy.config.php
        - mountPath: /var/www/html/config/.htaccess
          name: nextcloud-config
          subPath: .htaccess
        - mountPath: /var/www/html/config/apache-pretty-urls.config.php
          name: nextcloud-config
          subPath: apache-pretty-urls.config.php
        - mountPath: /var/www/html/config/apcu.config.php
          name: nextcloud-config
          subPath: apcu.config.php
        - mountPath: /var/www/html/config/apps.config.php
          name: nextcloud-config
          subPath: apps.config.php
        - mountPath: /var/www/html/config/autoconfig.php
          name: nextcloud-config
          subPath: autoconfig.php
        - mountPath: /var/www/html/config/redis.config.php
          name: nextcloud-config
          subPath: redis.config.php
        - mountPath: /var/www/html/config/smtp.config.php
          name: nextcloud-config
          subPath: smtp.config.php
      - image: nginx:alpine
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          httpGet:
            httpHeaders:
            - name: Host
              value: nextcloud.vleermuis.tech
            path: /status.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 5
        name: nextcloud-nginx
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            httpHeaders:
            - name: Host
              value: nextcloud.vleermuis.tech
            path: /status.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}
        securityContext:
          readOnlyRootFilesystem: false
          runAsGroup: 33
          runAsNonRoot: true
          runAsUser: 82
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/
          name: nextcloud-main
          subPath: root
        - mountPath: /var/www/html
          name: nextcloud-main
          subPath: html
        - mountPath: /var/www/html/data
          name: nextcloud-main
          subPath: data
        - mountPath: /var/www/html/config
          name: nextcloud-main
          subPath: config
        - mountPath: /var/www/html/custom_apps
          name: nextcloud-main
          subPath: custom_apps
        - mountPath: /var/www/tmp
          name: nextcloud-main
          subPath: tmp
        - mountPath: /var/www/html/themes
          name: nextcloud-main
          subPath: themes
        - mountPath: /etc/nginx/nginx.conf
          name: nextcloud-nginx-config
          subPath: nginx.conf
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - until pg_isready -h nextcloud-web-app-postgresql -U ${POSTGRES_USER} ; do
          sleep 2 ; done
        env:
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              key: username
              name: nextcloud-pgsql-credentials
        image: bitnami/postgresql:14.4.0-debian-11-r23
        imagePullPolicy: IfNotPresent
        name: postgresql-isready
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 82
        runAsGroup: 33
        runAsNonRoot: true
        runAsUser: 33
      serviceAccount: nextcloud-serviceaccount
      serviceAccountName: nextcloud-serviceaccount
      terminationGracePeriodSeconds: 30
      volumes:
      - name: nextcloud-main
        persistentVolumeClaim:
          claimName: nextcloud-files
      - configMap:
          defaultMode: 420
          name: nextcloud-web-app-config
        name: nextcloud-config
      - configMap:
          defaultMode: 420
          name: nextcloud-web-app-nginxconfig
        name: nextcloud-nginx-config
status:
  conditions:
  - lastTransitionTime: "2023-01-23T14:43:58Z"
    lastUpdateTime: "2023-01-23T14:43:58Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2023-01-23T14:43:58Z"
    lastUpdateTime: "2023-01-23T14:43:58Z"
    message: ReplicaSet "nextcloud-web-app-66fc5dfcb7" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 52
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Where does the UID 82 come from? (Edit: it comes from the alpine nextcloud and nginx images - that's www-data)

I set that to 33 (nextcloud's www-data user) to test, but it didn't seem to make a difference. Just so it's clear, without editing any of the security contexts, everything works, but I would like the security context to work, because otherwise it causes my restores from backups to fail, because the /var/www/html/config directory is always created with root ownership, which means if the restores run as www-data, they can't restore that particular directory, which is important. I'm hoping the security context fixes that, so that nothing has to run as root in this stack.

I'm deploying the 3.4.1 nextcloud helm chart via Argo CD onto k3s on Ubuntu 22.04.
Update: problem still present in 3.5.7 helm chart.

@jessebot jessebot added the security Security issues label Jan 27, 2023
@FrankelJb
Copy link

Adding my experience:
Just moved my directory from a hostpath to nfs then started encountering permission issues. I chown -R 33:33 the whole directory and set the security context. This is my error now:

4 Configuring Redis as session handler
3 /entrypoint.sh: 78: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
2 Initializing nextcloud 25.0.3.2 ...
1 touch: cannot touch '/var/www/html/nextcloud-init-sync.lock': Permission denied

@jessebot
Copy link
Collaborator Author

@FrankelJb are you also using nginx? Which security contexts are you setting? There's a few that you can set. If we could get the security context settings from your values.yaml, that would help in comparing states. Thank you for sharing!

@FrankelJb
Copy link

@jessebot I'm not using Nginx. I'm almost ready to give up on NC in kubernetes (I can't upgrade now). I've managed to solve this issue. I was trying to use a single redis cluster for all my services. However, I had to give up on that dream because NC refused to connect without a password. I'm not sure if that's caused by a config in the helm chart or my configuration error. Thanks for being so responsive :)

@jessebot
Copy link
Collaborator Author

I'm sorry you're having a bad time with this. I also had a bad time with this at first and then became sort of obsessed with trying to fix it for others too 😅

If you can post your values.yaml (after removing sensitive info) I can help troubleshoot it for you :)

@Jeroen0494
Copy link

UID 82 comes from the Nextcloud fpm alpine image. If you use another image instead of alpine, I believe the user is 33. The NGINX container you use is an alpine based image, so you have to make sure the group and fsgroup match for both containers.

For example my (abbreviated) deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextcloud
  namespace: nextcloud
  labels:
    app: nextcloud
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nextcloud
  template:
    metadata:
      annotations:
        container.apparmor.security.beta.kubernetes.io/nextcloud: localhost/container-nextcloud
        container.apparmor.security.beta.kubernetes.io/nginx: localhost/container-nginx
      labels:
        app: nextcloud
    spec:
      automountServiceAccountToken: false

      containers:
      - name: nextcloud
        image: "nextcloud:24.0.9-fpm-alpine"

        securityContext:
          runAsUser: 82
          allowPrivilegeEscalation: false
          privileged: false
          runAsNonRoot: true
          capabilities:
            drop:
              - ALL
          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

      - name: nginx
        image: cgr.dev/chainguard/nginx:1.23.3
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          capabilities:
            add:
              - NET_BIND_SERVICE
            drop:
              - ALL
          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nginx-seccomp-profile.json

      # Will mount configuration files as www-data (id: 82) for nextcloud
      securityContext:
        fsGroup: 82
      serviceAccountName: nextcloud-serviceaccount

You can see I use a distroless NGINX container image, bu the principle is the same.

@FrankelJb
Copy link

@jessebot here is a link to my values.yaml. I've just tried to recreate with flux, moving from agocd, and it just waits on "Initializing nextcloud 25.0.4.1 ..." for minutes. It was working with the same yaml, the deployment took 45 minutes last time.

@jessebot
Copy link
Collaborator Author

jessebot commented Apr 15, 2023

@FrankelJb , for Argo CD, I detailed some of my trials in #336 (comment) if that's at all helpful.

For this issue owned by root issue, also discussed in #114 , I finally got around to testing it (after battling argo 😅 ), and I've noted that all of the securityContext parameters I've tested (nextcloud, nginx, and the nextcloud pod) seem to work kind of, but the following directories are always owned by root on the nextcloud container:

Screenshot showing root ownership and GID 82 group ownership on everything in /var/www/html/config on the nextcloud pod except config.php which is owned by www-data

Screenshot showing everything in /var/www/html owned by www-data and group ownership on all of it set to GID 82, EXCEPT for config, custom_apps, data, and themes

I don't know why though. At first, I thought it was a persistence thing, but then I disabled persistence and it's still an issue. You can kind of see me live testing with 3.5.7 nextcloud chart here, but each thing I test leads me further to believing there's something going on with our volume mounts? I've been using the 26.0.0-fpm image, but I haven't tested the regular image or the alpine image like @Jeroen0494 suggested, yet.

Note: This /var/www/html/config directory owned by root doesn't happen when using the nextcloud docker container directly and setting it to run as nonroot. This only happens with the helm chart.

@provokateurin or @tvories have you been able to get this to work? I can get every other directory to be created as any other user, but the directories from the screenshot seem to always be owned by root. You can see my values.yaml here, but I don't know what else we need to set here 🤔 Are there security contexts for persistent volumes? Or perhaps mount options we need to set for the configmap when it gets mounted? It's been months, albeit in my off hours, but I'm still so confused.

@jessebot
Copy link
Collaborator Author

jessebot commented Apr 15, 2023

@Jeroen0494 , I switched to the 26.0.0-fpm-alpine tag and also added most of the options you'd added and /var/www/html/config is still owned by root when deploying with this helm chart. You can see the full values.yaml I tried here, but the important parts are this:

image:
  repository: nextcloud
  flavor: fpm-alpine
  pullPolicy: Always

nextcloud:
  # Set securityContext parameters. For example, you may need to define runAsNonRoot directive
  securityContext:
    runAsUser: 82
    runAsGroup: 82
    runAsNonRoot: true
    readOnlyRootFilesystem: false
    allowPrivilegeEscalation: false
    privileged: false
    capabilities:
      drop:
        - ALL

  podSecurityContext:
    fsGroup: 82
...
# this is deprecated, but I figured why not, anything to change that one config directory from root (but it didn't work)
securityContext:
  fsGroup: 82

I can't figure out what else it would be. Maybe a script in the container itself? 🤔 Are you using the helm chart and using persistence? Is your /var/www/html/config owned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:

          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

I see it described here in the k8s api docs, but it doesn't link further for what goes in localhostProfile.

@jessebot jessebot changed the title nextcloud-nginx container crashlooping after securityContext update nextcloud-nginx container crashlooping after securityContext update; /var/www/html/config always owned by root Apr 15, 2023
@tomasodehnal
Copy link

@jessebot Not sure if it is the same issue but maybe it will help. I'm using 25-alpine with hostPath PV and even though I have set securityContext in the pod and used the same id for the ownership of the path on the host, the mapped subdirectories of the PV were owned by root:root and container was stuck on:

/entrypoint.sh: 104: cannot create /var/www/html/nextcloud-init-sync.lock: Permission denied

I resolved it by manually changing the ownership of the subdirs on the host to the same uid.

@jessebot
Copy link
Collaborator Author

jessebot commented Apr 16, 2023

@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue 😁 ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?

The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.

Here's one of my PVCs which is using the local path provisioner, since I'm using k3s:

# Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: nextcloud
  name: nextcloud-files
  annotations:
    k8up.io/backup: "true"
    volumeType: local
spec:
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Still looking if there's anything that can be done here, but from my research, this might just be something that needs to be solved in an init container, which I might have to make a PR for :(

Update: tested without any sort of values.yaml, using all default settings on k3s with chart version 3.5.8 and only nextcloud-init-sync.lock is owned by root like this:

-rw-r--r--  1 root     www-data    0 Apr 16 09:01 nextcloud-init-sync.lock

but that's without any persistence or configurations enabled 🤔


Re: nextcloud-init-sync.lock

That file is actually owned by root by default in all the nextcloud docker containers, but only that one file (it occurs in both the docker container directly and in the helm chart).

Example Default Permissions on nextcloud:fpm-alpine Docker Container
$ docker run -d nextcloud:fpm-alpine
Unable to find image 'nextcloud:fpm-alpine' locally
fpm-alpine: Pulling from library/nextcloud
f56be85fc22e: Pull complete
ace8de9a4ff5: Pull complete
ac818333da4c: Pull complete
10f4138fad9a: Pull complete
04049f99cb8d: Pull complete
93231f0bdcb6: Pull complete
ab266ad8891c: Pull complete
552295b4d6d8: Pull complete
cffafb46943d: Pull complete
4964abd498c6: Pull complete
a05442d246e3: Pull complete
42633b5b39c2: Pull complete
6f8014cbce5e: Pull complete
18729ba22f88: Pull complete
9eedd0061e2b: Pull complete
97d1b1593a77: Pull complete
Digest: sha256:9a08c42558cda7d48de2cc3da9f5150eeed81e7595aa4c2c5ace6612c3923240
Status: Downloaded newer image for nextcloud:fpm-alpine
688a243c0388ca26541b0d39cc5ebe3c83ad41df617aa601e28e08a258319dfa

$ docker exec -it frosty_mendel /bin/sh
/var/www/html # ls -hal
total 180K
drwxrwxrwt   15 www-data www-data    4.0K Apr 16 08:42 .
drwxrwxr-x    1 www-data root        4.0K Apr 14 20:46 ..
-rw-r--r--    1 www-data www-data    3.2K Apr 16 08:42 .htaccess
-rw-r--r--    1 www-data www-data     101 Apr 16 08:42 .user.ini
drwxr-xr-x   45 www-data www-data    4.0K Apr 16 08:42 3rdparty
-rw-r--r--    1 www-data www-data   18.9K Apr 16 08:42 AUTHORS
-rw-r--r--    1 www-data www-data   33.7K Apr 16 08:42 COPYING
drwxr-xr-x   50 www-data www-data    4.0K Apr 16 08:42 apps
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 config
-rw-r--r--    1 www-data www-data    4.0K Apr 16 08:42 console.php
drwxr-xr-x   24 www-data www-data    4.0K Apr 16 08:42 core
-rw-r--r--    1 www-data www-data    6.2K Apr 16 08:42 cron.php
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 custom_apps
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 data
drwxr-xr-x    2 www-data www-data   12.0K Apr 16 08:42 dist
-rw-r--r--    1 www-data www-data     156 Apr 16 08:42 index.html
-rw-r--r--    1 www-data www-data    3.4K Apr 16 08:42 index.php
drwxr-xr-x    6 www-data www-data    4.0K Apr 16 08:42 lib
-rw-r--r--    1 root     root           0 Apr 16 08:42 nextcloud-init-sync.lock
-rwxr-xr-x    1 www-data www-data     283 Apr 16 08:42 occ
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 ocm-provider
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 ocs
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:42 ocs-provider
-rw-r--r--    1 www-data www-data    3.1K Apr 16 08:42 public.php
-rw-r--r--    1 www-data www-data    5.4K Apr 16 08:42 remote.php
drwxr-xr-x    4 www-data www-data    4.0K Apr 16 08:42 resources
-rw-r--r--    1 www-data www-data      26 Apr 16 08:42 robots.txt
-rw-r--r--    1 www-data www-data    2.4K Apr 16 08:42 status.php
drwxr-xr-x    3 www-data www-data    4.0K Apr 16 08:42 themes
-rw-r--r--    1 www-data www-data     384 Apr 16 08:42 version.php

Running docker with --user 82:82 fixes the issue on the alpine image (you'd use 33 for the non-alpine images) as you can see here (but that's not helpful for k8s itself 😞 since this was using docker directly):

Example Fixed Permissions on nextcloud:fpm-alpine Docker Container
$ docker run -d --user 82:82 nextcloud:fpm-alpine
9761e3ff869b3ad026ef5bf10b333d5c52c2ec0ad6b5dd212016d083c8888dd3

$ docker exec -it eager_buck /bin/sh
/var/www/html $ ls -hal
total 180K
drwxrwxrwt   15 www-data root        4.0K Apr 16 08:48 .
drwxrwxr-x    1 www-data root        4.0K Apr 14 20:46 ..
-rw-r--r--    1 www-data www-data    3.2K Apr 16 08:48 .htaccess
-rw-r--r--    1 www-data www-data     101 Apr 16 08:48 .user.ini
drwxr-xr-x   45 www-data www-data    4.0K Apr 16 08:48 3rdparty
-rw-r--r--    1 www-data www-data   18.9K Apr 16 08:48 AUTHORS
-rw-r--r--    1 www-data www-data   33.7K Apr 16 08:48 COPYING
drwxr-xr-x   50 www-data www-data    4.0K Apr 16 08:48 apps
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 config
-rw-r--r--    1 www-data www-data    4.0K Apr 16 08:48 console.php
drwxr-xr-x   24 www-data www-data    4.0K Apr 16 08:48 core
-rw-r--r--    1 www-data www-data    6.2K Apr 16 08:48 cron.php
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 custom_apps
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 data
drwxr-xr-x    2 www-data www-data   12.0K Apr 16 08:48 dist
-rw-r--r--    1 www-data www-data     156 Apr 16 08:48 index.html
-rw-r--r--    1 www-data www-data    3.4K Apr 16 08:48 index.php
drwxr-xr-x    6 www-data www-data    4.0K Apr 16 08:48 lib
-rw-r--r--    1 www-data www-data       0 Apr 16 08:48 nextcloud-init-sync.lock
-rwxr-xr-x    1 www-data www-data     283 Apr 16 08:48 occ
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 ocm-provider
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 ocs
drwxr-xr-x    2 www-data www-data    4.0K Apr 16 08:48 ocs-provider
-rw-r--r--    1 www-data www-data    3.1K Apr 16 08:48 public.php
-rw-r--r--    1 www-data www-data    5.4K Apr 16 08:48 remote.php
drwxr-xr-x    4 www-data www-data    4.0K Apr 16 08:48 resources
-rw-r--r--    1 www-data www-data      26 Apr 16 08:48 robots.txt
-rw-r--r--    1 www-data www-data    2.4K Apr 16 08:48 status.php
drwxr-xr-x    3 www-data www-data    4.0K Apr 16 08:48 themes
-rw-r--r--    1 www-data www-data     384 Apr 16 08:48 version.php

@jessebot jessebot added the Persistence Anything to do with external storage or persistence. This is also where we triage things like NFS. label Apr 16, 2023
@Jeroen0494
Copy link

@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?

When using existing storage and the owner of the files is root, when switching to a non-root container it wouldn't be able to change the owner. You'd have to change the owner on the storage medium itself with a chown.

Does the issue exist when using no attached storage? And when using emptyDir? An when using PVC template with local-path-provisioner?

I can't figure out what else it would be. Maybe a script in the container itself? thinking Are you using the helm chart and using persistence? Is your /var/www/html/config owned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:

          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

I see it described here in the k8s api docs, but it doesn't link further for what goes in localhostProfile.

I'm using the security profiles operator and have written my own seccomp profile. You may ignore this line, or switch type to RuntimeDefault.

Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.

@jessebot
Copy link
Collaborator Author

Thanks for getting back to me, @Jeroen0494 🙏

Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.

Commented on that PR and will take another look after conflicts are resolved :) Will still probably ping Kate in though, as the PR is large.

@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?

Let me try with emptyDir actually. 🤔 I've been doing this on a fresh k3s cluster each time. I completely destroy the cluster and it's storage before testing a new cluster. I checked /var/lib/rancher after removing k3s and there isn't anything in that directory, though the directory is owned by root, however the directories within it should not be. I use smol-k8s-lab for deploying and destroying local k3s clusters. Let me spin up a new cluster and check the ownership of the directory after that.

Does the issue exist when using no attached storage?

No, the issue doesn't exist when I don't use any persistence. Well, except for the nextcloud-init-sync.lock file, which is always owned by root, but that's not what I'm after right now. I'm after the /var/www/html/config dir. Detailed more info on that lock file here: #335 (comment)

@Jeroen0494
Copy link

Could you also try with a local mount, instead of using the local path provisioner?

For example, my PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud-data
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nextcloud-data
    namespace: nextcloud
  local:
    path: /data/crypt/nextcloud/data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - mediaserver.fritz.box
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

@jessebot
Copy link
Collaborator Author

jessebot commented Apr 16, 2023

Here's what else I tried recently:

I do not know how to set an emptyDir with the current values.yaml 🤔

Creating a Persistent Volume with spec.hostPath.path

I was previously using a dynamic pvc, but here's the new setup I tried, using the 26.0.0-fpm tag again this time, so I only did changed the securityContext for the nextcloud container, since nginx isn't even what I'm troubleshooting, so I didn't set nextcloud.podSecurityContext. Here's the PV and existing PVC for nextcloud files:

PV and PVC yaml
---
kind: PersistentVolume
apiVersion: v1
metadata:
  namespace: nextcloud
  name: nextcloud
spec:
  storageClassName: local-path
  capacity:
    storage: 11Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: '/data/nextcloud'

---
# Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: nextcloud
  name: nextcloud-files
  annotations:
    k8up.io/backup: "true"
spec:
  volumeName: nextcloud
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

The above still failed, so I'm beginning to think this is k3s related... because the directory I specified, I also created as user 33:33, which is also www-data on the host machine.

Screenshot 2023-04-16 at 15 50 34

I found this k3s issue, #3704, and whatever the fix was, just didn't seem to work? There's nother PR opened here, #7217, which may fix it but 🤷

Creating a Persistent Volume with spec.local.path

Next I tried the second thing you suggested, @Jeroen0494 , with a PV that has spec.local.path like this, making sure also that /data/nextcloud was cleaned between runs and was owned by www-data which is UID 33 in both the securityContext for the nextcloud container and the host node:

PV and PVC yaml
---
# using local path instead of local-path provisioner directly
apiVersion: v1
kind: PersistentVolume
metadata:
  namespace: nextcloud
  name: nextcloud
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nextcloud-files
    namespace: nextcloud
  local:
    path: /data/nextcloud/
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - compufam
---
# persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: nextcloud
  name: nextcloud-files
  annotations:
    k8up.io/backup: "true"
spec:
  volumeName: nextcloud
  # tried with AND *without* storageClassName set int he pvc
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

This also fails, and what's weird is that I'm not using the alpine container for nextcloud, but it still changed the group ownership to UID 82 but also left the user as root for all the same directories as previously 🤷 :

Screenshot 2023-04-16 at 16 29 06

Edit: Just realized I left spec.storageClassName: local-path in the persistent volume claim, so tried again without it and same result with the UID 82 above before the edit. I think we need to fix that, because that's coming from the deployment.yaml here, where it says to always set the fsGroup for the nextcloud container to 82 if nginx is enabled, but using the nginx-alpine container, doesn't mean that a user is using an alpine nextcloud container, so setting the fsgroup to 82 here doesn't make sense:

securityContext:
{{- if .Values.nextcloud.podSecurityContext }}
{{- with .Values.nextcloud.podSecurityContext }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- else }}
{{- if .Values.nginx.enabled }}
# Will mount configuration files as www-data (id: 82) for nextcloud
fsGroup: 82
{{- else }}
# Will mount configuration files as www-data (id: 33) for nextcloud
fsGroup: 33
{{- end }}

Submitted PR here: #379 (but that only would fix the group ownership, not the user ownership)

Current thoughts...

Perhaps since bitnami's postgres chart also provides an init container to get around this, we should just provide that as well since k3s and rancher are pretty popular, and it's not pretty, but I don't really see a way around this so far? (there is a beta rootless mode for k3s, but I haven't dove into that yet)

@tomasodehnal
Copy link

tomasodehnal commented Apr 19, 2023

@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue grin ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?

The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.

@jessebot It's K3s on a Ubuntu VM on ESXi.

This is the manifest I use for the persistence.nextcloud volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud
  labels:
    type: local
spec:
  storageClassName: nextcloud
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/nextcloud"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nextcloud
  namespace: nextcloud
spec:
  storageClassName: nextcloud
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

And the respective excerpt from the values.yaml:

nextcloud:
  podSecurityContext:
    runAsUser: 1003
    runAsGroup: 1003
    runAsNonRoot: true
    fsGroup: 1003
persistence:
  enabled: true
  existingClaim: nextcloud

I was testing with fresh install without existing claims and I would say it works as expected:

  • volume is created using the local-path provider (as the default StorageClass was used)
  • nextcloud works and setup proceeds even when the owner is still root:root, because the host paths have 777 bitmask (based on your k3s issues links I'm not sure if this is the expected current behavior of the provisioner, but that's how it worked here)

Looking into your manifest there is one thing I noticed. You say you use local-path provider, but I believe that might not be the case. The reason is that you are creating the PV on your own and the name local-path is then used as a referrer to existing PV by the claim, it is not the actual storage class. You can easily find out in annotations of the PVC - can you see volume.kubernetes.io/storage-provisioner: rancher.io/local-path?
The PV should be created by the provider, so you might remove the definition from the manifest and keep only PVC.

I think the issue lies in the storage provider one uses and is not related to nextcloud:

  • when using 'manual' PVC with hostPath, we are on our own with the permissions as there is noone else to care about
  • when using a provisioner that will create the PV for you based on the PVC settings, it might work, if the provisioner supports the permissions handling

If you want resolve it regardless of storage used, I would say init container is the safe bet, but it will need to have privileged permissions.

One other observation - fsGroup was not respected and used in my test as I'm on 1.25.3. Looks it might be supported only since 1.25.4 k3s-io/k3s#6401.

@jessebot
Copy link
Collaborator Author

jessebot commented Apr 23, 2023

Popping very quickly to say I tested this on GKE with kubernetes.io/gce-pd provisioner, and the same issue happens :( :

root@nextcloud-web-app-68f6bb8fb6-nblkq:/var/www/html# ls -hal
total 196K
drwxrwsr-x 15 www-data www-data 4.0K Apr 23 14:52 .
drwxrwsr-x  4 root           82 4.0K Apr 23 14:52 ..
-rw-r--r--  1 www-data www-data 3.2K Apr 23 14:52 .htaccess
-rw-r--r--  1 www-data www-data  101 Apr 23 14:52 .user.ini
drwxr-sr-x 45 www-data www-data 4.0K Apr 23 14:52 3rdparty
-rw-r--r--  1 www-data www-data  19K Apr 23 14:52 AUTHORS
-rw-r--r--  1 www-data www-data  34K Apr 23 14:52 COPYING
drwxr-sr-x 50 www-data www-data 4.0K Apr 23 14:52 apps
drwxrwsr-x  2 root           82 4.0K Apr 23 14:52 config
-rw-r--r--  1 www-data www-data 4.0K Apr 23 14:52 console.php
drwxr-sr-x 24 www-data www-data 4.0K Apr 23 14:52 core
-rw-r--r--  1 www-data www-data 6.2K Apr 23 14:52 cron.php
drwxrwsr-x  2 www-data www-data 4.0K Apr 23 14:52 custom_apps
drwxrwsr-x  2 www-data www-data 4.0K Apr 23 14:52 data
drwxr-sr-x  2 www-data www-data  12K Apr 23 14:52 dist
-rw-r--r--  1 www-data www-data  156 Apr 23 14:52 index.html
-rw-r--r--  1 www-data www-data 3.4K Apr 23 14:52 index.php
drwxr-sr-x  6 www-data www-data 4.0K Apr 23 14:52 lib
-rw-r--r--  1 root           82    0 Apr 23 14:52 nextcloud-init-sync.lock
-rw-r-----  1 www-data www-data  14K Apr 23 14:54 nextcloud.log
-rwxr-xr-x  1 www-data www-data  283 Apr 23 14:52 occ
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocm-provider
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocs
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocs-provider
-rw-r--r--  1 www-data www-data 3.1K Apr 23 14:52 public.php
-rw-r--r--  1 www-data www-data 5.5K Apr 23 14:52 remote.php
drwxr-sr-x  4 www-data www-data 4.0K Apr 23 14:52 resources
-rw-r--r--  1 www-data www-data   26 Apr 23 14:52 robots.txt
-rw-r--r--  1 www-data www-data 2.4K Apr 23 14:52 status.php
drwxrwsr-x  3 www-data www-data 4.0K Apr 23 14:52 themes
-rw-r--r--  1 www-data www-data  384 Apr 23 14:52 version.php

I don't think this is specific to k3s anymore 🤔

@oliverhu
Copy link

oliverhu commented Jan 1, 2024

@jessebot I think I ran into the same issue #504 and I saw your perseverance tackling this... config folder is owned by root:root and thus the folder is empty. Were you able to find a fix for this issue?

@MrFishFinger
Copy link

MrFishFinger commented Aug 29, 2024

i also stumbled onto this situation, where mounting a rancher.io/local-path PVC into k3s, results in the directory being owned as root. setting securityContext.fsGroup does change the directory group - just not the owner

i also observed the same behaviour with the kubernetes.io/aws-ebs provisioner on EKS. I am not sure if this is actually a bug, or if this is just working as expected? at least from these discussions, it seems like this is known behaviour:

...

anyway, at least for my usecase, i was able to get a non-root nextcloud container running by setting the php config check_data_directory_permissions. i also got non-root nginx running by using image nginxinc/nginx-unprivileged:alpine.

below is a partial extract from my values.caml file. maybe this can help someone in the same boat?

image:
  flavor: fpm

persistence:
  enabled: true
  existingClaim: nextcloud-pvc

nextcloud:
  ...
  podSecurityContext:
   runAsUser: 33
   runAsGroup: 33
   runAsNonRoot: true
   readOnlyRootFilesystem: false
  configs:
    custom.config.php: |
      <?php
        $CONFIG = array(
          'check_data_directory_permissions' => false, # https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/
        );

nginx:
  enabled: true
  image:
    repository: nginxinc/nginx-unprivileged
    tag: alpine
    pullPolicy: IfNotPresent
  securityContext:
    runAsUser: 101
    runAsGroup: 101
    runAsNonRoot: true
    readOnlyRootFilesystem: false
...

PVC definition:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nextcloud-pvc
  namespace: nextcloud
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Mi
  storageClassName: local-path
  volumeMode: Filesystem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Persistence Anything to do with external storage or persistence. This is also where we triage things like NFS. securityContext issues related security contexts
Projects
None yet
6 participants