Celery lms/cms-worker consumes too much RAM #1126

igobranco · 2024-09-27T16:59:14Z

Bug description

The normal execution of the celery workers, lms-worker and cms-worker services, consumes too much memory by default.
Tested on my tutor local, but I'm assuming that with tutor dev is the same.
By default the celery start 1 process by each CPU. If the deployed laptop or server has multiple CPU architecture, then the celery will launch multiple OS processes.
Each celery OS process consumes memory.
This makes more CPUs you have, more memory RAM the celery worker will consume.
The problem that I see is that on a deployed server, the operator probably don't want this magic, and want to control how much celery OS process each container/pod will start and consequently how much memory each container/pods will consume.

How to reproduce

Run a tutor local environment and see how much RAM your celery workers is consuming.
On 16 CPU server/laptop an idle environment each container lms/cms-worker consumes >2GB RAM per container.

If you add --concurrency=1 to the lms/cms-worker command an idle tutor local env uses <300MB per container.

Environment

tutor, version 14.2.3
But this applies also to newer version.

Solution (my opinion)

Add a way to parameterize the concurrency on each worker, ex: --concurrency={{ LMS_WORKER_CELERY_CONCURRENCY }}
Each celery worker by default should always consume the minimum memory, so it should defaults to just 1.
In my opinion there should be an easy way to customize both celery commands, lms-worker and cms-worker. Ex. adding the --prefetch-multiplier=1 --without-gossip --without-mingle

The text was updated successfully, but these errors were encountered:

regisb · 2024-09-30T09:14:08Z

Related PR: #1010

For more information regarding additional options, see: edx/configuration#68

regisb · 2024-09-30T13:10:55Z

Hi Ivo, I agree that we need to provide a way to override the number of concurrent workers. And we also need to provide good defaults for that value.

In that context, I'm curious to have your take on this comment by @dkaliberda, where he makes the case that we should default to concurrency=1 and instead scale the number of replicas.

I'm not sure I agree with that comment, because in my experience there is quite a bit of overhead incurred by scaling celery workers horizontally (by increasing the number of replicas) as opposed to vertically (by increasing the number of workers). For instance, here are the figures for memory usage on my laptop, in idle mode:

1 replica x 1 worker: 319 MB
1 replica x 4 workers: 746 MB
4 replicas x 1 worker: 1280 MB

In your current use case, what would be the ideal number of replicas/workers? (both for LMS and CMS)

EDIT: I just learned about process autoscaling in Celery and I'm very tempted to use that as the default. Of course, we would still need to provide a mechanism to override that.

igobranco · 2024-10-01T11:34:07Z

Hi @regisb, I think a default concurrency=1 is a good value. Nevertheless, I prefer that it should be parameterized.
Currently, for our Kubernetes PROD deployment I have min 6 replicas, each with concurrency=1, but an horizontal auto scaler for 20. I have dedicated hardware so having 6 replicas always up is not a problem for us. Only sporadic is that it upscale up.

About the autoscaling in Celery, I also just found out it! I think I won't change my current setup, because it's just working. But if had found out before, I think I would be tempted to just use it. Even for our case just fixed 2 replicas with a vertical autoscale on Celery would be good option. The good news is that it would benefit everyone, docker compose or K8s installations.
--concurrency {{ OPENEDX_CMS_CELERY_WORKERS }} --autoscale={{ OPENEDX_CMS_CELERY_MAX_WORKERS }},{{ OPENEDX_CMS_CELERY_MIN_WORKERS }}

3 configurations with:

OPENEDX_CMS_CELERY_WORKERS default 1
OPENEDX_CMS_CELERY_MAX_WORKERS default 1
OPENEDX_CMS_CELERY_MIN_WORKERS default 1

An upgrade note on the docs could be added to configure a proper value of OPENEDX_CMS_CELERY_MAX_WORKERS, like number of CPUs

For example this is a snippet of my custom tutor plugin to override the workers:

apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cms-worker
    spec:
      template:
        spec:
          terminationGracePeriodSeconds: 900
          containers:
            - name: cms-worker
              args:
                - celery
                - --app=cms.celery
                - worker
                - --loglevel=info
                - --concurrency=1
                - --hostname=edx.cms.core.default.%%h
                - --max-tasks-per-child=100
                - --prefetch-multiplier=1
                - --exclude-queues=edx.lms.core.default
                - --without-gossip
                - --without-mingle

The terminationGracePeriodSeconds is to prevent the Instructors export CSVs to be terminated during the export.
Recently, I had to add --without-gossip and --without-mingle to make the celery workers recover from a Redis downtime, celery/celery#7276

regisb · 2024-10-01T12:15:31Z

While I do think that administrators should be able to customise the default celery concurrency, I disagree that --concurrency=1 is the better default -- because it's much more expensive. According to my testing (see my comment here) running multiple replicas with a single process has a bigger overhead in terms of memory usage. (although I'm not sure exactly why).

I think that --autoscale=4,1 would be a good default for most people. We should be able to override this default setting using config patches or filters. Thus, I'm not inclined to merge #1010. I'll propose a different implementation.

I haven't looked yet at the other options that you are suggesting -- I'll investigate once I start working on this.

regisb added the feature request New features will be processed by decreasing priority label Oct 1, 2024

regisb mentioned this issue Oct 1, 2024

feat!: don't run Celery workers in dev mode #1041

Closed

This was referenced Oct 1, 2024

feat: Celery worker concurrency setting #1010

Open

Add support for running multiple Celery queues #1130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery lms/cms-worker consumes too much RAM #1126

Celery lms/cms-worker consumes too much RAM #1126

igobranco commented Sep 27, 2024

regisb commented Sep 30, 2024 •

edited

Loading

regisb commented Sep 30, 2024 •

edited

Loading

igobranco commented Oct 1, 2024

regisb commented Oct 1, 2024 •

edited

Loading

Celery lms/cms-worker consumes too much RAM #1126

Celery lms/cms-worker consumes too much RAM #1126

Comments

igobranco commented Sep 27, 2024

regisb commented Sep 30, 2024 • edited Loading

regisb commented Sep 30, 2024 • edited Loading

igobranco commented Oct 1, 2024

regisb commented Oct 1, 2024 • edited Loading

regisb commented Sep 30, 2024 •

edited

Loading

regisb commented Sep 30, 2024 •

edited

Loading

regisb commented Oct 1, 2024 •

edited

Loading