Skip to content

Commit

Permalink
Add a guide on scaling (#1164)
Browse files Browse the repository at this point in the history
The new guide is largely based on the "Scaling Oban Applications" talk we gave
at ElixirConf EU 2024. It outlines seven different scaling obstacles and potential
solutions to deal with them.
  • Loading branch information
sorenone authored Oct 23, 2024
1 parent f7c0be3 commit 18d90d4
Show file tree
Hide file tree
Showing 2 changed files with 188 additions and 0 deletions.
187 changes: 187 additions & 0 deletions guides/scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Scaling Applications

## Notifications

Oban uses PubSub notifications for communication between nodes, like job inserts, pausing queues,
resuming queues, and metrics for Web. The default notifier is `Oban.Notifiers.Postgres`, which
sends all messages through the database. Postgres' notifications adds up at scale because each one
requires a separate query.

If you're clustered, switch to an alternative notifier like `Oban.Notifiers.PG`. That keeps
notifications out of the db, reduces total queries, and allows larger messages. As long as you
have a functional Distributed Erlang cluster, then it’s a single line change to your Oban
config.

```diff
config :my_app, Oban,
+ notifier: Oban.Notifiers.PG,
```

If you're not clustered, consider using [`Oban.Notifiers.Phoenix`][onp] to send notifications
through an alternative service like Redis.

[onp]: https://github.com/sorentwo/oban_notifiers_phoenix

## Triggers

Inserting jobs emits a trigger notification to let queues know there are jobs to process
immediately, without waiting up to 1s for the next polling interval. Triggers may create many
notifications for active queues.

Evaluate if you need sub-second job dispatch. Without it, jobs may wait up to 1s before running,
but that’s not a concern for busy queues since they’re constantly fetching and dispatching.

Disable triggers in your Oban configuration:

```diff
config :my_app, Oban,
+ insert_trigger: false,
```

## Uniqueness

Frequently, people set uniqueness for jobs that don’t really need it. Not you, of course.
Before setting uniqueness, ensure the following, in a very checklist type fashion:

1. Evaluate whether it’s necessary for your workload
2. Always set a `keys` option so that uniqueness isn’t based on the full `args` or `meta`
3. Avoid setting a `period` at all if possible, use `period: :infinity` instead

If you're still committed to setting uniquness for your jobs, consider tweaking your
configuration as follows:

```diff
use Oban.Worker, unique: [
- period: {1, :hour},
+ period: :infinity,
+ keys: [:some_key]
```

> #### 🌟 Pro Uniqueness {: .tip}
>
> Oban Pro uses an [alternative mechanism for unique jobs][uniq] that works for bulk inserts, and
> is designed for speed, correctness, scalability, and simplicity. Uniqueness is enforced and makes insertion entirely safe between processes and nodes, without the load added
> by multiple queries.
[uniq]: https://oban.pro/docs/pro/1.5.0-rc.4/Oban.Pro.Engines.Smart.html#module-enhanced-unique

## Reindexing

To stop oban_jobs indexes from taking up so much space on disk, use the
[`Oban.Plugins.Reindexer`][onp] plugin to rebuild indexes periodically. The Postgres transactional
model applies to indexes as well as tables. That leaves bloat from inserting, updating, and
deleting jobs that auto-vacuuming won’t always fix.

The reindexer rebuilds key indexes on a fixed schedule, concurrently. Concurrent rebuilds are low
impact, they don’t lock the table, and they free up space while optimizing indexes.

The [`Oban.Plugins.Reindexer`][onp] plugin is part of OSS Oban. It runs every day at midnight by
default, but it accepts a cron-style schedule and you can tweak it to run less frequently.

```diff
config :my_app, Oban,
plugins: [
+ {Oban.Plugins.Reindexer, schedule: "@weekly"},
]
```

## Pruning

Ensuring you are using the `Pruner` plugin, and that you prune _aggressively_. Pruning
periodically deletes `completed`, `cancelled`, and `discarded` jobs. Your application
and database will benefit from keeping the jobs table small. Aim to retain as few jobs
as necessary for uniqueness and historic introspection.

For example, to limit historic jobs to 1 day:

```diff
config :my_app, Oban,
plugins: [
+ {Oban.Plugins.Pruner, max_age: 1_day_in_seconds}
]
```

The default auto vacuum settings are conservative and may fall behind on active tables. Dead
tuples accumulate until autovacuum proc comes to mark them as cleanable.

Like indexes, the MVCC system only flags rows for deletion later. Then, those rows are deleted
when the auto-vacuum runs. Autovacuum can be tweaked for the oban_jobs table alone.Tune autovacuum
for the oban_jobs table.

The exact scale factor tuning will vary based on total rows, table size, and database load.

Below is an example of the possible scale factor and threshold:

```diff
ALTER TABLE oban_jobs SET (
autovacuum_vacuum_scale_factor = 0,
autovacuum_vacuum_threshold = 100
)
```

> #### 🌟 Partitioning {: .tip}
>
> For _extreme_ load (tens of millions of jobs a day), Oban Pro’s [DynamicPartitioner][dynp] may
> help. It manages partitioned tables to drop older jobs without any bloat. Dropping tables
> entirely is instantaneous and leaves zero bloat. Autovacuuming each partition is faster as well.
[dynp]: https://oban.pro/docs/pro/1.5.0-rc.4/Oban.Pro.Plugins.DynamicPartitioner.html

## Pooling

Oban uses connections from your application Repo’s pool to talk to the database. When that pool
is busy, it can starve Oban of connections and you’ll see timeout errors. Likewise, if Oban is
extremely busy (as it should be), it can starve your application of connections. A good solution
for this is to set up another pool that’s exclusively for Oban’s internal use. The dedicated
pool isolates Oban’s queries from the rest of the application.

Start by defining a new `ObanRepo`:

```elixir
defmodule MyApp.ObanRepo do
use Ecto.Repo,
adapter: Ecto.Adapters.Postgres,
otp_app: :my_app
end
```

Then switch the configured `repo`, and use `get_dynamic_repo` to ensure the same repo is used
within a transaction:

```diff
config :my_app, Oban,
- repo: MyApp.Repo,
+ repo: MyApp.ObanRepo,
+ get_dynamic_repo: fn -> if MyApp.Repo.in_transaction?(), do: MyApp.Repo, else: MyApp.ObanRepo end
...
```

## High Concurrency

In a busy system with high concurrency all of the record keeping after jobs run causes pool
contention, despite the individual queries being very quick. Fetching jobs uses a single query
per queue. However, acking when a job finishes takes a single connection for each job.

Improve the ratio between executing jobs and available connections by scaling up your Ecto
`pool_size` and minimizing concurrency between all queues.

```diff
config :my_app, Repo,
- pool_size: 10,
+ pool_size: 50,

config :my_app, Oban,
queues: [
- events: 200,
+ events: 50,
- emails: 100,
+ emails: 25,
```

Using a dedicated pool with a known number of constant connections can also help the ratio. It’s
not necessary for most applications, but a dedicated database can help maintain predictable
performance.
1 change: 1 addition & 0 deletions mix.exs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ defmodule Oban.MixProject do
# Guides
"guides/installation.md",
"guides/preparing_for_production.md",
"guides/scaling.md",
"guides/troubleshooting.md",
"guides/release_configuration.md",
"guides/writing_plugins.md",
Expand Down

0 comments on commit 18d90d4

Please sign in to comment.