Skip to content

Named Queues Proposal

betamatt edited this page Apr 12, 2011 · 20 revisions

Table of Contents

Purpose

Add Resque-style named queues while retaining DJ-style priority. Named queues provide a convenient system for grouping tasks to be worked by separate pools of workers, which may be scaled and controlled individually.

Named queues should be:

  • arbitrarily named and created on-the-fly (just put a job in a queue and that queue exists)
  • ignored by default (Workers should pull from all queues unless told not to for backwards compatibility)
  • efficient (Backends must properly index so that queues are efficient even when large)
  • ignorant of the priority and delaying system. Priority and run_at should continue to function as before when jobs are segmented by queue.

API Changes

Old ways still work:

  object.delay.method
  Delayed::Job.enqueue job

Assigning to a queue:

  object.delay(:queue => 'tracking').method
  Delayed::Job.enqueue job, :queue => 'tracking'

Commandline Changes

If no queue option is specified, workers will work all queues. More than one queue may be specified to a worker. Both singular and plural queue option names are provided for convenience and are interchangeable.

rake:

  QUEUE=tracking rake jobs:run
  QUEUES=one,two rake jobs:run

daemon:

  delayed_job --queue=tracking start
  delayed_job --queues=tracking,two start

Schema Change

New column: queue (string, nullable) Change index: add queue to existing priority,run_at index

TODO:

  • determine if queue field should come at the start or end of the index
  • determine if it's more efficient to make queue not-null and use a default queue name

Target Release

Since there is both a schema change for the AR backend and implementation changes for all backends to coordinate, this feature should be targeted for the next minor-version release of DJ (2.2). The upgrade process from 2.0 and 2.1 should be documented and a generator provided for the AR schema migration.

Comments/Questions

(John H.) What is the goal you're trying to accomplish here? I'm guessing to separate long-running and short-running jobs so that long-running (or error-prone) classes of jobs don't bottleneck short-running jobs that you might want to be more responsive. What about being able to specify a min-priority on workers? Would that let you accomplish what you're after? To illustrate:

  • create 2 workers
  • set one workers min priority to 50
  • set short-running/responsive jobs to have higher priority than slow or long-running jobs
  • now, one worker will never pick up long-running tasks, and you'll always have one working on just high-priority jobs. You could scale this and manage the priority of jobs to ensure you always have workers waiting for high-priority tasks, and then just have one digging through your low-priority backlog.
(Matt G.) While priority grouping can accomplish many of the same things as named queues, it's inexpressive and difficult to manage. Named queues allow clear management of jobs by queue in ways that aren't possible without segmenting jobs into small, meaningless, priority ranges. For example:
  • temporarily disable processing of a class of jobs (due to load, debugging, etc)
  • only run certain jobs in bursts when there is excess capacity (batch)
  • allow a normally low priority job to have a higher-priority instance (to "catchup" a missed job, etc)
(Zach M.) My opinion is that when I get to this granular of control, I need to move to Resque. I love that DJ is a simple "I want to add background processing to my app". When I exceed that and need to manage workers and queues, I feel like I need to use Resque instead. I'm not saying DJ is unsuitable for named queues, but it seems like when you start using named queues other existing projects will be a better fit.

(Matt G.) I get the argument for not complicating DJ. The "just add backgrounding" use case should always be dirt simple and I don't feel this proposal impacts that case. I worry that "just use Resque" is going to be used as an argument any time we steal a convenient feature. Resque is awesome but it's misleading to assume that anyone with queue needs that are more complicated than average wants to add Redis to their infrastructure.

(Josh G.) Does this allow prioritizing of one queue over another? (Matt G.) No. Specifying queues for a worker filters jobs, setting priority controls the order they are processed.