XUtils

GoodJob

GoodJob is a multithreaded, Postgres-based, ActiveJob backend for Ruby on Rails.


Compatibility

  • Ruby on Rails: 6.0+
  • Ruby: Ruby 2.6+. JRuby 9.3+
  • Postgres: 10.0+

Configuration

Command-line options

There are several top-level commands available through the good_job command-line tool.

Configuration options are available with help.

Configuration options

Active Job configuration depends on where the code is placed:

  • config.active_job.queue_adapter = :good_job within config/application.rb or config/environments/*.rb.
  • ActiveJob::Base.queue_adapter = :good_job within an initializer (e.g. config/initializers/active_job.rb).

GoodJob configuration can be placed within Rails config directory for all environments (config/application.rb), within a particular environment (e.g. config/environments/development.rb), or within an initializer (e.g. config/initializers/good_job.rb).

Configuration examples:

# config/environments/development.rb
config.active_job.queue_adapter = :good_job
config.good_job.execution_mode = :async

# config/environments/test.rb
config.active_job.queue_adapter = :good_job
config.good_job.execution_mode = :inline

# config/environments/production.rb
config.active_job.queue_adapter = :good_job
config.good_job.execution_mode = :external

Global options

Good Job’s general behavior can also be configured via attributes directly on the GoodJob module:

  • GoodJob.configure_active_record { ... } Inject Active Record configuration into GoodJob’s base model, for example, when using multiple databases with Active Record or when other custom configuration is necessary for the Active Record model to connect to the Postgres database. Example:

    # config/initializers/good_job.rb
    GoodJob.configure_active_record do
      connects_to database: :special_database
      self.table_name_prefix = "special_application_"
    end
    
  • GoodJob.active_record_parent_class (string) Alternatively, modify the Active Record parent class inherited by GoodJob’s Active Record model GoodJob::Job (defaults to "ActiveRecord::Base"). Configure this The value must be a String to avoid premature initialization of Active Record.

You’ll generally want to configure these in config/initializers/good_job.rb, like so:

# config/application.rb
require_relative 'boot'
require 'rails/all'
require 'good_job/engine' # <= Add this line
# ...

API-only Rails applications

API-only Rails applications may not have all of the required Rack middleware for the GoodJob Dashboard to function. To re-add the middleware:

# config/application.rb
module MyApp
  class Application < Rails::Application
    #...
    config.middleware.use Rack::MethodOverride
    config.middleware.use ActionDispatch::Flash
    config.middleware.use ActionDispatch::Cookies
    config.middleware.use ActionDispatch::Session::CookieStore
  end
end

Live polling

The Dashboard can be set to automatically refresh by checking “Live Poll” in the Dashboard header, or by setting ?poll=10 with the interval in seconds (default 30 seconds).

Extending dashboard views

GoodJob exposes some views that are intended to be overriden by placing views in your application:

Warning: these partials expose classes (such as GoodJob::Job) that are considered internal implementation details of GoodJob. You should always test your custom partials after upgrading GoodJob.

For example, if your app deals with widgets and you want to show a link to the widget a job acted on, you can add the following to app/views/good_job/_custom_job_details.html.erb:

<%# file: app/views/good_job/_custom_job_details.html.erb %>
<% arguments = job.active_job.arguments rescue [] %>
<% widgets = arguments.select { |arg| arg.is_a?(Widget) } %>
<% if widgets.any? %>
  <div class="my-4">
    <h5>Widgets</h5>
    <ul>
      <% widgets.each do |widget| %>
        <li><%= link_to widget.name, main_app.widget_url(widget) %></li>
      <% end %>
    </ul>
  </div>
<% end %>

As a second example, you may wish to show a link to a log aggregator next to each job execution. You can do this by adding the following to app/views/good_job/_custom_execution_details.html.erb:

<%# file: app/views/good_job/_custom_execution_details.html.erb %>
<div class="py-3">
  <%= link_to "Logs", main_app.logs_url(filter: { job_id: job.id }, start_time: execution.performed_at, end_time: execution.finished_at + 1.minute) %>
</div>

Job priority

Higher priority numbers run first in all versions of GoodJob v3.x and below. GoodJob v4.x will change job priority to give smaller numbers higher priority (default: 0), in accordance with Active Job’s definition of priority (see #524). To opt-in to this behavior now, set config.good_job.smaller_number_is_higher_priority = true in your GoodJob initializer or application.rb.

Labelled jobs

Labels are the recommended way to add context or metadata to specific jobs. For example, all jobs that have a dependency on an email service could be labeled email. Using labels requires adding the Active Job extension GoodJob::ActiveJobExtensions::Labels to your job class.

class ApplicationJob < ActiveJob::Base
  include GoodJob::ActiveJobExtensions::Labels
end

# Add a default label to every job within the class
class WelcomeJob < ApplicationJob
  self.good_job_labels = ["email"]

  def perform
    # Labels can be inspected from within the job
    puts good_job_labels # => ["email"]
  end
end

# Or add to individual jobs when enqueued
WelcomeJob.set(good_job_labels: ["email"]).perform_later

Labels can be used to search jobs in the Dashboard. For example, to find all jobs labeled email, search for email.

Concurrency controls

GoodJob can extend Active Job to provide limits on concurrently running jobs, either at time of enqueue or at perform. Limiting concurrency can help prevent duplicate, double or unnecessary jobs from being enqueued, or race conditions when performing, for example when interacting with 3rd-party APIs.

class MyJob < ApplicationJob
  include GoodJob::ActiveJobExtensions::Concurrency

  good_job_control_concurrency_with(
    # Maximum number of unfinished jobs to allow with the concurrency key
    # Can be an Integer or Lambda/Proc that is invoked in the context of the job
    total_limit: 1,

    # Or, if more control is needed:
    # Maximum number of jobs with the concurrency key to be
    # concurrently enqueued (excludes performing jobs)
    # Can be an Integer or Lambda/Proc that is invoked in the context of the job
    enqueue_limit: 2,

    # Maximum number of jobs with the concurrency key to be
    # concurrently performed (excludes enqueued jobs)
    # Can be an Integer or Lambda/Proc that is invoked in the context of the job
    perform_limit: 1,

    # Maximum number of jobs with the concurrency key to be enqueued within
    # the time period, looking backwards from the current time. Must be an array
    # with two elements: the number of jobs and the time period.
    enqueue_throttle: [10, 1.minute],

    # Maximum number of jobs with the concurrency key to be performed within
    # the time period, looking backwards from the current time. Must be an array
    # with two elements: the number of jobs and the time period.
    perform_throttle: [100, 1.hour],

    # Note: Under heavy load, the total number of jobs may exceed the
    # sum of `enqueue_limit` and `perform_limit` because of race conditions
    # caused by imperfectly disjunctive states. If you need to constrain
    # the total number of jobs, use `total_limit` instead. See #378.

    # A unique key to be globally locked against.
    # Can be String or Lambda/Proc that is invoked in the context of the job.
    #
    # If a key is not provided GoodJob will use the job class name.
    #
    # To disable concurrency control, for example in a subclass, set the
    # key explicitly to nil (e.g. `key: nil` or `key: -> { nil }`)
    #
    # If you provide a custom concurrency key (for example, if concurrency is supposed
    # to be controlled by the first job argument) make sure that it is sufficiently unique across
    # jobs and queues by adding the job class or queue to the key yourself, if needed.
    #
    # Note: When using a model instance as part of your custom concurrency key, make sure
    # to explicitly use its `id` or `to_global_id` because otherwise it will not stringify as expected.
    #
    # Note: Arguments passed to #perform_later can be accessed through Active Job's `arguments` method
    # which is an array containing positional arguments and, optionally, a kwarg hash.
    key: -> { "#{self.class.name}-#{queue_name}-#{arguments.first}-#{arguments.last[:version]}" } #  MyJob.perform_later("Alice", version: 'v2') => "MyJob-default-Alice-v2"
  )

  def perform(first_name, version:)
    # do work
  end
end

When testing, the resulting concurrency key value can be inspected:

job = MyJob.perform_later("Alice", version: 'v1')
job.good_job_concurrency_key #=> "MyJob-default-Alice-v1"

How concurrency controls work

GoodJob’s concurrency control strategy for perform_limit is “optimistic retry with an incremental backoff”. The code is readable.

  • “Optimistic” meaning that the implementation’s performance trade-off assumes that collisions are atypical (e.g. two users enqueue the same job at the same time) rather than regular (e.g. the system enqueues thousands of colliding jobs at the same time). Depending on your concurrency requirements, you may also want to manage concurrency through the number of GoodJob threads and processes that are performing a given queue.
  • “Retry with an incremental backoff” means that when perform_limit is exceeded, the job will raise a GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError which is caught by a retry_on handler which re-schedules the job to execute in the near future with an incremental backoff.
  • First-in-first-out job execution order is not preserved when a job is retried with incremental back-off.
  • For pessimistic usecases that collisions are expected, use number of threads/processes (e.g., good_job --queues "serial:1;-serial:5") to control concurrency. It is also a good idea to use perform_limit as backstop.

config/environments/application.rb or a specific environment e.g. production.rb

Enable cron in this process, e.g., only run on the first Heroku worker process

config.good_job.enable_cron = ENV[‘DYNO’] == ‘worker.1’ # or true or via $GOOD_JOB_ENABLE_CRON

Configure cron with a hash that has a unique key for each recurring job

config.good_job.cron = { # Every 15 minutes, enqueue ExampleJob.set(priority: -10).perform_later(42, "life", name: "Alice") frequent_task: { # each recurring job must have a unique key

cron: "*/15 * * * *", # cron-style scheduling format by fugit gem
class: "ExampleJob", # name of the job class as a String; must reference an Active Job job class
args: [42, "life"], # positional arguments to pass to the job; can also be a proc e.g. `-> { [Time.now] }`
kwargs: { name: "Alice" }, # keyword arguments to pass to the job; can also be a proc e.g. `-> { { name: NAMES.sample } }`
set: { priority: -10 }, # additional Active Job properties; can also be a lambda/proc e.g. `-> { { priority: [1,2].sample } }`
description: "Something helpful", # optional description that appears in Dashboard

}, production_task: {

cron: "0 0,12 * * *",
class: "ProductionJob",
enabled_by_default: -> { Rails.env.production? } # Only enable in production, otherwise can be enabled manually through Dashboard

}, complex_schedule: {

class: "ComplexScheduleJob",
cron: -> (last_ran) { (last_ran.blank? ? Time.now : last_ran + 14.hours).at_beginning_of_minute }

} # etc. }


### Bulk enqueue

GoodJob's Bulk-enqueue functionality can buffer and enqueue multiple jobs at once, using a single INSERT statement. This can more performant when enqueuing a large number of jobs.

```ruby
# Capture jobs using `.perform_later`:
active_jobs = GoodJob::Bulk.enqueue do
  MyJob.perform_later
  AnotherJob.perform_later
  # If an exception is raised within this block, no jobs will be inserted.
end

# All Active Job instances are returned from GoodJob::Bulk.enqueue.
# Jobs that have been successfully enqueued have a `provider_job_id` set.
active_jobs.all?(&:provider_job_id)

# Bulk enqueue Active Job instances directly without using `.perform_later`:
GoodJob::Bulk.enqueue([MyJob.new, AnotherJob.new])

Batches

Batches track a set of jobs, and enqueue an optional callback job when all of the jobs have finished (succeeded or discarded).

  • A simple example that enqueues your MyBatchCallbackJob after the two jobs have finished, and passes along the current user as a batch property:

    GoodJob::Batch.enqueue(on_finish: MyBatchCallbackJob, user: current_user) do
      MyJob.perform_later
      OtherJob.perform_later
    end
    
    # When these jobs have finished, it will enqueue your `MyBatchCallbackJob.perform_later(batch, options)`
    class MyBatchCallbackJob < ApplicationJob
      # Callback jobs must accept a `batch` and `options` argument
      def perform(batch, params)
        # The batch object will contain the Batch's properties, which are mutable
        batch.properties[:user] # => <User id: 1, ...>
    
    
        # Params is a hash containing additional context (more may be added in the future)
        params[:event] # => :finish, :success, :discard
      end
    end
    
  • Jobs can be added to an existing batch. Jobs in a batch are enqueued and performed immediately/asynchronously. The final callback job will not be enqueued until GoodJob::Batch#enqueue is called.

    batch = GoodJob::Batch.new
    batch.add do
      10.times { MyJob.perform_later }
    end
    
    
    batch.add do
      10.times { OtherJob.perform_later }
    end
    batch.enqueue(on_finish: MyBatchCallbackJob, age: 42)
    
  • If you need to access the batch within a job that is part of the batch, include GoodJob::ActiveJobExtensions::Batches in your job class:

    class MyJob < ApplicationJob
      include GoodJob::ActiveJobExtensions::Batches

      def perform
        self.batch # => <GoodJob::Batch id: 1, ...>
      end
    end
    ```

- [`GoodJob::Batch`](app/models/good_job/batch.rb) has a number of assignable attributes and methods:

```ruby
batch = GoodJob::Batch.new
batch.description = "My batch"
batch.on_finish = "MyBatchCallbackJob" # Callback job when all jobs have finished
batch.on_success = "MyBatchCallbackJob" # Callback job when/if all jobs have succeeded
batch.on_discard = "MyBatchCallbackJob" # Callback job when the first job in the batch is discarded
batch.callback_queue_name = "special_queue" # Optional queue for callback jobs, otherwise will defer to job class
batch.callback_priority = 10 # Optional priority name for callback jobs, otherwise will defer to job class
batch.properties = { age: 42 } # Custom data and state to attach to the batch
batch.add do
  MyJob.perform_later
end
batch.enqueue

batch.discarded? # => Boolean
batch.discarded_at # => <DateTime>
batch.finished? # => Boolean
batch.finished_at # => <DateTime>
batch.succeeded? # => Boolean
batch.active_jobs # => Array of ActiveJob::Base-inherited jobs that are part of the batch

batch = GoodJob::Batch.find(batch.id)
batch.description = "Updated batch description"
batch.save
batch.reload

Batch callback jobs

Batch callbacks are Active Job jobs that are enqueued at certain events during the execution of jobs within the batch:

  • :finish - Enqueued when all jobs in the batch have finished, after all retries. Jobs will either be discarded or succeeded.
  • :success - Enqueued only when all jobs in the batch have finished and succeeded.
  • :discard - Enqueued immediately the first time a job in the batch is discarded.

Callback jobs must accept a batch and params argument in their perform method:

class MyBatchCallbackJob < ApplicationJob
  def perform(batch, params)
    # The batch object will contain the Batch's properties
    batch.properties[:user] # => <User id: 1, ...>
    # Batches are mutable
    batch.properties[:user] = User.find(2)
    batch.save

    # Params is a hash containing additional context (more may be added in the future)
    params[:event] # => :finish, :success, :discard
  end
end

Complex batches

Consider a multi-stage batch with both parallel and serial job steps:

graph TD
    0{"BatchJob\n{ stage: nil }"}
    0 --> a["WorkJob]\n{ step: a }"]
    0 --> b["WorkJob]\n{ step: b }"]
    0 --> c["WorkJob]\n{ step: c }"]
    a --> 1
    b --> 1
    c --> 1
    1{"BatchJob\n{ stage: 1 }"}
    1 --> d["WorkJob]\n{ step: d }"]
    1 --> e["WorkJob]\n{ step: e }"]
    e --> f["WorkJob]\n{ step: f }"]
    d --> 2
    f --> 2
    2{"BatchJob\n{ stage: 2 }"}

This can be implemented with a single, mutable batch job:

class WorkJob < ApplicationJob
  include GoodJob::ActiveJobExtensions::Batches

  def perform(step)
    # ...
    if step == 'e'
      batch.add { WorkJob.perform_later('f') }
    end
  end
end

class BatchJob < ApplicationJob
  def perform(batch, options)
    if batch.properties[:stage].nil?
      batch.enqueue(stage: 1) do
        WorkJob.perform_later('a')
        WorkJob.perform_later('b')
        WorkJob.perform_later('c')
      end
    elsif batch.properties[:stage] == 1
      batch.enqueue(stage: 2) do
        WorkJob.perform_later('d')
        WorkJob.perform_later('e')
      end
    elsif batch.properties[:stage] == 2
      # ...
    end
  end
end

GoodJob::Batch.enqueue(on_finish: BatchJob)

Other batch details

  • Whether to enqueue a callback job is evaluated once the batch is in an enqueued?-state by using GoodJob::Batch.enqueue or batch.enqueue.
  • Callback job enqueueing will be re-triggered if additional jobs are enqueue’d to the batch; use add to add jobs to the batch without retriggering callback jobs.
  • Callback jobs will be enqueued even if the batch contains no jobs.
  • Callback jobs perform asynchronously. It’s possible that :finish and :success or :discard callback jobs perform at the same time. Keep this in mind when updating batch properties.
  • Batch properties are serialized using Active Job serialization. This is flexible, but can lead to deserialization errors if a GlobalID record is directly referenced but is subsequently deleted and thus unloadable.
  • 🚧Batches are a work in progress. Please let us know what would be helpful to improve their functionality and usefulness.

Updating

GoodJob follows semantic versioning, though updates may be encouraged through deprecation warnings in minor versions.

Upgrading minor versions

Upgrading between minor versions (e.g. v1.4 to v1.5) should not introduce breaking changes, but can introduce new deprecation warnings and database migration warnings.

Database migrations introduced in minor releases are not required to be applied until the next major release. If you would like to apply newly introduced migrations immediately, assert GoodJob.migrated? in your application’s test suite.

To perform upgrades to the GoodJob database tables:

  1. Generate new database migration files:

    bin/rails g good_job:update
    

Optional: If using Rails’ multiple databases with the migrations_paths configuration option, use the --database option:

```bash
bin/rails g good_job:update --database animals
```
  1. Run the database migration locally

    bin/rails db:migrate
    
  2. Commit the migration files and resulting db/schema.rb changes.

  3. Deploy the code, run the migrations against the production database, and restart server/worker processes.

Upgrading v1 to v2

GoodJob v2 introduces a new Advisory Lock key format that is operationally different than the v1 advisory lock key format; it’s therefore necessary to perform a simple, but staged production upgrade. If you are already using >= v1.12+ no other changes are necessary.

  1. Upgrade your production environment to v1.99.x following the minor version upgrade process, including database migrations. v1.99 is a transitional release that is safely compatible with both v1.x and v2.0.0 because it uses both v1- and v2-formatted advisory locks.
  2. Address any deprecation warnings generated by v1.99.
  3. Upgrade your production environment from v1.99.x to v2.0.x again following the minor upgrade process.

Notable changes:

  • Renames :async_server execution mode to :async; renames prior :async execution mode to :async_all.
  • Sets default Development environment’s execution mode to :async with disabled polling.
  • Excludes performing jobs from enqueue_limit’s count in GoodJob::ActiveJobExtensions::Concurrency.
  • Triggers GoodJob.on_thread_error for unhandled Active Job exceptions.
  • Renames GoodJob.reperform_jobs_on_standard_error accessor to GoodJob.retry_on_unhandled_error.
  • Renames GoodJob::Adapter.shutdown(wait:) argument to GoodJob::Adapter.shutdown(timeout:).
  • Changes Advisory Lock key format from good_jobs[ROW_ID] to good_jobs-[ACTIVE_JOB_ID].
  • Expects presence of columns good_jobs.active_job_id, good_jobs.concurrency_key, good_jobs.concurrency_key, and good_jobs.retried_good_job_id.

Go deeper

Exceptions

Active Job provides tools for rescuing and retrying exceptions, including retry_on, discard_on, rescue_from that will rescue exceptions before they get to GoodJob.

If errors do reach GoodJob, you can assign a callable to GoodJob.on_thread_error to be notified. For example, to log errors to an exception monitoring service like Sentry (or Bugsnag, Airbrake, Honeybadger, etc.):

# config/initializers/good_job.rb
GoodJob.on_thread_error = -> (exception) { Rails.error.report(exception) }

Retries

By default, GoodJob relies on Active Job’s retry functionality.

Active Job can be configured to retry an infinite number of times, with a polynomial backoff. Using Active Job’s retry_on prevents exceptions from reaching GoodJob:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY
  # ...
end

When using retry_on with a limited number of retries, the final exception will not be rescued and will raise to GoodJob’s error handler. To avoid this, pass a block to retry_on to handle the final exception instead of raising it to GoodJob:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, attempts: 5 do |_job, _exception|
    # Log error, do nothing, etc.
  end
  # ...
end

When using retry_on with an infinite number of retries, exceptions will never be raised to GoodJob, which means GoodJob.on_thread_error will never be called. To report log or report exceptions to an exception monitoring service (e.g. Sentry, Bugsnag, Airbrake, Honeybadger, etc), create an explicit exception wrapper. For example:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

  retry_on SpecialError, attempts: 5 do |_job, exception|
    Rails.error.report(exception)
  end

  around_perform do |_job, block|
    block.call
  rescue StandardError => e
    Rails.error.report(e)
    raise
  end
  # ...
end

By default, jobs will not be retried unless retry_on is configured. This can be overridden by setting GoodJob.retry_on_unhandled_error to true; GoodJob will then retry the failing job immediately and infinitely, potentially causing high load.

Action Mailer retries

Any configuration in ApplicationJob will have to be duplicated on ActionMailer::MailDeliveryJob because ActionMailer uses that custom class which inherits from ActiveJob::Base, rather than your application’s ApplicationJob.

You can use an initializer to configure ActionMailer::MailDeliveryJob, for example:

# config/initializers/good_job.rb
ActionMailer::MailDeliveryJob.retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

# With Sentry (or Bugsnag, Airbrake, Honeybadger, etc.)
ActionMailer::MailDeliveryJob.around_perform do |_job, block|
  block.call
rescue StandardError => e
  Rails.error.report(e)
  raise
end

Note, that ActionMailer::MailDeliveryJob is a default since Rails 6.0. Be sure that your app is using that class, as it might also be configured to use (deprecated now) ActionMailer::DeliveryJob.

Interrupts, graceful shutdown, and SIGKILL

When GoodJob receives an interrupt (SIGINT, SIGTERM) or explicitly with GoodJob.shutdown, GoodJob will attempt to gracefully shut down, waiting for all jobs to finish before exiting based on the shutdown_timeout configuration.

To detect the start of a graceful shutdown from within a performing job, for example while looping/iterating over multiple items, you can call GoodJob.current_thread_shutting_down? or GoodJob.current_thread_running? from within the job. For example:

def perform(lots_of_records)
  lots_of_records.each do |record|
    break if GoodJob.current_thread_shutting_down? # or `unless GoodJob.current_thread.running?`
    # process record ...
  end
end
````

Note that when running jobs in `:inline` execution mode, `GoodJob.current_thread_running?` will always be truthy and `GoodJob.current_thread_shutting_down?` will always be falsey.

Jobs will be automatically retried if the process is interrupted while performing a job and the job is unable to finish before the timeout or as the result of a `SIGKILL` or power failure.

If you need more control over interrupt-caused retries, include the `GoodJob::ActiveJobExtensions::InterruptErrors` extension in your job class. When an interrupted job is retried, the extension will raise a `GoodJob::InterruptError` exception within the job, which allows you to use Active Job's `retry_on` and `discard_on` to control the behavior of the job.

```ruby
class MyJob < ApplicationJob
  # The extension must be included before other extensions
  include GoodJob::ActiveJobExtensions::InterruptErrors
  # Discard the job if it is interrupted
  discard_on GoodJob::InterruptError
  # Retry the job if it is interrupted
  retry_on GoodJob::InterruptError, wait: 0, attempts: Float::INFINITY
end

Timeouts

Job timeouts can be configured with an around_perform:

class ApplicationJob < ActiveJob::Base
  JobTimeoutError = Class.new(StandardError)

  around_perform do |_job, block|
    # Timeout jobs after 10 minutes
    Timeout.timeout(10.minutes, JobTimeoutError) do
      block.call
    end
  end
end

Database connections

GoodJob job executor processes require the following database connections:

  • 1 connection per execution pool thread. E.g., --queues=mice:2;elephants:1 is 3 threads and thus 3 connections. Pool size defaults to --max-threads.
  • 2 additional connections that GoodJob uses for utility functionality (e.g. LISTEN/NOTIFY, cron, etc.)
  • 1 connection per subthread, if your application makes multithreaded database queries (e.g. load_async) within a job.

The executor process will not crash if the connections pool is exhausted, instead it will report an exception (eg. ActiveRecord::ConnectionTimeoutError).

When GoodJob runs in :inline mode (in Rails’ test environment, by default), the default database pool configuration works.

# config/database.yml

pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

When GoodJob runs in :async mode (in Rails’s development environment, by default), the following database pool configuration works, where:

  • ENV.fetch("RAILS_MAX_THREADS", 5) is the number of threads used by the web server
  • 1 is the number of connections used by the job listener
  • 2 is the number of connections used by the cron scheduler and executor
  • ENV.fetch("GOOD_JOB_MAX_THREADS", 5) is the number of threads used to perform jobs
# config/database.yml

pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5).to_i + 1 + 2 + ENV.fetch("GOOD_JOB_MAX_THREADS", 5).to_i %>

When GoodJob runs in :external mode (in Rails’ production environment, by default), the following database pool configurations work for web servers and worker processes, respectively.

# config/database.yml

pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
# config/database.yml

pool: <%= 1 + 2 + ENV.fetch("GOOD_JOB_MAX_THREADS", 5).to_i %>

Queue performance with Queue Select Limit

GoodJob’s advisory locking strategy uses a materialized CTE (Common Table Expression). This strategy can be non-performant when querying a very large queue of executable jobs (100,000+) because the database query must materialize all executable jobs before acquiring an advisory lock.

GoodJob offers an optional optimization to limit the number of jobs that are queried: Queue Select Limit.

# CLI option
--queue-select-limit=1000

# Rails configuration
config.good_job.queue_select_limit = 1000

# Environment Variable
GOOD_JOB_QUEUE_SELECT_LIMIT=1000

The Queue Select Limit value should be set to a rough upper-bound that exceeds all GoodJob execution threads / database connections. 1000 is a number that likely exceeds the available database connections on most PaaS offerings, but still offers a performance boost for GoodJob when executing very large queues.

To explain where this value is used, here is the pseudo-query that GoodJob uses to find executable jobs:

  SELECT *
  FROM good_jobs
  WHERE id IN (
    WITH rows AS MATERIALIZED (
      SELECT id, active_job_id
      FROM good_jobs
      WHERE (scheduled_at <= NOW() OR scheduled_at IS NULL) AND finished_at IS NULL
      ORDER BY priority DESC NULLS LAST, created_at ASC
      [LIMIT 1000] -- <= introduced when queue_select_limit is set
    )
    SELECT id
    FROM rows
    WHERE pg_try_advisory_lock(('x' || substr(md5('good_jobs' || '-' || active_job_id::text), 1, 16))::bit(64)::bigint)
    LIMIT 1
  )

Execute jobs async / in-process

GoodJob can execute jobs “async” in the same process as the web server (e.g. bin/rails s). GoodJob’s async execution mode offers benefits of economy by not requiring a separate job worker process, but with the tradeoff of increased complexity. Async mode can be configured in two ways:

  • Via Rails configuration:

    # config/environments/production.rb
    config.active_job.queue_adapter = :good_job
    
    # To change the execution mode
    config.good_job.execution_mode = :async
    
    # Or with more configuration
    config.good_job = {
      execution_mode: :async,
      max_threads: 4,
      poll_interval: 30
    }
    
  • Or, with environment variables:

    GOOD_JOB_EXECUTION_MODE=async GOOD_JOB_MAX_THREADS=4 GOOD_JOB_POLL_INTERVAL=30 bin/rails server
    

Depending on your application configuration, you may need to take additional steps:

  • Ensure that you have enough database connections for both web and job execution threads:

    # config/database.yml
    pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5).to_i + ENV.fetch("GOOD_JOB_MAX_THREADS", 4).to_i %>
    
  • When running Puma with workers (WEB_CONCURRENCY > 0) or another process-forking web server, GoodJob’s threadpool schedulers should be stopped before forking, restarted after fork, and cleanly shut down on exit. Stopping GoodJob’s scheduler pre-fork is recommended to ensure that GoodJob does not continue executing jobs in the parent/controller process. For example, with Puma:

    # config/puma.rb
    
    
    before_fork do
      GoodJob.shutdown
    end
    
    
    on_worker_boot do
      GoodJob.restart
    end
    
    
    on_worker_shutdown do
      GoodJob.shutdown
    end
    
    
    MAIN_PID = Process.pid
    at_exit do
      GoodJob.shutdown if Process.pid == MAIN_PID
    end
    

GoodJob is compatible with Puma’s preload_app! method.

For Passenger:

```Ruby
if defined? PhusionPassenger
  PhusionPassenger.on_event :starting_worker_process do |forked|
    # If `forked` is true, we're in smart spawning mode.
    # https://www.phusionpassenger.com/docs/advanced_guides/in_depth/ruby/spawn_methods.html#smart-spawning-hooks
    if forked
      GoodJob.logger.info { 'Starting Passenger worker process.' }
      GoodJob.restart
    end
  end

  PhusionPassenger.on_event :stopping_worker_process do
    GoodJob.logger.info { 'Stopping Passenger worker process.' }
    GoodJob.shutdown
  end
end

# GoodJob also starts in the Passenger preloader process. This one does not
# trigger the above events, thus we catch it with `Kernel#at_exit`.
PRELOADER_PID = Process.pid
at_exit do
  if Process.pid == PRELOADER_PID
    GoodJob.logger.info { 'Passenger AppPreloader shutting down.' }
    GoodJob.shutdown
  end
end
```

If you are using cron-style jobs, you might also want to look at your Passenger configuration, especially at passenger_pool_idle_time and passenger_min_instances to make sure there’s always at least once process running that can execute cron-style scheduled jobs. See also Passenger’s optimization guide for more information.

Migrate to GoodJob from a different Active Job backend

If your application is already using an Active Job backend, you will need to install GoodJob to enqueue and perform newly created jobs and finish performing pre-existing jobs on the previous backend.

  1. Enqueue newly created jobs on GoodJob either entirely by setting ActiveJob::Base.queue_adapter = :good_job or progressively via individual job classes:

    # jobs/specific_job.rb
    class SpecificJob < ApplicationJob
      self.queue_adapter = :good_job
      # ...
    end
    
  2. Continue running executors for both backends. For example, on Heroku it’s possible to run two processes within the same dyno:

    # Procfile
    # ...
    worker: bundle exec que ./config/environment.rb & bundle exec good_job & wait -n
    ```

1. Once you are confident that no unperformed jobs remain in the previous Active Job backend, code and configuration for that backend can be completely removed.

# config/initializers/good_job.rb
config.good_job.preserve_job_records = false # defaults to true; can also be `false` or `:on_unhandled_error`

GoodJob will automatically delete preserved job records after 14 days. The retention period, as well as the frequency GoodJob checks for deletable records can be configured:

config.good_job.cleanup_preserved_jobs_before_seconds_ago = 14.days
config.good_job.cleanup_interval_jobs = 1_000 # Number of executed jobs between deletion sweeps.
config.good_job.cleanup_interval_seconds = 10.minutes # Number of seconds between deletion sweeps.

It is also possible to manually trigger a cleanup of preserved job records:

  • For example, in a Rake task:

    GoodJob.cleanup_preserved_jobs # Will use default retention period
    GoodJob.cleanup_preserved_jobs(older_than: 7.days) # custom retention period
    
  • For example, using the good_job command-line utility:

    bundle exec good_job cleanup_preserved_jobs --before-seconds-ago=86400
    

Write tests

By default, GoodJob uses its inline adapter in the test environment; the inline adapter is designed for the test environment. When enqueuing a job with GoodJob’s inline adapter, the job will be executed immediately on the current thread; unhandled exceptions will be raised.

In GoodJob 2.0, the inline adapter will execute future scheduled jobs immediately. In the next major release, GoodJob 3.0, the inline adapter will not execute future scheduled jobs and instead enqueue them in the database.

To opt into this behavior immediately set: config.good_job.inline_execution_respects_schedule = true

To perform jobs inline at any time, use GoodJob.perform_inline. For example, using time helpers within an integration test:

MyJob.set(wait: 10.minutes).perform_later
travel_to(15.minutes.from_now) { GoodJob.perform_inline }

Note: Rails travel/travel_to time helpers do not have millisecond precision, so you must leave at least 1 second between the schedule and time traveling for the job to be executed. This behavior may change in Rails 7.1.

CLI HTTP health check probes

Default configuration

GoodJob’s CLI offers an http health check probe to better manage process lifecycle in containerized environments like Kubernetes:

# Run the CLI with a health check on port 7001
good_job start --probe-port=7001

# or via an environment variable
GOOD_JOB_PROBE_PORT=7001 good_job start

# Probe the status
curl localhost:7001/status
curl localhost:7001/status/started
curl localhost:7001/status/connected

Multiple health checks are available at different paths:

  • / or /status: the CLI process is running
  • /status/started: the multithreaded job executor is running
  • /status/connected: the database connection is established

This can be configured, for example with Kubernetes:

spec:
  containers:
    - name: good_job
      image: my_app:latest
      env:
        - name: RAILS_ENV
          value: production
        - name: GOOD_JOB_PROBE_PORT
          value: 7001
      command:
          - good_job
          - start
      ports:
        - name: probe-port
          containerPort: 7001
      startupProbe:
        httpGet:
          path: "/status/started"
          port: probe-port
        failureThreshold: 30
        periodSeconds: 10
      livenessProbe:
        httpGet:
          path: "/status/connected"
          port: probe-port
        failureThreshold: 1
        periodSeconds: 10

Custom configuration

The CLI health check probe server can be customized to serve additional information. Two things to note when customizing the probe server:

  • By default, the probe server uses a homespun single thread, blocking server so your custom app should be very simple and lightly used and could affect job performance.
  • The default probe server is not fully Rack compliant. Rack specifies various mandatory fields and some Rack apps assume those fields exist. If you do need to use a Rack app that depends on being fully Rack compliant, you can configure GoodJob to use WEBrick as the server

To customize the probe server, set config.good_job.probe_app to a Rack app or a Rack builder:

# config/initializers/good_job.rb OR config/application.rb OR config/environments/{RAILS_ENV}.rb

Rails.application.configure do
  config.good_job.probe_app = Rack::Builder.new do
    # Add your custom middleware
    use Custom::AuthorizationMiddleware
    use Custom::PrometheusExporter

    # This is the default middleware
    use GoodJob::ProbeServer::HealthcheckMiddleware
    run GoodJob::ProbeServer::NotFoundApp # will return 404 for all other requests
  end
end
Using WEBrick

If your custom app requires a fully Rack compliant server, you can configure GoodJob to use WEBrick as the server:

# config/initializers/good_job.rb OR config/application.rb OR config/environments/{RAILS_ENV}.rb

Rails.application.configure do
  config.good_job.probe_handler = :webrick
end

You can also enable WEBrick through the command line:

good_job start --probe-handler=webrick

or via an environment variable:

GOOD_JOB_PROBE_HANDLER=webrick good_job start

Note that GoodJob doesn’t include WEBrick as a dependency, so you’ll need to add it to your Gemfile:

# Gemfile
gem 'webrick'

If WEBrick is configured to be used, but the dependency is not found, GoodJob will log a warning and fallback to the default probe server.

Doing your best job with GoodJob

This section explains how to use GoodJob the most efficiently and performantly, according to its maintainers. GoodJob is very flexible and you don’t necessarily have to use it this way, but the concepts explained here are part of GoodJob’s design intent.

Background jobs are hard. There are two extremes:

  • Throw resources (compute, servers, money) at it by creating dedicated processes (or servers) for each type of job or queue and scaling them independently to achieve the lowest latency and highest throughput.
  • Do the best you can in a small budget by creating dedicated thread pools within a process for each type of job or queue to produce quality-of-service and compromise maximum latency (or tail latency) because of shared resources and thread contention. You can even run them in the web process if you’re really cheap.

This section will largely focused on optimizing within the latter small-budget scenario, but the concepts and explanation should help you optimize the big-budget scenario too.

Let’s start with anti-patterns, and then the rest of this section will explain an alternative:

  • Don’t use functional names for your queues like mailers or sms or turbo or batch. Instead name them after the total latency target (the total duration within queue and executing till finish) you expect for that job e.g.latency_30s or latency_5m or literally_whenever.
  • Priority can’t fix a lack of capacity. Priority rules (i.e. weighing or ordering which jobs or queues execute first) only works when there is capacity available to execute that next job. When all capacity is in-use, priority cannot preempt a job that is already executing (“head-of-line blocking”).

The following will explain methods to create homogenous workloads (based on latency) and increase execution capacity when queuing latency causes the jobs to exceed their total latency target.

Sizing jobs: mice and elephants

Queuing theory will refer to fast/small/low-latency tasks as Mice (e.g. a password reset email, an MFA token via SMS) and slow/big/high-latency tasks as Elephants (e.g. sending an email newsletter to 10k recipients, a batched update that touches every record in the database).

Explicitly group your jobs by their latency: how quickly you expect them to finish to achieve your expected quality of service. This should be their total latency (or duration) which is the sum of: queuing latency which is how long the job waits in queue until execution capacity becomes available (which ideally should be zero, because you have idle capacity and can start executing a job immediately as soon as it is enqueued or upon its scheduled time) and execution latency which is how long the job’s execution takes (e.g. the email being sent). Example: I expect this Password Reset Email Job to have a total latency of 30 seconds or less.

In a working application, you likely will have more gradations than just small and big or slow and fast (analogously: badgers, wildebeests; maybe even tardigrades or blue whales for tiny and huge, respectively), but there will regardless be a relatively small and countable number of discrete latency buckets to organize your jobs into.

Isolating by total latency

The most efficient workloads are homogenous (similar) workloads. If you know every job to be executed will take about the same amount of time, you can estimate the maximum delay for a new job at the back of the queue and have that drive decisions about capacity. Alternatively, if those jobs are heterogenous (mixed) it’s possible that a very slow/long-duration job could hold everything back for much longer than anticipated and it’s sorta random. That’s bad!

A fun visual image here for a single-file queue is a doorway: If you only have 1 doorway, it must be big enough to fit an elephant. But if an elephant is going through the door (and it will go through slowly!) no mice can fit through the door until the elephant is fully clear. Your mice will be delayed!

Priority will not help when an elephant is in the doorway. Yes, you could say mice have a higher priority than elephants and always allow any mouse to go before any elephant in queue will start. But once an elephant has started going through the door, any subsequent mouse who arrives must wait for the elephant to egress regardless of their priority. In Active Job and Ruby, it’s really hard to stop or cancel or preempt a running job (unless you’ve already architected that into your jobs, like with the job-iteration library)

The best solution is to have a 2nd door, but only sized for mice, so an elephant can’t ever block it. With a mouse-sized doorway and an elephant-sized doorway, mice can still go through the big elephant door when an elephant isn’t using it. Each door has a maximum size (or “latency”) we want it to accept, and smaller is ok, just not larger.

Configuring your queues

If we wanted to capture the previous 2-door scenario in GoodJob, we’d configure the queues like this;

config.good_job.queues = "mice:1; elephant,mice:1"

This configuration creates two isolated thread pools (separated by a semicolon) each with 1 thread each (the number after the colon). The 2nd thread pool recognizes that both elephants and mice can use that isolated thread pool; if there is an influx of mice, it’s possible to use the elephant’s thread pool if an elephant isn’t already in progress.

So what if we add an intermediately-sized badgers ? In that case, we can make 3 distinct queues:

config.good_job.queues = "mice:1; badgers,mice:1; elephants,badgers,mice:1"

In this case, we make a mouse sized queue, a badger sized queue, and an elephant sized queue. We can simplify this even further:

config.good_job.queues = "mice:1; badgers,mice:1; *:1"

Using the wildcard * for any queue also helps ensure that if a job is enqueued to a newly declared queue (maybe via a dependency or just inadvertently) it will still get executed until you notice and decide on its appropriate latency target.

In these examples, the order doesn’t matter; it just is maybe more readable to go from the lowest-latency to largest-latency pool (the semicolon groups), and then within a pool to list the largest allowable latency first (the commas). Nothing here is about “job priority” or “queue priority”, this is wholly about grouping.

In your application, not the zoo, you’ll want to enqueue your PaswordResetJob on the mice queue, your CreateComplicatedObjectJob on the badger queue, and your AuditEveryAccountEverJob on the elephant queue. But you want to name your queues by latency, so that ends up being:

config.good_job.queues = "latency_30s:1; latency_2m,latency_30s:1; *:1"

And you likely want to have more than one thread (though more than 3-5 threads per process will cause thread contention and slow everything down a bit):

config.good_job.queues = "latency_30s:2; latency_2m,latency_30s:2; *:2"

Additional observations

  • Unlike GoodJob, other Active Job backends may treat a “queue” and an “isolated execution pool” as one and the same. GoodJob allows composing multiple Active Job queues into the same pool for flexibility and to make it easier to migrate from functionally-named queues to latency-based ones.
  • You don’t have to name your queues explicitly like latency_30s but it makes it easier to identify outliers and communicate your operational targets. Many people push back on this; that’s ok. An option to capture functional details is to use GoodJob’s Labels feature instead of encoding them in the queue name.
  • The downside of organizing your jobs like this is that you may have jobs with the same latency target but wildly different operational parameters, like being coupled to another system that has limited throughput or questionable reliability. GoodJob offers Concurrency and Throttling Controls, but isolation is always the most performant and reliable option, though it requires dedicated resources and costs more.
  • Observe, monitor, and adapt your job queues over time. You likely have incomplete information about the execution latency of your jobs inclusive of all dependencies across all scenarios. You should expect to adjust your queues and grouping over time as you observe their behavior.
  • If you find you have unreliable external dependencies that introduce latency, you may also want to further isolate your jobs based on those dependencies, for example, isolating latency_10s_email_service to its own execution pool.
  • Scale on queue latency. Per the previous point in which you do not have complete control over execution latency, you do have control over the queue latency. If queue latency is causing your jobs to miss their total latency target, you must add more capacity (e.g. processes or servers.
  • This is all largely about latency-based queue design. It’s possible to go further and organize by latency and parallelism. For that I recommend Nate Berkopec’s Complete Guide to Rails Performance which covers things like Amdahl’s Law.

Gem development

Development setup

# Clone the repository locally
git clone git@github.com:bensheldon/good_job.git

# Set up the gem development environment
bin/setup

Rails development harness

A Rails application exists within demo that is used for development, test, and GoodJob Demo environments.

# Run a local development webserver
bin/rails s

# Disable job execution and cron for cleaner console output
GOOD_JOB_ENABLE_CRON=0 GOOD_JOB_EXECUTION_MODE=external bin/rails s

# Open the Rails console
bin/rails c

For developing locally within another Ruby on Rails project:

# Within Ruby on Rails project directory
# Ensure that the Gemfile is set to git with a branch e.g.
# gem "good_job", git: "https://github.com/bensheldon/good_job.git", branch: "main"
# Then, override the Bundle config to point to the local filesystem's good_job repository
bundle config local.good_job /path/to/local/good_job/repository

# Confirm that the local copy is used
bundle install

# => Using good_job 0.1.0 from https://github.com/bensheldon/good_job.git (at /Users/You/Projects/good_job@dc57fb0)

Running tests

Tests can be run against the primary development environment:

# Set up the gem development environment
bin/setup

# Run the tests
bin/rspec

Environment variables that may help with debugging:

  • LOUD=1: display all stdout/stderr output from all sources. This is helpful because GoodJob wraps some tests with quiet { } for cleaner test output, but it can hinder debugging.
  • SHOW_BROWSER=1: Run system tests headfully with Chrome/Chromedriver. Use binding.irb in the system tests to pause.

Appraisal can be used to run a test matrix of multiple versions of Rails:

# Install Appraisal matrix of gemfiles
bin/appraisal

# Run tests against matrix
bin/appraisal bin/rspec

Release

Package maintainers can release this gem by running:

# Sign into rubygems
$ gem signin

# Add a .env file with the following:
# CHANGELOG_GITHUB_TOKEN= # Github Personal Access Token

# Update version number, changelog, and create git commit:
$ bundle exec rake release_good_job[minor] # major,minor,patch

# ..and follow subsequent directions.

Articles

  • coming soon...