Features
- Cluster-friendly. Guarantees execution by single scheduler instance.
- Persistent tasks. Requires a single database-table for persistence.
- Embeddable. Built to be embedded in existing applications.
- High throughput. Tested to handle 2k - 10k executions / second. Link.
- Simple.
- Minimal dependencies. (slf4j)
Who uses db-scheduler?
List of organizations known to be running db-scheduler in production:
Company | Description |
---|---|
Digipost | Provider of digital mailboxes in Norway |
Vy Group | One of the largest transport groups in the Nordic countries. |
Wise | A cheap, fast way to send money abroad. |
Becker Professional Education | |
Monitoria | Website monitoring service. |
Loadster | Load testing for web applications. |
Statens vegvesen | The Norwegian Public Roads Administration |
Lightyear | A simple and approachable way to invest your money globally. |
NAV | The Norwegian Labour and Welfare Administration |
ModernLoop | Scale with your company’s hiring needs by using ModernLoop to increase efficiency in interview scheduling, communication, and coordination. |
Diffia | Norwegian eHealth company |
Swan | Swan helps developers to embed banking services easily into their product. |
Feel free to open a PR to add your organization to the list.
Examples
See also runnable examples.
Recurring task (static)
Define a recurring task and schedule the task’s first execution on start-up using the startTasks
builder-method. Upon completion, the task will be re-scheduled according to the defined schedule (see pre-defined schedule-types).
RecurringTask<Void> hourlyTask = Tasks.recurring("my-hourly-task", FixedDelay.ofHours(1))
.execute((inst, ctx) -> {
System.out.println("Executed!");
});
final Scheduler scheduler = Scheduler
.create(dataSource)
.startTasks(hourlyTask)
.registerShutdownHook()
.build();
// hourlyTask is automatically scheduled on startup if not already started (i.e. exists in the db)
scheduler.start();
For recurring tasks with multiple instances and schedules, see example RecurringTaskWithPersistentScheduleMain.java.
More examples
Plain Java
- EnableImmediateExecutionMain.java
- MaxRetriesMain.java
- ExponentialBackoffMain.java
- ExponentialBackoffWithMaxRetriesMain.java
- TrackingProgressRecurringTaskMain.java
- SpawningOtherTasksMain.java
- SchedulerClientMain.java
- RecurringTaskWithPersistentScheduleMain.java
- StatefulRecurringTaskWithPersistentScheduleMain.java
- JsonSerializerMain.java
- JobChainingUsingTaskDataMain.java
- JobChainingUsingSeparateTasksMain.java
Spring Boot
Example | Description |
---|---|
BasicExamples | A basic one-time task and recurring task |
TransactionallyStagedJob | Example of transactionally staging a job, i.e. making sure the background job runs iff the transaction commits (along with other db-modifications). |
LongRunningJob | Long-running jobs need to survive application restarts and avoid restarting from the beginning. This example demonstrates how to persisting progress on shutdown and additionally a technique for limiting the job to run nightly. |
RecurringStateTracking | A recurring task with state that can be modified after each run. |
ParallellJobSpawner | Demonstrates how to use a recurring job to spawn one-time jobs, e.g. for parallelization. |
JobChaining | A one-time job with multiple steps. The next step is scheduled after the previous one completes. |
MultiInstanceRecurring | Demonstrates how to achieve multiple recurring jobs of the same type, but potentially differing schedules and data. |
Configuration
Scheduler configuration
The scheduler is created using the Scheduler.create(...)
builder. The builder has sensible defaults, but the following options are configurable.
Less commonly tuned
- How often to update the heartbeat timestamp for running executions. Default
5m
. - :gear:
.missedHeartbeatsLimit(int)
How many heartbeats may be missed before the execution is considered dead. Default6
.
- Default
<hostname>
.
- the table. Default
scheduled_tasks
.
- See also additional documentation under Serializers.
- will use should still be supplied (for scheduler polling optimizations). Default
null
.
- error (missing known-tasks) and problems during rolling upgrades. Default
14d
.
- method is an escape-hatch to allow for setting
JdbcCustomizations
explicitly. Default auto-detect.
- behavior and have the Scheduler always issue commits. Default
false
.
:gear: .failureLogging(Level, boolean)
Configures how to log task failures, i.e. Throwable
s thrown from a task execution handler. Use log level OFF
to disable
this kind of logging completely. Default WARN, true
.
Task configuration
Tasks are created using one of the builder-classes in Tasks
. The builders have sensible defaults, but the following options can be overridden.
Option | Default | Description |
---|---|---|
.onFailure(FailureHandler) |
see desc. | What to do when a ExecutionHandler throws an exception. By default, Recurring tasks are rescheduled according to their Schedule one-time tasks are retried again in 5m. |
.onDeadExecution(DeadExecutionHandler) |
ReviveDeadExecution |
What to do when a dead executions is detected, i.e. an execution with a stale heartbeat timestamp. By default dead executions are rescheduled to now() . |
.initialData(T initialData) |
null |
The data to use the first time a recurring task is scheduled. |
Schedules
The library contains a number of Schedule-implementations for recurring tasks. See class Schedules
.
Schedule | Description |
---|---|
.daily(LocalTime ...) |
Runs every day at specified times. Optionally a time zone can be specified. |
.fixedDelay(Duration) |
Next execution-time is Duration after last completed execution. Note: This Schedule schedules the initial execution to Instant.now() when used in startTasks(...) |
.cron(String) |
Spring-style cron-expression (v5.3+). The pattern - is interpreted as a disabled schedule. |
Another option to configure schedules is reading string patterns with Schedules.parse(String)
.
The currently available patterns are:
Pattern | Description |
---|---|
FIXED_DELAY\|Ns |
Same as .fixedDelay(Duration) with duration set to N seconds. |
DAILY\|12:30,15:30...(\|time_zone) |
Same as .daily(LocalTime) with optional time zone (e.g. Europe/Rome, UTC) |
- |
Disabled schedule |
More details on the time zone formats can be found here.
Disabled schedules
A Schedule
can be marked as disabled. The scheduler will not schedule the initial executions for tasks with a disabled schedule,
and it will remove any existing executions for that task.
Serializers
A task-instance may have some associated data in the field task_data
. The scheduler uses a Serializer
to read and write this
data to the database. By default, standard Java serialization is used, but a number of options is provided:
GsonSerializer
JacksonSerializer
- KotlinSerializer
For Java serialization it is recommended to specify a serialVersionUID
to be able to evolve the class representing the data. If not specified,
and the class changes, deserialization will likely fail with a InvalidClassException
. Should this happen, find and set the current auto-generated
serialVersionUID
explicitly. It will then be possible to do non-breaking changes to the class.
If you need to migrate from Java serialization to a GsonSerializer
, configure the scheduler to use a SerializerWithFallbackDeserializers
:
.serializer(new SerializerWithFallbackDeserializers(new GsonSerializer(), new JavaSerializer()))
Third-party extensions
- bekk/db-scheduler-ui is admin-ui for the scheduler. It shows scheduled executions and supplies simple admin-operations such as “rerun failed execution now” and “delete execution”.
- rocketbase-io/db-scheduler-log is an extention providing a history of executions, including failures and exceptions.
- piemjean/db-scheduler-mongo is an extension for running db-scheduler with a Mongodb database.
Prerequisites
- An existing Spring Boot application
- A working
DataSource
with schema initialized. (In the example HSQLDB is used and schema is automatically applied.)
Getting started
- Add the following Maven dependency
NOTE: This includes the db-scheduler dependency itself.<dependency> <groupId>com.github.kagkarlsson</groupId> <artifactId>db-scheduler-spring-boot-starter</artifactId> <version>14.0.1</version> </dependency>
- In your configuration, expose your
Task
’s as Spring beans. If they are recurring, they will automatically be picked up and started. - If you want to expose
Scheduler
state into actuator health information you need to enabledb-scheduler
health indicator. Spring Health Information. - Run the app.
Configuration options
Configuration is mainly done via application.properties
. Configuration of scheduler-name, serializer and executor-service is done by adding a bean of type DbSchedulerCustomizer
to your Spring context.
# application.properties example showing default values
db-scheduler.enabled=true
db-scheduler.heartbeat-interval=5m
db-scheduler.polling-interval=10s
db-scheduler.polling-limit=
db-scheduler.table-name=scheduled_tasks
db-scheduler.immediate-execution-enabled=false
db-scheduler.scheduler-name=
db-scheduler.threads=10
# Ignored if a custom DbSchedulerStarter bean is defined
db-scheduler.delay-startup-until-context-ready=false
db-scheduler.polling-strategy=fetch
db-scheduler.polling-strategy-lower-limit-fraction-of-threads=0.5
db-scheduler.polling-strategy-upper-limit-fraction-of-threads=3.0
db-scheduler.shutdown-max-wait=30m
Interacting with scheduled executions using the SchedulerClient
It is possible to use the Scheduler
to interact with the persisted future executions. For situations where a full
Scheduler
-instance is not needed, a simpler SchedulerClient
can be created using its builder:
SchedulerClient.Builder.create(dataSource, taskDefinitions).build()
It will allow for operations such as:
- List scheduled executions
- Reschedule a specific execution
- Remove an old executions that have been retrying for too long
- …
How it works
A single database table is used to track future task-executions. When a task-execution is due, db-scheduler picks it and executes it. When the execution is done, the Task
is consulted to see what should be done. For example, a RecurringTask
is typically rescheduled in the future based on its Schedule
.
The scheduler uses optimistic locking or select-for-update (depending on polling strategy) to guarantee that one and only one scheduler-instance gets to pick and run a task-execution.
Recurring tasks
The term recurring task is used for tasks that should be run regularly, according to some schedule.
When the execution of a recurring task has finished, a Schedule
is consulted to determine what the next time for
execution should be, and a future task-execution is created for that time (i.e. it is rescheduled).
The time chosen will be the nearest time according to the Schedule
, but still in the future.
There are two types of recurring tasks, the regular static recurring task, where the Schedule
is defined statically in the code, and
the dynamic recurring tasks, where the Schedule
is defined at runtime and persisted in the database (still requiring only a single table).
Static recurring task
The static recurring task is the most common one and suitable for regular background jobs since the scheduler automatically schedules
an instance of the task if it is not present and also updates the next execution-time if the Schedule
is updated.
To create the initial execution for a static recurring task, the scheduler has a method startTasks(...)
that takes a list of tasks
that should be “started” if they do not already have an existing execution. The initial execution-time is determined by the Schedule
.
If the task already has a future execution (i.e. has been started at least once before), but an updated Schedule
now indicates another execution-time,
the existing execution will be rescheduled to the new execution-time (with the exception of non-deterministic schedules
such as FixedDelay
where new execution-time is further into the future).
Create using Tasks.recurring(..)
.
One-time tasks
The term one-time task is used for tasks that have a single execution-time.
In addition to encode data into the instanceId
of a task-execution, it is possible to store arbitrary binary data in a separate field for use at execution-time. By default, Java serialization is used to marshal/unmarshal the data.
Create using Tasks.oneTime(..)
.
Custom tasks
For tasks not fitting the above categories, it is possible to fully customize the behavior of the tasks using Tasks.custom(..)
.
Use-cases might be:
- Tasks that should be either rescheduled or removed based on output from the actual execution
- ..
Dead executions
During execution, the scheduler regularly updates a heartbeat-time for the task-execution. If an execution is marked as executing, but is not receiving updates to the heartbeat-time, it will be considered a dead execution after time X. That may for example happen if the JVM running the scheduler suddenly exits.
When a dead execution is found, the Task
is consulted to see what should be done. A dead
RecurringTask
is typically rescheduled to now()
.
Performance
While db-scheduler initially was targeted at low-to-medium throughput use-cases, it handles high-throughput use-cases (1000+ executions/second) quite well due to the fact that its data-model is very simple, consisting of a single table of executions. To understand how it will perform, it is useful to consider the SQL statements it runs per batch of executions.
Polling strategy fetch-and-lock-on-execute
The original and default polling strategy, fetch-and-lock-on-execute
, will do the following:
select
a batch of due executions- For every execution, on execute, try to
update
the execution topicked=true
for this scheduler-instance. May miss due to competing schedulers. - If execution was picked, when execution is done,
update
ordelete
the record according to handlers.
In sum per batch: 1 select, 2 * batch-size updates (excluding misses)
User testimonial
There are a number of users that are using db-scheduler for high throughput use-cases. See for example:
- https://github.com/kagkarlsson/db-scheduler/issues/209#issuecomment-1026699872
- https://github.com/kagkarlsson/db-scheduler/issues/190#issuecomment-805867950
Things to note / gotchas
There are no guarantees that all instants in a schedule for a
RecurringTask
will be executed. TheSchedule
is consulted after the previous task-execution finishes, and the closest time in the future will be selected for next execution-time. A new type of task may be added in the future to provide such functionality.The methods on
SchedulerClient
(schedule
,cancel
,reschedule
) will run using a newConnection
from theDataSource
provided. To have the action be a part of a transaction, it must be taken care of by theDataSource
provided, for example using something like Spring’sTransactionAwareDataSourceProxy
.Currently, the precision of db-scheduler is depending on the
pollingInterval
(default 10s) which specifies how often to look in the table for due executions. If you know what you are doing, the scheduler may be instructed at runtime to “look early” viascheduler.triggerCheckForDueExecutions()
. (See alsoenableImmediateExecution()
on theBuilder
)
Versions / upgrading
See releases for release-notes.
Upgrading to 8.x
- Custom Schedules must implement a method
boolean isDeterministic()
to indicate whether they will always produce the same instants or not.
Upgrading to 4.x
- Add column
consecutive_failures
to the database schema. See table definitions for postgresql, oracle or mysql.null
is handled as 0, so no need to update existing records.
Upgrading to 3.x
- No schema changes
- Task creation are preferrably done through builders in
Tasks
class
Upgrading to 2.x
- Add column
task_data
to the database schema. See table definitions for postgresql, oracle or mysql.
Building the source
Prerequisites
- Java 8+
- Maven
Follow these steps:
Clone the repository.
git clone https://github.com/kagkarlsson/db-scheduler cd db-scheduler
Build using Maven (skip tests by adding
-DskipTests=true
)mvn package
Recommended spec
Some users have experienced intermittent test failures when running on a single-core VMs. Therefore, it is recommended to use a minimum of:
- 2 cores
- 2GB RAM
FAQ
Why db-scheduler
when there is Quartz
?
The goal of db-scheduler
is to be non-invasive and simple to use, but still solve the persistence problem, and the cluster-coordination problem.
It was originally targeted at applications with modest database schemas, to which adding 11 tables would feel a bit overkill..
Why use a RDBMS for persistence and coordination?
KISS. It’s the most common type of shared state applications have.
Is anybody using it?
Yes. It is used in production at a number of companies, and have so far run smoothly.