Elastic job is active and standby in the same city, dual active in the same city, high availability and essential ~


Author: Schrodinger’s tuyere pig


When using elastic job Lite for scheduled tasks, I found that many development teams deploy single points directly, which may not matter for some offline non core businesses (such as reconciliation and monitoring), but for some highly available compensation and regular modification of core data (such as interest update of financial scenarios), single point deployment is “very dangerous”. In fact, elastic job Lite supports high availability.

There are few advanced blog posts about elastic job on the Internet. This paper attempts to roughly explain its scheme principle combined with some experience of its own practice, and extends to the architecture practice of dual computer rooms in the same city.

Note: all discussions in this article are based on the open source version of elastic job lite and do not involve the elastic job cloud.

The basic tutorial of elastic job is recommended here:


Single point deployment to high availability

As mentioned at the beginning of this article, many systems are deployed in the following deployment architecture:

The reason is that developersWorry that timed tasks are triggered multiple times at the same time, causing business problems. In fact, this is a lack of understanding of the most basic principles of the framework. In the function list of official documents:


It has been explained that one of its most basic functions is:

The consistency of job fragmentation ensures that the same fragmentation has only one execution instance in the distributed environment

Elastic job will rely on zookeeper to select the corresponding instance for sharding, so as to ensure that only one instance is executing the same sharding (if the task does not adopt sharding (i.e. the number of shards is 0), it means that only one instance of the task is executing)

Therefore, the deployment architecture shown in the figure below is completely OK – first, the service will only be called by one instance; second, if a service hangs, other instances can take over and continue to provide services, so as to achieve high availability.

Dual machine room high availability

With the development of Internet business, slowly, there will be higher requirements for the high availability of the architecture. The next step may be the deployment of two computer rooms in the same city. At this time, in order to ensure the high availability of scheduled services in the two computer rooms, our architecture may become like this:

In this way, if all the scheduled tasks in machine room a are unavailable, machine room B can indeed take over and provide services. Moreover, because the cluster is one, elastic job can ensure that only one instance of the same partition runs in two machine rooms. It looks perfect.

Note: This paper does not discuss how zookeeper can realize the high availability of dual computer rooms. In fact, from the principle of zookeeper, only two computer rooms form a large cluster, which can not realize the high availability of dual computer rooms.

Priority scheduling?

The above architecture solves the problem that scheduled tasks are available in both computer rooms, but in actual production, scheduled tasks are likely to rely on stored data sources. This data source is usually divided into active and standby (the case of modular architecture is not considered here): for example, the main data source is in machine room a and the standby data source is in machine room B for real-time synchronization.

If this scheduled task only has read operation, it may be no problem, because as long as the data source is configured to connect to the data source of the same machine room. However, if it is to be written, there is a problem – if all tasks are scheduled in machine room B, these data will be written to machine room a across machine rooms, which greatly improves the delay, as shown in the figure below.

As shown in the figure, if elastic job schedules all tasks to machine room B, the traffic will be written across the machine room all the time, which is not good for performance.

Is there any way to achieve the following results:

  1. Ensure that both computer rooms are available at any time, that is, if all the services of one computer room are unavailable, the other computer room can provide equivalent services
  2. However, a task can be assigned to machine room a first

Elastic job slicing strategy

Before answering this question, we need to understand the slicing strategy of elastic job according to the instructions on the official website(http://elasticjob.io/docs/elastic-job-lite/02-guide/job-sharding-strategy/), elastic job has some built-in partition strategies, including an average allocation algorithm. The odd and even hash value of the job name determines the IP ascending and descending order. The algorithm and the hash value of the job name rotate the server list; At the same time, it also supports customized policies and implementationJobShardingStrategyInterface and ImplementationshardingMethod is enough.

public Map> sharding(List jobInstances, String jobName, int shardingTotalCount)

Suppose we can implement this custom strategy: let us know which instances belong to machine room a and machine room B when we partition, and then we know that machine room a is preferred. When we partition, we first kick out the instances of machine room B, and then reuse the original strategy for allocation. Doesn’t this solve our nearest access problem (close to the data source)?

The following is a decorator class customized by decorator pattern (abstract class, which determines which instances belong to standby instances by subclasses). Readers can use it in combination with their own business scenarios.

In addition, the java series interview questions and answers are all sorted out. Wechat searches the Java technology stack and sends them in the background: the interview can be read online.

public abstract class JobShardingStrategyActiveStandbyDecorator implements JobShardingStrategy {

    //The built-in allocation policy adopts the original default policy: average
    private JobShardingStrategy inner = new AverageAllocationJobShardingStrategy();

     *Judge whether an instance is a standby instance. Before triggering the sharding method each time, it will traverse all instances and call this method.
     *If the active and standby instances exist in the list at the same time, the standby instance will be removed before sharding
     * @param jobInstance
     * @return
    protected abstract boolean isStandby(JobInstance jobInstance, String jobName);

    public Map> sharding(List jobInstances, String jobName, int shardingTotalCount) {

        List jobInstancesCandidates = new ArrayList<>(jobInstances);
        List removeInstance = new ArrayList<>();

        boolean removeSelf = false;
        for (JobInstance jobInstance : jobInstances) {
            boolean isStandbyInstance = false;
            try {
                isStandbyInstance = isStandby(jobInstance, jobName);
            } catch (Exception e) {
                log.warn("isStandBy throws error, consider as not standby",e);

            if (isStandbyInstance) {
                if (IpUtils.getIp().equals(jobInstance.getIp())) {
                    removeSelf = true;

        If (jobinstancescandidates. Isempty()) {// if no instance is found after removal, it will not be removed. Use the top of the original list (backup)
            jobInstancesCandidates = jobInstances;
            log.info("[{}] ATTENTION!! Only backup job instances exist, but do sharding with them anyway {}", jobName, JSON.toJSONString(jobInstancesCandidates));

        if (!jobInstancesCandidates.equals(jobInstances)) {
            log.info("[{}] remove backup before really do sharding, removeSelf :{} , remove instances: {}", jobName, removeSelf, JSON.toJSONString(removeInstance));
            log.info("[{}] after remove backups :{}", jobName, JSON.toJSONString(jobInstancesCandidates));
        }Else {// all are masters or all are slave
            log.info("[{}] job instances just remain the same {}", jobName, JSON.toJSONString(jobInstancesCandidates));

        //Be safe and sort to ensure that the list of each instance is the same
        jobInstancesCandidates.sort((o1, o2) -> o1.getJobInstanceId().compareTo(o2.getJobInstanceId()));

        return inner.sharding(jobInstancesCandidates, jobName, shardingTotalCount);


Using user-defined strategy to realize priority scheduling under dual computer rooms in the same city

The following is a very simple example of nearby access: those specified in the IP white list are implemented first, and those not in use are considered as standby. Let’s see how.

1、 Inherit this decorator policy and specify which instances are standby instances

public class ActiveStandbyESJobStrategy extends JobShardingStrategyActiveStandbyDecorator{

    protected boolean isStandby(JobInstance jobInstance, String jobName) {
        String activeIps = ",";// Only these two IP instances are preferentially executed, and the others are standby
        String ss[] = activeIps.split(",");
        return ! Arrays. asList(ss). contains(jobInstance.getIp());// What is not on the active list is the backup


Very simple! In this way, the following similar effects can be achieved

2、 Specify the use of this policy before the task starts

The following is illustrated in Java,

JobCoreConfiguration simpleCoreConfig = JobCoreConfiguration.newBuilder(jobClass.getName(), cron, shardingTotalCount).shardingItemParameters(shardingItemParameters).build();
SimpleJobConfiguration simpleJobConfiguration = new SimpleJobConfiguration(simpleCoreConfig, jobClass.getCanonicalName());
return LiteJobConfiguration.newBuilder(simpleJobConfiguration)
        . jobshardingstrategyclass ("com. XXX. YYY. Job. Activestandbyesjobstrategy") // use the allocation policy of active and standby, and divide them into active and standby instances (enter your implementation class name)

That’s it.

City living mode

After the above transformation, two problems have been solved for scheduled tasks:

1. Scheduled tasks can achieve high availability in two machine rooms

2. Tasks can be preferentially dispatched to the designated machine room

In this mode, for scheduled tasks, machine room B is actually just a standby machine room – because machine room a always gives priority to scheduling.

In fact, we may not know whether there are some practical problems in machine room B (for example, the database permission is not applied). Since there is no flow verification, the disaster recovery problem really occurs at this time. Whether machine room B can accept it safely is not 100% safe.

Can we further achieve double living in the same city? That is, machine room B will also bear part of the traffic? For example, 10%?

Return to the sharding interface of the custom policy:

public Map> sharding(List jobInstances, String jobName, int shardingTotalCount)

When assigning, you can get a panoramic view of a task instance (list of all instances), the current task name, and the number of slices.

Based on this, some things can be done to divert the flow to the instance of machine room B, for example:

  1. Specify the host room of the task and make it the priority scheduling of machine room B (for example, select some read-only tasks, accounting for 10% of the tasks)
  2. For the partition allocation, the partition at the end (such as 1 / 10) shall be preferentially allocated to machine room B.

The above two schemes can realize the so-called dual activity by allowing both machine rooms a and B to have traffic (tasks are being scheduled).

The following is a double live schematic code and architecture for scheme 1 thrown out above.

Suppose we have two tasks for a scheduled task, task_ A_ FIRST,TASK_ B_ First, where task_ B_ First is a read-only task, so we can let it configure the backup database of machine room B and let it run in machine room B first, while task_ A_ First is a more frequent task with write operation. We give priority to running in machine room a, so as to achieve traffic in both machine rooms.

Note: any machine room here is unavailable, and tasks can be scheduled in another machine room. What is enhanced here is to make targeted priority scheduling for different tasks to achieve double activity

public class ActiveStandbyESJobStrategy extends JobShardingStrategyActiveStandbyDecorator{

    protected boolean isStandby(JobInstance jobInstance, String jobName) {
         String activeIps = ",";// By default, only these two IP instances are preferentially executed, and the others are standby
        If ("task_b_first". Equals (jobname)) {// select this task and schedule it to machine room B first
           activeIps = ",";

        String ss[] = activeIps.split(",");
        return ! Arrays. asList(ss). contains(jobInstance.getIp());// What is not on the active list is the backup


Recent hot article recommendations:

1.1000 + java interview questions and answers (2021 latest version)

2.Stop playing if / else on the full screen and try the strategy mode. It’s really fragrant!!

3.what the fuck! What is the new syntax of XX ≠ null in Java?

4.Spring boot 2.5 heavy release, dark mode is too explosive!

5.Java development manual (Songshan version) is the latest release. Download it quickly!

Feel good, don’t forget to like + forward!