Python timing task framework: source code analysis of apscheduler (1)

Time:2020-3-12

Preface

Apscheduler is a well-known timing task framework in Python, which can meet the needs of timing execution or periodic execution of program tasks, similar to crontab on Linux, but more powerful than crontab. The framework can not only add and delete timing tasks, but also provide multiple functions of persistent tasks.

Apscheduler is a weakly distributed framework. Because each task object is stored in the current node, it can only be distributed in the form of human flesh, such as using redis.

The first time I contacted the apscheduler, I would send out many concepts. When I first contacted the apscheduler, it was just because there were too many concepts and it was more comfortable to use crontab directly. But now many of the company’s projects are implemented based on the apscheduler, so I simply picked up its source code.

Preconception

Mention the key concepts in the apscheduler in the simplest language.

  • Job: the task object is the task you want to perform
  • Jobstores: task storage mode. It is stored in memory by default. It can also support redis, mongodb, etc
  • Executors: executors are things that perform tasks
  • Trigger: trigger, when a certain condition is reached, trigger the corresponding call logic
  • Scheduler: scheduler, something that connects the above parts

Apscheduler provides multiple schedulers. Different schedulers are suitable for different scenarios. At present, the most common one for me is the backgroundscheduler background scheduler, which is suitable for the scheduling of programs required to run in the background.

There are many other schedulers:

Blockingscheduler: suitable for running a single task only in a process, usually when the scheduler is the only thing you want to run.

Asyncio scheduler: suitable for use with asyncio framework

Geventscheduler: suitable for using gevent framework

Tornado scheduler: suitable for applications using tornado framework

Twisted scheduler: application suitable for twisted framework

Qtscheduler: suitable for QT

This article only analyzes the logic related to the backgroundscheduler. First, take a brief look at the official example, and then take it as the entry to analyze layer by layer.

Analyze the backgroundscheduler

The official example code is as follows.

from datetime import datetime
import time
import os
from apscheduler.schedulers.background import BackgroundScheduler

def tick():
    print('Tick! The time is: %s' % datetime.now())

if __name__ == '__main__':
    scheduler = BackgroundScheduler()
    Scheduler. Add ා job (tick, 'interval', seconds = 3) 񖓿 add a task and run it in 3 seconds
    scheduler.start()
    print('Press Ctrl+{0} to exit'.format('Break' if os.name == 'nt' else 'C'))

    try:
        #This is where you simulate application activity (keeping the main thread active).
        while True:
            time.sleep(2)
    except (KeyboardInterrupt, SystemExit):
        #Turn off scheduler
        scheduler.shutdown()

The above code is very simple, first instantiate a scheduler through the BackgroundScheduler method, then call the add_job method, add the tasks that need to be implemented to JobStores, default is to store in memory, more specifically, save to a dict, and finally start the scheduler by start method, APScheduler will trigger the trigger named interval every 3 seconds. Let the scheduler schedule the default executor to execute the logic in the tick method.

When the program is completed, the shutdown method is called to shut down the scheduler.

The backgroundscheduler is actually based on the thread form, and the thread has the concept of a daemonic thread. If the daemonic thread mode is started, the scheduler does not have to close.

First, look at the source code of the backgroundscheduler class.

# apscheduler/schedulers/background.py

class BackgroundScheduler(BlockingScheduler):

    _thread = None

    def _configure(self, config):
        self._daemon = asbool(config.pop('daemon', True))
        super()._configure(config)

    def start(self, *args, **kwargs):
        #Create event notification
        #Multiple threads can wait for an event to occur. After the event occurs, all threads will be activated.
        self._event = Event() 
        BaseScheduler.start(self, *args, **kwargs)
        self._thread = Thread(target=self._main_loop, name='APScheduler')
        #It is set as a daemonic thread. After the python main thread runs, it will directly end regardless of the daemonic thread,
        #If it is a non daemonic thread, the python main thread will wait for other non daemonic threads to finish running and then finish
        Self. ﹐ thread. Daemon = self. ﹐ daemon ﹐ whether it is a daemon
        Self. Thread. Start() (start thread)

    def shutdown(self, *args, **kwargs):
        super().shutdown(*args, **kwargs)
        self._thread.join()
        del self._thread

In the above code, the detailed comments are given and explained briefly.

_The configure method is mainly used for parameter setting. Here you define the self. \.

The start method is its start method, and its logic is very simple. It creates a thread event. Thread event is a thread synchronization mechanism. If you look at its source code, you will find that thread event is implemented based on conditional lock. Thread event provides three main methods: set(), wait(), clear().

  • The set() method sets the event flag state to true.
  • The clear() method sets the event flag state to false
  • The wait () method blocks the thread until the event flag state is true.

After creating a thread event, the start() method of its parent class is called. This method is the real start method. Put it temporarily. After starting, create a thread through the thread method. The target function of the thread isself._main_loop, which is the main training of the scheduler. If the scheduler does not close, it will always execute the logic in the main loop, so as to realize various functions of the apscheduler. It is a very important method. Similarly, it is put temporarily. After creation, the startup thread is OK.

After the thread is created, define the daemon of the thread. If the daemon is true, it means that the current thread is a daemon thread, otherwise it is a non daemon thread.

Simply mentioned, if the thread is a guardian thread, the python main thread will exit directly after the logic is executed, regardless of the guardian thread. If it is a non Guardian thread, the python main thread will exit only after all other non Guardian threads are executed.

The shutdown method first calls the shutdown method of the parent class, then calls the join method, and finally the thread object is deleted directly by del.

After reading the code of backgroundscheduler class, look back at the example code at the beginning. After instantiating the scheduler through backgroundscheduler, the add  job method is called. Three parameters are added to the add  job method. They are the tick method to be executed regularly, the trigger trigger name is interval, and the trigger parameter is seconds = 3.

Can I change the trigger trigger name to any character? This is not allowed. Apscheduler actually uses the entry point technique in Python here. If you have made a python package and uploaded it to pypi, you should be impressed with the entry point. In fact, entry point can not only be packaged forever, but also be used for modular plug-in architecture. There are many contents in it, which will be discussed later.

In short, to pass in the trigger name of the add ﹐ job() method, the interval will correspond to the apscheduler.triggers.interval.intervaltrigger class. The seconds parameter is the parameter of this class.

Analyze the add job method

The source code of the add job method is as follows.

# apscheduler/schedulers/base.py/BaseScheduler

    def add_job(self, func, trigger=None, args=None, kwargs=None, id=None, name=None,
                misfire_grace_time=undefined, coalesce=undefined, max_instances=undefined,
                next_run_time=undefined, jobstore='default', executor='default',
                replace_existing=False, **trigger_args):
        job_kwargs = {
            'trigger': self._create_trigger(trigger, trigger_args),
            'executor': executor,
            'func': func,
            'args': tuple(args) if args is not None else (),
            'kwargs': dict(kwargs) if kwargs is not None else {},
            'id': id,
            'name': name,
            'misfire_grace_time': misfire_grace_time,
            'coalesce': coalesce,
            'max_instances': max_instances,
            'next_run_time': next_run_time
        }
        Filtration
        job_kwargs = dict((key, value) for key, value in six.iteritems(job_kwargs) if
                          value is not undefined)
        #Instantiate specific task objects
        job = Job(self, **job_kwargs)

        # Don't really add jobs to job stores before the scheduler is up and running
        with self._jobstores_lock:
            if self.state == STATE_STOPPED:
                self._pending_jobs.append((job, jobstore, replace_existing))
                self._logger.info('Adding job tentatively -- it will be properly scheduled when '
                                  'the scheduler starts')
            else:
                self._real_add_job(job, jobstore, replace_existing)

        return job

There are not many code of add ﹣ job method. In the beginning, a job ﹣ kwargs dictionary was created, which contains triggers, executors, etc. it is simple and reasonable.

  • Trigger is created by the self. 65123; create ﹣ trigger() method, which requires two parameters. The trigger in the code is actually the interval string, and the trigger ﹣ args is the corresponding parameter.
  • The execuor actuator is currently default, which will be discussed later.
  • Func callback method is the logic that we really want to be executed. The trigger will trigger the scheduler, and the scheduler will call the executor to execute the specific logic.
  • Misfire · grace · time: the note is interpreted as “run the task several seconds after the specified run time”. It can be understood only after reading relevant documents. For example, a task was originally run at 12:00, but it was not scheduled at 12:00 for some reasons. Now it is 12:30. At this time, the difference between the current time and the pre scheduled time will be judged during scheduling. If misfire · grace · time is set to 20, then It is not scheduled to perform the task previously scheduled for failure. If misfire_grace_time is set to 60, it will be dispatched.
  • Coalesce: if a task doesn’t actually run for some reason, resulting in task accumulation. For example, if 10 people are accumulated and coalesce is true, only the last layer will be executed. If coalesce is false, try to execute 10 times in a row.
  • Max_instances: at most several instances can be running at the same time through the task
  • Next run time: next run time of the task

Then a filter is made, and the parameters are passed into the job class to complete the instantiation of the task object.

Later, the logic is relatively simple. First, judge whether you can get the self. “Jobs” lock, which is actually a reentrant lock. In Python, the implementation of reentrant lock is based on the common mutual exclusion lock, but there is only one more variable for counting. Each time you add a lock, the variable is added, and each time you unlock the variable is reduced by one. Only when the variable is 0, can you really release the mutual exclusion lock.

After getting the lock, the state of the current scheduler is judged first. If it is STATE_STOPPED (stop state), the task is added._pending_jobsIn the pending list, if it is not in the stop state, call_real_add_jobMethod, and then return the job object.

actually_real_add_jobMethod is the real way to add the task object job to the specified storage backend.

When the task object is added to the specified storage backend (directly stored in memory by default), the scheduler will fetch it for execution.

Back in the example code, after executing the add ﹣ job method of the scheduler, the start method of the scheduler is executed.

Ending

Considering the number of words, this article will be here first, and then we will continue to analyze the apscheduler.

If the article is helpful to you, click “reading” to support two Liang. For the next article, see.

Python timing task framework: source code analysis of apscheduler (1)