Several common methods of Python crawler scheduling tasks (recommended)

Time:2021-9-24

I remember that the windows task timing can be used normally in the past. I tried it today and found that it can’t be used normally. The task plan is always suspended. Next, record several solutions to the python crawler timing task.

1. Method 1: while true

First of all, the easiest thing is to suspend the while true loop. Without nonsense, directly write the code:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import os
import time
import sys
from datetime import datetime, timedelta
def One_Plan():
  #Set start cycle
  Second_update_time = 24 * 60 * 60
 #Current time
 now_Time = datetime.now()
 #Set task start time
 plan_Time = now_Time.replace(hour=9, minute=0, second=0, microsecond=0)
 #Set the difference, - 1 day, 21:48:53.246576, similar to this
 #Time. Sleep () needs to pass in int, so. Total is used below_ seconds()
 #It is mainly used to calculate the difference and return int. for specific functions, you can consult relevant data by yourself
 delta = plan_Time - now_Time
 first_plan_Time = delta.total_seconds() % Second_update_time
 print("Sleep required% d seconds from the first execution" % first_plan_Time)
 return first_plan_Time
#While ture code block, suspend the program, and call the function name for execution after the sleep time
while True:
 s1 = One_Plan()
 time.sleep(s1)
 #Here are the functions defined by yourself. If you want to run the code, you can replace it with the Hello world function or comment out this line for testing
 exe_file(D_list)
 print("Performing first update")

Personally, I feel that if this method is used to start the scheduled plan, it is no problem if it is a single program and executed once a day. If it is necessary to consider executing multiple tasks a day and need to be executed many times a day, the short board will be highlighted at once.

In the case of work, many factors need to be considered. For example, the crawler program needs to be executed four times at 12 p.m., 6 a.m., 9 a.m. and 3 p.m., and four crawlers need to be executed at the same time. It also needs to consider whether the network is stable, how to deal with it if the program hangs, and so on.

2. Method 2: timer module

Previously, I said the simplest timed startup, which can be said to be the simplest and rough. Life is short and Python is elegant. Is there one that is very simple and can be done in a few lines of code? There must be! For a simple example, it is mentioned at the end of the previous method that other factors need to be considered, which is why:

Now you need to start a selenium crawler and use Firefox driver + multithreading. As we all know, now the computer housekeeper shows that the CPU utilization rate is 20%. After starting selenium, you keep turning on the browser + multithreading. Well, after 5 minutes, the CPU utilization rate is directly pulled to 90%. The computer card flies up. Although the timing program is still running, it is similar to the standby state, Do you suddenly feel stuck in the face of the computer? Your first reaction: lying in the slot, this LJ computer can’t run any program. I still write so many codes, ******!!

Right? Next, the code. For specific functions, please refer to relevant materials for further study:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from datetime import datetime
from threading import Timer
import time
#Timed task
def task():
 print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def timedTask():
 '''
 First parameter: how long to delay the execution of the task (seconds)
 Second parameter: the function to execute
 The third parameter: the parameter of the calling function (tuple)
 '''
 Timer(5, task, ()).start()
while True:
 timedTask()
 time.sleep(5)

7 lines of code, isn’t it elegant? The main reason is that there is little code and it doesn’t take much effort, right.

2020-06-05 14:06:39 
2020-06-05 14:06:44 
2020-06-05 14:06:49 
2020-06-05 14:06:54 
2020-06-05 14:06:59 
2020-06-05 14:07:04 
2020-06-05 14:07:09 
2020-06-05 14:07:14 
2020-06-05 14:07:19 
2020-06-05 14:07:24

3. Method 3. Sched module

This time, go directly to the module – sched module

The code is as follows:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from datetime import datetime
import sched
import time
def timedTask():
 #Initialize the scheduler class of the scheduled module and pass in the two parameters (time. Time, time. Sleep)
 schedscheduler = sched.scheduler(time.time, time.sleep)
 #Add scheduling task, enter (sleep time, execution level, execution function)
 scheduler.enter(5, 1, task)
 #Run task
 scheduler.run()
#Timed task
def task():
 print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
if __name__ == '__main__':
 timedTask()

This module is also easy to use. It should be noted that the scheduler () will only execute once to end the program. You can add while ture under Mian or directly add a scheduling task in TimeTask. In addition, there is another writing method. The code is as follows:

?
1
2
3
4
5
6
7
8
9
10
11
import schedule
import time
def hellow():
 print('hellow')
def Timer():
 schedule.every().day.at("09:00").do(hellow)
 schedule.every().day.at("18:00").do(hellow)
 while True:
  schedule.run_pending()
  time.sleep('sleep cycle required ')
Timer()

Several common methods of Python crawler scheduling tasks (recommended)

It can be seen here that there is a day hour minute, which is very convenient for timing tasks. Add the time to sleep in while true and the number of times to execute in the function module.

This is the end of this article on several common methods of Python crawler scheduled tasks. For more information about Python crawler scheduled tasks, please search for previous articles on developeppaper or continue to browse the relevant articles below. I hope you will support developeppaper in the future!