File lock of Linux shell

Time:2021-6-14

It is often necessary to block other processes in shell scripts, such as the mail queue script that comes with msmtp. The mutual exclusion of this script is incorrect. Here are three correct ways to achieve mutual exclusion through files.

1. Flock of util Linux

This command has two uses:   flock LOCKFILE COMMAND   ( flock -s 200; COMMAND; ) 200 > lockfilefloat needs to keep the lock file open, which is not convenient for the second mode of use, and the file handle specified in – S mode may conflict. The advantage is that there is no need to explicitly unlock, and the lock must be released after the process exits.

2. Dotlockfile of liblockfile1

Known as the most flexible and reliable file lock implementation. The waiting time is the same   – R is related to the specified number of retries. The retrial time is sum (5,10,…, min (5 * n, 60),…). The lock file does not need to be kept open. The problem is that the trap exit is needed to ensure that the lock file is deleted when the process exits

3. Lockfile of Procmail

Similar to dotlockfile, but you can create multiple lock files at one time
 
There are two simple ways to implement file lock in shell.

One is to use ordinary files to check whether a specific file exists when the script starts. If it exists, wait for a period of time and continue to check until the file does not exist. The file is created and deleted at the end of the script. In order to ensure that the file can still be deleted when the script exits abnormally, we can use the trap “CMD” exit term int command. Generally, such files are stored in the / var / lock / directory, which will be cleaned up by the operating system at startup.
Another way is to use the flock command. The use of this command is as follows. The advantage of this command is to wait for the action to complete in the flock command without adding additional code.
( flock 300 …cmd… flock -u 300 ) > /tmp/file.lock
However, there is a flaw in flock. After opening the flock, the fork () subprocess will also have a lock. If there is a running Daemon in the flock, you must make sure that the daemon has closed all the file handles at startup, otherwise the file will not be unlocked because the daemon keeps it open.

An example of Linux SHELL file lock
Recently, I saw a lot of discussions about how to prevent scripts from executing repeatedly. In fact, it is the concept of file lock. I wrote a small example:
Using this as the beginning of the file does not result in duplicate execution( I think as like as two peas, the two executable script should not appear frequently.


#!/bin/bash
LockFile()
{
 find/dev/shm/* -maxdepth 0 -type l -follow -exec unlink {} \;
 [ -f /dev/shm/${0##*/}]&&exit
 ln -s /proc/$$/dev/shm/${0##*/}
 trap "Exit" 0 1 2 3 15 22 24
}
Exit()
{
 unlink /dev/shm/${0##*/};
 exit 0;
}
LockFile
# main program
# program ......
#Exit

/The function of var / lock / subsys directory
Many programs need to judge whether there is an instance running. This directory is a sign for the program to judge whether there is an instance running. For example, xinetd. If this file exists, it means that xinetd is already running, otherwise it is not. Of course, there should be corresponding judgment measures in the program to really determine whether there is an instance running.
Usually, the directory / var / run is used to store the PID of the corresponding instance. If you write a script, you will find that the combination of these two directories is very convenient to judge whether many services are running, the information about running, etc.
   In fact, to judge whether the file is locked or not is to judge whether the file is locked or not, so the existence of the file implies whether it is locked or not. The content of this directory does not necessarily mean that it is locked, because many services use touch to create the lock file in the startup script. At the end of the system, the script is responsible for clearing the lock, which itself is not reliable (for example, the lock file still exists due to an unexpected failure). I usually combine the PID file in the script (if there is a PID file), Get the PID of the instance from the PID file, and then use PS to test whether the PID exists, so as to judge whether the instance is really running. A more reliable method is to use process communication, but in this case, it can’t be done by script alone.
 
The flood command belongs to the util-linux-2.13-0.46.fc6 package in my system. If there is no such command, try to update the util Linux package in your system.
Explain the reason for this command:
There was a discussion on serialization of scripts written by brother Woodie in the forum, which has been very perfect.
But the command “flock” not only combines well with shell script, but also is similar to the usage of “flock” function in C / Perl / PHP and other languages. In contrast, the content of brother Woodie’s article needs a lot of shell skills to understand.
The two formats are as follows:
       flock [-sxon] [-w timeout] lockfile [-c] command…

       flock [-sxun] [-w timeout] fd
Here are the parameters:
-S is the shared lock. During the time of setting the shared lock on the FD directed to a file but not releasing the lock, the request of other processes trying to set the exclusive lock on the FD directed to this file fails, while the request of other processes trying to set the shared lock on the FD directed to this file will succeed.
-E is exclusive or exclusive lock. During the time when the exclusive lock is set on the FD directed to a file but not released, other processes will fail to set the shared lock or exclusive lock on the FD directed to this file. As long as the – S parameter is not set, this parameter is set by default.
-U manual unlocking is not necessary in general. When FD is closed, the system will automatically unlock. This parameter is used when some script commands need to be executed asynchronously and some can be executed synchronously.
-N is the non blocking mode. When the attempt to set the lock fails, use the non blocking mode, return 1 directly, and continue to execute the following statement.
-W sets the blocking timeout. When the set number of seconds is exceeded, it jumps out of the blocking, returns 1, and continues to execute the following statement.
-O can only be used when the first format is used, which means that FD setting lock is closed before command is executed, so that the child process of command does not keep lock.
-C executes the subsequent command.
Let’s take a practical example


#!/bin/bash
{
flock -n 3
[ $? -eq 1 ] && { echo fail; exit; }
echo $$
sleep 10
} 3<>mylockfile

The function of this example is that when a script instance is executing, another process trying to execute the script will fail and exit.
The sentence “sleep” can be replaced by the statement segment you need to execute.
Please note here that I use < > to open mylockfile because the directed file descriptor is executed before the command. Therefore, if you need to read and write the mylockfile file in the statement segment you want to execute, for example, you want to get the PID of the previous script instance and write the PID of this script instance to mylockfile. At this time, directly using > to open mylockfile will clear the last saved content, while using < to open mylockfile will cause an error when it does not exist. Of course, these problems can be solved by other methods. I just point out the most common method.

Background

Several posts on Cu discussed a practical problem, that is, how to limit only one script instance to run at the same time. Among them, new and old Banzhu and other netizens participated in the discussion, but brother fantblue’s post inspired you most. Many of the contents in the following background introduction are from him. Thanks to fantblue, Banzhu and other friends!
Woodie summarizes the existing results, which can be divided into two kinds of ideas
1、 The simple way is to find out the number of scripts that have been run by using the PS command. If it is greater than or equal to 2 (don’t forget to count yourself in), Exit the current script, equal to 1, then run. This method is simple, but there are some problems
First of all, there are many pitfalls when PS gets the number of script file processes. For example, sometimes PS cannot get the name of the script file;
Even if you can get PS to the script name, if you use the pipeline, you will get strange results on most platforms due to the subshell. Sometimes you get the number a, sometimes you get the number B, which makes you at a loss;
Even if the counting problem has been solved, there is still a problem, but it is not too serious: if two script instances count at the same time, obviously the number should be equal to 2, so both of them quit. So at this point in time, no script is executing;

2、 Lock method. The script tries to get a “lock” at the beginning of execution. If it gets a lock, it will continue to execute, otherwise it will exit.
There are also some problems in the locking method, which mainly focus on two aspects
First, how to avoid race condition when locking. That is, how to find some “atomic” operations so that the locking action can be completed in one step without interruption. Otherwise, the following situations may occur:
Script 1 detects that no lock is occupied;
Then script 2 also detects that no lock is occupied;
Script 1 locks and starts execution;
Then script 2 locks (wrongly) and starts execution;
See, two scripts are executed at the same time.:(
Some possible locking “atomic” operations are as follows:
1. Create a directory. When a process is successfully created, other processes will fail;
2. Symbolic link: ln – S. after a link is created, the LN – s command of other processes will make an error;
3. The competition of the first line of the file, multiple processes write to the file in the form of append at the same time, only one process writes to the first line of the file, because it is impossible to have two first lines^_^
4. Lock tools of other software packages, usually binary programs in C language, can be written by oneself.
At present, the problem of locking can be solved.
Second, find a way to avoid “deadlock”, which means that although the “lock” is occupied, there is no script executing. This is usually after the script exits unexpectedly and has no time to release the occupied “lock”. Such as exit after receiving some system signals, exit after unexpected power failure of the machine, etc.
In the former case, trap can be used to capture some signals and release the lock before exiting; But some signals cannot be captured.
For the latter, we can use the script to automatically delete the lock after the machine is restarted. But it’s a bit of a problem.
So ideally, the script itself detects the deadlock and releases it. However, the difficulty lies in how to find an “atom” operation, which can detect and delete deadlocks in one step. Otherwise, the same race condition will appear. For example:
Deadlock detected in process 1;
Process 2 detects a deadlock;
Process 1 removes the deadlock;
Process x (or process 1 itself) locks and starts running;
Process 2 (incorrectly) removes the deadlock;
At this time, the lock is not occupied, so any process can be locked and put into operation.
In this way, two processes run at the same time(
Unfortunately, Woodie has not found a suitable “atomic” operation after the discussion so far( We just found a slightly better way: to use the inode of the file as the identification when deleting, so the new locks created by other processes (although the file name is the same, the probability of the same inode is relatively small) are not easy to be accidentally deleted. This method is nearly perfect, but there is still a small chance of deletion, which can not be said to be 100% safe. Alas, the mountains are heavy and the waters are complex, and there is no way out(

Recently, some netizens asked this question, which prompted me to think again. I developed an idea from my previous one, changed a way of thinking, and then I felt suddenly enlightened. Don’t dare to hide, write it out, please debug^_^

The basic idea is: learn from the concept of critical area in multiprocess programming, if each process enters the critical area we set up, it can only enter one by one, can’t it guarantee that only one script runs at a time? How to establish such a critical zone? I have come up with a method, that is, using a pipeline, multiple processes can write to the same pipeline, and can only enter line by line. Correspondingly, the other end can also read out line by line. In this way, we can realize the “serialization” of multiple processes executing in parallel when they enter the critical area. This is similar to the method of the append file posted by brother fantblue.
We can ask parallel processes to write a line of requests to a pipeline at the same time, the content of which is the process number, and read these requests in sequence at the other end of the pipeline, but only the first request will get a “token” and be allowed to start running; Subsequent requests will be ignored, and the corresponding process will exit without a token. This ensures that only one process runs at any time (strictly speaking, it enters the critical region). When it comes to “token”, friends who are familiar with the history of network development may associate with the token ring architecture of IBM. Only one host can get the token and send data at any time. There is no Ethernet “collision” problem. Unfortunately, just like the microchannel technology, IBM’s technology is good, but it was eventually eliminated. Yes, the concept of token here is borrowed from token ring^_^
When a process is finished, it sends a termination signal to the pipeline, that is, it returns the “token”. After the other end receives it, it starts to select the next process to issue the “token”.
As you may ask, how to solve the deadlock problem? Don’t worry. In the previous discussion, I proposed to take out the deadlock detection and processing code separately and give it to a special process for processing. Here is the concrete practice of this idea. When the task of detecting and deleting deadlock is executed by a special process, there will be no multiple concurrent processes to operate on the same lock, so the material basis of race condition does not exist at all^_^
How about developing this idea to allow multiple processes to be executed at the same time? Certainly. Just set up a counter and stop issuing “token” when the number reaches the limit.
Here’s an implementation of Woodie’s above idea. It’s just a simple test under CentOS 4.2. There may be many errors. Please help us “bug removal”^_^ If you have any problems in your thinking, please give me your advice
Script 1, token. Sh, is responsible for token management and deadlock detection. Like the next script, in order to maintain the maximum compatibility of the script, try to use the Bourne shell syntax, and use printf instead of echo. The usage of SED also try to maintain universality. Here, a named pipe accepts the request, and the token is issued in a file. If you use KSH, maybe you can use coprocessing to implement it. Friends who are familiar with KSH can have a try^_^


#!/bin/sh 
#name: token.sh 
#function: serialized token distribution, at anytime, only a cerntern number of token given out 
#usage: token.sh [number] & 
#number is set to allow number of scripts to run at same time 
#if no number is given, default value is 1 
if [ -p /tmp/p-aquire ]; then 
 rm -f /tmp/p-aquire 
fi 
if mkfifo /tmp/p-aquire; then 
 printf "pipe file /tmp/p-aquire created\n" >>token.log 
else 
 printf "cannot create pipe file /tmp/p-aquire\n" >>token.log 
 exit 1 
fi 

loop_times_before_check=100 
if [ -n "$1" ];then 
 limit=$1 
else 
 # default concurrence is 1 
 limit=1 
fi 
number_of_running=0 
counter=0 
while :;do 
 #check stale token, which owner is died unexpected 
 if [ "$counter" -eq "$loop_times_before_check" ]; then 
  counter=0 
  for pid in `cat token_file`;do 
   pgrep $pid 
   if [ $? -ne 0 ]; then 
    #remove lock 
      printf "s/ $pid//\nwq\n"|ed -s token_file 
      number_of_running=`expr $number_of_running - 1` 
   fi 
  done 
 fi 
 counter=`expr $counter + 1` 

 # 
 if [ "$number_of_running" -ge "$limit" ];then 
  # token is all given out. bypass all request until a instance to give one back 
  pid=`sed -n '/stop/ {s/\([0-9]\+\) \+stop//p;q}' /tmp/p-aquire` 
  if [ -n "$pid" ]; then 
   # get a token returned 
   printf "s/ $pid//\nwq\n"|ed -s token_file 
   number_of_running=`expr $number_of_running - 1` 
   continue 
  fi 
 else 
  # there is still some token to give out. serve another request 
  read pid action < /tmp/p-aquire 
    if [ "$action" = stop ]; then 
     # one token is given back. 
     printf "s/ $pid//\nwq\n"|ed -s token_file 
     number_of_running=`expr $number_of_running - 1` 
    else 
     # it's a request, give off a token to instance identified by $pid 
     printf " $pid" >> token_file 
     number_of_running=`expr $number_of_running + 1` 
    fi 
 fi 
done

——————————————————————————————–
Revision record:
1. Fix a bug in token.sh and replace the original command of sed to delete invalid token with ED. Thank you for pointing out the mistake!
——————————————————————————————–

Script 2: a script executed concurrently — my script. Insert your own code after the line “your code goes here”. What I have is for testing.


#!/bin/sh 
# second to wait that the ditributer gives off a token 
a_while=1 
if [ ! -p /tmp/p-aquire ]; then 
 printf "cannot find file /tmp/p-aquire\n" >&2 
 exit 1 
fi 
# try to aquire a token 
printf "$$\n" >> /tmp/p-aquire 
sleep $a_while 
# see if we get one 
grep "$$" token_file 
if [ $? -ne 0 ]; then 
 # bad luck. :( 
 printf "no token free now, exitting...\n" >&2 
 exit 2 
fi

This script locks the file, but I still have some doubts about this script. I won’t discuss it for the moment. I’ll talk about it later


#!/bin/sh

# filelock - A flexible file locking mechanism.
retries="10"      # default number of retries
action="lock"      # default action
nullcmd="/bin/true"   # null command for lockfile

while getopts "lur:" opt; do
 case $opt in
  l ) action="lock"   ;;
  u ) action="unlock"  ;;
  r ) retries="$OPTARG" ;;
 esac
done
shift $(($OPTIND - 1))

if [ $# -eq 0 ] ; then
 cat << EOF >&2
Usage: $0 [-l|-u] [-r retries] lockfilename
Where -l requests a lock (the default), -u requests an unlock, -r X
specifies a maximum number of retries before it fails (default = $retries).
EOF
 exit 1
fi

# Ascertain whether we have lockf or lockfile system apps

if [ -z "$(which lockfile | grep -v '^no ')" ] ; then
 echo "$0 failed: 'lockfile' utility not found in PATH." >&2
 exit 1
fi

if [ "$action" = "lock" ] ; then
 if ! lockfile -1 -r $retries "$1" 2> /dev/null; then
  echo "$0: Failed: Couldn't create lockfile in time" >&2
  exit 1
 fi
else  # action = unlock
 if [ ! -f "$1" ] ; then
  echo "$0: Warning: lockfile $1 doesn't exist to unlock" >&2
  exit 1
 fi
 rm -f "$1"
fi

exit 0