Shell Scripting best practices

Time:2021-9-13

preface

Due to work needs, I recently started cleaning up shell scripts. Although most commands are often used by themselves, they are always ugly when writing scripts. And when I look at scripts written by others, I always find it difficult to read. After all, shell script is not a serious programming language. It is more like a tool to mix different programs for us to call.
Linux教程
Therefore, when many people write, they also think of where to write. They are basically like an ultra long main function and can’t bear to look directly at it. At the same time, due to historical reasons, there are many different versions of the shell, and there are also many commands with the same functions that need to be chosen, so that the code specification is difficult to unify.

Considering the above reasons, I checked some relevant documents and found that many people have considered these problems, and some good articles have been formed, but they are still a little scattered. So I sort out these articles a little here as the technical specification for my own script in the future.

Code style specification

It starts with “snake stick”

The so-called shebang actually appears on the first line of many scripts to #! The comment at the beginning indicates the default interpreter when we do not specify an interpreter. Generally, it may be as follows:

#!/bin/bash

Of course, there are many kinds of interpreters. In addition to bash, we can use the following command to view the locally supported interpreters:

$ cat /etc/shells#/etc/shells: valid login shells/bin/sh/bin/dash/bin/bash/bin/rbash/usr/bin/screen

When we directly use. / a.sh to execute this script, if there is no shebang, it will use the interpreter specified by $shell by default, otherwise it will use the interpreter specified by shebang.

This method is our recommended use.

The code has comments

Annotation is obviously a common sense, but it should be emphasized here. This is particularly important in shell scripts. Because many single line shell commands are not so easy to understand, without comments, it will make people’s head especially big in maintenance.

The meaning of annotation is not only to explain the purpose, but to tell us the precautions, just like a readme.

Specifically, for shell scripts, comments generally include the following parts:

  • shebang
  • Script parameters
  • Purpose of script
  • Script considerations
  • Script writing time, author, copyright, etc
  • Notes before each function
  • Some complex single line command comments

Parameters should be standardized

This is very important. When our script needs to accept parameters, we must first judge whether the parameters meet the specifications and give appropriate echo to facilitate users to understand the use of parameters.

At least, at least, let’s judge the number of parameters:

if [[ $# != 2 ]];then    echo "Parameter incorrect."    exit 1fi

Variables and magic numbers

Generally, we will define some important environment variables at the beginning to ensure the existence of these variables.

source /etc/profileexport PATH=”/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/apps/bin/”

This definition method has a very common use. The most typical application is that when we install many Java versions locally, we may need to specify a java to use. Then we will redefine Java at the beginning of the script_ Home and path variables. At the same time, a good piece of code usually does not have many “magic numbers” hard coded in the code. If it is necessary, it is usually defined in the form of a variable at the beginning, and then calls directly to the variable when it is called, which makes it convenient for future modifications.

Indentation has rules

For shell scripts, indentation is a big problem. Because many places that need to be indented (such as if and for statements) are not long, many people are lazy to indent, and many people are not used to using functions, resulting in the weakening of the indenting function.

In fact, correct indentation is very important, especially when writing functions. Otherwise, it is easy to confuse the function body with the directly executed commands when reading.

Common indentation methods include “soft tab” and “hard tab”.

  • The so-called soft tab is indented with n spaces (n is usually 2 or 4)
  • The so-called hard tab of course refers to the real \ T character
  • We can only say that each has its own advantages and disadvantages. Anyway, I’m used to using hard tab.
  • For if and for statements, we’d better not write the keywords then and do on a single line, which looks ugly…

Naming criteria

The so-called naming convention basically includes the following points:

  • The file name is standardized and ends with. Sh for easy identification
  • Variable names should have meaning and should not be misspelled
  • Unified naming style. Shell is usually written in lowercase letters and underlined

The coding shall be unified

When writing scripts, try to use UTF-8 coding, which can support some strange characters such as Chinese. However, although I can write Chinese, I still try to write comments and log in English. After all, many machines still do not directly support Chinese, and there may be garbled codes when typing. It is also important to note that when we write shell scripts in UTF-8 under windows, we must pay attention to whether the UTF-8 has a BOM. By default, windows judges the UTF-8 format by adding three EF BB BF bytes at the beginning of the file, but in Linux, there is no BOM by default. Therefore, if we are writing scripts under windows, we must pay attention to changing the code to UTF-8 without BOM. Generally, we can change it with editors such as Notepad + +. Otherwise, when running under Linux, it will recognize the first three characters and report some unrecognized command errors. Of course, another common problem with cross platform scripting is that the newline characters are different. The default is \ R \ n for windows and \ R \ n for UNIX. However, there are two gadgets that can easily solve this problem: dos2unix and UNIX 2dos.

Remember to add permissions

Although this point is very small, I often forget that without execution permission, it can not be executed directly, which is a little annoying…

Log and echo

Needless to say, the importance of logs is very important in large-scale projects.

If this script is used by users directly on the command line, we’d better be able to echo the execution process in real time during execution, which is convenient for users to control.

Sometimes, in order to improve the user experience, we will add some special effects to the echo, such as color and flicker. For details, please refer to the introduction of ANSI / VT100 control sequences.

Password to remove

Don’t hard code the password in the script, don’t hard code the password in the script, don’t hard code the password in the script.

Say the important things three times, especially when the script is hosted on a platform like GitHub…

Too long to branch

When calling some programs, the parameters may be very long. At this time, in order to ensure a better reading experience, we can use a backslash:

./configure \–prefix=/usr \–sbin-path=/usr/sbin/nginx \–conf-path=/etc/nginx/nginx.conf \

Notice a space before the backslash.

Coding detail specification

Code efficiency

When using a command, you should understand the specific methods of the command. Especially when the data processing capacity is large, you should always consider whether the command will affect the efficiency.

For example, the following two sed commands:

sed -n '1p' filesed -n '1p;1q' file

They have the same function. They all get the first line of the file. However, the first command reads the entire file, while the second command reads only the first line. When the file is large, just such a command will cause a huge efficiency difference.

Of course, this is just an example. The correct usage of this example should be to use the head – N1 file command…

Use double quotation marks frequently

Almost all bigwigs recommend that you use “$” to get variables with double quotes.

Without double quotation marks will cause great trouble in many cases. Why? For example:

#!/ Bin / sh# it is known that the current folder has a file of a.sh var = "*. Sh" echo $varecho "$var"

His operation results are as follows:

a.sh*.sh

Why is this? In fact, it can be explained that he executed the following commands:

echo *.shecho "*.sh"

In many cases, when using variables as parameters, we must pay attention to the above point and carefully experience the differences. The above is just a very small example. In practical application, there are too many problems caused by this detail…

Skillfully using main function

We know that compiled languages such as Java and C have a function entry. This structure makes the code readable. We know which ones are executed directly and which are functions. But the script is different. The script belongs to an explanatory language. It is executed directly from the first line to the last line. If the commands and functions are mixed together, it is very difficult to read.

Friends who use Python know that a standard Python script is at least as follows:

#!/usr/bin/env pythondef func1():    passdef func2():    passif __name__=='__main__':    func1()    func2()

He used a very clever method to implement the main function we are used to, making the code more readable.

In the shell, we have similar tips:

#!/usr/bin/env bashfunc1(){    #do sth}func2(){    #do sth}main(){    func1    func2}main "[email protected]"

We can use this writing method to implement a similar main function to make the script more structured.

Consider scope

The default variable scope in the shell is global, such as the following script:

#!/usr/bin/env bashvar=1func(){    var=2}funcecho $var

His output is 2 instead of 1, which obviously does not conform to our coding habits and is easy to cause some problems.

Therefore, compared with using global variables directly, we’d better use commands such as local readonly. Secondly, we can use declare to declare variables. These methods are better than global methods.

Function return value

When using functions, it must be noted that the return value of a function in the shell can only be an integer. It is estimated that in general, the return value of a function usually represents the running state of the function, so it is generally 0 or 1. Therefore, it is designed like this. However, if you have to pass a string, you can also use the following alternative methods:

func(){    echo "2333"}res=$(func)echo "This is from $res."

In this way, you can pass some additional parameters through echo or print.

Indirect reference value

What is indirect reference? For example, the following scenario:

VAR1="2323232"VAR2="VAR1"

We have a variable VAR1 and a variable var2. The value of var2 is the name of VAR1. Now we want to obtain the value of VAR1 through var2. What should we do at this time?

The way to compare woodlouse is this:

eval echo $$VAR2

What do you mean? In fact, it constructs a string echo XXX, which is “XXX”, which is the value VAR1 of var2, and then uses Eval to force parsing, so as to achieve the value in disguise.

This usage is indeed feasible, but it looks very uncomfortable and difficult to understand intuitively. We don’t recommend it. In fact, we don’t recommend using eval.

The more comfortable way to write is as follows:

echo ${!VAR1}

By adding one before the variable name! You can do a simple indirect reference.

However, it should be noted that with the above method, we can only take values, not assign values. If you want to assign values, you should honestly use Eval:

VAR1=VAR2eval $VAR1=233echo $VAR2
Skillfully using heredocs

The so-called heredocs can also be regarded as a multi line input method<

Using heredocs, we can easily generate some template files:

cat>>/etc/rsyncd.conf << EOFlog file = /usr/local/logs/rsyncd.logtransfer logging = yeslog format = %t %a %m %f %bsyslog facility = local3EOF
Learn to check the path

In many cases, we will first obtain the path of the current script, and then use this path as the benchmark to find other paths. Usually we use PWD directly to get the path of the script.

But in fact, this is not rigorous. PWD obtains the execution path of the current shell, not the execution path of the current script.

The correct approach should be the following two:

script_dir=$(cd $(dirname $0) && pwd)script_dir=$(dirname $(readlink -f $0 ))

You should CD into the directory of the current script before PWD, or read the path of the current script directly.

Keep the code short

The brevity here refers not only to the length of the code, but to the number of commands used. In principle, we should ensure that problems that can be solved by one order will never be solved by two orders. This is not only related to the readability of the code, but also related to the execution efficiency of the code.

The most classic examples are as follows:

cat /etc/passwd | grep rootgrep root /etc/passwd

The most despised use of cat command is like this. It doesn’t make any sense. It’s clear that a command can be solved. He has to add a pipe…

In fact, short code can also ensure the improvement of efficiency to some extent, such as the following example:

#method1find . -name '*.txt' |xargs sed -i s/233/666/gfind . -name '*.txt' |xargs sed -i s/235/626/gfind . -name '*.txt' |xargs sed -i s/333/616/gfind . -name '*.txt' |xargs sed -i s/233/664/g#method1find . -name '*.txt' |xargs sed -i "s/233/666/g;s/235/626/g;s/333/616/g;s/233/664/g"

Both methods do the same thing, that is, find all files with. TXT suffix and make a series of replacements. The former executes find multiple times, while the latter executes find once, but adds the mode string of sed. The first one is more intuitive, but when the amount of replacement becomes larger, the second one will be much faster than the first one. The reason for the efficiency improvement here is that the second kind only needs to execute the command once, while the first kind needs to execute multiple times. Moreover, by skillfully using xargs command, we can also easily parallelize:

find . -name '*.txt' |xargs -P $(nproc) sed -i "s/233/666/g;s/235/626/g;s/333/616/g;s/233/664/g"

Specifying the parallelism through the – P parameter can further speed up the execution efficiency.

Command parallelization

When we need to fully consider execution efficiency, we may need to consider parallelization when executing commands. The simplest parallelization in the shell is done through the “&” and “wait” commands:

func(){    #do sth}for((i=0;i<10;i++))do    func &donewait

Of course, the number of parallels here cannot be too many, otherwise the machine will get stuck. The slightly correct method is more complex, which will be discussed later. If the figure is easy, you can use the parallel command or xargs mentioned above.

Full text retrieval

We know that when we want to retrieve a string (such as 233) from all txt files in the folder, we may use a command like this:

find . -name '*.txt' -type f | xargs grep 2333

In many cases, this command will find the corresponding matching line as we want, but we need to pay attention to two small problems.

The find command will match the required file name, but if the file name contains spaces, there will be a problem when passing the file name to grep. The file will be treated as two parameters. At this time, a layer of processing will be added to ensure that the file name separated by spaces will not be treated as two parameters:

find . -type f|xargs -i echo '"{}"'|xargs grep 2333

Sometimes, the character set of the file may be inconsistent with the character set of the terminal, which will cause grep to treat the file as a binary file during search, and report binary file matches. At this time, either use character set conversion tools such as iconv to switch the character set, or add the – a parameter to grep without affecting the search, and treat all files as text files:

find . -type f|xargs grep -a 2333

Use new writing

The new writing here does not mean how powerful it is, but that we may prefer to use some newly introduced syntax, which is more biased towards code style, such as

Try to use func () {} to define functions instead of func {}

Try to use [[]] instead of []

Try to use $() to assign the result of the command to a variable instead of backquotes

In complex scenes, try to use printf instead of echo for echo

In fact, many of the functions of these new writing methods are more powerful than the old writing methods. You will know when you use them.

Other tips

Considering that there are many fragmentary points, we won’t start them one by one. Let’s briefly mention them here.

The path should be absolute as far as possible. The vast number of paths is not easy to make mistakes. If you have to use relative paths, you’d better use. / modifier

Give priority to replacing awk sed with bash variable replacement, which is shorter

For simple if, try to use & & 𞓜 and write it in a single line.

For example [[x > 2]] & & echo X

When exporting variables, try to add the namespace of the sub script to ensure that the variables do not conflict

Trap will be used to capture the signal and perform some finishing work when the termination signal is received

Generate temporary files or folders using mktemp

Use / dev / null to filter unfriendly output information

It will use the return value of the command to judge the execution of the command

Before using the file, judge whether the file exists, otherwise handle the exception

Do not process the data after LS (such as LS – L | awk ‘{print $8}’). The result of LS is very uncertain and platform related

When reading a file, do not use for loop, but use while read

When copying a folder with the CP – R command, note that if the destination folder does not exist, it will be created. If it does exist, it will be copied to the subfolder of the file

Static check tool shellcheck

summary

In order to ensure the quality of scripts from the system, our simplest idea is probably to build a static inspection tool to make up for the possible knowledge blind spots of developers by introducing tools.

There are not many static shell checking tools on the market. You can find a tool called shellcheck. It is open-source on GitHub. There are more than 8K stars, which looks very reliable. We can go to his homepage for specific installation and use information.

install

This tool has great support for different platforms. It at least supports the mainstream package management tools of Debian, arch, Gentoo, EPEL, Fedora, OS X, openSUSE and other platforms. Easy installation. Please refer to the installation document for details

integrate

Since it is a static inspection tool, it can be integrated into the CI framework. Shellcheck can be easily integrated into Travis CI for static inspection of projects with shell script as the main language.

Sample

The gallery of bad code in the document also provides a very detailed standard of “bad code”, which has a very good reference value. It’s very comfortable to read it as a book such as “Java puzzlers” when you’re free.

essence

However, in fact, I think the most outstanding part of this project is not the above function, but he has provided a very very powerful wiki. In this wiki, we can find all the judgment basis of the tool. Here, each detected problem can find the corresponding problem number in the wiki. It not only tells us “it’s not good to write like this”, but also tells us “why it’s not good” and “how we should write like this”, which is very suitable for the party’s further research.

The above isLiangxu tutorial networkShell script programming best practices shared by friends.

This article is composed of blog one article multi posting platformOpenWriterelease!