Linux text processing three swordsman’s awk learning notes 09: argc and argv, etc

Time:2021-10-24

brief introduction

Argc and argv are predefined variables of awk.

Argc stores the number of CLI parameters of awk (argument count); Argv (argument value) is an array variable. Although it is an associative array, its subscript is a value starting from 0 (of course, internally recognized as a string), which stores each parameter in the CLI execution of awk.

# awk -va=1 -F: 'BEGIN{print ARGC;for(i in ARGV){print i,ARGV[i]}}' b=3 a.txt b.txt
4
0 awk
1 b=3
2 a.txt
3 b.txt

Through this example, we can find that. The parameters of the array include the awk command itself (and it is the first parameter with an array subscript of 0), optional parameters and non optional parameters (seehere)。

ARGV[0] ==> "awk"
ARGV[1] ==> "b=3"
ARGV[2] ==> "a.txt"
ARGV[3] ==> "b.txt"

Therefore, the value of argc is 4, that is, argc is also the array length (argv) of argv. Argv does not contain options and specific awk codes.

-va=1
-F:
'BEGIN{print ARGC;for(i in ARGV){print i,ARGV[i]}}'

The number / length of argv elements is equal to argc only at the beginning of awk execution. We can modify these two predefined variables during code execution.

The contents of argv and argc determine how to treat parameters during this awk execution. The pseudo code looks like the following:

for(i=1;i

Only reducing the argc value will result in the loss of parameters that should have been read, that is, the tail file may not be read.

awk '{print}' a.txt b.txt
awk 'BEGIN{ARGC--}{print}' a.txt b.txt

Only increasing the value of argc has no effect on the results.

awk 'BEGIN{ARGC++}{print}' a.txt b.txt

If we add or delete elements in argv, we need to change the value of argc at the same time to make the result what we expect.

awk 'BEGIN{ARGV[ARGC]="c.txt"}{print}' a.txt b.txt
awk 'BEGIN{ARGV[ARGC++]="c.txt"}{print}' a.txt b.txt

Here, for the three files, the original argc value should be 4. We manually set it to 3, so only a.txt and b.txt will be read.

awk 'BEGIN{ARGC=3}{print}' a.txt b.txt c.txt

If we delete argv [1] (i.e. a.txt) or leave it blank, awk will still read only two parameters (excluding the awk command itself). The first parameter is an empty string and the second parameter is b.txt.This position will not be replaced by subsequent because a parameter is missing.c. Txt is always unreadable. That is, empty string parameters can also be read.

awk 'BEGIN{ARGC=3;delete ARGV[1]}{print}' a.txt b.txt c.txt
awk 'BEGIN{ARGC=3;ARGV[1]=""}{print}' a.txt b.txt c.txt
# awk 'BEGIN{print ARGC}' a.txt b.txt c.txt ""
5

Let’s also look at a few predefined variables.

Filename: as the name suggests, the file name that stores the file currently being processed.

awk 'FNR==1{print FILENAME}' a.txt b.txt c.txt

Argind: the index value of the parameter (argument index). Store the index value of the file you are currently working on in argv. Therefore, when the parameter awk is processing happens to be a file, “filename = = argv [argind]” will always return true.

Environ: This is an array variable that holds the environment variables of the shell.

Environ ["shell_env"] # here's shell_ Env should be replaced with the specific environment variable under the shell.
# echo $SHELL
/bin/bash
# awk 'BEGIN{print ENVIRON["SHELL"]}'
/bin/bash
# echo $HOME
/root
# awk 'BEGIN{print ENVIRON["HOME"]}'
/root

 

Wonderful use of argc and argv

When we use awk to process files, if the permission of the file is insufficient or the file does not exist, awk will report an error and exit, and will no longer process the subsequent files of the error reported file.

Note: I use alongdidi, an ordinary user, to test here. The root user has large permissions and can still be read even if the permissions are 000.

$ ls -l {a,aa,aaa}.txt
ls: cannot access aaa.txt: No such file or directory
---------- 1 alongdidi alongdidi   0 Jan 26 16:44 aa.txt
-rw-rw-r-- 1 alongdidi alongdidi 566 Jan 26 16:47 a.txt
$ awk '{print}' {a,aa,aaa}.txt
ID  name    gender  age  email          phone
... ...
awk: cmd. line:1: fatal: cannot open file `aa.txt' for reading (Permission denied)
#This is the end, and aaa.txt will not be processed
$ awk '{print}' {a,aaa}.txt
ID  name    gender  age  email          phone
... ...
awk: cmd. line:1: fatal: cannot open file `aaa.txt' for reading (No such file or directory)

We can use argc and argv to remove those files that cannot be read normally from the files to be processed.

The idea is as follows:

  • The file to be processed must be modified before main, so the code should be written in begin.
  • Determine the readability of the file in combination with the return value feature of getline. Remember to close the file with normal getline.
  • The parameters may be optional parameters (variable assignment var = Val) or standard inputs (- and / dev / stdin), which need to be excluded.
  • Since the argv element is deleted, do not reduce the argc value, otherwise the normal file at the end may not be processed.

 

$ cat ARGCandARGV.awk 
BEGIN{
    for(i=1;i

It is executed by the ordinary user alondidi to see the results.

$ awk -f ARGCandARGV.awk a=3 {a,aa,aaa}.txt - /dev/stdin
1 a=3
2 a.txt
3 
4 
5 -
6 /dev/stdin

Variable assignment and standard input are skipped and retained in argv. The file a.txt that can be read normally is reserved, and the unauthorized read (AA. Txt) and nonexistent file (AAA. Txt) are removed from argv.

Therefore, this re executable code will not report an error. Users can also write the main code in the file specified by – F, and I directly specified it with – E.

$ awk -f ARGCandARGV.awk -e '{print}' a=3 {a,aa,aaa}.txt - /dev/stdin
1 a=3
2 a.txt
3 
4 
5 -
6 /dev/stdin
ID name gender age email phone # because we correctly use close() in the code, the file still starts from the first line when printing.
... ...

 

Recommended Today

OC basis

IOS development interview essential skills chart.png What are objects and what are the objects in OC? An object is an instance of a class; Is an instance created through a class, which is generally called an instance object; Common objects in OC include instance objects, class objects, and metaclass objects; What is a class? What […]