Tour UNIX – APUE course notes [2]

Time:2021-10-27

Preface

In the last article, we implemented the simplest shell, and this shell only executes bash instructions. What should we do if we want to implement all the commands? Like ls.

First, we should think of parsing parameters, because as long as we parse the parameters, we can call the exec function to execute the command.

Generally speaking,

int mian(argc,**argv)

This is the most common way to pass in command line parameters, so the question is, how does argv parse from string? Many robustness issues need to be considered, such as removing spaces, taking commands, and so on. Let’s first implement how to get and parse the input command.

Parse input command

Here, we should make good use of strtok function to easily segment char [] type strings.
I found many ingenious ways from the answer of stack overflowPortal
I think the following method is the most concise and easy to understand.

enum { kMaxArgs = 64 };
int argc = 0;
char *argv[kMaxArgs];
//Parse command into (argc, * * argv)
int parse_para(char commandLine[]) {
    
    char *p2;
    p2 = strtok(commandLine, " ");
    while (p2 && argc < kMaxArgs-1)
    {
        printf("%s\n",p2);
        argv[argc++] = p2;
        p2 = strtok(0, " ");
    }
    argv[argc] = 0;
    
}

In fact, I prefer C + +

#include <vector>
#include <string>
#include <sstream>

std::string cmd = "mycommand arg1 arg2";
std::istringstream ss(cmd);
std::string arg;
std::list<std::string> ls;
std::vector<char*> v;
while (ss >> arg)
{
   ls.push_back(arg); 
   v.push_back(const_cast<char*>(ls.back().c_str()));
}
v.push_back(0);  // need terminating null pointer

execv(v[0], &v[0]);

Either way, the string we input each time can be converted into argc and * * argv (global variables)
Next, we introduce a function — > getopt

Man 3 getopt can get an example

   getopt()
   The following trivial example program uses getopt() to handle  two  program  options:  -n,
   with no associated value; and -t val, which expects an associated value.

   #include <unistd.h>
   #include <stdlib.h>
   #include <stdio.h>

   int
   main(int argc, char *argv[])
   {
       int flags, opt;
       int nsecs, tfnd;

       nsecs = 0;
       tfnd = 0;
       flags = 0;
       while ((opt = getopt(argc, argv, "nt:")) != -1) {
           switch (opt) {
           case 'n':
               flags = 1;
               break;
           case 't':
               nsecs = atoi(optarg);
               tfnd = 1;
               break;
           default: /* '?' */
               fprintf(stderr, "Usage: %s [-t nsecs] [-n] name\n",
                       argv[0]);
               exit(EXIT_FAILURE);
           }
       }

       printf("flags=%d; tfnd=%d; nsecs=%d; optind=%d\n",
               flags, tfnd, nsecs, optind);

       if (optind >= argc) {
           fprintf(stderr, "Expected argument after options\n");
           exit(EXIT_FAILURE);
       }

       printf("name argument = %s\n", argv[optind]);

       /* Other code omitted */

       exit(EXIT_SUCCESS);
   }

OK ~ now, we can analyze the parameters. The next step is to execute the command. Here, we have to introduce the exec function family of UNIX. The 8.10 function exec is explained in detail.

Execute command

Section 8.3 mentioned that after creating a new sub process with fork function, the sub process often calls an exec function to execute another program. When a process calls an exec function, the program executed by the process is completely replaced by a new program.
Because calling exec does not create a new process, the process ID before and after it has not changed. Exec just uses a new program on disk to replace the body segment, data segment, heap segment and stack segment of the current process.
There are seven different exec functions.

#include <unistd.h>
int execl(const char *pathname, const char *arg0, ... /* (char *)0 */);
int execv(const char *pathname, char *const argv[]);
int execle(const char *pathname, const char *arg0, ... /* (char *)0, char *const envp[] */);
int execve(const char *pathname, char *const argv[], char *const envp[]);
int execlp(const char *filename, const char *arg0, ... /* (char *)0 */);
int execvp(cosnt char *filename, char *const argv[]);
int fexecve(int fd,char *const argv[],char *const envp[]);
Return values of 7 functions: Return - 1 in case of error, and no return value in case of success

In APUE, a long paragraph is explained, mainly focusing on three different differences:

  1. The first difference is that the first four functions take the path name as the parameter, the last two functions take the file name as the parameter, and the last one takes the file descriptor as the parameter.

    • If the filename contains /, it is treated as a pathname.

    • Otherwise, search for executable files in the directories specified by the path environment variable.
      The path variable contains a directory table (called the path prefix): path = / bin: / usr / bin: / usr / local / bin:

    If execlp or execvp finds an executable file using one of the path prefixes, but the file is not an executable file generated by the connection compiler, it is considered to be a shell script. Try calling it with / bin / sh.
    The fexecve function parameter is a file descriptor, which is very important. Because it is a file descriptor, the file can be executed without competition. Otherwise, a privileged malicious user can tamper with the program. (my understanding here), specifically a tocttou problem

    Section 3.3
    TOCTTOU: 
        Basic idea: if there are two file based function calls, and the second call depends on the result of the first call, the program is fragile.
        Because the two calls are not atomic operations, the file may change between the two function calls, resulting in the result of the first call no longer valid.
        The tocttou error in the file system namespace usually deals with those tricks that subvert the file system permissions. These tricks are carried out by cheating the privileged program to reduce the permission control of the privileged file or allowing the privileged file to open a security vulnerability.
  2. The second difference is related to the transfer of parameter table. (I won’t elaborate)

  3. The last difference relates to passing the environment table to the new program.
    Usually, a process is allowed to propagate its environment to its child processes, but sometimes this is the case. When a process wants to formulate a certain environment for the child process, for example, when initializing a newly logged in shell, the login program usually creates a special environment that defines a few variables. When we log in, we can start the file through the shell, Add other variables to the environment.

In fact, there is a more detailed analysis, but I won’t mention too much, because our goal is the stars and the sea, and we can’t lose too much because of small. In fact, I always think that the way to learn this kind of big part is to find a direction first. For example, I want to implement a JAS shell (my own name:), and then use the knowledge of this book to constantly improve my shell. In this, I can’t be comprehensive and meticulous, but I want to be bold and direct ahead. When I realize it in the future, I happen to have read this book about once. I will go back and chew the details slowly, and then update my work.

Careless nonsense, ha ha, half a bucket of water jingled, and all the spectators laughed it off ~

Well, I’ll post an example below, which is changed from the basic shell implemented in Chapter 1. As for the imitate used in it_ Ls implementation, I’ll talk about it in the next chapter ~

Where / home / jasperyang / clionprojects / JAS shell / imitate_ LS is implemented by me. There is no code posted for LS. Please wait patiently for me in the next chapter ~ or you can implement it yourself.

//
// Created by jasperyang on 17-6-6.
//
#include "apue.h"
#include <sys/wait.h>
#include "myerr.h"

static void sig_int(int); /* our signal-catching function */
static int parse_para(char commandLine[]);

enum { kMaxArgs = 64 };
int argc=0; // Number of command line parameters
char *argv[kMaxArgs]; // Command line parameters

int main(void) {
    char buf[MAXLINE];  /* from apue.h */
    pid_t pid;
    int status;

    if(signal(SIGINT,sig_int)==SIG_ERR)
    err_sys("signal error");

    printf("%% ");  /* print prompt (printf requires %% to print %) */
    while(fgets(buf,MAXLINE,stdin) != NULL) {
        if(buf[strlen(buf) -1] == '\n'){
            buf[strlen(buf)-1]=0;   /* replace newline with null */
        }
        if((pid = fork()) < 0) {
            err_sys("fork error");
        } else if (pid == 0){   /* child */
            argc = 0;
            parse_para(buf);
            printf("%s\n",argv[0]);
            if(!strcmp(argv[0],"ls")) {
                if (execv("/home/jasperyang/CLionProjects/Jas-shell/imitate_ls", argv) < 0) {
                    printf("execv error: %s\n", strerror(errno));
                    exit(-1);
                }
            }
            else {
                err_ret("couldn't execute: %s", buf);
            }
            exit(127);
        }

        /* parent */
        if((pid = waitpid(pid,&status,0)) < 0)
            err_sys("waitpid error");
        printf("%% ");
    }
    exit(0);
}

//Interrupt signal
void sig_int(int signo) {
    printf("interrupt\n%% ");
}

//Parse command into (argc, * * argv)
int parse_para(char commandLine[]) {

    char *p2;
    p2 = strtok(commandLine, " ");
    while (p2 && argc < kMaxArgs-1)
    {
        printf("%s\n",p2);
        argv[argc++] = p2;
        p2 = strtok(0, " ");
    }
    argv[argc] = 0;
}

Take a break. See you in the next chapter