Use go to see the system calls in the process

Time:2021-9-17

system call

In our daily coding, we usually write user layer code. The kernel seems transparent to us and has not been paid attention to. But the program is dealing with the kernel all the time. For example, when reading a file or writing a file, it will pass through the kernel. The user program will not directly deal with disk and other hardware, so it cannot directly operate on files, so it needs the “gasket” of the kernel. Since the user program wants to access the kernel, it is inevitable to execute system calls.

Use go to see the system calls in the process

When the system call is to be executed, the CPU will switch to the kernel state and execute the system call function.

Because the kernel implements many system call functions, the kernel needs to provide an identification for each function to represent the kernel function to be called. The system call number is also different in different kernel architectures. (exceptions and interrupts will also switch the CPU to the kernel state, which will not be described.)

Generally, the execution flow of a system call is as follows

  • The user program calls the C library or directly calls the system through its own assembly instructions. The variables to be passed and the system call number are saved in the CPU register
  • The process enters the kernel state, identifies the system function and executes the system call through the system call number saved in the register
  • At the end of the system call, the result, return value and parameters are saved in the register, from which the user program obtains the result

In the early stage, system calls were triggered by soft interrupts, such as 32-bit x86. The interrupt number of system calls is 128, so the soft interrupt will be triggered by int 0x80 instruction to enter the system call and enter the kernel state. Read the value stored in the register and find the corresponding system call in the system call table and execute it. Due to the high overhead of triggering system calls in the form of soft interrupts, Therefore, it gradually withdrew from the field of vision and instead used the assembly instruction sysenter or syscall to trigger the system call. Compared with the soft interrupt trigger method, it reduced a series of operations such as querying the interrupt vector table and improved the performance.

We can obtain the system call of a process through the strace command. The common usage is as follows

$   Strace - P < PID > # view the system call of a process
$   Strace < common > # view the system call of a common instruction or process

For example, write a very simple print function debugging, (this program will be used as the tracked program later)

#include <unistd.h>
#include <stdio.h>
int main(){
   for(;;){
       printf("pid=%dn", getpid());
       sleep(2);
  }
   return 0;
}
$ gcc -o print print.c

Viewing the process through strace, you can see the system call

[email protected]:/app/gowork/stramgrpc/c$ strace ./print 
execve("./print", ["./print"], [/* 51 vars */]) = 0
.......
getpid() = 23419
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL) = 0x55e3343e2000
brk(0x55e334403000) = 0x55e334403000
write(1, "pid=23419n", 10pid=23419
) = 10
nanosleep({tv_sec=2, tv_nsec=0}, 0x7ffd2a6d37f0) = 0
getpid() = 23419
write(1, "pid=23419n", 10pid=23419
) = 10
nanosleep({tv_sec=2, tv_nsec=0}, ^Cstrace: Process 23419 detached
<detached ...>

The strace command is implemented in C language and is based on ptrace system call. Due to different server systems, the system call mechanism will also change. Therefore, there are a lot of preprocessor code in the strace source code, which can be read easilyIt’s very hard and thankless

Since golang encapsulates the package of system calls, you can directly execute system calls through assembly, or you can use golang to implement a simple ptrace tool to monitor the system calls of processes. We mainly focus on X86 here_ 64 Linux syscall.

ptrace

To implement a ptrace tool, first understand ptrace and take a look at the definition rules of the C standard library.

long ptrace(int request, pid_t pid, void *addr, void *data);

Ptrace needs to pass in four parameters:

  • pidThe PID used to pass in the target process, that is, the process to be followed;
  • addranddataIt is used to pass in memory address and additional address. It usually reads the passed in parameters after the system call to obtain the system call results, which will vary according to different operations.
  • requestIt is used to select a symbol flag. The kernel will decide which kernel function to use for execution according to this flag. Next, we will introduce several symbol flags to be used.

Optional value of request

  • PTRACE_ATTACHIssue a request, connect to a process and start tracking. On the contrary,PTRACE_DETACH disconnects from the process and ends the trace. After calling the instruction, the tracked process will send a signal to the tracker process. The tracker process needs to use waitpid to obtain the signal and track subsequent system calls.
  • PTRACE_SYSCALLIssue the command of system call tracking. When this option is used, the tracked process will stop before or after entering the system call. At this time, the tracker process can use waitpid to receive the notification sent by the tracked person during system call, so as to analyze the address space and system call related information at this time;
  • PTRACE_GETREGSandPTRACE_SETREGSUsed to set and read CPU registers in x86_ On 64 Linux, the system call number is stored in orig_ Rax register. Other parameters are in RDI, RSI, RDX and other registers. When returning, the return value is stored in rax register;
  • PTRACE_TRACEME: this process is allowed to be tracked by its parent process (in the form of strace + command).
  • … there are many other ways to use it. Interested students can read in-depth understanding of Linux kernel and architecture 13.3.3 tracking system calls

Implementation of go

Go provides syscall package, which can directly call assembly code for system call. This case is based on the syscall package of go1.13.5.

To realize process tracking, you need two processes, one is the Tracee and the other is the tracer, which is used to print out the system calls that occur in the Tracee process. We use go to implement a tracer, and Tracee uses the above C code.

thinking
  • Start a process as the tracked process Tracee.

Implementation principle of tracer

  • First use ptrace_ Attach to track the Tracee process, and then use the wait system call to obtain the signal sent by the tracked person. At this point, the tracer process and Tracee process have established a connection in the kernel.
//The corresponding library functions in go are as follows
func PtraceAttach(pid int) (err error) {...}
func Wait4(pid int, wstatus *WaitStatus, options int, rusage *Rusage) (wpid int, err error) {...}
  • Next, the tracer process reads the system call of Tracee through an infinite loop

Read process

  • First through ptrace_ Syscall to wait for the tracked process to enter the system call, and wait for the tracked process to enter the desired state through wait. At this time, the tracked process has not fallen into the system call, which is equivalent to pausing at the entry of the system call.
//The corresponding library functions in go are as follows
func PtraceSyscall(pid int, signal int) (err error) {...}
func Wait4(pid int, wstatus *WaitStatus, options int, rusage *Rusage) (wpid int, err error) {...}
  • Next, via ptrace_ Getregs obtains register parameters, including system call number and other parameters.
func PtraceGetRegs(pid int, regsout *PtraceRegs) (err error) {...}
  • Next, use another ptrace_ Syscall and wait obtain the system call and wait for the system call to return. At this time, the Tracee process falls into the kernel state and executes the system call. After the system call returns, the tracer process can obtain the return result;
  • Using ptrace_ Getregs obtains the returned result through the register parameter
  • Enter the next cycle
  • If an exception occurs, use ptrace_ Detach disconnect trace status
realization
type syscallTask struct {
  ID uint64
  Name string
}
//x86_ System call name corresponding to system call number on 64
var sTask = []syscallTask{
  {0, "read"},
  {1, "write"},
  {2, "open"},
  {3, "close"},
  {4, "stat"},
  ... // too many omitted
}
func main() {
  //Register status data
  var regs syscall.PtraceRegs
  //Wait state of wait
  var wsstatus syscall.WaitStatus
  //Tracked process PID
  pid := 13070
  fmt.Println(pid)
  var err error
    //For ptrace_ Attach encapsulation, using attach to connect and track the process
  err = syscall.PtraceAttach(pid)
  if err != nil{
    fmt.Println(err)
    return
  }
  syscall.Wait4(pid,&wsstatus,0,nil)
    //If you exit abnormally, disconnect
  defer func() {
        //For ptrace_ Detach package, disconnect from the tracker
    err = syscall.PtraceDetach(pid)
    if err != nil{
      fmt.Println("PtraceDetach err :",err)
      return
    }
    syscall.Wait4(pid,&wsstatus,0,nil)
  }()
  //Cyclic acquisition
  for {
    fmt.Println("")
    //Wait for Tracee to enter the system call
    syscall.PtraceSyscall(pid,0)
        //Use the wait system call and pass in the waiting status pointer
    _, err := syscall.Wait4(pid, &wsstatus, 0, nil)
    if err != nil{
      fmt.Println("line 501",err)
      return
    }
        //If Tracee exits, print the exit code of the process
    if wsstatus.Exited(){
      fmt.Println("------exit status",wsstatus.ExitStatus())
            return
    }
    //Judge whether Tracee receives an interrupt signal according to wsstatus, such as Ctrl + C of the keyboard, etc
    //If so, the signal is transmitted to Tracee
    if wsstatus.StopSignal().String() == "interrupt"{
      syscall.PtraceSyscall(pid, int(wsstatus.StopSignal()))
      fmt.Println("send interrupt sig to pid ")
      //Print Tracee exit code
      fmt.Println("------exit status",wsstatus.ExitStatus())
      return
    }
    //For ptrace_ Getregs package, get the data of the register and save it to regs
    err = syscall.PtraceGetRegs(pid, &regs)
    if err != nil{
      fmt.Println("PtraceGetRegs err :",err.Error())
      return
    }
    //Print system call name
    fmt.Println("in syscall :",sTask[regs.Orig_rax].Name)
    //Group 2 ptrace_ Syscall and waitpid, waiting for the Tracee system call to return
    //Used to obtain the parameters returned by the system call
    syscall.PtraceSyscall(pid, 0)
    _ ,err = syscall.Wait4(pid,&wsstatus,0,nil)
    if err != nil{
      fmt.Println("line 518",err)
      return
    }
    //If Tracee exits, print the exit code of the process
    if wsstatus.Exited(){
      fmt.Println("------exit status",wsstatus.ExitStatus())
            return
    }
    //As above, judge whether the process is interrupted by a signal
    if wsstatus.StopSignal().String() == "interrupt"{
      syscall.PtraceSyscall(pid, int(wsstatus.StopSignal()))
      fmt.Println("send interrupt sig to pid ")
      fmt.Println("------exit status",wsstatus.ExitStatus())
    }
    //Gets the status of the returned register
    err = syscall.PtraceGetRegs(pid, &regs)
    if err != nil{
      fmt.Println("PtraceGetRegs err :",err.Error())
      return
    }
        //Return value parameters stored in the print register
    fmt.Println("syscall return:" ,regs.Rax)
  }

Use this case to test the above demo

$ ./print
$ go build -o gostrace main.go
$ sudo ./gostrace

Output results:

[email protected]:/app/gowork/gostraces# sudo ./gostrace 
20533
in syscall : restart_syscall
syscall return: 0
in syscall : getpid
syscall return: 20533
in syscall : write
syscall return: 10
in syscall : nanosleep
syscall return: 0
in syscall : getpid
syscall return:   twenty thousand five hundred and thirty-three  # Here, Ctrl + C interrupts the above print process
send interrupt sig to pid 
------exit status -1
PtraceDetach err : no such process

Compare the system calls obtained through strace

[email protected]:/app/GoWork/stramgrpc$ sudo strace -p 27579
strace: Process 27579 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
getpid()                                = 27579
write(1, "pid=27579\n", 10)             = 10
nanosleep({tv_sec=2, tv_nsec=0}, 0x7ffeda284d00) = 0
getpid()                                = 27579
write(1, "pid=27579\n", 10)             = 10
nanosleep({tv_sec=2, tv_nsec=0}, {tv_sec=1, tv_nsec=173442353}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ killed by SIGINT +++

It can be seen that the expected function of the simplest version of strace has been realized. Compared with strace, there is one less system call parameter. The function of parameter transfer needs to read the data in the register for specific system calls. Interested students can consider implementing it by themselves.

Reference articles

  • In depth understanding of Linux kernel and architecture Chapter 13 system call
  • In depth understanding of Linux kernel Chapter 10 system call