Compiler implementation journey — Chapter 1 overview of compiler

Time:2022-5-20

Compiler, close at hand but far away. When we write any non machine language code, we need the help of the compiler to turn the code into a state that can be run by the computer. However, we know little about such a highly used program. What is a compiler? What does the compiler do to our code? And how? If you also have these questions and want to go deep into the compiler, follow me on this journey of compiler implementation.

1. What is a compiler

In a broad sense, a compiler is a program that reads a language code and outputs B language code. As shown in the figure below:

+-------+
    A language code - > compiler - > b language code
                +-------+

Just by definition, a and B can be the same language. If the compiler has a function of “copy and paste”, that is to say, it is only a function of “copy and paste”. But obviously, such a compiler is meaningless. In practice, the input of the compiler is generally high-level language code, such as C language, python language, etc., while the output of the compiler is generally low-level language code, such as assembly language, various bytecodes, etc. The assembly language code continues to be compiled through the assembly language compiler, and finally produces the machine language for computer execution; The bytecode can be executed by a virtual machine capable of executing the bytecode. In this way, the process from writing to executing a program is completed.

2. Structure of compiler

The interior of the compiler is not a whole, but multiple components work together to complete the compilation function. These components can be divided into two parts: compiler front end and compiler back end. As shown in the figure below:

+----------+               +----------+
    A language code - > compiler front end - > intermediate code - > compiler back end - > b language code
                +----------+               +----------+

Since the high-level language code we wrote is not the preferred form of the compiler, the compiler reads, checks and reorganizes the source code through the front end of the compiler to make it equivalent to the form preferred by the compiler, that is, intermediate code; Generally speaking, syntax errors are also checked by the front end of the compiler. Next, the compiler back-end takes the intermediate code for further inspection and optimization, and finally generates the object code.

In fact, the front and back ends of the compiler can be further subdivided into multiple components, which will be described one by one in our next journey.

3. What are we going to achieve

At the end of this journey, we will implement a compiler called CMM (i.e. C minus minus) language. The output of this compiler will be an instruction file composed of a set of instructions in our own instruction set. Therefore, we will also implement a set of virtual machine programs to run the instruction files output by the compiler.

CMM language is a language obtained by reducing the syntax of C language. Its main features are as follows:

  • There is only one type: int
  • Support assignment, four arithmetic operations and comparison operations
  • Support if and while statements
  • Support function
  • Support array
  • Distinguish between global scope and local scope

Next, let’s go deep into the front end of the compiler. See the next chapter: overview of compiler front end.

Recommended Today

Open platform: brief introduction

Open API open platformThe core of the open platform is a gateway connecting institutions and banks and forwarding API. The company has many own interface services, such as credit card, debit card and direct bank, such as account opening, deduction and transfer. These APIs are loose. If there are customer institutions to connect, a lot […]