In the previous journey, we have implemented the whole compiler front-end. We also know that the goal of the front end is to transform the source code into an abstract syntax tree for use by the back end. From this chapter, we will go to the back-end world to explore. Now, let’s take a look at what components the compiler back-end is composed of and what they are doing.
1. Structure composition of compiler backend
Unlike the compiler front end, the compiler back end is not a strict pipeline structure. I think it is more appropriate to use the description of “a serves B”. The back-end of CMM compiler mainly includes two components: semantic analyzer and code generator; In addition, because the virtual machine suite is closely related to the back end, we will also show it here. Please see the following figure:
+-----------+ Abstract syntax tree - > semantic analyzer - > symbol table | +-----------+ | | | | v | +----------+ +-------+ +------------------->| code generator - > low level instruction - > virtual machine| +----------+ +-------+
2. What is a semantic analyzer
To answer this question, the first thing to discuss is: what is semantics? Semantics, to put it bluntly, is “the meaning of a sentence”. In daily life, we often hear the description of “I can understand every word, but I don’t know what this is talking about”. This is caused by semantic problems. The same is true in programming languages. Some codes may completely conform to the syntax, but they are still wrong. For example, adding two types that cannot be added is a semantic error, because in the view of the parser, as long as the code meets “a + B”, rather than “+ a B”, “a B +” or other wrong writing, it conforms to the syntax. In other words, the ability of the parser to check the code can not meet all the requirements of the programming language, and the missing part needs to be made up by the semantic analyzer.
Semantic analyzer is not only a component for error checking, but also responsible for another very important work: generating symbol table. What is a symbol table? Symbol table is a table that records anything you want to record in the abstract syntax tree. It mainly includes variable name, function name, array size, etc. to serve the code generator.
We will further tell the story of semantic analyzer and symbol table in the relevant chapters of semantic analyzer.
3. What is a code generator
Code generator, as the name suggests: the component that generates code. The code generator holds the abstract syntax tree generated by the front end in one hand and the symbol table generated by the semantic analyzer in the other hand. The backpack also contains items such as syntax definition and instruction set definition, and finally generates low-level instructions that can be executed by the virtual machine. The code generator is the most complex part of the whole compiler and needs to consider the most aspects. We will also go through a long journey in the relevant chapters of the code generator.
4. What is a virtual machine
The code generator generates a variety of codes: for some compiler implementations, the code generator directly generates assembly language code; For another kind of compiler, including CMM compiler, the code generator generates low-level instructions designed by the compiler author, which need a special virtual machine to execute. A computer executes machine instructions, while a virtual machine executes another set of instructions similar to machine instructions, hence the name “virtual machine”. For a virtual machine, it imitates the hardware structure of the computer, and also has and manages “physical devices” such as registers and memory. In the real compiler design, virtual machine is a highly complex component, but in the compiler implementation of CMM, we designed a set of extremely simplified instruction set and a very simple virtual machine.
We will further tell the story of instruction set and virtual machine in the relevant chapters of virtual machine.
5. Reading suggestions for readers
Different from the front-end of the compiler, the components of the back-end of the compiler are interrelated to achieve the goal together. Therefore, the reading of each chapter at the back end of the compiler is not strictly sequential, but more like a “parallel reading”. If you can’t understand why a certain place needs to be implemented in this way, or why something is needed, look at the relevant chapters of other components. Because the code generator is the core of the back-end of the compiler, the author of this paper describes this part at the end for the sake of logical order.
Next, let’s take a look at how the semantic analyzer is implemented and what the symbol table looks like. Please see the next chapter: implementing semantic analyzer.