# The secret of CPU executing program is hidden in these 15 pictures

Time：2020-10-26

## preface

So much code, you know`a = 1 + 2`How is this code executed by the CPU?

Software used so much, do you know the difference between 32-bit and 64 bit software? Can a 32-bit operating system run on a 64 bit computer? Can a 64 bit operating system run on a 32-bit computer? If not, why not?

After watching so many CPUs, we all know that CPUs are usually divided into 32-bit CPU and 64 bit CPU. Do you know the advantages of 64 bit CPU over 32-bit CPU? Is the computing performance of 64 bit CPU higher than that of 32-bit CPU?

Don’t know and don’t panic, and then step by step, layer by layer, to solve these problems.

## text

### How Turing machine works

To understand the principle of program execution, we can start with the Turing machine. The basic idea of Turing is to use a machine to simulate the process of people’s mathematical operations with paper and pen. Moreover, it also defines what parts of a computer consist of and how the program is executed.

What does captain Turing look like? You can see the actual appearance of Turing machine from the following figure:

The basic components of Turing machine are as follows:

• There is a “paper tape”. The paper tape is composed of a continuous grid. Each grid can write characters. The paper tape is like memory, and the characters in the grid on the paper tape are like data or programs in memory;
• There is a “read-write head”. The read-write head can read the characters in any grid on the paper tape, and can also write the characters into the lattice of the paper tape;
• There are some components on the read-write head, such as storage unit, control unit and calculation unit
1. The storage unit is used to store data;
2. The control unit is used to identify whether characters are data or instruction, and the flow of control program, etc;
3. The operation unit is used to execute operation instructions;

After knowing the composition of Turing machine, we use simple mathematical operation`1 + 2`As an example, let’s see how it executes this line of code.

• First, the three characters “1, 2 and +” are written into the three squares on the paper tape with the read-write head, and then the read-write head stops on the corresponding grid of 1 character;

• Then, the read-write head reads 1 into the storage device, which is called the Turing machine state;

• Then the read-write head moves a grid to the right and reads 2 into the Turing machine state in the same way. Now, there are two consecutive numbers, 1 and 2, in the Turing machine state;

• If the read-write head moves to the right one more space, it will encounter the + sign. After the read-write head reads the + sign, it will transmit the + sign to the “control unit”. The control unit finds that it is a + sign instead of a number, so it is not stored in the status because`+`The sign is an operator instruction, which adds the current state, and then informs the “operation unit” to work. When the operation unit receives the notification of the value in the state to be added, it will read in and calculate the 1 and 2 in the state, and then store the calculation result 3 in the state;

• Finally, the operation unit returns the result to the control unit, and the control unit transmits the result to the read-write head, and the read-write head moves to the right to write the result 3 into the grid of the paper tape;

Through the Turing machine above`1 + 2`It can be found that the main function of Turing machine is to read the contents of the paper tape grid, and then give it to the control unit to identify whether the character is a number or an operator instruction. If it is a number, it will be stored in the Turing machine state. If it is an operator, it will inform the operator unit to read the value in the state for calculation, and the calculation result will finally be returned to the read-write head, and the read-write head will write the result Into the grid of the tape.

In fact, the Turing machine, which seems to be a very simple way to work, is basically the same as our computers today. Next, let’s take a look at the composition of today’s computers and how they work.

### von Neumann model

In 1945, von Neumann and other computer scientists put forward a report on the concrete implementation of the computer, which followed the design of Turing machine, and also proposed to construct a computer with electronic components, and agreed to use binary for calculation and storage. The basic structure of the computer was defined as five parts, namelyCPU, memory, input device, output device, bus

These five parts are also known as the von Neumann model. Let’s take a look at the specific functions of these five parts.

#### Memory

Our programs and data are stored in memory, and the storage area is linear.

The unit of data storage is aBinary bit（bit, i.e. 0 or 1. The smallest unit of storage isByte（byte, 1 byte is equal to 8 bits.

The address of memory is numbered from 0, and then increased automatically. The last address is the total number of bytes of memory – 1. This structure is like an array in our program, so the speed of reading and writing any data in memory is the same.

#### a central processor

CPU is also known as CPU. The main difference between 32-bit CPU and 64 bit CPU is how many bytes of data can be calculated at a time

• The 32-bit CPU can calculate 4 bytes at a time;
• 64 bit CPU can calculate 8 bytes at a time;

The 32-bit and 64 bit bits here are commonly referred to as the bit width of the CPU.

The reason why the CPU is designed in this way is to calculate larger values. If it is an 8-bit CPU, it can only calculate one byte at a time`0~255`So you can’t do the calculation at once`10000 * 500`Therefore, in order to calculate a large number of operations at one time, the CPU needs to support multiple bytes to calculate together. Therefore, the larger the CPU bit width, the larger the value that can be calculated. For example, the maximum integer that a 32-bit CPU can calculate is`4294967295`

There are also some components in CPU, such as register, control unit and logic operation unit. Among them, the control unit is responsible for controlling the CPU work, the logic operation unit is responsible for calculation, and the registers can be divided into many types, and the functions of each register are different.

The main function of registers in CPU is to store the data during calculation. You may wonder why you need registers when you have memory? The reason is very simple, because the memory is too far away from the CPU, and the registers are in the CPU, next to the control unit and logical operation unit, so the speed of natural calculation will be very fast.

Common register types:

• General register, which is used to store the data that needs to be calculated, such as two data that need to be added and calculated.
• Program counter, which is used to store the “memory address” where the CPU will execute the next instruction. Note that it does not store the next instruction to be executed. At this time, the instruction is still in memory, and the program counter only stores the address of the next instruction.
• Instruction registerIs used to store the instruction pointed to by the program counter, that is, the instruction itself. The instruction is stored here before the instruction is executed.

#### Bus

Bus is used for communication between CPU, memory and other devices

• Address bus, which is used to specify the memory address to be operated by CPU;
• data bus, which is used to read and write data in memory;
• Control bus, which is used to send and receive signals, such as interrupt, device reset and so on. When the CPU receives the signal, it will respond naturally. At this time, it also needs to control the bus;

When the CPU wants to read and write memory data, it generally needs to pass through two buses

• First, the address of memory should be specified by “address bus”;
• Then the data is transmitted through the “data bus”;

#### Input and output equipment

The input device inputs data to the computer, and the computer outputs the data to the output device after calculation. During this period, if the input device is a keyboard, it needs to interact with the CPU when pressing the key, and then the control bus is needed.

### Line bit width and CPU bit width

How is the data transmitted through the address bus? In fact, through the operating voltage, low voltage means 0, high voltage means 1.

If a high-low signal is constructed, it is actually 101 binary data, and decimal represents 5. If there is only one line, it means that only 1 bit of data can be transmitted each time, that is, 0 or 1. Then it takes three times to transmit 101 data, which is very inefficient.

Such a bit by bit transmission is called serial, and the next bit must wait for the previous bit to complete transmission. Of course, if you want to transfer more data at a time, you can add lines, and then the data can be transmitted in parallel.

In order to avoid inefficient serial transmission, the bit width of the line should be able to access all memory addresses at one time. If the CPU wants to operate memory address, it needs address bus. If there is only one address bus, it can only indicate “0 or 1” each time. Therefore, the CPU can only operate two memory addresses at a time. If the CPU wants to operate 4G memory, it needs 32 address buses`2 ^ 32 = 4G`

After knowing the meaning of line bit width, let’s take a look at CPU bit width.

The bit width of CPU should not be less than the line bit width. For example, if the 32-bit CPU controls the 40 bit wide address bus and data bus, it will be very complicated and troublesome to work. Therefore, the 32-bit CPU is best to match the 32-bit wide line, because the 32-bit CPU can only operate the 32-bit wide address bus and data bus at one time.

If you use 32-bit CPU to add two 64 bit numbers, you need to divide the two 64 bit numbers into two low-order 32-bit numbers and two high-order 32-bit numbers for calculation. First, add two low-order 32-bit numbers to calculate the carry, then add two high-order 32-bit numbers, and finally add carry. You can find the 32-bit CPU It is not possible to sum two 64 digit numbers at once.

For 64 bit CPU, the result of adding two 64 bit numbers can be calculated at one time, because 64 bit CPU can read 64 bit number at a time, and the logic operation unit of 64 bit CPU also supports the calculation of 64 bit number.

However, it does not mean that the performance of 64 bit CPU is much higher than that of 32-bit CPU, and few applications need to calculate more than 32-bit numberIf the amount of calculation does not exceed 32 bits, there is no difference between 32-bit CPU and 64 bit CPU. Only when the calculation amount exceeds 32-bit, the advantage of 64 bit can be reflected

In addition, 32-bit CPU can only operate 4GB of memory, even if you install 8 GB memory module, it is useless. The 64 bit CPU has a large addressing range, and the theoretical maximum addressing space is`2^64`

### The basic process of program execution

Now that we know how the program works on a Turing machine, let’s take a look at how the program executes on the von Neumann model.

The program is actually an instruction, so the running process of the program is to execute each instruction step by step. The CPU is responsible for executing the instruction.

The CPU executes the program as follows:

• In the first step, the CPU reads the value of the “program counter”, which is the memory address of the instruction. Then, the CPU’s “control unit” operates the “address bus” to specify the memory address to be accessed, and then informs the memory device to prepare the data. After the data is ready, the instruction data is transmitted to the CPU through the data bus After receiving the data from memory, the instruction data is stored in the instruction register.
• In the second step, the CPU analyzes the instructions in the instruction register to determine the type and parameters of the instructions. If the instruction is of the calculation type, it is handed over to the logical operation unit for operation; if it is a storage type instruction, it is handed over to the control unit for execution;
• Step 3: after the CPU finishes executing the instruction, the value of the “program counter” increases automatically, indicating that it points to the next instruction. The size of the self increment is determined by the bit width of the CPU. For example, for a 32-bit CPU, the instruction is 4 bytes and needs 4 memory addresses to store. Therefore, the value of the “program counter” will automatically increase by 4;

To sum up, when a program is executed, the CPU will read the instruction to be executed from the memory to the instruction register according to the memory address in the program counter, and then read the next instruction in sequence according to the instruction length increment.

The CPU reads the instruction from the program counter, to the execution, and then to the next instruction. This process will continue to cycle until the end of program execution. This continuous cycle process is calledInstruction cycle of CPU

### A = 1 + 2 execute the specific process

After knowing the basic program execution process, next use`a = 1 + 2`As an example, further analysis of the program in the von Neumann model implementation process.

CPU is not known`a = 1 + 2`This string, these strings are only convenient for us programmers to understand, to run this program, we also need to translate the whole program intoassembly languageThis process is called compiling into assembly code.

For assembly code, we also need to translate it into machine code by assembler. These machine codes are machine languages composed of 0 and 1, and the machine codes are one by oneComputer instructionsThis is what the CPU can really understand.

Let’s take a look`a = 1 + 2`In the 32-bit CPU execution process.

During the compilation process, the compiler finds that 1 and 2 are data by analyzing the code. Therefore, when the program is running, there will be a special memory area to store the data. This area is called “data segment”. As shown in the figure below, the region location of data 1 and 2 is as follows:

• Data 1 is stored in the position of 0x100;
• Data 2 is stored in the position of 0x104;

Note that data and instructions are stored in separate areas. The place where the instructions are stored is called the “body segment.”.

The compiler will`a = 1 + 2`Translated into 4 instructions, stored in the body segment. As shown in the figure, these four instructions are stored in the area of 0x200 ~ 0x20c

• The content of 0x200 is`load`The instruction loads data 1 from the address of 0x100 into the register`R0`
• The content of 0x204 is`load`The instruction loads data 2 in the address of 0x104 into the register`R1`
• The content of 0x208 is`add`Instruction will register`R0`and`R1`And the result is stored in the register`R2`
• The content of 0x20c is`store`Instruction will register`R2`The data in the data segment is saved back to the address of 0x108 in the data segment, which is also called the variable`a`Address in memory;

After compiling, when executing the program, the program counter will be set to the address of 0x200, and then execute the four instructions in turn.

In the above example, since it is executed on a 32-bit CPU, an instruction takes up 32 bits, so you will find that each instruction is 4 bytes apart.

The size of the data depends on the type of variable you specify in the program, such as`int`Type data takes up 4 bytes,`char`Type data takes up 1 byte.

#### instructions

In the above example, I write simple assembly code for the contents of instructions in the figure, so as to facilitate the understanding of specific contents of instructions. In fact, the contents of instructions are machine codes of binary numbers, and each instruction has its corresponding machine code. The CPU can know the contents of instructions by parsing the machine code.

Different CPUs have different instruction sets, that is, corresponding to different assembly languages and different machine codes. Next, select the simplest MIPS index set to see how the machine code is generated, so as to understand the specific meaning of binary machine code.

The instruction of MIPs is a 32-bit integer, and the upper 6 bits represent the opcode, which indicates what kind of instruction this instruction is. The contents of the remaining 26 bits of different instruction types are different. There are mainly three types R, I and J.

Let’s take a look at the meaning of these three types

• R instructionUsed in arithmetic and logic operations in which the address of a register that reads and writes data. If it is a logical shift operation, there is a “displacement” of the displacement operation, and the last “function code” is to extend the operation code to represent the corresponding specific instruction when the previous operation code is insufficient;
• I instruction, used in data transmission, conditional branching, etc. In this type of instruction, there is no displacement and opcode, and there is no third register. Instead, these three parts are directly combined into an address value or a constant;
• J instruction, used in jump, the 26 bits beyond the top 6 bits are all the addresses after the jump;

Next, let’s take the command from the previous example: “the`add`Instruction will register`R0`and`R1`And put the results into the`R3`“, translated into machine code.

• The operation code in MIPS instruction corresponding to add is`000000`, and the last function code is`100000`These values are fixed. You can find out by looking up the MIPS instruction manual;
• RS represents the number of the first register R0, i.e`00000`
• RT represents the number of the second register R1, i.e`00001`
• Rd represents the number of the temporary register R2 of the target, i.e`00010`
• Because it’s not a displacement operation, so the displacement is`00000`

Put these numbers together is a 32-bit MIPS addition instruction, then the machine code in hexadecimal is`0x00011020`

When compiling a program, the compiler constructs instructions, which is called the encoding of instructions. When CPU executes program, it resolves instruction, which is called instruction decoding.

Most modern CPUs use pipelining to execute instructions. The so-called pipeline is to divide a task into several small tasks. Therefore, an instruction is usually divided into four stages, which is called level 4 pipeline, as shown in the following figure:

The specific meanings of the four stages are as follows

1. The CPU reads the instructions corresponding to the memory address through the program counterFetch
2. The CPU decodes instructions. This part is calledDecode (instruction decoding)
3. The CPU executes instructions, which is calledExecution (execution instruction)
4. The CPU stores the calculation result back to the register or stores the value of the register into memory. This part is calledStore (data write back)

The above four stages are calledInstruction cycle（Instrution CycleThe work of CPU is cycle by cycle, cycle after cycle.

In fact, different stages are actually done by different components in the computer:

• In the instruction fetching stage, our instructions are stored in thestorageThe instructions and instructions in the program are actually taken out of the registercontrollerOperational;
• The decoding process of instructions is also determined bycontrollerCarried out;
• The process of instruction execution, whether it is arithmetic operation, logic operation, data transmission and conditional branch operation, is controlled byArithmetic logic unitOperational, that is, byArithmetic unitProcessed. But if it is a simple unconditional address jump, it is directly in thecontrollerIt’s done inside. You don’t need an arithmetic unit.

#### Type of instruction

Instructions can be divided into five categories from the perspective of function

• Instruction of data transfer typeFor example`store/load`Is an instruction for data transfer between registers and memory,`mov`Is an instruction that moves data from one memory address to another;
• Instruction of operation typeSuch as addition, subtraction, multiplication and division, bit operation, comparison size, etc. they can only process data in two registers at most;
• Jump type instructionBy modifying the value of the program counter to achieve jump execution instruction process, such as programming common`if-else``swtich-case`, function call, etc.
• Instruction of signal type, such as an instruction with an interrupt`trap`
• Idle type instructionsSuch as instructions`nop`After execution, the CPU will idle for one cycle;

#### Speed of instruction execution

The hardware parameters of CPU will be available`GHz`This parameter, such as a 1 GHz CPU, means that the clock frequency is 1g, which means 1g times of pulse signal will be generated in one second. Each time the high-low level conversion of pulse signal is a cycle, which is called clock cycle.

For the CPU, in a clock cycle, the CPU can only complete the most basic action, the higher the clock frequency, the shorter the clock cycle, the faster the working speed.

Can an instruction be executed in a clock cycle? The answer is not necessarily, most instructions can not be completed in one clock cycle, usually need several clock cycles. Different instructions need different clock cycles. Addition and multiplication both correspond to one CPU instruction, but multiplication requires more clock cycles than addition.

How to make the program run faster?

When the program is executed, the less CPU time consumed indicates that the program is fast. For the CPU execution time of the program, we can decompose it intoCPU clock cycles（CPU Cycles）And clock cycle time（Clock Cycle Time）Product of

The clock cycle time is the main frequency of the CPU mentioned earlier. The higher the main frequency, the faster the CPU will work. For example, the CPU of my computer is 2.4 GHz quad core Intel Core i5. Here, 2.4 GHz is the main frequency of the computer, and the clock cycle time is 1 / 2.4G.

If you want the CPU to run faster, naturally shorten the clock cycle time, that is to improve the main frequency of the CPU. But today is not that day, Moore’s law has long been invalid, and today’s CPU main frequency has been difficult to double the effect.

In addition, changing to a better CPU is something that our software engineers can’t control. We should focus on another multiplication factor – CPU clock cycles. If we can reduce the number of CPU clock cycles required by the program, we can also improve the performance of the program.

For CPU clock cycles, we can further decompose it into the following:Number of instructions x average number of clock cycles per instruction（Cycles Per Instruction, abbreviated as`CPI`Thus, the formula of CPU execution time of the program can be changed as follows:

Therefore, if you want the program to run faster, you can optimize the three

• Number of instructionsIndicates how many instructions are required to execute the program, and which instructions. This level is basically optimized by the compiler. After all, for the same code, in different compilers, the compiled computer instructions will have different representations.
• Average number of clock cycles per instruction CPI, which represents the number of clock cycles required by an instruction. Most modern CPUs use pipelining technology to minimize the number of CPU clock cycles required by an instruction;
• Clock cycle timeRepresents the main frequency of the computer, depending on the computer hardware. Some CPUs support overclocking technology. Turning on overclocking means that the internal clock of the CPU is adjusted fast, so the CPU speed becomes faster, but there is also a cost. The faster the CPU runs, the greater the heat dissipation pressure will be, and the CPU will easily collapse.

Many manufacturers in order to run points and run points, the basic are in these three aspects of Oh, especially the overclocking.

### summary

Finally, let’s answer the first question.

What are the advantages of 64 bit CPU over 32-bit CPU? Is the computing performance of 64 bit CPU higher than that of 32-bit CPU?

Compared with 32-bit CPU, 64 bit CPU has two advantages

• 64 bit CPU can calculate more than 32-bit numbers at a time. If 32-bit CPU wants to calculate more than 32-bit numbers, the efficiency is not so high. However, most applications rarely calculate such large numbersOnly when calculating large numbers, the advantage of 64 bit CPU can be reflected. Otherwise, the computing performance of 64 bit CPU is similar to that of 32-bit CPU
• 64 bit CPU canAddressing larger memory spaceThe maximum addressing address of a 32-bit CPU is 4G. Even if you add 8g of memory, you can only address to 4G. However, the maximum addressing address of 64 bit CPU is`2^64`, which is much higher than the maximum address of 32-bit CPU`2^32`

Do you know the difference between 32-bit and 64 bit software? Can a 32-bit operating system run on a 64 bit computer? Can a 64 bit operating system run on a 32-bit computer? If not, why not?

64 bit and 32-bit software, actually representing whether the instruction is 64 bit or 32-bit:

• If 32-bit instructions are executed on 64 bit machines, a set of compatible mechanism is needed to run compatible. howeverIf the 64 bit instruction is executed on a 32-bit machine, it is more difficult because the 32-bit register can not store 64 bit instructions
• The operating system is actually a kind of program. We can also see that the operating system is divided into 32-bit operating system and 64 bit operating system. Its representative meaning is how many bits of instructions are in the operating system. For example, 64 bit operating system, instructions are 64 bits, so they can not be installed on 32-bit machines.

In short, 64 bits and 32 bits of hardware refer to the bit width of CPU, while 64 bits and 32 bits of software refer to the bit width of instructions.

## Garrulous

Hello everyone, I’m Xiaolin, a tool person for you to illustrate. If you think the article is helpful to you, please share it with your friends and give Xiao Lin a “reading”. This is very important for Xiaolin. Thank you. See you next time!