The origin of everything Turing machine
Turing machine is mainly composed of data storage unit, control unit, operation unit and a read-write head for reading and writing external data.
Turing machine needs a paper tape. The paper tape is covered with a grid. Characters can be recorded on the grid. Characters can be divided into data characters and instruction characters; The paper tape passes through the Turing machine and moves forward continuously; The read-write head on the Turing machine reads the characters on the paper tape grid in turn, and distinguishes whether the read characters belong to data or instructions according to the control unit. When the data characters are read, the characters are stored in the storage unit. When the instruction characters are read, the operation unit will read out the data in the storage unit and perform corresponding operations, The result is written into the next grid of the paper tape through the read-write head.
The basic working mode of Turing machine is the same as that of today’s computers. Data and instructions are stored in memory (paper tape and storage unit), and the processor reads them and calculates the results. In the computer, CPU is used for instruction calculation and the memory for storing data. We often hear disk, SSD and memory. However, the CPU does not directly read and execute instructions from these memories, but adopts a hierarchical cache strategy.
We are familiar with disks. Data can still be saved after power failure, and the disk has large storage space. It usually has t capacity, but its data reading speed is very slow; Although the reading speed of memory is nearly 100 times faster than that of disk, it still belongs to “turtle speed” compared with the execution speed of CPU. In addition, the memory is on the motherboard, and the data is transmitted to the CPU through the circuit board. The time-consuming of data transmission relative to the execution speed of the CPU can not be ignored.
The smaller the size of the memory is, the limited its storage capacity will be; The faster the reading and writing speed, the higher the energy consumption and cost; Secondly, the farther the memory is from the CPU, the larger the data transmission. Therefore, at present, using a single memory can not make the data in the memory keep up with the processing speed of the CPU.
The scheme adopted by the computer is to classify the memory, and store the data with higher CPU use frequency in the memory (CACHE) with faster reading and writing speed and closer to the CPU; Store the data with low frequency in the memory with slow reading and writing speed, far away from CPU, large storage capacity and low cost. In this way, when the CPU reads data, it reads it directly from the cache, which does not exist in the cache, and then reads it from the memory farther away.
The feasibility of the hierarchical cache scheme lies in the locality principle of the computer. Imagine the code program we usually write. The most used for operation is the for loop, and then calculate, read and write several defined variables. Therefore, when the CPU executes a program, the read-write frequency of several data areas is relatively high. Therefore, the data in these “hot spots” can be cached, and it will be much faster the next time it is read. According to statistics, the memory cache hit rate can reach 95%, that is, only 5% of the data will penetrate into the memory.
Memory classification strategy
Generally, memory is divided into the following levels:
- Disk / SSD
Disk / SSD
SSD / disk is the kind of memory that is farthest from the CPU and has the slowest reading speed. It has the advantage of low cost and data is still available after power failure. SSD is what we often call solid-state disk. Its structure is similar to memory, and its reading and writing speed is 10-1000 times slower than memory; Disk reading speed is slower, about 100W times slower than memory. With the popularity of SSD, it has been slowly replaced.
The memory is inserted on the motherboard and is at a distance from the CPU. The CPU reads the data in the memory through the motherboard bus, which is slightly more expensive than the disk, but the reading speed is faster than the disk, and the speed is about 200-300 CPU cycles; In terms of capacity, the memory of personal computer is generally 8-16g, and the memory on server can reach several t.
CPU cycle: an instruction can be divided into several stages, such as fetching instructions and executing instructions. The time required to complete each stage becomes the CPU cycle.
CPU cache exists inside the CPU. CPU cache can be divided into L1 (Level 1 cache), L2 (Level 2 cache) and L3 (Level 3 cache). Each core of the CPU has its own L1 and L2 cache, and multiple cores of the same CPU share an L3 cache.
- Distance from CPU: L1 < L2 < L3
- Capacity: L1 (tens to hundreds of KB) < L2 (hundreds of KB to several MB) < L3 (several MB to several tens of MB)
- Read / write speed: L1 (2-4cpu cycle) > L2 (10-20cpu cycle) > L3 (20-60cpu cycle)
L1 cache is divided into instruction area and data area, which will be explained below
It should be noted that the minimum unit of each cache in the CPU cache is a memory block of memory, not a variable; There are many ways to map CPU cache and memory, similar to cache line number = memory page number mod cache total rows; In this way, first calculate the memory page number of the address according to the memory address, and then calculate the cache line number through the mapping relationship. If it exists in the cache, you can directly obtain the data. If it does not exist, you can obtain it in memory.
Register is the place where the CPU actually reads and writes instructions. It is the memory closest to the CPU, and the reading and writing speed is also the fastest. It can complete reading and writing in half a CPU cycle; The number of registers in a CPU is between tens and hundreds. Each register has a small capacity and can only store a certain byte (4-8 bytes) of data.
Most registers in 32-bit CPUs can store 4 bytes and most registers in 64 bit CPUs can store 8 bytes
Registers can be divided into several categories according to their different purposes. In order to facilitate the learning of the later instruction execution process, we first understand the following categories:
- General purpose register: used to store program parameter data.
- Instruction register: each instruction executed by the CPU will be read into the instruction register from memory, and then let the CPU read and execute.
- Instruction pointer register: it stores the memory address of the next instruction to be executed by the CPU. The CPU reads the instruction into the instruction register according to the instruction memory address in the instruction pointer register. The instruction pointer register also becomes an IP register.
Segment register: in order to access larger physical space, the CPU locates a physical memory address through the base address + offset. The segment register stores the base address information. CS is a segment register that stores the instruction address. It locates the address of the instruction in memory together with the IP register.
Suppose a register can store up to 4 bytes of data, 4 bytes = 4 * 8 = 32 bits, the value represents the range: 0 ~ (2 ^ 32) – 1, and the conversion unit is 4G, that is, the register can find the address in the range of 0-4g at most, but the memory capacity mentioned earlier can reach several T, so it can not represent the memory address in the whole range directly through one register. The addressing mode of “base address + offset address = physical address” can greatly expand the memory addressing capability. For example, a 32-bit base address shifted 32 bits to the left, plus a 32-bit offset address, can represent a 64 bit (16eib) memory address. It should be noted that the final addressing range of the computer is determined by the address bus described below.
Bus – the bridge between CPU and the outside world
According to the above memory classification, the data is first loaded from the disk into the memory, then read into the cache and registers inside the CPU, and the CPU reads the registers for processing. Among them, data reading and writing between CPU and CPU cache is completed in the CPU, and CPU reading and writing to memory is completed through the bus on the motherboard.
The bus can be regarded as a collection of multiple wires. It transmits data by controlling the voltage of the wires. The high voltage is 1 and the low voltage is 0.
According to different transmission information, the bus is divided into address bus, data bus and control bus
Imagine that the read instruction “read data to memory 3 location” contains several information:
- The memory location of the operation is 3 (address information)
- The operation command is read command (control information)
- Data transmission (data information)
Class 3 buses are respectively responsible for the transmission of corresponding information: the CPU transmits the memory address information to be operated to the memory through the address bus; Send memory read command through control bus; Memory transfers data from the data bus to the CPU.
Before we talk about the address bus, let’s talk about the division of memory addresses. The memory will be divided into several storage units. The storage units are numbered from zero. These numbers can be regarded as the address of the storage unit in the memory.
Each storage unit is composed of 8 bits, that is, one byte of data can be stored; Suppose a memory has 128 storage units, which can store 128 bytes.
The CPU specifies the storage unit through the address bus. The number of lines of the address bus determines the addressing range of the memory. For example, a CPU has 16 address buses and can find up to 16 memory units to the power of 2.
Suppose a 16 bit CPU has 20 address buses, how can a 16 bit CPU give a 20 bit address at one time?
In fact, the answer has been given earlier. The CPU will synthesize a 20 bit address through the method of “base address” + “offset address”.
CPU and memory or other devices transmit data through data bus. The width of data bus (number of buses) determines the data transmission speed between CPU and the outside world. 8 data buses can transmit one byte (8bit) of data at a time, and 16 data buses can transmit two bytes (16bit) at a time.
The CPU controls external devices through the control bus. How many control buses means how many kinds of CPU controls external devices. Therefore, the width of the control bus determines the control ability of the CPU to external devices.
After learning about various memory and buses, let’s take a look at how programs are loaded from disk to memory and then executed by the CPU.
The program we write needs to be translated into instructions recognized by the CPU by the compiler. This process is called instruction construction. When the program starts, the instructions and data of the program will be stored in two memory segments respectively. At the same time, the PC pointer (IP register + CS register) will point to the starting address of the instruction segment (that is, assign the starting address to the PC pointer), indicating that the CPU will read and execute the instructions in memory from this address.
The instruction is read into the instruction register first. When the CPU takes it out for execution, it needs to parse the instruction first.
As we all know, the contents stored in memory are of binary type (the above instructions are written in hexadecimal). After the CPU reads the instructions to be executed, it will first parse the binary instructions. Take “0x8c400104” above as an example and split it into binary:
The above instructions are divided into three parts: opcode, register number and memory address:
- The leftmost 6 bits are called opcodes, and “10011” indicates the load instruction.
- The middle 4 bits specify the register number, “0001” indicates R1 register.
- The last 22 bits represent the memory address to operate on.
Therefore, this instruction refers to loading the contents of the specified memory address into register R1.
To sum up, when the program is executed:
- The instructions and data of the program are stored in the memory instruction segment and data segment respectively, and the PC pointer refers to the starting address of the instruction segment.
- The CPU reads the instruction pointed by the PC pointer and stores it in the instruction register.
a. The CPU specifies the memory address to be accessed through the address bus; Send “read command” through the control bus.
b. The memory transmits the data to the CPU through the data bus, and the CPU stores the data in the instruction register.
- The CPU parses the instruction contents in the instruction register.
- The CPU executes instructions through an arithmetic unit and a control unit.
- The PC pointer increases automatically and points to the memory address of the next instruction.
Therefore, addressing, decoding and execution are the execution cycle of instructions, and all instructions will be executed in strict order.
Instruction pre reading
The speed of CPU executing instructions is very fast, but the reading and writing of memory is very slow. Therefore, if instructions are read from memory one by one and then executed, the instruction execution cycle will become very slow.
As we learned earlier, there is also a three-level cache inside the CPU, so we can read multiple instructions in the memory to the L1 cache with faster read-write speed at one time, so that the access speed can keep up with the instruction execution speed of the CPU.
At the same time, in order to avoid data cache overwriting instruction cache and affecting instruction execution, we can divide L1 cache into instruction area and data area.
Consider whether L2 and L3 need to be divided into instruction area and data area? In fact, it is not necessary, because L2 and L3 do not need to assist in instruction pre reading.
How to execute instructions faster
In order to execute instructions faster, we need to use the instruction pipeline technology of CPU.
In the flow just now, the operation unit is idle when fetching and decoding. In order to improve the instruction processing speed, the operation unit needs to be in operation all the time. We can use the instruction pipeline technology of CPU. When the first instruction completes addressing and decoding, the second instruction immediately accesses and so on. In this way, after the execution of the previous instruction, the next instruction also completes decoding and can be executed.
One sentence summary
The program is stored in memory, and the CPU reads instructions and performs calculations.
Because the instruction execution speed of CPU is very fast, there is no memory that can meet the requirements of fast reading and writing speed, low heat dissipation, low energy consumption and large capacity at the same time, so the memory hierarchical strategy is adopted and multi-level cache is used to match the execution speed of CPU.
The data transmission between CPU and memory is completed through the bus on the motherboard. Transfer the memory address information to be operated to the memory through the address bus; Type of command issued through the control bus; Memory transfers data from the data bus to the CPU.
The register is the memory where the CPU reads instruction and parameter data directly. Registers can be divided into several categories according to their purpose. For data, the data will be read into the general register first, and then the CPU will read and write data from the general register; For instructions, the CPU will first obtain the instructions according to the instruction memory address jointly pointed by the CS segment register and the instruction pointer register, store the instructions in the instruction register, and then the CPU will read and execute them from the instruction register.
The execution of instructions includes addressing, decoding and execution. In order to prevent the CPU from getting instructions from memory every time, the instructions can be pre read into the CPU L1 cache; At the same time, in order to keep the computing unit of the CPU in the computing state all the time, pipelining technology can be used.
Write at the end
Friends who love this article welcome to pay attention to the official account “play code” and focus on vernacular sharing practical technology.