15 minutes! A text helps Xiaobai understand the memory of the operating system

Time:2021-8-16

preface

Operating system is a difficult course to learn. At the same time, operating system knowledge is very important for developers. I believe that when you learn the operating system, there are too many abstract words and concepts that are difficult to understand, which directly dissuade us. Even if you learn the operating system with a warm heart, you suddenly feel sleepy in less than 3 minutes.

Therefore, I want to produce a series of articles on the operating system in the form of illustration + vernacular, so that Xiaobai can understand it and help you get started quickly

This chapter begins with the introduction of memory. Memory is still important in the operating system. Understanding it will have a preliminary outline of the work of the whole operating system.


content syllabus

15 minutes! A text helps Xiaobai understand the memory of the operating system

content syllabus

text

What is memory

Little story

What steps do we need to go through to set up a stall (prepare to run the program process). Guess here.

First, go to the urban management to apply for a booth (apply for memory). The urban management (operating system) divides a booth (memory) of the corresponding size according to the remaining carpet space and the scale of your carpet, and then you can set up a stall (run the program process) happily to make money.

The urban management will also check (tidy up memory space debris) from time to time to see whether the stalls are regular and whether they hinder the normal sidewalks.

In short, the program (process) running on the computer needs to use the corresponding size of physical memory.


virtual memory

In fact, the running process does not directly use the physical memory address, but isolates the memory address used by the process from the actual physical memory address, that is, the operating system will allocate an independent set ofVirtual address」。

Each process plays its own address and does not interfere with each other. As for how the virtual address is mapped to the physical address, it is transparent to the process. The operating system has made these arrangements clear.

The operating system will provide a mechanism to map the virtual addresses of different processes to the physical addresses of different memory, as shown in the following figure

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

From this, we draw two concepts:

  • The memory address used in the process is calledVirtual address
  • The spatial address in the computing hardware is calledPhysical address

In short, the operating system introduces virtual space, and the virtual address held by the process will be converted into a physical address through the mapping relationship of the memory management unit (MMU) in the CPU chip, and then access the physical memory through the physical address

How does the operating system manage the relationship between virtual addresses and physical memory addresses?

There are three main ways, namelySegmentation, pagination, segment page, let’s take a look at these three memory management methods


Memory segmentation

The program contains several logical segments, such as code segment, data segment, stack segment and heap segment. Each segment has different properties, so the memory separates these segments in the form of segments for management

How do virtual addresses and physical addresses map in memory segmentation?

The virtual address under segment management consists of two parts, segment number and intra segment offset

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

  1. Items in the segment table are mapped by segment numbers
  2. Get segment base address from item
  3. Segment base address + offset within segment = physical memory used

It is known from the above that the segment number is used to map the items in the segment table, and the segment base address and offset in the item are used to calculate the physical memory address. However, in fact, the segmentation method will divide the virtual address of the program into four segments. Each segment has an item in the segment table, in which the segment base address is found and the offset is added to calculate the physical memory address

The segmentation method solves the problem that the program itself does not need to care about the specific physical memory address, but it still has shortcomings:

  • Memory fragmentation problem
  • Low efficiency of memory exchange

Next, these two problems are analyzed

How does fragmentation generate memory fragmentation?

Before talking about memory fragmentation, let’s figure out what memory fragmentation is?, Eight people go out for dinner. There are more people because of the meal point. The rest are small tables for four people. These small tables for four people are what we call memory fragments. At this time, a small partner will say that this problem can be solved by putting together two small tables for four people. It is very simple. We call this method memory defragmentation (involving memory exchange).

Back to the point, let’s take an example. Suppose that the physical memory is only 1GB (1024MB), and multiple programs are running on the user’s computer:

  • The browser occupies 128MB
  • Music software occupies 256MB
  • The game occupies 512MB

At this time, we close the browser and the remaining physical memory is 1024MB – (256MB + 512MB) = 256MB. However, the remaining 256MB of physical memory is not continuous and is divided into two sections of 128MB, resulting in no space to open another 200MB program, as shown in the figure below

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

The memory fragmentation problem here has two points:

  • External memory fragments are multiple discontinuous small physical memory spaces, resulting in new programs that cannot be loaded
  • Internal memory fragments. All the memory of the program is loaded into the physical memory, but the program has some memory, which may not be used often, resulting in a waste of memory

The solution to external memory fragmentation is to use memory defragmentation

Memory defragmentation is realized through memory exchange. We can load the 256MB occupied by music software onto the hard disk and read it back from the hard disk, but the read back position is no longer the original position, but immediately behind the 512MB occupied by the game. In this way, two 128MB free physical memory are combined into a 256MB continuous physical memory, So the new 200MB program can be loaded in

The memory exchange space is the swap space we often see in the Linux system. This space is divided from the hard disk for the space exchange between memory and hard disk.

Why is the efficiency of memory exchange low?

First, segmented management is easy to cause memory fragmentation, resulting in a high frequency of memory exchange, because the access speed of the hard disk is much slower than that of the memory. Then, during each exchange, a large section of continuous memory is written to the hard disk and then read out from the hard disk. If the exchange is a program that occupies a large memory space, the whole machine will appear stuck, The process is also very slow, so the segmented memory exchange is inefficient.

In order to solve the problem of memory fragmentation and low efficiency of memory exchange caused by memory segmentation management, memory paging appears


Memory paging

The advantage of segmentation is that it can produce continuous memory space, but there will be a large number of memory fragments and low efficiency of memory exchange

First think about how to solve these two problems. Memory fragmentation is caused by multiple discontinuous small physical memory spaces. If these discontinuous small physical memory spaces are combined, will this problem be solved? Similarly, during memory exchange, we ensure that the data exchanged is small. Can we improve the efficiency of memory exchange?

This method is memory paging. Paging is to cut the whole virtual and physical space into sections of fixed size. Such a continuous and fixed size space is called page. Under Linux, the size of each page is 4KB( Virtual space refers to the space where a set of virtual addresses are stored)

The virtual address and physical address are mapped through the page table. The virtual address in the virtual space must be continuous, and the physical address is not necessarily, but multiple discontinuous physical memory can be combined through continuous virtual addresses.

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

When the virtual address accessed by the process cannot be found in the page table, the system will generate a page missing exception, enter the system kernel space, allocate physical memory, update the process page table, and finally return to the user space to restore the operation of the process.

How does paging solve the problem of memory fragmentation and slow memory exchange efficiency?

Memory fragmentation resolution:

Because the memory unit becomes a fixed size page, the virtual space of each program also maintains continuous pages (virtual addresses), which are mapped to physical memory pages through the page table. Although the mapped physical memory pages are discontinuous, the virtual space is continuous, so they can be used together, but this can only solve the problem of external memory fragmentation, The internal fragmentation problem is not solved because each page has a fixed size. Maybe only part of a page is used, which will still cause some waste.

Solutions to slow memory exchange efficiency:

As mentioned earlier, reducing the size of exchange data can improve the efficiency of memory exchange. The paging method is solved in this way. If the memory space is insufficient, the operating system will release the “recently unused” memory pages in other running processes, that is, load them into the hard disk, which is called exchange out. Once necessary, load them in, which is called exchange in. Therefore, only one or several pages are written to the hard disk at one time, and the memory exchange efficiency is naturally improved.

Paging mode makes it unnecessary to load the program into physical memory at one time when loading the program. After the mapping between virtual memory and physical memory pages, the pages are not really loaded into physical memory. Instead, they are loaded into physical memory only when the instructions and data in the corresponding virtual memory page are needed during program operation (in vernacular, you will use the corresponding physical memory when you need it).

How do virtual addresses and physical addresses map in memory paging mode?

Under the paging mechanism, each process will allocate a page table, and the virtual address will be divided into two parts: page number and intra page offset. The page number is used as the index of the page table. The page table contains the base address of the physical memory where each page is located. The intra page offset + physical memory base address form the physical memory address, as shown in the figure below

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

These are the following steps

  1. Page number found page item in page table
  2. Gets the physical page number base address of the page item
  3. Offset + physical page number base address to calculate the physical memory address

Is it very simple, but will it be a problem if this paging method is used on the operating system? There must be a problem. Remember that each process will be assigned a page table? Let’s unlock this foreshadowing for you

What is the problem with allocating a page table to each process in paging mode?

Don’t worry. Allocating a page table to each process will have space defects, because many processes can run on the operating system, which doesn’t mean that there are a lot of page tables!

one       1B(Byte   Byte) = 8bit,
2     1KB  ( Kilobyte   Kilobytes) = 1024b,
3     1MB  ( Megabyte   Megabytes   Referred to as "megabyte") = 1024KB,
4     1GB  ( Gigabyte   Gigabyte   Also known as "Gigabit") = 1024MB

Taking the 32-bit environment as an example, the virtual address space range has a total of 4GB. Assuming that the size of a page is 4KB (2 ^ 12), about 1 million (2 ^ 20) pages are required, and each “page table item” needs 4 bytes to store. Then the mapping of the entire 4GB space range needs 4MB of memory to store the page table.

4MB seems small, but the number is terrible. Assuming 100 processes, 400MB of memory is required to store the page table, which is a very large memory, let alone a 64 bit environment.

In order to solve the space problem, based on the paging mode, the multi-level page table mode appears

Multi level page table

As we know earlier, in a 32-bit environment, there are 1 million pages based on 4KB per page. A page table item needs 4 bytes to store. A page table contains 1 million page table items. Then the page table of each process needs to occupy 4MB. How can multi-level page tables solve this problem?

On the basis of the page table, make a secondary paging, divide 1 million “page table items” into “1024 page table items” of the primary page table, and the “1024 page table items” of the secondary page table are associated under the “primary page table items”. In this way, the 1024 page table items of the primary page table cover the space range mapping of 4GB, and the secondary page table is loaded on demand, which greatly reduces the space occupied by the page table.

For a simple calculation, if only 20% of the level-1 page table items are used, the memory space occupied by the page table is only 4KB (level-1 page table) + 20% * 4MB (level-2 page table) = 0.804mb. Is this a huge savings compared with the 4MB of the single-level page table?

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

Then think about whether you can continue to grade on the basis of level 2. It can be divided into two levels, and it must also be divided into three and four levels. In the 64 bit operating system, there are four levels of paging, which are divided into four directories

  1. Global page directory entry
  2. Upper page directory entry
  3. Middle page directory entry
  4. Page table entry

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

TBL

Although the multi-level page table solves the problem of space, we find that this method needs multi-channel conversion to find the mapped physical memory address, and the multi-channel conversion causes time overhead.

The program is local, that is, in a period of time, the execution of the whole program is limited to a part of the program. Accordingly, the storage space accessed by the execution is also limited to a certain memory area.

The operating system uses this feature to put the most used page table entries into the TBL cache. When addressing, the CPU will first check the TLB. If it is not found, it will continue to check the conventional page table. In fact, the hit rate of TLB is very high, because there are only a few pages most frequently accessed by the program.

Memory segment page

Segment type and page type are not relative. They can also be used together for paging classification based on segments

  1. Firstly, the program is divided into several logically meaningful segments, that is, the segmentation mechanism mentioned earlier
  2. Then, each segment is divided into multiple pages, that is, the continuous space divided by segments is divided into pages of fixed size

The virtual address structure consists of segment number, intra segment page number and intra page displacement

15 minutes! A text helps Xiaobai understand the memory of the operating system

Insert picture description here

These are the following steps

  1. Obtain the segment item of the segment table through the segment number
  2. Get the page table address through the segment item
  3. Find segment page table by page table address
  4. Find the segment page item of the segment page table by the page number in the segment
  5. Get the physical page base address through the segment page entry
  6. Calculate the physical memory address through the physical page base address + offset
    • *

summary

Processes do not use physical memory directly, but through virtual address mapping. Therefore, the operating system will allocate virtual space (a set of addresses) for each process, so that the physical memory used by each process does not affect and isolate each other.

When a large number of processes are enabled, resulting in insufficient memory, the operating system will load the infrequently used memory to the hard disk (exchange out) through the memory exchange technology, and load it from the hard disk to the memory (exchange in) during use

The operating system can manage memory in three ways: segmentation, paging and segment page. The advantage of segmentation is that the physical memory space is continuous, but the disadvantage is obvious. It is easy to cause memory fragmentation, and the memory exchange efficiency is slow. Paging can well solve the defect of segmentation, and solve the problem of external memory fragmentation through continuous virtual addresses, In each memory exchange, the recently unused memory is swapped in and out in page units to ensure the exchange data size and improve the memory exchange efficiency. However, there will be a problem of page table space occupation. In order to solve this problem, multi-level paging + TBL mode is optimized on the basis of paging to reduce space occupation and time consumption. The last one is segment page, which is a combination of segmentation and paging.

Through thinking, we find that multi-level paging solves the problems of space occupation and time consumption through tree + lazy loading + cache. The virtual address can well decouple the process from the physical memory address. Because of this, there will be no conflict when multiple processes use physical memory, and they can be independent and isolated from each other.

About me

The official account principle: “program ape an star” focuses on the principles and source of technology, and outputs the technology through graphic methods. Here we will share the original articles of quality, such as operating system, computer network, Java, distributed, database and so on, and expect your attention.

15 minutes! A text helps Xiaobai understand the memory of the operating system

Recommended Today

Exercises in Chapter 8 of statistical learning methods

Exercise 8.1 Sklearn.ensemble.adaboostclassifier of scikit learn library can be used for model training slightly Exercise 8.2 List tables for comparison Model name learning strategy Learning loss function learning algorithm Support vector machine Minimize regularization page loss and maximize soft spacing hinge loss Sequential minimum optimization algorithm (SMO) AdaBoost Minimizing the exponential loss of additive model […]