[operating system virtualization] memory paging



There is an inherent problem in segmented memory management: after the space is divided into different lengths, the space itself will be fragmented, and it will become more difficult to allocate memory over time.

Therefore, it is worth considering another method: dividing the space into fixed length pieces. In virtual memory, we call this idea paging.Paging does not divide the address space of a process into several logical segments of different lengths, but into units of fixed size, each of which is called a page. Accordingly, we regard physical memory as an array of fixed length slot blocks, which is called page frame. Each such page frame contains a virtual memory page.


The figure below shows a small 64 byte address space with four 16 byte pages (virtual pages 0, 1, 2, 3).

[operating system virtualization] memory paging

Physical memory, as shown in the figure, is also composed of a set of fixed size slot blocks. In this case, there are eight page frames (128 bytes of physical memory, which is also very small). As can be seen from the figure, the pages of the virtual address space are placed in different locations of the physical memory.

[operating system virtualization] memory paging

Paging has many advantages over our previous methods. Perhaps the biggest improvement is flexibility: through the perfect paging method, the operating system can efficiently provide the abstraction of address space, no matter how the process uses the address space.

Another advantage is the simplicity of free space management provided by paging. For example, if the operating system wants to put 64 bytes of small address space into 8 pages of physical address space, it only needs to find 4 free pages. Maybe the operating system keeps a free list of all the free pages, and you only need to take four free pages out of this list.

In order to record the location of each virtual page in the address space in the physical memory, the operating system usually saves a data structure for each process, which is called the page table. The main function of page table is to save address translation for each virtual page in address space, so that we can know the location of each page in physical memory.

It’s important to remember that this page table is a data structure that every process has.

In order to translate the virtual address generated by this process, we must first divide it into two components: virtual page number (VPN) and offset within the page.For this example, because the virtual address space of the process is 64 bytes, we need a total of 6 bits of virtual address. Because we know the size of the page (16 bytes), we can further divide the virtual address as follows:

[operating system virtualization] memory paging

When a process generates a virtual address, the operating system and hardware must work together to translate it into a meaningful physical address. Through the virtual page number, we can search the page table and find the physical page where the virtual page is located. Then we can convert the virtual address by replacing VPN with PFN, because the offset only tells us which byte in the page we want, so the offset remains unchanged, and finally the result is sent to the physical memory.

[operating system virtualization] memory paging

Structure of page table

A page table is a data structure that maps a virtual address (or, in fact, a virtual page number) to a physical address (physical frame number). Therefore, any data structure can be used. The simplest form is called linear page table, which is an array. The operating system retrieves the array through the virtual page number (VPN) and looks up the page table entry (PTE) at the index to find the expected physical frame number (PFN). Now, let’s assume this simple linear structure.

As for the content of each PTE, we have many different bits in it, which is worth knowing.Valid bits are usually used to indicate whether a particular address translation is valid.For example, when a program starts to run, its code and heap are at one end of its address space, and the stack is at the other end. All unused intermediate space will be marked as invalid. If a process attempts to access this memory, it will fall into the operating system, which may cause the process to terminate. Therefore, significant bits are very important to support sparse address space. By simply marking all unused pages in the address space as invalid, we no longer need to allocate physical frames for these pages, thus saving a lot of memory.

PTE may also have a protection bit indicating whether the page can be read, written, or executed.Similarly, accessing a page in a way that these bits do not allow can trap the operating system.

The presence bit indicates whether the page is in physical memory or on disk (that is, it has been swapped out).Swapping allows the operating system to free physical memory by moving infrequently used pages to disk.Dirty bits are also common, indicating whether a page has been modified after it has been brought into memory.

The reference bit (also known as the access bit) is sometimes used to track whether a page has been accessed and to determine which pages are popular, so it should be kept in memory.This knowledge is very important in page replacement.

There are other important parts, but we won’t discuss them too much now.


Page tables in memory, we already know that they may be too large. As it turns out, they also slow things down.

In order to realize the function of address translation, the hardware must know the location of the page table of the running process. Now let’s assume aThe page table base register contains the physical address of the starting position of the page table. Then, we use VPN as the index of the PTE array pointed to by the base register of the page table, get PTE from memory, extract PFN, and connect it with the offset from the virtual address to form the required physical address.

For each memory reference (whether fetching instructions or explicitly loading or storing), paging requires us to perform an additional memory reference in order to get the address translation from the page table first. Additional memory references are expensive, in which case the process can be slowed down by two times or more.

Fast address translation (TLB)

Using paging as the core mechanism to implement virtual memory may bring high performance overhead. In order to use paging, the memory address space must be divided into a large number of fixed size units (pages), and the address mapping information of these units needs to be recorded. Because the mapping information is usually stored in physical memory, an additional memory access is needed in paging logic when converting virtual addresses. Every time an instruction is fetched, explicitly loaded or saved, the memory is read once more to get the conversion information, which is unacceptably slow.

Therefore, we need to add the so-called translation lookaside buffer (TLB), which is the hardware cache of frequent virtual to physical address translation. For each memory access, the hardware first checks the TLB to see if there is an expected conversion mapping. If there is, it completes the conversion without accessing the page table.

The basic algorithm of TLB

The algorithm of hardware processing virtual address translation is as followsFirst, extract the page number from the virtual address, and then check whether the TLB has the VPN translation mapping. If so, we have a TLB hit, which means that the TLB has a transformation mapping for the page. Next, we can get the page frame number from the relevant TLB items, combine it with the offset in the original virtual address to form the desired physical address, and access the memory (assuming that the protection check does not fail).

If the CPU does not find the TLB miss in the TLB, the hardware accesses the page table to find the TLB miss. Assuming that the virtual address is valid and we have relevant access rights, the TLB is updated with the TLB miss. The overhead of the above operations is large, mainly due to the need for additional memory references to access the page table. Finally, when the TLB is updated successfully, the system will try the instruction again. At this time, with this transformation mapping in the TLB, the memory reference will be processed quickly.

Due to the principle of locality, the access hit rate of TLB is generally high.

Content of TLB

A typical TLB has 32, 64 or 128 items, and is fully associated. This means that an address map may exist anywhere in the TLB, and the hardware will search the TLB in parallel to find the desired translation map.The content of a TLB entry may be as follows:

VPN PFN other bits

In the other bits of TLB, there is usually a valid bit to identify whether the item is a valid conversion mapping. There are usually protection bits to identify whether the page has access rights.For example, code pages are identified as readable and executable, while heap pages are identified as readable and writable.There are other bits, including address space identifier, dirty bit and so on.

TLB processing in context switching

With TLB, there will be some new problems when switching between processes. Specifically, the virtual to physical address mapping contained in TLB is only valid for the current process, but not meaningful for other processes. Therefore, in the event of process switching, the hardware or operating system must pay attention to ensure that the running process does not misread the previous process address mapping.

There are some possible solutions to this problem. One way is to simply flush the TLB during context switching, so that the TLB becomes empty before the new process runs. If it is a software managed TLB system, it can be accomplished by an explicit (privileged) instruction when context switching occurs. If the TLB is managed by hardware, the TLB can be cleared when the content of the base register of the page table changes (the operating system must change the value of the base register of the page table during context switching). In either case, the clear operation is to set all the valid bits to 0, essentially clearing the TLB.

Clear TLB when context switch. This is a feasible solution. The process will not read the wrong address mapping again. However, there is an overhead: every time a process runs, when it accesses data and code pages, it triggers a TLB miss. If the operating system switches processes frequently, the overhead will be high.

In order to reduce this overhead, some systems add hardware support to achieve cross context switching TLB sharing.For example, some systems add an address space identifier (ASID) to the TLB. Asid can be regarded as a process identifier (PID), but it is usually less than PID (PID is generally 32 bits, asid is generally 8 bits).Of course, the hardware also needs to know which process is running in order to carry out address translation, so the operating system must set a privilege register to the asid of the current process when switching context.

Smaller page table

Now let’s solve the second problem of pagination: the page table is too large, so it consumes too much memory. Let’s start with the linear page table. Suppose a 32-bit address space, 4KB pages and a 4-byte page table entry. There are about one million virtual pages in an address space. Multiply by the size of the page table entry, you will find that the page table size is 4MB. If there are 100 active processes, it is necessary to allocate hundreds of megabytes of memory for the page table! Therefore, we need to find some technologies to reduce this heavy burden.

Simple solution: bigger pages

There is a simple way to reduce the page table size: use larger pages. Suppose you use 16kb pages, the total size of each page table becomes 1MB, and the page table shrinks to a quarter.

However, the main problem of this method is that large memory pages will lead to waste in each page, which is called internal fragmentation problem. So the result is that the application allocates pages, but only a small portion of each page, and memory quickly fills those pages that are too large. Therefore, most systems use relatively small page sizes in common situations.

Multi level page table

The basic idea of multilevel page table is very simple.First, divide the page table into page size cells. Then, if the page table entry of the whole page is invalid, the page table of the page is not allocated at all. To track whether a page in a page table is valid (and if so, its location in memory), a new structure called page directory is used. The page directory can therefore tell you where the pages of the page table are, or the entire page of the page table does not contain valid pages.

The figure below shows an example. On the left is the classic linear page table. Even if most of the middle regions of the address space are invalid, we still need to allocate page table space for these regions. On the right is a multi-level page table. The page directory only marks two pages of the page table as valid; Therefore, these two pages of the page table reside in memory.

[operating system virtualization] memory paging

In a simple two-level page table, each page table contains a page directory. It consists of multiple page directory entries (PDEs). PDE has a valid bit and a page frame number (PFN), similar to PTE. However, the meaning of this significant bit is slightly different: if the PDE item is valid, it means that at least one page in the page table (through PFN) pointed by the item is valid, that is, in the page pointed by the PDE, the significant bit of at least one PTE is set to 1. If the PDE entry is invalid, the rest of the PDE is undefined.

Compared with the methods we have seen so far, multi-level page tables have some obvious advantages. First, the page table space allocated by the multi-level page table is proportional to the amount of memory in the address space you are using. Therefore, it is usually compact and supports sparse address space.

Second, if carefully constructed, each part of the page table can be neatly placed on a page, making it easier to manage memory. The operating system can simply get the next free page when it needs to allocate or grow the page table. With the multi-level structure, we add a level of indirection and use the page directory, which points to each part of the page table. This indirect way allows us to put the page table page anywhere in the physical memory.

However, there is a cost to multi-level page tables. In the case of TLB miss, it needs to load twice from memory to get the correct address translation information from the page table (one for the page directory and the other for the PTE itself), while the linear page table only needs to load once. Therefore, the multi-level page table is a typical example of time space trade-off.

Another obvious drawback is complexity. Whether it is hardware or operating system to deal with page table lookup (in the case of TLB miss), this is undoubtedly more complex than simple linear page table lookup. Usually we are willing to add complexity to improve performance or reduce overhead.

When using multi-level page table for virtual address translation, VPN should be divided into pdindex and ptindex. Once the page directory index is extracted from VPN, we can find the address of the page directory entry by simple calculation: pdeaddr = pagedirbase + (pdindex * sizeof (PDE)). If the page directory entry is marked as invalid, we know that the access is invalid and an exception is thrown. If PDE is valid, we must get the page table entry from the page of the page table pointed by the page directory entry. To find this PTE, we must index the remaining bits of the VPN to the part of the page table. After the PTE is found, the PPN is combined with the offset to get the real physical address.

[operating system virtualization] memory paging

In our example, assume that a multi-level page table has only two levels: a page directory and several page tables. In some cases, deeper trees are possible (and indeed needed).

Also remember that before any complex multi-level page table access occurs, the hardware checks the TLB first. On hit, the physical address is formed directly instead of accessing the page table as before. The hardware needs to perform a full multi-level lookup only when the TLB misses. In this case, the system needs to make two additional memory accesses to find a valid transformation mapping.