UCORE operating system experiment notes – lab1

Time:2020-1-30

Recently, I have been following the operating system course of Tsinghua University. The biggest feature of this course is a series of practical operating system experiments. There are 8 experiments in total. I’m here to record some experience and summary in the experiment.

Task1

The main purpose of this task is to be familiar with makfile and how to generate image files of operating system. Makefile can be used without deep understanding.

Task2

The main purpose of this task is to be familiar with GDB and the startup process of the operating system. Here are some procedures for debugging BIOS.

First, modify gdbinit to:

set architecture i8086
target remote :1234
define hook-stop
x/i $pc
end

Then input

make debug

By input

x/i $cs
x/i $eip

We can get the current$csand$eipThe value. among

$cs = 0xf000
$eip = 0xfff0

In real mode, this address is

$cs << 4 | $eip = 0xffff0

We can also see what the instructions for this address are

x/2i 0xffff0

The result is

0xffff0:     ljmp   $0xf000,$0xe05b

That is to say, the starting address of BIOS should be

$cs << 4 | 0xe05b = 0xfe05b

At this point, we set a breakpoint to 0x7c00:

B * 0x7c00 / * note that for an absolute address, you need to add * as the address*/

Then when the program runs, it stops at0x7c00This address. The bootloader is stored here.

Task3

This taks is the most important of the five taks. Through this task, we can learn: how to turn on A20; how to switch CPU from real mode to protection mode; how to initialize and use GDT table.

How to turn A20 on / off

Memory access in real mode

Before turning on A20, let’s talk about how the CPU accesses memory space in the i8086.

In the i8086 era, the CPU’s data bus is 16bit, the address bus is 20bit, and the register is 16bit, so the CPU can only access space within 1MB. Because the data bus and register are only 16bit, if we need to get 20bit data, we need to do some additional operations, such as shift. In fact, the CPU forms a 20bit address by shifting segments (each segment has a constant size of 64K) and offsets. This address is the address for accessing memory in real mode

address = segment << 4 | offset

Theoretically, a 20bit address can access 1MB of memory space (0x00000 – (2 ^ 20 – 1 = 0xfffff)). However, in real mode, the 20 bit address can theoretically access the memory space from 0x00000 – (0xffff0 + 0xFFFF = 0x10ffef). That is to say, theoretically, we can access more than 1MB of memory space, but after passing 0xfffff, the address will return to 0x00000.

There is no problem with the above feature in the i8086 (because it can only access 1MB of memory space at most), but after the i80286 / i80386, the CPU has a wider address bus, data bus and register, which will cause a problem: in the real mode, we can access more than 1MB of space, but we only want to access within 1MB of memory space. In order to solve this problem, a module which can control the A20 address line is added to the CPU. Through this module, we limit the 20th bit address line to 0 in the real mode, so that the CPU can not access more than 1MB of space. After entering the protection mode, we use this module to release the limitation of A20 address line, so that we can access more than 1MB of memory space.

A20 on / off process

The CPU used now controls the A20 address line through the keyboard controller 8042. By default, the A20 address line is off (the 20bit address line is limited to 0), so before entering the protection mode (requiring access to more than 1MB of memory space), we need to turn on the A20 address line (the 20bit address line can be 0 or 1). Please refer to the bootasm. S file for the startup process of A20.

How to change CPU from real mode to protected mode

This is very simple. We need to turn on the A20 address line and set the PE (bit0) of $CR0 (control register 0) to 1. Please refer to the bootasm. S file for specific code.

How to initialize and use GDT table

GDT detailed explanation

Before using GDT, we need to know what is GDT. The full name of GDT is global descriptor table, that is, global descriptor table. In protected mode, we divide the memory space into segments (which can overlap) by setting GDT, so that different programs can access different memory spaces.
This is different from addressing in real mode. In real mode, we can only use

address = segment << 4 | offset

It is also segment + offset, but we will not actually segment in real mode. In this case, any program can access the whole 1MB space. In protected mode, the program can not access the whole memory space through segmentation. Here is a quotation from UCORE experiment report:

In the [supplementary] protection mode, there are two segment tables: GDT (Global Descriptor Table) and LDT (local descriptor table). Each segment table can contain 8192 (2 ^ 13) descriptors [1], so there can be at most 2 * 2 ^ 13 = 2 ^ 14 segments at the same time. Although there can be so many segments in the protection mode, the logical address space seems to be very large, but in fact, the segments do not extend the physical address space, to a large extent, the address spaces of each segment are overlapped. At present, the so-called 64tb (2 ^ 14 + 32 = 2 ^ 46) logical address space is a theoretical value and has no practical significance. In 32-bit protection mode, the real physical space is still only 2 ^ 32 bytes. Note: only GDT is used in UCORE lab, not LDT.

Reference: [1] 3.5.1 Segment Descriptor Tables, Intel® 64 and IA-32 Architectures Software Developer’s Manual

In addition to GDT, we need to know several other terms: segment descriptor and segment selector. Segment descriptor is the element in GDT, segment selector is the index to access GDT.

Segment selector

In the real mode, the logical address consists of segment selection sub and segment selection sub offset. Among them, segment selection sub is 16bit and segment selection sub offset is 32bit. The following is the schematic diagram of segment selection sub:

UCORE operating system experiment notes - lab1

  1. In the segment selector, index [15:3] is the index of GDT.

  2. Ti [2:2] is used to select the type of table. 1 is LDT and 0 is GDT.

  3. RPL [1:0] is used to select the privilege level of the requester, with 00 being the highest and 11 being the lowest.

Segment descriptor

The form of segment descriptors is quite complex (in order to be compatible with different versions of CPUs), here I only give a schematic diagram, please refer to the manual for specific contents. The most important ones used here are segment base and segment limit:

UCORE operating system experiment notes - lab1

GDT access

With the above knowledge, we can see how to get the address to be accessed through GDT. We use this diagram to explain:

UCORE operating system experiment notes - lab1

  1. We separate segment selectors according to the logical address given by the CPU.

  2. Use this segment selector to select a segment descriptor.

  3. Add the base address in the segment descriptor and the offset of the segment selector to get the linear address. This is the address we need.

Initialization and use of GDT

Because we need to use segmented memory space in protected mode, we need to initialize GDT before entering protected mode. Here are some code to show how to initialize and use GDT.

Here is the GDT initialization code:

#define SEG_NULLASM                                             \
    .word 0, 0;                                                 \
    .byte 0, 0, 0, 0

#define SEG_ASM(type,base,lim)                                  \
    .word (((lim) >> 12) & 0xffff), ((base) & 0xffff);          \
    .byte (((base) >> 16) & 0xff), (0x90 | (type)),             \
        (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)

gdt:
    /*There is a special selector called null selector whose index = 0, Ti = 0, and RP
    The l field can be any value. The null selector has a specific purpose. When the null selector is used for storage access
    When asked, it will cause an exception. The null selector is specially defined and does not correspond to the global descriptor table GDT
    Descriptor 0 in, so the descriptor 0 in the processor is never accessed by the processor
    Set it to zero. * /
    SEG_NULLASM                                     # null seg
    
    /*In lab1, both code segment and data segment can access the whole memory space*/
    SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)           # code seg for bootloader and kernel
    SEG_ASM(STA_W, 0x0, 0xffffffff)                 # data seg for bootloader and kernel

gdtdesc:
    /*Lgdt needs to load the size of GDT first, and then the address of GDT*/
    .word 0x17                                      # sizeof(gdt) - 1
    .long gdt                                       # address gdt

In theory, GDT can exist anywhere in memory, but here we initialize GDT in real mode, so GDT should exist in the lowest memory space of 1MB. The CPU reads the address of GDT through the lgdt instruction, and then we can use GDT.

.set PROT_MODE_CSEG,        0x8   
.set PROT_MODE_DSEG,        0x10

/*Load GDT*/
lgdt gdtdesc

/*Switch from real mode to protected mode*/
movl %cr0, %eax
orl $CR0_PE_ON, %eax
movl %eax, %cr0

# ljmp <imm1>, <imm2>
# %cs ← imm1
# %ip ← imm2
/*Set the value of% CS (code segment) to 0x8*/
ljmp $PROT_MODE_CSEG, $protcseg

...

protcseg:
    # Set up the protected-mode data segment registers
    /*Set the value of data segment*/
    movw $PROT_MODE_DSEG, %ax                       # Our data segment selector
    movw %ax, %ds                                   # -> DS: Data Segment
    movw %ax, %es                                   # -> ES: Extra Segment
    movw %ax, %fs                                   # -> FS
    movw %ax, %gs                                   # -> GS
    movw %ax, %ss                                   # -> SS: Stack Segment

Task4

Through this task, we can understand how OS loads elf image files. I didn’t study ELF file format and how to use it.

Task5

This task is to let us know the relationship between function call and stack. For the details of function call, I have written in the previous article. For details, please refer to c function call process principle and function stack frame analysis. Here the main analysis of the code, the source code in the Kern / debug / kdebug. C file.

/*
Stack bottom direction high address
...          
...          
Parameter 3        
Parameter 2        
Parameter 1        
Return address     
Previous [EBP] < ------- [ESP / current EBP]
Local variable low address
*/
void
print_stackframe(void) {
    uint32_t cur_ebp, cur_eip; 
    uint32_t args[4]; 
    cur_ebp = read_ebp();
    cur_eip = read_eip();
    
    /*Suppose there are at most 20 layers of function calls*/
    for (int stack_level = 0; stack_level < STACKFRAME_DEPTH + 1; stack_level++) {
        cprintf("ebp: 0x%08x eip: 0x%08x ", cur_ebp, cur_eip);
        
        /*Suppose the function has at most four parameters*/
        for (int arg_num = 0; arg_num < 4; arg_num++)
            args[arg_num] = *((uint32_t *)cur_ebp + (2 + arg_num));
        cprintf("args:0x%08x 0x%08x 0x%08x 0x%08x\n", args[0], args[1], args[2], args[3]);
        print_debuginfo(cur_eip);
        
        /* 获取上一层函数的Return address和$ebp的值 */
        cur_eip = *((uint32_t *)cur_ebp + 1); 
        cur_ebp = *((uint32_t *)cur_ebp);  
    }
}

Task6

The main purpose of this task is to familiarize us with the interruption under the protection mode. In the x86 architecture, there are three types of interrupts:

  1. Those unrelated to CPU, such as peripheral requests, belong to interrupt.

  2. Those related to CPU, such as division by 0, page fault, etc., belong to exception.

  3. System calls, which belong to trap

Interrupt mechanism

When the CPU receives an interrupt (completed through 8259A) or an abnormal event, it will pause executing the current program or task, jump to the relevant processing routine responsible for processing this signal through a certain mechanism, and then jump back to the program or task just broken after finishing the processing of this event

Interrupt vector and interrupt service routine

In the x86 architecture, the system supports up to 256 different interrupts, each of which has a corresponding interrupt vector. Each interrupt vector has a corresponding interrupt service routine, which is used to process the interrupt vector

IDT

The relation between interrupt vector and interrupt service routine is IDT (interrupt descriptor table). Input an interrupt vector, we can find and run the interrupt service routine corresponding to the interrupt vector. IDT is similar to GDT, each descriptor is 8K, but the first entry of IDT can contain one descriptor. Interrupt descriptors in IDT can be divided into three types:

  1. Task Gate

  2. Interrupt Gate

  3. Trap Gate

In this lab, we use the latter two interrupt descriptors

UCORE operating system experiment notes - lab1

Interrupt gate is similar to trap gate, but there are some slight differences. I directly quote the teacher’s instructions:

[supplement] the so-called “auto disable” means that when the CPU jumps to the address in interrupt gate, after saving EFLAGS on the stack, clear if bit in EFLAGS to avoid repeated triggering of interrupt. In interrupt handling routines, the operating system can set if in EFLAGS to allow nesting of interrupts. However, the necessary preparations for handling nested interrupts, such as saving necessary registers, must be made before this. Second, the purpose of accessing trap gate in UCORE is to realize system call. The user process cannot be interrupted during normal execution. When it makes a system call, it will complete the OS kernel from the user process in the user state (ring 3) to the kernel state (ring 0) through the trap gate. If you disable the if bit in EFLAGS after reaching the OS kernel, the first is meaningless (because there will be no nested system call). The second is that some interrupts will not be responded in time. Therefore, when calling trap gate, the CPU will not disable interrupts. In a word, there is no priority between interrupt gate and trap gate. Only the CPU has different methods to deal with interrupts, which can be selected by the operating system according to needs.

According to the actual needs, we set up the corresponding IDT. After setting up the IDT, we need to tell the CPU where we set up the IDT. To achieve this, we need to use a special instruction LIDT to load the address of IDT into the IDTR register. So the CPU can access the IDT through this register. In the IDTR register, we need to store the starting address and size of the IDT. The following is a schematic diagram of the IDTR register:

UCORE operating system experiment notes - lab1

Interrupt instance

Here, I use the code of the task to show how to establish the IDT and how to access the corresponding interrupt service routine through the interrupt vector.

Set up interrupt vector table

In this lab, the interrupt vector table is “vectors”, each item of which stores the address of an interrupt vector. The interrupt service routine is called in \. __In addition to calling interrupt service routines, alltraps can also do field protection and other work.

# kern/trap/vectors.S
.globl vector0
vector0:
  pushl $0
  pushl $0
  jmp __alltraps
  ...
.globl vector255
vector255:
  pushl $0
  pushl $255
  jmp __alltraps

# vector table
.data
.globl __vectors
__vectors:
  .long vector0
  .long vector1
  .long vector2
  .long vector3
  ...
  .long vector255
  
# kern/trap/trapentry.S
.globl __alltraps
__alltraps:
    ...
    # push %esp to pass a pointer to the trapframe as an argument to trap()
    #I would like to add that before call \\alltraps, $ESP points to the last parameter to be pressed, that is, interrupt number (such as pushl $255). So pushl% ESP here is to press the address of $255 in the stack into the stack as the parameter of trap()
    pushl %esp

    # call trap(tf), where tf=%esp
    call trap

Building IDT

In this lab, the first 32 interrupt vectors and t ﹣ syscall use trap gate; the rest interrupt vectors use interrupt gate.

void
idt_init(void) {
    extern uintptr_t __vectors[]; 
    
    for (int i = 0; i < 256; i++) {
        if (i < IRQ_OFFSET) { 
            SETGATE(idt[i], 1, GD_KTEXT, __vectors[i], DPL_KERNEL); 
        } else if (i == T_SYSCALL) { 
            SETGATE(idt[i], 1, GD_KTEXT, __vectors[i], DPL_USER);
        } else { 
            SETGATE(idt[i], 0, GD_KTEXT, __vectors[i], DPL_KERNEL);
        }
    }

    lidt(&idt_pd);
}

Interrupt processing flow

The following figure is a simplified version of the interrupt processing flow:

  1. When the system receives the interrupt, it will generate an interrupt vector according to the interrupt type.

  2. Use this interrupt vector as an index to find the corresponding interrupt descriptor in the IDT.

  3. Use the segment selector in the interrupt descriptor to find the corresponding segment in GDT.

  4. Add the segment found in 3 and the offset in the interrupt descriptor (that is, the address of the interrupt vector stored in the interrupt vector table) to get the address of the interrupt service routine.

  5. Call this interrupt service routine.

For the detailed interrupt handling process, please refer to interrupt and exception and the implementation of interrupt handling in lab1

UCORE operating system experiment notes - lab1

Recommended Today

MySQL related

Indexes Bottom structure Disadvantages of hash table index: Using hash storage requires adding all files to memory, which consumes more memory space If all queries are equivalent queries, the hash is indeed fast, but in the actual scene, more data is found, and not all queries are equivalent queries, so the hash table is not […]