Write OS kernel from scratch – simple file system


Series catalog

get ready

In the previous articles, we have established the framework of process and system call, and have implemented the first oneforkSystem call. So far, all processes and their threads have been manually created in the kernel. The working function of thread is also a fixed function prepared in advance, which is only for testing. Of course, a real OS needs the ability to load user provided programs to run in process, which will use the second system call we want to implementexec

However, before that, there is still a preparatory work to be done. Since you want to load the user program, you certainly need to load it from disk. At present, our kernel does not have the ability to interact with the disk, and the files or directories of each node in VFS are abstract. They all need any file system. This article will implement a very simple file system.

file system

File system(file system)This word is often ambiguous and has different meanings in different contexts, so beginners are often confused. For example, we often hear about windows systemFATNTFSFile system, Linux systemEXTFile system, sometimes you will hear the Virtual File System VFS of Linux(Virtual File System)Wait.

There is a saying in the computer world that any technical problem can be solved by adding an intermediate layer, Linuxfile systemArchitecture perfectly embodies this philosophy. All the terms you hear above only belong to the whole universityfile systemIn different layers under the concept.

Write OS kernel from scratch - simple file system

Let’s look at the specific responsibilities of these three layers.

Virtual File System

Top down, topVirtual File SystemIt is an abstract file system built by the Linux kernel. In fact, it can roughly correspond to the files and directories in the system we usually see:

bash> ls -l /
drwxr-xr-x   2 root root  4096 Jan 13  2019 bin
drwxr-xr-x   4 root root  4096 Jan 11  2019 boot
drwxr-xr-x   3 root root  4096 Feb  3  2020 data

This layer is closest to the file system in the psychological concept of our users, but it is actually abstract, because you don’t know the device and storage format under these files, and you don’t need to care as a user. VFS shields these underlying details, so this layer is calledVirtualFile system. Logically, VFS is a tree structure with a root directory at the top/, each node may be a directory (gray) or a plain file (green).

Write OS kernel from scratch - simple file system

Storage file system

The file or directory of each node in VFS is abstract, and they should correspond to the file entity on the specific storage device (such as disk), which is managed by the layer under VFS. For example, we often hearEXT2NTFSAnd so on, although they are also called in termsfile systemBut they describe how files are stored and organized on hardware, so it should be called “storage file system”“(storage system)。 Disk, like memory, the above data is not disorderly. They must be organized in a certain structure, so that the upper layer can parse and correctly index the desired data according to its specifications.

For example, EXT2 file system format:

Write OS kernel from scratch - simple file system

The entire storage space of ext2 will be divided into severalblock groupThen, each group reorganizes the storage of files, including various meta information and the most important meta informationinode, which corresponds to each file, is used to store the basic meta information of each file, and the pointer points to the specific data block of the file (blue part).

In fact, the storage system will also organize the concept of directory hierarchy. For example, some inodes are ordinary files and some are directories. The directory will guide you to find the inodes under it. The whole disk file system is like the index of a book, telling you how to find the data of a file there.

Storage file system is generally built on what we usually call disk partition(partition)For example, C disk and D disk in windows and Linuxdev/hda1/dev/sda1Wait. What we usually call diskformat, which refers to initializing a partition of a disk according to the format of a storage file system, which is similar to setting up a logical structure network on a disk partition.

There are many types of storage file systems,EXT2Just one of them. We can even customize a file system ourselves. In this project, we will implement the simplest file system and use it to create user disk images.

Hardware driver layer

The next layer is the hardware IO layer, that is, the hardware driver, which directly interacts with the hardware. There is no concept of data organization and storage logic here. It is purely a rigid io. For example, you tell it that I need to read the data from position x to position Y on the hard disk, or I need to write any data from position W to position Z on the hard disk.

Access a file

How is a storage file system, or disk partition, put into the tree structure of VFS organization? In Linux, this is called mount(mount)For example, in the beginning, for VFS, the whole tree is empty and has only one root node/However, we usually have a system partition, for example/dev/sda1That is, the partition you usually use for Linux Installation. This partition is an EXT2 file system, which will bemountTo the root directory of VFS/Go up so that VFS can start query/The directories and files in the.

For example, the user needs to read a file:


The system will query this directory from front to back, level by level:

  • /It’s the root directory. It’s now mounted/dev/sda1This partition is in ext2 storage format, so the system queries the node named home in the top layer of this partition according to the format of ext2 system; Note that VFS has a tree structure here, and ext2 also has a tree structure, which can be queried from the top down;
  • Found the home node at the top level of ext2, and found that it is indeed a directory type node, no problem; Then look for the in the home directoryhello.txtFile, if it can be found, read it;

This is always in the format of ext2 system, level by level/dev/sda1Search on this partition; Although the path in VFS is an abstract concept, when actually accessing a file, this path will be projected to the file system of the disk partition it is mounted to query.

The above example only mounts a single disk partition. In fact, under Linux, you can find a directory node on VFS to mount a new disk partition. Even this partition does not have to be in Ext format, as long as the kernel can support parsing this format. For example, we have a disk partition/dev/hda2, it is in NTFS format (for example, D: \ disk on your dual system Windows). We will use itmountTo VFS/mntOn this node:

Write OS kernel from scratch - simple file system

After the new disk partition is mounted, from the perspective of VFS, it canmntStart downward access in the format of NTFS file system, such as reading this file:


When VFS accessesmntNode, it is found that this is a mount point, and the mounted disk partition is an NTFS file system. Next, it will parse the next path in NTFS format – it will try to find and read the path on this disk partition/barroute.

File system interface

As mentioned above, when accessing files on different nodes, VFS will track which disk partition it belongs to and what storage file system the partition is (such as ext, NTFS), and then use the corresponding file system format to read the disk partition data. Here, in order to be compatible with various file systems, VFS will first define a series of unified file operation interfaces, and then various specific file systems will implement these interfaces respectively. This is a typical paradigm of object-oriented programming, such as:

class FileSystem {
    int32 read_file(const char* filename,
                    char* buffer,
                    uint32 start,
                    uint32 length) = 0;
    int32 write_file(const char* filename,
                     const char* buffer,
                     uint32 start,
                     uint32 length) = 0;
    int32 stat_file(const char* filename,
                    file_stat_t* stat) = 0;
    // ...

For the above demonstration with C + + code (of course, the kernel is written in C language, and this is just to demonstrate its object-oriented programming mode), the abstract class filesystem is defined, in which various file operation interfaces are defined, which are pure virtual functions. Various specific file systems only need to inherit and implement these interfaces, such as:

class Ext2FileSystem : public FileSystem {
    int32 read_file(const char* filename,
                    char* buffer,
                    uint32 start,
                    uint32 length) ;
    // ...

Again, the above is just for demonstration. Of course, the interface and implementation in the VFS of real Linux are not so simple, but the structure is similar.

code implementation

This project will not use the complex file system like ext, nor will it realize the complete VFS function. It will only build its basic framework and embed a very simple storage file system customized by ourselves.

First definefile systemInterface, similar to the above abstract class, insrc/fs/vfs.hIn the file:

struct file_system {
  enum fs_type type;
  disk_partition_t partition;

  // functions
  stat_file_func stat_file;
  list_dir_func list_dir;
  read_data_func read_data;
  write_data_func write_data;

typedef struct file_system fs_t;

You can see that the function pointers of various file operations are defined as interfaces. Their prototypes are:

typedef int32 (*stat_file_func)(const char* filename,
                                file_stat_t* stat);

typedef int32 (*list_dir_func)(char* dir);

typedef int32 (*read_data_func)(const char* filename,
                                char* buffer,
                                uint32 start,
                                uint32 length);

typedef int32 (*write_data_func)(const char* filename,
                                 const char* buffer,
                                 uint32 start,
                                 uint32 length);

naive_ FS implementation

We do not need to implement a complex storage file system like ext. in this project, we only implement a very simple file system with very limited functions:

  • The disk image data is engraved in advance and can only be read but not written;
  • There is only one level of root directory and no subordinate directory;

The purpose of customizing this file system is to demonstrate and use it for the project. We need to use it to save user programs for loading and running, so we only need to be able to read and do not need complex directory structure. One layer is enough, and all files are placed on this layer. Although it is very low-level, it is still a file system. We might as well name itnaive_fs, because it is really naive and simple.

naive_fsThe storage structure of is as follows:

Write OS kernel from scratch - simple file system

  • The green part of the header is an integer, which records the total number of files, which is also fixed;
  • The latter gray part is the meta information of each file;
  • Finally, the blue part is the specific file data, using the meta information of each file(file offsetfile size)You can locate where its data is stored;

You will find that this is actually similar to the heap we implemented before, which is very simple and straightforwardmeta + dataStructure.

I wrote a tool inuser/disk_image_writer.c, it readsuser/progsAll files in the directory (this directory does not exist at present. In the next article, we will compile the linked user program and put it here), and then follow the above naive_ FS file system format, write them to the disk image fileuser_disk_imageAnd then write the image file into our kernel disk imagesrcoll.imgJust go inside.

dd if=user/user_disk_image of=scroll.img bs=512 count=2048 seek=2057 conv=notrunc

The write position starts from the 2057th sector of the disk, because the boot loader and kernel images are in front.

Then let’s do itnaive_fsThe code is actually the implementation of the above function pointers. The code is insrc/fs/naive_fs.cLi:

static fs_t naive_fs;

void init_naive_fs() {
  naive_fs.type = NAIVE;

  naive_fs.stat_file = naive_fs_stat_file;
  naive_fs.read_data = naive_fs_read_file;
  naive_fs.write_data = naive_fs_write_file;
  naive_fs.list_dir = naive_fs_list_dir;
  // load file metas to memory.
  // ...

init_naive_fsFunction, read and save the meta parts of all files in memory, similar to a file list, and thenread write statAnd other functions realize the operation of files according to the meta information of these files, which is very simple.

For example, when reading a file, first find the meta according to the file name, get the offset and size of the file on the disk, and then call the underlying driver to read the data:

static int32 naive_fs_read_file(char* filename,
                                char* buffer,
                                uint32 start,
                                uint32 length) {
  // Find file meta by name.
  naive_file_meta_t* file_meta = nullptr;
  for (int i = 0; i < file_num; i++) {
    naive_file_meta_t* meta = file_metas + i;
    if (strcmp(meta->filename, filename) == 0) {
      file_meta = meta;
  if (file_meta == nullptr) {
    return -1;

  uint32 offset = file_meta->offset;
  uint32 size = file_meta->size;
  if (length > size) {
    length = size;

  // Read file data from disk.
  read_hard_disk((char*)buffer, naive_fs.partition.offset + offset + start, length);
  return length;  

Disk drive

We also need to implement the lowest disk IO driver, which is the upper layernaive_fsThere is only one function that needs to be calledread_hard_disk, because we only need the function of reading the disk. For simplicity, we still use the disk reading function in boot loader for the underlying IO hereread_disk, it is realized by operating each port of the disk management device, which is a synchronous implementation method. The real operating system must process the IO of the disk asynchronously, because the speed of the disk is very slow, and the system cannot block and wait for it. Instead, it continues to process other things after issuing the read-write command, and then the disk management device notifies the system that the data IO is completed and the data is ready through interrupt.


Above, we implemented a simple VFS and file systemnaive_fs, let’s see how the kernel uses it to read a file, for example:

char* buffer = (char*)kmalloc(1024);
read_file("hello.txt", buffer, 0, 100);

It calls the interface of the top-level VFSvfs.cMedium:

int32 read_file(char* filename, char* buffer, uint32 start, uint32 length) {
  fs_t* fs = get_fs(filename);
  return fs->read_data(filename, buffer, start, length);

VFS will be based on the given file pathfilename, locate which file system it belongs to and which disk partition it corresponds to. Of course, we only mount a unique partition here. The file system type isnaive_fsBecauseget_fsDirectly return naive_ FS entities:

fs_t* get_fs(char* path) {
  return get_naive_fs();

The next step is to use the FS file reading function interfaceread_data, read the file.

This article is about the file systemFile SystemThe hierarchical disassembly and sample implementation of an overall architecture of is very simple and elementary. It is only for demonstration. I hope it can help you have a comprehensive understanding of how the operating system manages files and underlying storage.

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]