Detailed description of Linux root file system mounting process

Time:2019-9-20

st1\:*{behavior:url(#ieooui) }

1: Preface

Some time ago, when compiling kernels, it was found that rootfs could not be mounted. The same root option can be used to set the old image. In order to solve this problem thoroughly. Studied the mounting process of rootfs. Special summary is as follows, hoping to give this part of knowledge more confused friends a little help.

II: Types of rootfs

Overall, rootfs are divided into two categories: virtual rootfs and real rootfs. The trend of kernel is to put more functions into user space. To keep the kernel lean. Virtual rootfs is also a popular method adopted by various Linux publishers. Part of the initialization can be done in virtual rootfs. Then switch to the real file system.

In the development of virtual rootfs. There are also the following versions:

initramfs:

Initramfs is a technology introduced in kernel 2.5. In fact, its meaning is to add a CPIO package to the kernel image, which contains a small file system. When the kernel boots, the kernel unzips the CPIO package and releases the file system contained therein into rootfs, a part of the kernel. The partially initialized code is placed in the file system and executed as a user-level process. The obvious benefit of this approach is that it simplifies the initialization code of the kernel and makes the initialization process of the kernel easier to customize. Rootfs in this way are included in the kernel image.

Cpio-initrd: rootfs in CPIO format

Image-initrd: rootfs in traditional format

Please refer to other materials for making these two virtual file systems.

Three: mount process of rootfs file system

The rootfs described here are different from the rootfs analyzed above. This refers to the root node at the time of system initialization. Namely/Node. It is its rootfs file system in memory. This section was previously analyzed in > and in the file system. This is repeated here for the sake of knowledge coherence.

Start_kernel()àmnt_init():

void __init mnt_init(void)

{

         ……

         ……

         init_rootfs();

         init_mount_tree();

}

The code for Init_rootfs is as follows:

int __init init_rootfs(void)

{

         int err;

         err = bdi_init(&ramfs_backing_dev_info);

         if (err)

                   return err;

         err = register_filesystem(&rootfs_fs_type);

         if (err)

                   bdi_destroy(&ramfs_backing_dev_info);

         return err;

}

This function is simple. It’s the file system that registered rootfs.

The init_mount_tree() code is as follows:

static void __init init_mount_tree(void)

{

         struct vfsmount *mnt;

         struct mnt_namespace *ns;

         struct path root;

         mnt = do_kern_mount(“rootfs”, 0, “rootfs”, NULL);

         if (IS_ERR(mnt))

                   panic(“Can’t create rootfs”);

         ns = kmalloc(sizeof(*ns), GFP_KERNEL);

         if (!ns)

                   panic(“Can’t allocate initial namespace”);

         atomic_set(&ns->count, 1);

         INIT_LIST_HEAD(&ns->list);

         init_waitqueue_head(&ns->poll);

         ns->event = 0;

         list_add(&mnt->mnt_list, &ns->list);

         ns->root = mnt;

         mnt->mnt_ns = ns;

         init_task.nsproxy->mnt_ns = ns;

         get_mnt_ns(ns);

         root.mnt = ns->root;

         root.dentry = ns->root->mnt_root;

         set_fs_pwd(current->fs, &root);

         set_fs_root(current->fs, &root);

}

Here, mount the rootfs file system. Its mount point defaults to “/”. Finally, switch the process’s root directory and current directory to “/”. This is the origin of the root directory. But here’s just initialization. After mounting a specific file system, the root directory is usually switched to a specific file system. So after the system starts up, the mount command can not see the mount information of rootfs.

IV: Mounting of Virtual File System

The root directory has been mounted and the specific file system can be mounted.

At start_kernel() rest_init() kernel_init():

static int __init kernel_init(void * unused)

{

         ……

         ……

         do_basic_setup();

if (!ramdisk_execute_command)

                   ramdisk_execute_command = “/init”;

         if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {

                   ramdisk_execute_command = NULL;

                   prepare_namespace();

         }

         /*

          * Ok, we have completed the initial bootup, and

          * we’re essentially up and running. Get rid of the

          * initmem segments and start the user-mode stuff..

          */

         init_post();

         return 0;

}

Do_basic_setup() is a key function that starts all modules directly compiled in the kernel. The code snippet is as follows:

static void __init do_basic_setup(void)

{

         /* drivers will send hotplug events */

         init_workqueues();

         usermodehelper_init();

         driver_init();

         init_irq_proc();

         do_initcalls();

}

Do_initcalls() is used to start all functions in the _initcall_start and _initcall_end segments, and modules compiled into the kernel statically place their entries in that section.

Initialization functions associated with the root file system are referenced by rootfs_initcall(). Notice the following initialization functions:

rootfs_initcall(populate_rootfs);

That is to say, populate_rootfs is called to initialize the system when it is initialized. The code is as follows:

static int __init populate_rootfs(void)

{

         char *err = unpack_to_rootfs(__initramfs_start,

                             __initramfs_end – __initramfs_start, 0);

         if (err)

                   panic(err);

         if (initrd_start) {

#ifdef CONFIG_BLK_DEV_RAM

                   int fd;

                   printk(KERN_INFO “checking if image is initramfs…”);

                   err = unpack_to_rootfs((char *)initrd_start,

                            initrd_end – initrd_start, 1);

                   if (!err) {

                            printk(” it is\n”);

                            unpack_to_rootfs((char *)initrd_start,

                                     initrd_end – initrd_start, 0);

                            free_initrd();

                            return 0;

                   }

                   printk(“it isn’t (%s); looks like an initrd\n”, err);

                   fd = sys_open(“/initrd.image”, O_WRONLY|O_CREAT, 0700);

                   if (fd >= 0) {

                            sys_write(fd, (char *)initrd_start,

                                               initrd_end – initrd_start);

                            sys_close(fd);

                            free_initrd();

                   }

#else

                   printk(KERN_INFO “Unpacking initramfs…”);

                   err = unpack_to_rootfs((char *)initrd_start,

                            initrd_end – initrd_start, 0);

                   if (err)

                            panic(err);

                   printk(” done\n”);

                   free_initrd();

#endif

         }

         return 0;

}

Unpack_to_rootfs: As the name implies, unpack the package and release it to rootfs. It actually has two functions, one is to release the package, the other is to check the package to see if it belongs to the CPIO structure of the package. Function selection is based on the last parameter.

In this function, it corresponds to the three virtual root file systems we analyzed earlier. One is initramfs, which is integrated with kernel. When compiling kernel, link script will store it in the area from initramfs start to initramfs end. In this case, unpack_to_rootfs is called directly to release it to the root directory, if not in this form. That is, the values of _initramfs_start and _initramfs_end are equal and the length is zero. No treatment will be done. Sign out.

Corresponding to the latter two cases. As you can see from the code, you have to configure CONFIG_BLK_DEV_RAM to support image-initrd. Otherwise, it will be treated as cpio-initrd.

For the case of cpio-initrd. Release it directly to the root directory. In the case of image-initrd. Release it to / initrd. image. Finally, put the initrd memory area into the partner system. This memory can be used for other purposes by the operating system.

Next, how does the kernel deal with these situations? Don’t worry. Look down:

Back to the kernel_init() function:

static int __init kernel_init(void * unused)

{

         …….

         …….

         do_basic_setup();

         /*

          * check if there is an early userspace init.  If yes, let it do all

          * the work

          */

         if (!ramdisk_execute_command)

                   ramdisk_execute_command = “/init”;

         if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {

                   ramdisk_execute_command = NULL;

                   prepare_namespace();

         }

         /*

          * Ok, we have completed the initial bootup, and

          * we’re essentially up and running. Get rid of the

          * initmem segments and start the user-mode stuff..

          */

         init_post();

         return 0;

}

Ramdisk_execute_command: Used when kernel parses boot parameters. If the user specifies the init file path, even if “init=” is used, the parameter value is stored here.

If the init file path is not specified. Default is / init

Corresponding to the previous analysis, we know that in the case of initramdisk and cpio-initrd, the virtual root file system is released to the root directory. If there is / init in these virtual file systems. It goes to init_post().

The Init_post() code is as follows:

static int noinline init_post(void)

{

         free_initmem();

         unlock_kernel();

         mark_rodata_ro();

         system_state = SYSTEM_RUNNING;

         numa_default_policy();

         if (sys_open((const char __user *) “/dev/console”, O_RDWR, 0)  

         (void) sys_dup(0);

         (void) sys_dup(0);

         if (ramdisk_execute_command) {

                   run_init_process(ramdisk_execute_command);

                   printk(KERN_WARNING “Failed to execute %s\n”,

                                     ramdisk_execute_command);

         }

         /*

          * We try each of these until one succeeds.

          *

          * The Bourne shell can be used instead of init if we are

          * trying to recover a really broken machine.

          */

         if (execute_command) {

                   run_init_process(execute_command);

                   printk(KERN_WARNING “Failed to execute %s.  Attempting “

                                               “defaults…\n”, execute_command);

         }

         run_init_process(“/sbin/init”);

         run_init_process(“/etc/init”);

         run_init_process(“/bin/init”);

         run_init_process(“/bin/sh”);

         panic(“No init found.  Try passing init= option to kernel.”);

}

As you can see from the code, the specified init file will be executed in turn, and if it fails, it will execute / SBIN / init, / etc / init, / bin / init, / bin / sh.

Notice that run_init_process uses kernel_execve when calling the corresponding program to run. That is to say, the calling process replaces the current process. As long as any of the above file calls succeed, this function will not be returned. If none of the above files can be executed. Print out an error that did not find the init file.

For image-hdr or virtual file systems that do not contain / init, it is prepared_namespace(). The code is as follows:

void __init prepare_namespace(void)

{

         int is_floppy;

         if (root_delay) {

                   printk(KERN_INFO “Waiting %dsec before mounting root device…\n”,

                          root_delay);

                   ssleep(root_delay);

         }

         /* wait for the known devices to complete their probing */

         while (driver_probe_done() != 0)

                   msleep(100);

//MTD processing

         md_run_setup();

         if (saved_root_name[0]) {

                   root_device_name = saved_root_name;

                   if (!strncmp(root_device_name, “mtd”, 3)) {

                            mount_block_root(root_device_name, root_mountflags);

                            goto out;

                   }

                   ROOT_DEV = name_to_dev_t(root_device_name);

                   if (strncmp(root_device_name, “/dev/”, 5) == 0)

                            root_device_name += 5;

         }

         if (initrd_load())

                   goto out;

         /* wait for any asynchronous scanning to complete */

         if ((ROOT_DEV == 0) && root_wait) {

                   printk(KERN_INFO “Waiting for root device %s…\n”,

                            saved_root_name);

                   while (driver_probe_done() != 0 ||

                            (ROOT_DEV = name_to_dev_t(saved_root_name)) == 0)

                            msleep(100);

         }

         is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR;

         if (is_floppy && rd_doload && rd_load_disk(0))

                   ROOT_DEV = Root_RAM0;

         mount_root();

out:

         sys_mount(“.”, “/”, NULL, MS_MOVE, NULL);

         sys_chroot(“.”);

}

There are several interesting treatments here. First, the user can specify the root file system with root =. Its value is stored in saved_root_name. If the user specifies a string starting with MTD as its root file system. It will be mounted directly. This file is the device file of mtdblock.

Otherwise, the device node file will be converted to ROOT_DEV, that is, the device node number.

Then, turn to initrd_load() to perform initrd preprocessing, and then mount the specific root file system.

Notice that at the end of this function. Sys_mount() is called to move the current file system mount point to the “/” directory. Then switch the root directory to the current directory. In this way, the mount point of the root file system becomes the “/” we see in user space.

For other root file systems, initrd is used first. Namely

int __init initrd_load(void)

{

         if (mount_initrd) {

                   create_dev(“/dev/ram”, Root_RAM0);

                   /*

                    * Load the initrd data into /dev/ram0. Execute it as initrd

                    * unless /dev/ram0 is supposed to be our actual root device,

                    * in that case the ram disk is just set up here, and gets

                    * mounted in the normal path.

                    */

                   if (rd_load_image(“/initrd.image”) && ROOT_DEV != Root_RAM0) {

                            sys_unlink(“/initrd.image”);

                            handle_initrd();

                            return 1;

                   }

         }

         sys_unlink(“/initrd.image”);

         return 0;

}

Create a ROOT_RAM device node and release / initrd /. image to this node. The content of / initrd. image is the image-initrd we analyzed earlier.

If the root file device number is not ROOT_RAM0 (the user-specified root file system is not / dev/ram0, it will be transferred to handle_initrd()

If the current root file system is / dev / ram0. mount it directly.

Handle_initrd() code is as follows:

static void __init handle_initrd(void)

{

         int error;

         int pid;

         real_root_dev = new_encode_dev(ROOT_DEV);

         create_dev(“/dev/root.old”, Root_RAM0);

         /* mount initrd on rootfs’ /root */

         mount_block_root(“/dev/root.old”, root_mountflags & ~MS_RDONLY);

         sys_mkdir(“/old”, 0700);

         root_fd = sys_open(“/”, 0, 0);

         old_fd = sys_open(“/old”, 0, 0);

         /* move initrd over / and chdir/chroot in initrd root */

         sys_chdir(“/root”);

         sys_mount(“.”, “/”, NULL, MS_MOVE, NULL);

         sys_chroot(“.”);

         /*

          * In case that a resume from disk is carried out by linuxrc or one of

          * its children, we need to tell the freezer not to wait for us.

          */

         current->flags |= PF_FREEZER_SKIP;

         pid = kernel_thread(do_linuxrc, “/linuxrc”, SIGCHLD);

         if (pid > 0)

                   while (pid != sys_wait4(-1, NULL, 0, NULL))

                            yield();

         current->flags &= ~PF_FREEZER_SKIP;

         /* move initrd to rootfs’ /old */

         sys_fchdir(old_fd);

         sys_mount(“/”, “.”, NULL, MS_MOVE, NULL);

         /* switch root and cwd back to / of rootfs */

         sys_fchdir(root_fd);

         sys_chroot(“.”);

         sys_close(old_fd);

         sys_close(root_fd);

         if (new_decode_dev(real_root_dev) == Root_RAM0) {

                   sys_chdir(“/old”);

                   return;

         }

         ROOT_DEV = new_decode_dev(real_root_dev);

         mount_root();

         printk(KERN_NOTICE “Trying to move old root to /initrd … “);

         error = sys_mount(“/old”, “/root/initrd”, NULL, MS_MOVE, NULL);

         if (!error)

                   printk(“okay\n”);

         else {

                   int fd = sys_open(“/dev/root.old”, O_RDWR, 0);

                   if (error == -ENOENT)

                            printk(“/initrd does not exist. Ignored.\n”);

                   else

                            printk(“failed\n”);

                   printk(KERN_NOTICE “Unmounting old root\n”);

                   sys_umount(“/old”, MNT_DETACH);

                   printk(KERN_NOTICE “Trying to free ramdisk memory … “);

                   if (fd 

                            error = fd;

                   } else {

                            error = sys_ioctl(fd, BLKFLSBUF, 0);

                            sys_close(fd);

                   }

                   printk(!error ? “okay\n” : “failed\n”);

         }

}

First mount / dev / ram0, then execute / linuxrc. and wait for it to execute. Switch the root directory and mount the specific root file system.

Come here. All the contents of the file system mount are analyzed.

Five: summary

In this section. The process of mounting root file system is analyzed. Several virtual root file systems are analyzed in detail. Understanding this part is very helpful for us to build Linux embedded development system.