Linux EXT Series File System Format

Time:2019-8-13

Linux Filesystem

Linux EXT Series File System Format
Common hard disks are shown in the figure above. Each disk is divided into multiple tracks, each track is divided into multiple sectors, each sector 512 bytes, which is the smallest storage unit of hard disks. However, at the operating system level, multiple sectors are made up of blocks, which are the smallest storage unit of data in the operating system. Usually eight sectors are made up of 4K bytes. Block.
For Linux filesystems, the following considerations need to be considered:

  • File systems need to be organized so that files can be stored in blocks.
  • The file system needs an index area to find out where multiple blocks of a file exist.
  • If there are files that are frequently read and written in the near future, a cache layer is needed.
  • Documents should be organized in folders for easy management and query
  • The Linux kernel maintains a set of data structures in its own memory, keeping which files are opened and used by which processes.

Everything in Linux is a file, and there are several files in Linux (from ___________ls -lThe first identifier of the result can be seen:

  • – Represents ordinary files
  • D denotes folders
  • C Represents character device files
  • B Represents block device files
  • S represents socket file
  • L Represents soft links

Inode and Block Storage

Let’s take EXT series format as an example to see if the file exists on the hard disk. Firstly, the files are divided into blocks, which are scattered on the hard disk. We need an index structure to help us find these blocks and some meta-information of the recorded files. This is inode, where I stands for index. The inode data structure is as follows:

struct ext4_inode {
        __le16  i_mode;         /* File mode */
        __le16  i_uid;          /* Low 16 bits of Owner Uid */
        __le32  i_size_lo;      /* Size in bytes */
        __le32  i_atime;        /* Access time */
        __le32  i_ctime;        /* Inode Change time */
        __le32  i_mtime;        /* Modification time */
        __le32  i_dtime;        /* Deletion Time */
        __le16  i_gid;          /* Low 16 bits of Group Id */
        __le16  i_links_count;  /* Links count */
        __le32  i_blocks_lo;    /* Blocks count */
        __le32  i_flags;        /* File flags */
        union {
                struct {
                        __le32  l_i_version;
                } linux1;
                struct {
                        __u32  h_i_translator;
                } hurd1;
                struct {
                        __u32  m_i_reserved1;
                } masix1;
        } osd1;                         /* OS dependent 1 */
        __le32  i_block[EXT4_N_BLOCKS];/* Pointers to blocks */
        __le32  i_generation;   /* File version (for NFS) */
        __le32  i_file_acl_lo;  /* File ACL */
        __le32  i_size_high;
        __le32  i_obso_faddr;   /* Obsoleted fragment address */
        union {
                struct {
                        __le16  l_i_blocks_high; /* were l_i_reserved1 */
                        __le16  l_i_file_acl_high;
                        __le16  l_i_uid_high;   /* these 2 fields */
                        __le16  l_i_gid_high;   /* were reserved2[0] */
                        __le16  l_i_checksum_lo;/* crc32c(uuid+inum+inode) LE */
                        __le16  l_i_reserved;
                } linux2;
                struct {
                        __le16  h_i_reserved1;  /* Obsoleted fragment number/size which are removed in ext4 */
                        __u16   h_i_mode_high;
                        __u16   h_i_uid_high;
                        __u16   h_i_gid_high;
                        __u32   h_i_author;
                } hurd2;
                struct {
                        __le16  h_i_reserved1;  /* Obsoleted fragment number/size which are removed in ext4 */
                        __le16  m_i_file_acl_high;
                        __u32   m_i_reserved2[2];
                } masix2;
        } osd2;                         /* OS dependent 2 */
        __le16  i_extra_isize;
        __le16  i_checksum_hi;  /* crc32c(uuid+inum+inode) BE */
        __le32  i_ctime_extra;  /* extra Change time      (nsec << 2 | epoch) */
        __le32  i_mtime_extra;  /* extra Modification time(nsec << 2 | epoch) */
        __le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
        __le32  i_crtime;       /* File Creation time */
        __le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
        __le32  i_version_hi;   /* high 32 bits for 64-bit version */
        __le32  i_projid;       /* Project ID */
};

among__le32 i_block[EXT4_N_BLOCKS]References to data blocks are stored, and EXT4_N_BLOCKS is defined as follows:

#define    EXT4_NDIR_BLOCKS    12
#define    EXT4_IND_BLOCK    EXT4_NDIR_BLOCKS
#define    EXT4_DIND_BLOCK    (EXT4_IND_BLOCK    + 1)
#define    EXT4_TIND_BLOCK    (EXT4_DIND_BLOCK + 1)
#define    EXT4_N_BLOCKS    (EXT4_TIND_BLOCK + 1)

The first 12 items of i_block in ext2 and ext 3 store references directly to data blocks. The thirteenth item stores references to indirect blocks. The location of data blocks is stored in indirect blocks. By analogy, the position of secondary indirect fast blocks is stored in item 14 and the location of three indirect blocks is stored in item 15, as shown in the following figure:

Linux EXT Series File System Format

It is not difficult to see that for large files, it is necessary to read the hard disk several times to find the corresponding blocks. Extents Tree is proposed in ext4 to solve this problem. Its core idea is to use the number of blocks at the beginning to represent the continuous blocks, instead of recording the positions of each block one by one, so as to save storage. Space. First, it will use the original 4 in i_blockThe space of 15 = 60 bytes is replaced by an extent header (ext4_extent_header) plus four extent entries (ext4_extent), because both ext4_extent_header and ext4_extension take up 12 bytes. The first bit in ee_len is used to determine initialization, so it can also store a maximum of 32K, so a maximum of 32K can be stored in an extent entry.4K = 128M data, if a file is larger than 4128M = 512M or if the file is scattered over more than four discrete blocks, we need to extend the i_block structure in inode. Its extent entry is replaced by an ext4_extent_idx structure from ext4_extend. It points to a block with 4K bytes. Besides the 12 bytes occupied by header, it can also store 340 ext4_extent with a maximum of 340 bytes.128M = 42.5G. It can be seen that this index structure is very efficient when files are stored in contiguous blocks.

struct ext4_extent_header {
    _ le16 eh_magic; /* ext4 extents logo: 0xF30A*/
    _ le16 eh_entries; /* Number of valid nodes at the current level*/
    _ le16 eh_max; /* Number of maximum nodes in the current hierarchy*/
    _ Le16 eh_depth; /* The depth of the current hierarchy in the tree, 0 is the leaf node, that is, the data node, > 0 represents the index node.*/
    __le32    eh_generation; 
}
struct ext4_extent {
     _ le32 ee_block; /* extension's starting block logical number*/
     _ Le16 ee_len; /* extension contains the number of blocks*/
     _ Le16 ee_start_hi; /* extend start block physical address 16 bits high*/
     _ Le32 ee_start_lo; /* extend start block physical address low 32 bits*/
} extent_body format in // data nodes
struct ext4_extent_idx {
    _ Le32 ei_block; /* The logical sequence number of the initial block of the file range covered by the index*/
    _ le32 ei_leaf_lo; /* low 32 bits for the physical address of the block that stores the next level extents*/     
    _ Le16 ei_leaf_hi; /* The physical address of the block that stores the next level extents is 16 bits high.*/
     __u16   ei_unused;

}; // extent_body format in index nodes

An example of a / var / log / messages file is shown in the following figure:

Linux EXT Series File System Format

Inode bitmap and block bitmap

There will be areas dedicated to storing block data on hard disk and inode areas, but when we want to create a new file, we need to know which inode area and which block is empty, which requires a block to store inode bitmaps and a block to store block bitmaps, each bit is 1 for occupancy, 0 for occupancy. Not occupied. But a block has a maximum of 4K*8=32K bits, which can represent the state of 32K blocks at most. So we need to make these blocks into a block group to build a larger system.

Hard Link and Soft Link

Hard links share an inode with the original file, and inode cannot cross the file system, so hard links can not cross the file system.

Linux EXT Series File System Format

Soft links have their own inode, but when opening a file, they point to another file, so they can cross the file system and still exist when the original file is deleted.

Linux EXT Series File System Format