How does git store data

Time:2021-3-2

How does git store data

Git is one of the tools that we use most every day. It is a new version control tool created by Linus Torvalds, the earliest author of Linux kernel. It is as simple and easy to use as Linux. This article will briefly talk about how git works

requirement

  • I know GIT
  • The function of algorithm (MD5, SHA-1)
  • Some simple linux commands

You will understand

  • . git directory structure
  • How to store files in Git

Tools used in demo

  • Vs Code: a general editor
  • Iterm: command line software on Mac OS
  • Watch: a UNIX like command line tool, which can execute the specified command repeatedly and output it
  • Tree: Unix like command line tool, which can output the directory tree structure of the specified directory

Local configuration

1. Install git locally

https://git-scm.com/ Download and install

2. Create an empty folder at any location

I created a git on my desktop_ Demo folder

3. Switch to the directory in the command line

How does git store data

4. Monitor directory changes

I am here to split the window directly for easy display and navigate to the same directory

How does git store data

In the right window, enter the following command to monitor directory changes

watch -d -n 1 tree --charset=ascii -aF

The command combines two tools, one iswatchOne istreeThrough the watch to execute the following commands regularly (every 1 second specified above), we can see the real-time directory changes

How does git store data

Add a dynamic version

How does git store data

Start demo

1. Git init

Execute the familiargit init, we will create a local git repository in the current directory. After execution, it will look like this

How does git store data

On the right side, we can see that there is an extra one named.gitHidden directory, there are some folders, let’s look at the first level

`-- .git/
    |-- HEAD
    |-- config
    |-- description
    |-- hooks/
    |-- info/
    |-- objects/
    `-- refs/

There are several files / folders

  • His content is

    ref: refs/heads/master
    

    The current reference refers torefs/heads/masterThis directory, in fact, is where we are currently working

  • Config is actually the local git config

    [core]
      repositoryformatversion = 0
      filemode = true
      bare = false
      logallrefupdates = true
      ignorecase = true
      precomposeunicode = true
    

    Is a kind of toml data format

  • Description is a description, usually not used

    Unnamed repository; edit this file 'description' to name the repository.
    
  • Hooks here is the local git hooks

    Its function is to execute the bash command specified by you when you execute some specific git commands (such as git commit), which is very powerful. For example, if we want to perform code testing before submitting, we can edit the code in hookspre-commit.sampleThis file, and remove the word at the end of the file name.sampleThat’s in your next executiongit commitWhen the exit code is 0, the bash will be run first. Only when the whole Bash is executed successfully, will you execute what you just saidgit commit.

  • Info stores some default configuration files

    By default, a file namedexcludeIn fact, it’s just a document.gitignorefile

  • Objects actually store the backup of our code, which will be explained in detail later
  • Refs is also a very important folder, which stores all our branch information

In this folder, the most important thing isHEADDocuments,objectsfolder,refsFolder, GIT through first accessHEADFile, find the branch of our current work, and then gorefsFind the corresponding branch description in the folder. This description file points toobjectsFolder of a file, and then through this file, we can get the current branch of the current version of all the source code!

I saw it on the top,HEADThe file actually has only one line, pointing to therefsA file in the directory

Let’s find outrefsFolders andobjectsWhat exactly is the folder doing

2. Create a new file

In order to avoid the influence of too many files, we first delete all the files in the hooks folder and keep only pre- commit.sample

Now we are ingit_demoCreate a new file namedfoo.txt, and input the following data and save it (note that there is no blank space at the end of the text below, and there is no newline)

this is foo

Let’s look at our directory structure

|-- .git/
|   |-- HEAD
|   |-- config
|   |-- description
|   |-- hooks/
|   |   `-- pre-commit.sample*
|   |-- info/
|   |   `-- exclude
|   |-- objects/
|   |   |-- info/
|   |   `-- pack/
|   `-- refs/
|       |-- heads/
|       `-- tags/
`-- foo.txt

One more on schedulefoo.txtfile

3. Add the file to the cache

It’s something we’re very familiar withgit addCommand, we execute it on the command line on the leftgit add foo.txtYou can see that the tree looks like this

How does git store data

|-- .git/
|   |-- HEAD
|   |-- config
|   |-- description
|   |-- hooks/
|   |   `-- pre-commit.sample*
|   |-- index
|   |-- info/
|   |   `-- exclude
|   |-- objects/
|   |   |-- 9b/
|   |   |   `-- 3a97dafadb12faf10cf1a1f3a32f63eaa7220a
|   |   |-- info/
|   |   `-- pack/
|   `-- refs/
|       |-- heads/
|       `-- tags/
`-- foo.txt

stayobjectsIn the folder, there is an extra one named9bAnd there’s a folder called3a97dafadb12faf10cf1a1f3a32f63eaa7220aLet’s see what this document is

How does git store data

It looks like a binary file. In fact, this is our source code. It’s just compressed

Git performs a delete compression on every file added to the cache, and then stores it

Let’s execute this command. The function of this command is toblob 12\0this is fooThis string uses the SHA1 algorithm in OpenSSL to calculate the summary

echo "blob 12
echo "blob 12\0this is foo" | openssl sha1
this is foo" | openssl sha1

What’s found? The generated summary result (9b3a97dafadb12faf10cf1a1f3a32f63eaa7220a) is the same string as our directory (9b) and file name (3a97dafadb12faf10cf1a1f3a32f63eaa7220a)

Git will send the file asBlob file length (unit B) 0 file contentCalculate SHA1, and store our files in theobjectsUnder the directory

Well, we basically know how git stores our files. Let’s submit the code once to see what changes will be made

4. Git built-in binary viewing tool

We can also use git’s built-in view command to view the corresponding submitted file

#View the type of object
> git cat-file -t 9b3a97dafadb12faf10cf1a1f3a32f63eaa7220a
blob
#View the type of object,后面的提交 hash 其实可以简写前面一部分,只要保证只能检索出一个文件即可
> git cat-file -t 9b3a
blob
#View the content of the object
> git cat-file -p 9b3a
this is foo

5. Make a commit

It’s routine. We’ll do itgit commit -m "add foo.txt"Just submit this command. The result is as follows

How does git store data

|-- .git/
|   |-- COMMIT_EDITMSG
|   |-- HEAD
|   |-- config
|   |-- description
|   |-- hooks/
|   |   `-- pre-commit.sample*
|   |-- index
|   |-- info/
|   |   `-- exclude
|   |-- logs/
|   |   |-- HEAD
|   |   `-- refs/
|   |       `-- heads/
|   |           `-- master
|   |-- objects/
|   |   |-- 82/
|   |   |   `-- be13e5bf9fafe4db5ad38c76a5c0116e156953
|   |   |-- 9b/
|   |   |   `-- 3a97dafadb12faf10cf1a1f3a32f63eaa7220a
|   |   |-- f8/
|   |   |   `-- aa85021bfe8c10d9517e22feda4fc67d0a4095
|   |   |-- info/
|   |   `-- pack/
|   `-- refs/
|       |-- heads/
|       |   `-- master
|       `-- tags/
`-- foo.txt

There are several more files in the folder and two more folders in the objects directory82f8And the corresponding documents,refsIn the folder, there is an extra one namedmasterThis is our current branch

Check it outrefs/headers/masterWhat’s in the file

> cat .git/refs/heads/master
82be13e5bf9fafe4db5ad38c76a5c0116e156953

It’s a hash. According to this hash, we canobjectsFind the corresponding file in, usegit cat-fileCheck it out

> git cat-file -t 82be13e5bf9fafe4db5ad38c76a5c0116e156953
commit
> git cat-file -p 82be13e5bf9fafe4db5ad38c76a5c0116e156953
tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095
author Aiello <[email protected]> 1577369635 +0800
committer Aiello <[email protected]> 1577369635 +0800

add foo.txt

Explain that this file is acommitThe file content refers to a file of typetreeLet’s continue to search for thistreeWhat information does a file of type contain

> git cat-file -t f8aa85021bfe8c10d9517e22feda4fc67d0a4095
tree
> git cat-file -p f8aa85021bfe8c10d9517e22feda4fc67d0a4095
100644 blob 9b3a97dafadb12faf10cf1a1f3a32f63eaa7220a foo.txt

As you can see, this onetreeType file, pointing to our source file, we use the same method to do a hash calculation

#Gets the type of the submission
> git cat-file -t f8aa85021bfe8c10d9517e22feda4fc67d0a4095
tree
#Get file content
> git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095
100644 foo.txt�:������
                      ��/c�
#Count bytes of content
> git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095 | wc -c
35
#Put it all together
#It is calculated according to the formula of type bytes number of content
> (printf "tree %s
#Gets the type of the submission
> git cat-file -t f8aa85021bfe8c10d9517e22feda4fc67d0a4095
tree
#Get file content
> git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095
100644 foo.txt�:������
��/c�
#Count bytes of content
> git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095 | wc -c
35
#Put it all together
#It is calculated according to the formula of type bytes number of content
> (printf "tree %s\0" $(git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095 | wc -c); git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095) | openssl sha1
f8aa85021bfe8c10d9517e22feda4fc67d0a4095
" $(git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095 | wc -c); git cat-file tree f8aa85021bfe8c10d9517e22feda4fc67d0a4095) | openssl sha1 f8aa85021bfe8c10d9517e22feda4fc67d0a4095

You will find that it is the same as the hash value we input! Let’s use the same method to manually calculate the hash value of commit

> (printf "commit %s
> (printf "commit %s\0" $(git cat-file commit 82be13e5bf9fafe4db5ad38c76a5c0116e156953 | wc -c); git cat-file commit 82be13e5bf9fafe4db5ad38c76a5c0116e156953) | openssl sha1
82be13e5bf9fafe4db5ad38c76a5c0116e156953
" $(git cat-file commit 82be13e5bf9fafe4db5ad38c76a5c0116e156953 | wc -c); git cat-file commit 82be13e5bf9fafe4db5ad38c76a5c0116e156953) | openssl sha1 82be13e5bf9fafe4db5ad38c76a5c0116e156953

As expected, we output the hash value we see in the directory tree. Therefore, in git, we always use this set of formulas to calculate the hash value

Type bytes number of content

6. Modify the current document and submit it

Let’s revise the current document and submit it

> cat foo.txt
this is foo
and new line

implementgit addAfter that, our directory tree became like this

|-- .git/
|   |-- COMMIT_EDITMSG
|   |-- HEAD
|   |-- config
|   |-- description
|   |-- hooks/
|   |   `-- pre-commit.sample*
|   |-- index
|   |-- info/
|   |   `-- exclude
|   |-- logs/
|   |   |-- HEAD
|   |   `-- refs/
|   |       `-- heads/
|   |           `-- master
|   |-- objects/
|   |   |-- 82/
|   |   |   `-- be13e5bf9fafe4db5ad38c76a5c0116e156953
|   |   |-- 9b/
|   |   |   `-- 3a97dafadb12faf10cf1a1f3a32f63eaa7220a
|   |   |-- c9/
|   |   |   `-- da32f4e76824497d02312e46ac0a40e28bef91
|   |   |-- f8/
|   |   |   `-- aa85021bfe8c10d9517e22feda4fc67d0a4095
|   |   |-- info/
|   |   `-- pack/
|   `-- refs/
|       |-- heads/
|       |   `-- master
|       `-- tags/
`-- foo.txt

There’s an extra one calledc9In fact, we usegit cat-fileAs you can see, this is our current file

> git cat-file -p c9da32f4e768
this is foo
and new line

This shows that git has created a new compressed record based on our latest file.This compressed file is full foo.txtEven if you delete the previous one9b3a97dafadb12faf10cf1a1f3a32f63eaa7220aFile, the current file is no impact, we can still get the latest version of the full file! Just after deleting the previous record, there will be a problem with rollback.

Then we submit the change

> git commit -m "update foo.txt"
[master 432a5d9] update foo.txt
 1 file changed, 1 insertion(+)

Then look at our directory tree

|-- .git/
|   |-- COMMIT_EDITMSG
|   |-- HEAD
|   |-- config
|   |-- description
|   |-- hooks/
|   |   `-- pre-commit.sample*
|   |-- index
|   |-- info/
|   |   `-- exclude
|   |-- logs/
|   |   |-- HEAD
|   |   `-- refs/
|   |       `-- heads/
|   |           `-- master
|   |-- objects/
|   |   |-- 03/
|   |   |   `-- 7b10984589cbfe8c6a9c5b84d61592f84fd97a
|   |   |-- 43/
|   |   |   `-- 2a5d918414362c78da50274a1fa0c57b5dc380
|   |   |-- 82/
|   |   |   `-- be13e5bf9fafe4db5ad38c76a5c0116e156953
|   |   |-- 9b/
|   |   |   `-- 3a97dafadb12faf10cf1a1f3a32f63eaa7220a
|   |   |-- c9/
|   |   |   `-- da32f4e76824497d02312e46ac0a40e28bef91
|   |   |-- f8/
|   |   |   `-- aa85021bfe8c10d9517e22feda4fc67d0a4095
|   |   |-- info/
|   |   `-- pack/
|   `-- refs/
|       |-- heads/
|       |   `-- master
|       `-- tags/
`-- foo.txt

As expected, it was generated43and03Folder, respectively, for this timecommitRecords andtreerecord

> git cat-file -t 432a
commit
> git cat-file -t 037b
tree

7. Summary in advance

According to the above operations, we have a very clear understanding of GIT’s storage mechanism. We can boldly draw the following model diagram

How does git store data

Note: I did not write out the hash of each file in the figure, but you should know that each file in each layer has a unique hash as the file name, so there will never be two files with the same content (even if your commit information is the same, but the hash of the corresponding tree and blob will be different)!

Each level in Git is clearly layered, each performs its own duties, and then links them through hash!

From the figure, we can see that the head we are in is pointing tomasterBranch, andmasterThe branch points to a specified commit recordadd bar.txt, the submission record points to a file directory tree at the time of submission, and this tree continues to point to the file at that time!

This storage method is clear, reliable and extensible. In the real file layer, only one file with the same content will be stored (because the calculation of blob hash is only related to the content of the file, a new blob will be added when a file is submitted after modification. If it is modified back and submitted again, no new blob will be created Blob, but as like as two peas, because their contents are the same. Even if the file names are different, the files with the same contents will only store one copy.

8. Branch switching

In fact, as you can see from the figure, to create a new branch or switch branches, all we have to do is create a new branch file, and then point the file to our specified commit, which is so simple and reliable.

As for git’s rollback, rebase, merge and other operations, it seems much clearer now

9. Packed hooks

It’s written at the beginning of the article that hooks can help us do a lot of thingspre-commit.sampleHooks have not been deleted, so we will use them

(because there are some examples in the file, if you don’t want to empty it, copy the file and remove the. Sample at the end)

> copy .git/hooks/pre-commit.sample .git/hooks/pre-commit

edit.git/hooks/pre-commitFile, empty it, and enter the following code

#!/bin/sh
echo "this is pre-commit hooks"

When you submit next time, the console will print out the above text, in which you can write some lint code. If lint passes, you are allowed to submit, and thenpre-pushIn this hooks, you can run local tests and submit them to the server only if you pass the tests. Before, when doing component libraries, we also used hooks to add small versions. Hahaha, it’s very convenient, and there won’t be any small version out of control, because when updating the medium version, we will reset the small version manually

Now there are many libraries that can help you implement hooks, such as Husky

🎉