Go inside chrome and learn how the V8 engine works

Time:2022-1-18

As a front-end programmer, the first thing I do at work every day is to turn on the computer and turn it on involuntarilychromeBrowser, or touch the fish for a while or enter the working state immediately. Next, the browser window will accompany you through the day. Normally, it will be seven or eight o’clock, 90 o’clock later, and then it will accompany you across the day, paying attention to your work all the time. As a loyal partner with you, ask yourself, have you seriously understood how it works? Have you ever walked into its inner world?

If you’ve been curious, please watch this issue of “go into the heart of chrome and learn how the V8 engine works”.

What is V8

Before you know something deeply, you must first know what it is.

V8It’s aGoogleOpen source adoptionC++Write high performanceJavaScriptandWebAssemblyEngine, applied inChromeandNode.jsWait. It realizesECMAScriptandWebAssembly, running onWindows 7And abovemacOS 10.12+And usex64、IA-32、ARMorMIPSProcessorLinuxOn the system.V8It can run independently or embedded into anyC++In the application.

V8 origin

Next, let’s care about how it was born and why it was called.

V8 was originally developed byLars BakTeam developed to carV8The engine (V-type engine with eight cylinders) is named, which indicates that it will be a high-performance engineJavaScriptEngine, inSeptember 2, 2008withchromeOpen source release together.

Why V8

We wrote itJavaScriptThe code is ultimately to be executed in the machine, but the machine cannot directly recognize these high-level languages. It needs a series of processing to convert the high-level language into instructions that can be recognized by the machine, that is, binary codes, and give them to the machine for execution. The conversion process in the middle isV8Specific work.

Next, let’s look at it in detail.

V8 composition

First, let’s take a lookV8Internal composition of.V8There are many modules inside, of which the most important four are as follows:

  • Parser: parser, which is responsible for parsing the source code intoAST
  • IgnitionInterpreter, responsible for:ASTConvert to bytecode and execute, and mark the hotspot code
  • TurboFan: compiler, which is responsible for compiling hotspot code into machine code and executing it
  • Orinoco: garbage collector, responsible for memory space recycling

V8 workflow

Here isV8Specific work flow charts of several important modules in. We analyze them one by one.

Go inside chrome and learn how the V8 engine works

Parser parser

The parser is responsible for transforming the source code into an abstract syntax treeAST。 There are two important stages in the conversion process:Lexical analysisandSyntax analysis

lexical analysis

Also known as word segmentation, it is the process of converting string code into token sequence. theretokenIs a string, which is the smallest unit of source code, similar to words in English. Lexical analysis can also be understood as the process of combining English letters into words. Lexical analysis does not care about the relationship between words. For example, parentheses can be marked astoken, but does not check whether the parentheses match.

JavaScriptMediumtokenIt mainly includes the following:

Keywords: VaR, let, const, etc

Identifier: a continuous character not enclosed in quotation marks. It may be a variable, keywords such as if and else, or built-in constants such as true and false

Operators: +, -, *, /, etc

Numbers: like hexadecimal, decimal, octal and scientific expressions

String: value of variable, etc

Spaces: continuous spaces, line breaks, indents, etc

Comments: Line comments or block comments are the smallest syntax unit that cannot be split

Punctuation: braces, parentheses, semicolons, colons, etc

Here isconst a = 'hello world'afteresprimaGenerated after lexical analysistokens

[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'hello world'"
    }
]
Syntax analysis

Grammatical distraction is the result of lexical analysistokenConvert a given formal grammar intoASTThe process of. That is, the process of combining words into sentences. The syntax will be verified during the conversion process. If there is any syntax error, a syntax error will be thrown.

aboveconst a = 'hello world'Generated after parsingASTAs follows:

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

afterParserParser generatedASTWill be handed over toIgnitionInterpreter for processing.

Ignition interpreter

The ignition interpreter is responsible forASTConvert to bytecode and execute. Bytecode is betweenASTA code between and machine code. It has nothing to do with a specific type of machine code. It can be executed only after it is converted into machine code by interpreter.

Seeing this, everyone must have doubts. Since the bytecode also needs to be converted into machine code to run, why not directly convert it to machine code at the beginningASTConvert it into machine code and run it directly? It must be faster to convert it into machine code, so why add an intermediate process?

ActuallyV8of5.9Before the version, there was no bytecode, but the JS code was directly compiled into machine code and stored in memory, which occupied a lot of memory. However, the memory of the early mobile phone was not high, and excessive occupation would lead to a significant decline in the performance of the mobile phone; Moreover, direct compilation into machine code leads to long compilation time and slow startup speed; Furthermore, the direct conversion of JS code into machine code needs to be specific to different requirementsCPUThe architecture writes different instruction sets with high complexity.

5.9Bytecode is introduced after the version, which can solve the problems of large memory occupation, long startup time and high code complexity.

Next, let’s take a lookIgnitionHow willASTConverted to bytecode.

The following figure isIgnitionWork flow chart of the interpreter.ASTBytecode can be generated only after bytecode generator and a series of optimization.

Go inside chrome and learn how the V8 engine works

The optimization includes:

  • Register Optimizer: mainly to avoid unnecessary loading and storage of registers
  • Peephole Optimizer: find the reusable parts of bytecode and merge them
  • Dead-code Elimination: delete useless code and reduce the size of bytecode

After the code is converted into bytecode, it can be executed through the interpreter.IgnitionDuring execution, it will monitor the execution of the code and record the execution information, such as the execution times of the function, the parameters passed each time the function is executed, etc.

When the same piece of code is executed multiple times, it will be marked as hot code. The hotspot code will be handed over toTurboFanThe compiler processes it.

Turbofan compiler

TurboFanGetIgnitionAfter marking the hotspot code, it will be optimized first, and then the optimized bytecode will be compiled into more efficient machine code for storage. The next time the same code is executed again, the corresponding machine code will be executed directly, which greatly improves the execution efficiency of the code.

When a piece of code is no longer hot code,TurboFanIt will carry out the process of de optimization, restore the optimized compiled machine code to bytecode, and return the execution right of the code toIgnition

Now let’s take a look at the specific implementation process.

withsum += arr[i]For example, becauseJSIs a dynamically typed language, every timesumandarr[i]Can be different types. When executing this code,IgnitionEvery timesumandarr[i]The data type of the. When the same code is found to have been executed multiple times, it is marked as hot code and handed over toTurboFan

TurboFanDuring execution, if you judge every timesumandarr[i]Data types are a waste of time. Therefore, during optimization, it will be determined according to the previous executionssumandarr[i]And compile it into machine code. In the next execution, the process of judging the data type is omitted.

But if in the subsequent implementation process,arr[i]The data type of is changed, and the previously generated machine code does not meet the requirements,TurboFanThe previously generated machine code will be discarded and the execution right will be handed over to theIgnition, complete the process of de optimization.

Hotspot Code:
Go inside chrome and learn how the V8 engine works

Before optimization:
Go inside chrome and learn how the V8 engine works

After optimization:
Go inside chrome and learn how the V8 engine works

summary

Now let’s summarizeV8Implementation process of the:

  1. Source code throughParserThe parser is generated through lexical analysis and syntax analysisAST
  2. ASTafterIgnitionThe interpreter generates bytecode and executes it
  3. During execution, if the hotspot code is found, hand over the hotspot code to theTurboFanThe compiler generates machine code and executes it
  4. If the hotspot code no longer meets the requirements, it shall be de optimized

This bytecode technology combined with interpreter and compiler is what we usually call real-time compilation(JIT)。

This article does not introduce the garbage collectorOrinocoV8The garbage collection mechanism of can be introduced in detail in a separate article. See you next time.

Reference articles

  1. V8 official documents
  2. Celebrating 10 years of V8
  3. How does V8 execute JavaScript code?
  4. Ignition: An Interpreter for V8
  5. Just in time compilation