Understanding JavaScript Design with V8

Time:2021-6-10

Javascript code running

Take the common Chrome browser or node for example, our JavaScript code runs through V8. But how does V8 execute the code? When we inputconst foo = {foo:'foo'}What did V8 do when it came to the car? The author first throws out the above question, we then look down.

JavaScript storage

When the code is running, the most important premise is that there is a place where the state can be stored, which is the stack space. Our basic types are stored in the stack and will be recycled automatically; The composite type is stored in the heap, and space is released by GC operation. This process is implicit for users, so users must write code according to the JavaScript specification. If it does not meet the specification, GC will not be able to reclaim space correctly, which will cause ml phenomenon and even more serious will cause oom.

In order to see the storage form of each type in memory more intuitively, the author creates a basic type variableFoo, compound typeBarAnd a statementJohnAnd their state diagrams in the memory stack are given

Understanding JavaScript Design with V8

About GC

Through the above analysis, we mentioned that GC will recycle invalid objects and release space. For users, their declaration and release are automatic, regardless of the basic type or composite type. But actuallyRecycling of the heap is manualIt’s just that it has been implemented for us at the V8 level, and this process is not completely free(write barrier)。 However, this automatic process allows most developers to completely ignore its existence. Obviously, JavaScript is designed on purpose

Understanding JavaScript Design with V8

The write barrier is used to inform GC of all operations of the current object graph changes during the asynchronous three color marking algorithm, so as to ensure the accuracy of the three color marking method in the asynchronous process

// Called after `object.field = value`.
write_barrier(object, field_offset, value) {
  if (color(object) == black && color(value) == white) {
    set_color(value, grey);
    marking_worklist.push(value);
  }
}

Positioning of JavaScript

Used itC / C++Students must have a deep understanding of manual memory operation and memory releaseGOandDThere are also problemsPointerIt’s a new concept. Generally speaking, if a language is located at the “system level”, it can directly manipulate the memory space, except for the above-mentioned languages,RustIt is also a system level language, the VM of filefoxTraceMonkeyWrite in this language. It is worth mentioning thatTraceMonkeyThe predecessor ofMoonMonkeyIt’s the world’s first JavaScript engine. Of course, the so-called direct operation of memory stack content here is still through some mapping of hardware. Our high-level language is on the top of OS, so OS still gives the program the illusion of direct operation of memory.

Back to JavaScript, it’s obvious thatIt’s not a language defined at the “system level.”More upstreamApplication levelLanguage, so language design and application scenarios tend to hide some underlying concepts. In addition to the positioning of language, JavaScript is a dynamic language, which means that there are a lot of running information in the language runtime, which records such asGlobal execution contextglobal scopePrototype chain inheritanceInformation and so on. Just because these features can only be completed at run time, there is another reason to need V8, which also leads to the role of interpreter in V8.

About CPU

Before we introduce the interpreter, let’s take a look at the CPU. Today’s CPU is very complex. Let’s first purify the CPU, that is, it has a simple instruction set, ALU and register. When it executes the code, the idea is actually very simple, just a large stringif ... else ...To determine the current instruction code, parsing instructions. In other words, the basic work of the CPU is to calculate and jump according to the operation code. It will not check whether the program is correct or not. As long as the operation code matches, it will execute. Naturally, it will not care what data is in the content stack. Here’s howRISC-VProcessor code fragment, you can see that it is only by judging the instruction to perform the corresponding operation.

  while(1){
    iters++;
    if((iters % 500) == 0)
      write(1, which_child?"B":"A", 1);
    int what = rand() % 23;
    if(what == 1){
      close(open("grindir/../a", O_CREATE|O_RDWR));
    } else if(what == 2){
      close(open("grindir/../grindir/../b", O_CREATE|O_RDWR));
    } else if(what == 3){
      unlink("grindir/../a");
    } else if(what == 4){
      if(chdir("grindir") != 0){
        printf("grind: chdir grindir failed\n");
        exit(1);
      }
      unlink("../b");
      chdir("/");
    } else if(what == 5){
      close(fd);
      fd = open("/grindir/../a", O_CREATE|O_RDWR);
    } else if(what == 6){
      close(fd);
      fd = open("/./grindir/./../b", O_CREATE|O_RDWR);
    } else if(what == 7){
      write(fd, buf, sizeof(buf));
    } else if(what == 8){
      read(fd, buf, sizeof(buf));
    } else if(what == 9){
      mkdir("grindir/../a");
      close(open("a/../a/./a", O_CREATE|O_RDWR));
      unlink("a/a");
    } else if(what == 10){
      mkdir("/../b");
      close(open("grindir/../b/b", O_CREATE|O_RDWR));
      unlink("b/b");
    } else if(what == 11){
      unlink("b");
      link("../grindir/./../a", "../b");
    } else if(what == 12){
      unlink("../grindir/../a");
      link(".././b", "/grindir/../a");
    } else if(what == 13){
      int pid = fork();
      if(pid == 0){
        exit(0);
      } else if(pid < 0){
        printf("grind: fork failed\n");
        exit(1);
      }
      wait(0);
    } else if(what == 14){
      int pid = fork();
      if(pid == 0){
        fork();
        fork();
        exit(0);
      } else if(pid < 0){
        printf("grind: fork failed\n");

So back to V8, one of the functions of the V8 interpreter isRecord the runtime state of the programIt can track the memory condition and monitor the variable type to ensure the security of code execution. stayC / C++In the manual operation of memory language, if the memory has a small overrun, it will not necessarily cause the program to crash, but the result will certainly be a problem, but this kind of troubleshooting is time-consuming.

Now that I have mentioned the concepts related to the V8 interpreter, let’s continue to expand on it. Because JavaScript is a dynamic language, it needs the interpreter to process the code. Therefore, the early JavaScript engines were very slow to run the code. Therefore, the interpreter has a big feature, that isFast startup and slow execution. In order to improve this problem, V8 first introduced the concept of just in time compilation (JIT), and later other engines have also been introduced, so most popular JavaScript engines now have this feature. It mainly uses trade-off strategy, interpreter and compiler at the same time. Compiler hasSlow start and fast executionThe characteristics of the system. They work together in this way: after the code is converted into ast, it is first handed over to the interpreter for processing. If the interpreter monitors that some JavaScript code runs more times and has a fixed structure, it will be marked as hot code and handed over to the compiler for processing. The compiler will compile that part of the code into binary machine code and optimize it, The optimized binary code will be given to the CPU, and the execution speed will be greatly improved. At the same time, this leads to a reason for the need for V8: due to differentCPUThe instruction set of is different, so in order to do thatCross platformThere must be a layer of abstraction, and V8 is this layer of abstraction to break away from the machine dependency of the target code.

Speaking of this, the students must also know why we need V8 and how the bottom layer of V8 executes a section of JavaScript code. However, the main reason why we need V8 is introduced in the above process, so I avoid many details generated during the compilation of V8. In short, JavaScript is an application oriented language. In order to meet the requirements of security, cross platform, and runtime state control, we choose to set another layer on the real machine for processing, which can also be called VM (virtual machine)

V8 compilation process

Now let’s discuss in detail how V8 executes JavaScript code. According to the previous description, in order to improve the execution efficiency, V8 uses a mixture of interpretation execution and compilation execution, which is what we call just in time. At present, there are many languages using this kind of method, such as JavaJVM,   Lua scriptLuaJITwait.

When we code

foo({foo: 1});

function foo(obj) {
  const bar = obj.foo + 1
  
  return bar + '1'
}

We can find that foo is executable, which we call in JavaScript languageVariable promotionBut from another point of view, pay attention to the address I wrote above?code; The program code we write is just for human beings. For machines, it’s just meaningless characters, so it’s also called high-level language. So the final execution and the code we wrote can not be equal, so we can not understand the execution completely according to our code.

But how does the machine deal with our code? Because the encoding string is not easy to operate for the machine, we will convert it into ast (abstract syntax tree). Using this tree data structure, we can operate our encoding very clearly and effectively, and finally compile it into a machine understandable mechanical language.

So how does V8 deal with variable promotion? Obviously, before V8 starts executing JavaScript code, it needs to know which variable declaration statements are available and put them into scope.

According to the above analysis, we can know that when V8 starts, it needs to initialize the execution environment first, and the main initialization operations in V8 are as follows:

  • Initialize heap space and stack space
  • Initialize the global context environment, including the global information and variables in the execution process
  • Initialize global scope. Function scopes and other child scopes exist at execution time
  • Initializes the event loop system

Understanding JavaScript Design with V8

After the initialization, V8 will use the parser to transform the coding structure into ast. Let’s take a look at what the ast generated by V8 looks like. The coding performed is subject to the example above

[generating bytecode for function: foo]
--- AST ---
FUNC at 28
. KIND 0
. LITERAL ID 1
. SUSPEND COUNT 0
. NAME "foo"
. PARAMS
. . VAR (0x7fe5318086d8) (mode = VAR, assigned = false) "obj"
. DECLS
. . VARIABLE (0x7fe5318086d8) (mode = VAR, assigned = false) "obj"
. . VARIABLE (0x7fe531808780) (mode = CONST, assigned = false) "bar"
. BLOCK NOCOMPLETIONS at -1
. . EXPRESSION STATEMENT at 50
. . . INIT at 50
. . . . VAR PROXY local[0] (0x7fe531808780) (mode = CONST, assigned = false) "bar"
. . . . ADD at 58
. . . . . PROPERTY at 54
. . . . . . VAR PROXY parameter[0] (0x7fe5318086d8) (mode = VAR, assigned = false) "obj"
. . . . . . NAME foo
. . . . . LITERAL 1
. RETURN at 67
. . ADD at 78
. . . VAR PROXY local[0] (0x7fe531808780) (mode = CONST, assigned = false) "bar"
. . . LITERAL "1"

The above is the ast syntax tree format of V8 output. Although the presentation is not very intuitive, it is essentially consistent with thebabel / acornThe ast tree compiled by JavaScript parser is the same, and they all follow the estree specification. Convert it to our familiar format as follows:

{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "foo"
      },
      "params": [
        {
          "type": "Identifier",
          "name": "obj"
        }
      ],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "VariableDeclaration",
            "declarations": [
              {
                "type": "VariableDeclarator",
                "id": {
                  "type": "Identifier",
                  "name": "bar"
                },
                "init": {
                  "type": "BinaryExpression",
                  "left": {
                    "type": "MemberExpression",
                    "object": {
                      "type": "Identifier",
                      "name": "obj"
                    },
                    "property": {
                      "type": "Identifier",
                      "name": "foo"
                    },
                  },
                  "operator": "+",
                  "right": {
                    "type": "Literal",
                    "value": 1,
                    "raw": "1"
                  }
                }
              }
            ],
          },
          {
            "type": "ReturnStatement",
            "start": 51,
            "end": 67,
            "argument": {
              "type": "BinaryExpression",
              "left": {
                "type": "Identifier",
                "name": "bar"
              },
              "operator": "+",
              "right": {
                "type": "Literal",
                "value": "1",
                "raw": "'1'"
              }
            }
          }
        ]
      }
    }
  ],
}

After converting the encoding to ast, the structured representation of JavaScript encoding is completed, and the compiler can operate the source code accordingly. When the AST is generated, the corresponding scope will be generated. For example, the above code will generate the following scope content:

Global scope:
global { // (0x7f91fb010a48) (0, 51)
  // will be compiled
  // 1 stack slots
  // temporary vars:
  TEMPORARY .result;  // (0x7f91fb010ef8) local[0]
  // local vars:
  VAR foo;  // (0x7f91fb010e68)

  function foo () { // (0x7f91fb010ca8) (20, 51)
    // lazily parsed
    // 2 heap slots
  }
}
Global scope:
function foo () { // (0x7f91fb010c60) (20, 51)
  // will be compiled
}

The above line generates a global scope, and we can see that the foo variable is added to the global scope.

Bytecode

After completing the above steps, the interpreterIgnitionThe corresponding bytecode will be generated according to ast

Because JavaScript bytecode is not standardized as JVM or estree, its format will be closely related to V8 engine version.

Understand a byte code

Bytecode is the abstraction of machine code. If bytecode is designed with the same computing model as physical CPU, it will be easier to compile bytecode into machine code. That is to say, interpreter is often register or stack. In other wordsIgnitionIs a register with an accumulator.

Byte dock file of V8bytecodes.hAll kinds of bytecode are defined. These bytecode description blocks can be combined to form any JavaScript function.

Many bytecodes satisfy the following rules/^(Lda|Sta).+$/Some of themaReferring to accumulator, it is mainly used to describe the operation of the value into the accumulator register, or to take out the current value in the accumulator and store it in the register. Therefore, the interpreter can be understood as a register with an accumulator

The JavaScript bytecode output by the above example code through the V8 interpreter is as follows:

[generated bytecode for function: foo (0x3a50082d25cd <SharedFunctionInfo foo>)]
Bytecode length: 14
Parameter count 2
Register count 1
Frame size 8
OSR nesting level: 0
Bytecode Age: 0
         0x3a50082d278e @    0 : 28 03 00 01       LdaNamedProperty a0, [0], [1]
         0x3a50082d2792 @    4 : 41 01 00          AddSmi [1], [0]
         0x3a50082d2795 @    7 : c6                Star0
         0x3a50082d2796 @    8 : 12 01             LdaConstant [1]
         0x3a50082d2798 @   10 : 35 fa 03          Add r0, [3]
         0x3a50082d279b @   13 : ab                Return
Constant pool (size = 2)
0x3a50082d275d: [FixedArray] in OldSpace
 - map: 0x3a5008042205 <Map>
 - length: 2
           0: 0x3a50082d2535 <String[3]: #foo>
           1: 0x3a500804494d <String[1]: #1>
Handler Table (size = 0)
Source Position Table (size = 0)

Let’s first look at the bytecode output of the foo function,LdaNamedProperty a0, [0], [1]Load the attribute named by A0 into the accumulator. The I in a [i] represents the i-th parameter of arguments [I-1], that is, the i-th parameter of the function. Then this operation is to take out the first parameter of the function and put it into the accumulator, followed by[0]It means0: 0x30c4082d2535 <String[3]: #foo>That is to saya0.foo. final[1]Represents the index of the feedback vector, which contains the runtime information for performance optimization. In short, theobj.fooPut it in the accumulator.

NextAddSmi [1], [0]It means to add the value in the accumulator to [1]. Since this is the number 1, it does not exist in the corresponding table. Finally, the value in the accumulator has been stored as 2. final[0]Represents the feedback vector index

Since we define a variable to store the result of the accumulator, the bytecode also corresponds to the storage code of the responseStar0Indicates that the value of the corresponding accumulator is taken out and stored in the register R0.

LdaConstant [1]It means to take the second in the corresponding table[i]Two elements are stored in the accumulator, that is, taken out1: 0x3a500804494d <String[1]: #1>, which is stored in the accumulator.

Add r0, [3]Represents the value of the current accumulator'1'And registerr0Value of:2Accumulate, the last[3]Represents the feedback vector index

finalReturnRepresents the value that returns the current accumulator'21'. The return statement is a functionFoo()At this time, the caller of foo function can get the corresponding value by accumulator and further process it.

Application of bytecode

Because bytecode is the abstraction of machine code, it will be more friendly at run time than our code directly handed over to V8, because if you input bytecode directly to V8, you can skip the corresponding process of using parser to generate the corresponding ast tree. In other words, it will greatly improve the performance and guarantee the security. Because bytecode has gone through a complete compilation process, which erases the extra semantic information carried in the source code, its reverse difficulty can be compared with the traditional compiled language.

Found on NPMBytenodeIt is a bytecode compiler that works on node.js. It can compile JavaScript into real V8 bytecode to protect the source code. At present, the author has seen some people share the details of this application. See the references at the end of this paper for details-The principle of using bytecode to contain node.js source code

Interpretation execution and compilation execution of just in time compilation

After the bytecode is generated, there are two links in the V8 compilation process. The bytecode will be directly executed by the normal code and directly executed by the bytecode compiler. How to deal with bytecodeparser  I don’t know about it, let’s understand it as bytecode first, and finally use bytecodegccProcessing into machine code execution.

When we find repeated code in the execution code, the V8 monitor will mark it as hot code and submit it to the compilerTurboFanImplementation,TurboFanThe bytecode is compiled intoOptimized Machine CodeAfter optimization, the execution efficiency of machine code will be greatly improved.

But JavaScript is a dynamic language with a lot of runtime state information, so our data structure can be modified at run time, and the machine code optimized by the compiler can only deal with fixed structure, so once the machine code optimized by the compiler is modified dynamically, the machine code will be invalid and the compiler needs to executeAnti optimizationDo it, do itOptimized Machine CodeRecompile back to bytecode.

Understanding JavaScript Design with V8

JavaScript Object

JavaScript is a languageObject basedIt can be said that in addition to JavaScriptnullundefined  Most of the content is made up of objects, we can even say JavaScript
It’s a language built on objects.

But strictly speaking, JavaScript is not an object-oriented language, because object-oriented language needs to support encapsulation, inheritance and polymorphism. But JavaScript doesn’t directly provide polymorphism support, but we can still implement polymorphism, but it’s more troublesome to implement.

The object structure of JavaScript is very simple. It consists of a component and a value. The value can be of three types:

  • Primitive type: primitive types mainly include: null, undefined, Boolean, number, string, bigint, symbol, which are stored in a stack like data structure, follow the principle of first in, last out, and have the following characteristicsimmutableFeatures, for example, we modifiedstringV8 will return us a brand new valuestring
  • Object type: Javascript is a language based on objects, so the attribute value of an object can also be another object.
  • Function type: if a function is a property of an object, we usually call it a method.

Function

Function as a first-class citizen in JavaScript, it can be very flexible to achieve a variety of functions. The fundamental reason is that functions in JavaScript are special objects. Because the function is a first-class citizen design, our JavaScript can be very flexible to achieve closure and functional programming and other functions.

Functions can be called by adding parentheses to the function name

function foo(obj) {
  const bar = obj.foo + 1
  return bar + '1'
}

foo({foo: 1});

You can also use anonymous functions,IIFEMethod call, actuallyIIFEMethod only supports receiving expressions, but the function in the following example is a statement, so V8 will implicitly use the function statementfooUnderstand as a function expressionfooTo run.

Before the emergence of module scope in ES6, there was no concept of private scope in JavaScript. Therefore, when developing projects by multiple people, singleton mode was often used to create a namespace in Iife mode to reduce the naming conflict of global variables. Therefore, the biggest feature of Iife is that the execution will not pollute the environment, the function and the variables inside the function will not be accessed by other parts of the code, and the external can only get the return results of Iife.

(function foo(obj) {
  const bar = obj.foo + 1
  
  return bar + '1'
})({foo: 1})

Since a function is an object in essence, how can a function obtain the callable features different from other objects?

In order to handle the callability of functions, hidden attributes are added to each function in V8, as shown in the following figure:

Understanding JavaScript Design with V8

The hidden properties are the properties of the functionnameAttributes andcodeProperty.

  • nameAttribute is widely supported by browsers, but it was not written into the standard until ES6nameAttribute can get the function name because V8 exposes the corresponding interface. The function instance returned by the function constructor. The value of the name property isanonymous
(new Function).name // "anonymous"
  • codeProperty represents the function code tostringIs stored in memory in the form of. When a function call statement is executed, V8 takes it out of the function objectcodeProperty value, and then explain and execute the function code. The V8 is not exposedcodeProperty, so it cannot be output directly.

About JavaScript

JavaScript can be accessed throughnewKeyword to generate the corresponding object, but there are many details hidden in it, which makes it easy to increase the cost of understanding. In fact, this approach is based on market research. Java was very popular during the birth of JavaScript, and JavaScript needs to be like Java, but it can’t interact with Java. Therefore, JavaScript not only rubs the heat on the name, but also adds new. So the construction object becomes what we see. This seems unreasonable in design, but it also helps to promote the popularity of JavaScript.

In addition, ES6 addedclassFeatures, butclassIn the history of development, people tried to implement the real class before and after Es4, but they all failed, so they finally decided not to do the real right thing, so we now use theclassIt’s really JS VM syntax sugar, but it’s essentially different from using Babel to convert into function and then execute in our project. V8 will give corresponding keywords to process when compiling classes.

Object Storage

JavaScript is object-based, so the value types of objects are also very rich. It brings us flexibility at the same time, the object storage data structure with linear data structure has been unable to meet the needs, have to use nonlinear data structure (Dictionary) for storage. This brings the problem of inefficient object access. Therefore, in order to improve the efficiency of storage and search, V8 adopts a complex storage strategy.

First, we create the object Foo and print it. The codes are as follows:

function Foo() {
  this["bar3"] = 'bar-3'
  this[10] = 'foo-10'
  this[1] = 'foo-1'
  this["bar1"] = 'bar-1'
  this[10000] = 'foo-10000'
  this[3] = 'foo-3'
  this[0] = 'foo-0'
  this["bar2"] = 'bar-2'
}

const foo = new Foo()

for(key in bar){
  console.log(`key: ${key} value:${foo[item]}`)
}

The result of the code output is as follows

key: 0 value:foo-0
key: 1 value:foo-1
key: 3 value:foo-3
key: 10 value:foo-10
key: 10000 value:foo-10000
key: bar3 value:bar-3
key: bar1 value:bar-1
key: bar2 value:bar-2

After careful observation, we can see that V8 implicitly deals with the arrangement order of objects

  • keyAttributes for numbers are printed first and arranged in ascending order
  • keyThe attributes for the string are arranged in the order in which they were defined.

The reason for this is that the ECMAScript specification defines that the numeric attributes should be arranged in ascending order according to the size of the index value, and the string attributes should be arranged in ascending order according to the order in which they were created. As the implementation of ECMAScript, V8 needs to comply with the specification.

In order to optimize the access efficiency of objects, V8 useskeyDivide objects into two categories.

  • Within the objectkeyAn attribute that is a number is calledelements(sort attribute), this kind of attribute exchanges time by wasting space, subscripts access directly, and improves access speed. When the ordinal number of an element is very discontinuous, it will be optimized into a hash table.
  • Within the objectkeyA property that is a string is calledproperties(general attribute), the attribute and value of the object are divided into linear data structure and attribute dictionary structure to optimize the original complete dictionary storage.propertiesAttribute adopts linked list structure by default. When the amount of data is very small, the search will be very fast, but when the amount of data rises to a certain value, it will be optimized into a hash table. The above objects are stored in memory, as shown in the figure:

Understanding JavaScript Design with V8

After the storage decomposition is completed, the access of the object will search the corresponding attribute according to the category of the index value. If it is a full index of the attribute value, V8 will retrieve the object from the corresponding attributeelementsRead elements in ascending order, and then go topropertiesRead the remaining elements in.

It is worth noting that the implementation of ECMAScript in V8 is lazy, and V8 does not implement ECMAScript in memoryelementThe elements are arranged in ascending order.

Understanding JavaScript Design with V8

In object properties

V8 divides objects into two categories according to their attributes, which simplifies the efficiency of object searching, but there will be one more step. For example, I need to visitFoo.bar3, V8 needs to access the corresponding object firstFoo, and then visit the correspondingpropertiesTo get itbar3The corresponding value, in order to simplify the operation, V8 will be the value of the objectproperties10 in object properties are assigned by default, as shown in the following figure:

Understanding JavaScript Design with V8

WhenpropertiesWhen the number of attributes is less than 10, allpropertiesAll attributes can be in object attributes. When there are more than 10 attributes,Over 10OfpropertiesProperties, refilling topropertiesThe dictionary structure is used for storage. After using the attribute in the object, it is much more convenient to find the corresponding attribute again.

The attributes in an object can be expanded dynamically. The number of in-object properties is predetermined by the initial size of the object。 However, the author has not seen the case that more than 10 attributes are dynamically expanded.

By analyzing here, students can think about which operations in daily development will be very detrimental to the implementation efficiency of the above rules, such asdeleteIn general, it is not recommended for object attribute value operation, because deleting elements will cause a large number of attribute elements to move, andpropertiesIt may also need to rearrange the attributes in the object, which is the cost of extra performance; stayIt does not affect the semantic fluency of the codeIn this case, it can be usedundefinedReset the property value, or use theMapData structure,Map.deleteThe optimization of this method is better.

Attributes within objects are not suitable for all scenarios. In the case of too many or frequent changes of object attributes, V8 will cancel the allocation of attributes within objects and degrade them to non-linear dictionary storage mode. Although this reduces the search speed, it improves the speed of modifying the attributes of objects. For example:

function Foo(_elementsNum, _propertiesNum) {
  // set elements
  for (let i = 0; i < _elementsNum; i++) {
    this[i] = `element${i}`;
  }
  // set property
  for (let i = 0; i < _propertiesNum; i++) {
    let ppt = `property${i}`;
    this[ppt] = ppt + 'value';
  }
}
const foos = new Foo(100, 100);

instantiation foosObject, we observe the corresponding memorypropertiesYou can find all of themproperty${i}All attributes are inpropertiesBecause of the excessive quantity, it has been degraded by V8.

Understanding JavaScript Design with V8

Compiler optimization

Taking the code above as an example, let’s create a larger object instance

const foos = new Foo(10000, 10000);

Since the constructor of the object we create is a fixed structure, in theory, it will trigger the monitor to mark the hot code and give it to the compiler for corresponding optimization. Let’s take a look at the output record of V8

[marking 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> for optimized recompilation, reason: small function]
[compiling method 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> (target TURBOFAN) using TurboFan OSR]
[optimizing 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> (target TURBOFAN) - took 1.135, 3.040, 0.287 ms]
[marking 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> for optimized recompilation, reason: small function]
[compiling method 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> (target TURBOFAN) using TurboFan OSR]
[optimizing 0x2ca4082d26e5 <JSFunction Foo (sfi = 0x2ca4082d25f5)> (target TURBOFAN) - took 0.596, 1.681, 0.050 ms]

It can be seen that the corresponding optimization record has been output, but the author has not done more in-depth research on it. If you know more about compiler optimization details, you can add them in the comments area.

aboutproto

JavaScript inheritance is very characteristic. It uses prototype chain to inherit, and uses_proto_As a bridge of links. However, it is not recommended to use V8 directly_proto_Directly operate the inheritance of objects, because it involves V8 hidden class correlation, which will destroy the hidden Class Optimization and the corresponding class transition operation that V8 has done well in the generation of object instances.

JavaScript type system

The type system in JavaScript is a very basic knowledge point, but it is also the most widely used, flexible, complex and error prone. The main reason is that the conversion rules of type system are cumbersome and easy to be ignored by engineers.

In CPU, the processing of data is just shift, add or multiply, without the concept of related types, because it deals with a pile of binary code. However, in high-level languages, language compilers need to determine whether the addition of different types of values has corresponding meaning.

For example, python, like JavaScript, is a weakly typed language. Enter the following code1+'1'

In[2]: 1+'1'

Traceback (most recent call last):
  File "..", line 1, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-0cdad81f9201>", line 1, in <module>
    1+'1'
TypeError: unsupported operand type(s) for +: 'int' and 'str'

You can see the throw responseTypeErrorBut this code will not report an error in JavaScript because it is considered meaningful code.

console.log(1+'1')
// 11

The internal cause of the above phenomenon is the type system. The more powerful the type system is, the more content the compiler can detect. It can affect not only the definition of type, but also the check of type and the definition of operation interaction before different types.

In Wikipedia, the type system is defined as follows: in computer science, the type system is used to define how to classify values and expressions in programming languages into many different types, how to operate these types, and how these types interact. Type can confirm that a value or a group of values has a specific meaning and purpose (although some types, such as abstract type and function type, may not be expressed as values in program operation). The type system is very different among different languages. Perhaps the most important difference lies in the syntax at compile time and the operation implementation at run time.

Basic conversion of type system

ECMAScriptThe specific operation rules in JavaScript are defined.

1.Let lref be the result of evaluating AdditiveExpression.
2.Let lval be GetValue(lref).
3.ReturnIfAbrupt(lval).
4.Let rref be the result of evaluating MultiplicativeExpression.
5.Let rval be GetValue(rref).
6.ReturnIfAbrupt(rval).
7.Let lprim be ToPrimitive(lval).
8.ReturnIfAbrupt(lprim).
9.Let rprim be ToPrimitive(rval).
10.ReturnIfAbrupt(rprim).
11.If Type(lprim) is String or Type(rprim) is String, then
    a.Let lstr be ToString(lprim).
    b.ReturnIfAbrupt(lstr).
    c.Let rstr be ToString(rprim).
    d.ReturnIfAbrupt(rstr).
    e.Return the String that is the result of concatenating lstr and rstr.
12.Let lnum be ToNumber(lprim).
13.ReturnIfAbrupt(lnum).
14.Let rnum be ToNumber(rprim).
15.ReturnIfAbrupt(rnum).
16.Return the result of applying the addition operation to lnum and rnum. See the Note below

The rules are complicated. Let’s introduce them slowly. Taking addition as an example, let’s first look at the standard type. If a number and a string are added, as long as a string appears, V8 will process other values to become strings, for example:

const foo = 1 + '1' + null + undefined + 1n

//The expression is converted by V8 to
const foo = Number(1).toString() + '1' + String(null) + String(undefined) + BigInt(1n).toString()

// "11nullundefined1"

If the content involved in the operation is not the basic type, according to the ECMAScript specification, V8 implements aToPrimitiveMethod, whose function is to convert the compound type into the corresponding basic type.
ToPrimitiveAccording to the object to string conversion or object to number conversion, there are two sets of rules

type NumberOrString = number | string

type PrototypeFunction<T> = (input: Record<string, any>, flag:T) => T

type ToPrimitive = PrototypeFunction<NumberOrString>

From the aboveTypeScriptType can be known, although the object will useToPrimitiveBut according to the second parameter, the final processing will be different.
The corresponding values of different parameters are given belowToPrimitiveProcessing flow chart:

Corresponding to toprimitive (object, number), the processing steps are as follows:

Understanding JavaScript Design with V8

  • If the object is of basic type, the result is returned directly
  • Otherwise, the valueof method is called, and if an original value is returned, JavaScript returns it.
  • Otherwise, the toString method is called, and if it returns an original value, JavaScript returns it.
  • Otherwise, JavaScript throws aTypeErrorAbnormal.

Corresponding to toprimitive (object, string), the processing steps are as follows:

Understanding JavaScript Design with V8

  • If the object is of basic type, the result is returned directly
  • Otherwise, the toString method is called, and if it returns an original value, JavaScript returns it.
  • Otherwise, the valueof method is called, and if an original value is returned, JavaScript returns it.
  • Otherwise, JavaScript throws aTypeErrorAbnormal.

amongToPrimitiveThe second parameter of is not required. The default value isnumberhoweverdateThe type is the exception, and the default value isstring

Let’s take a look at a few examples to verify:

/*
Example 1
*/
{ foo: 'foo' } + { bar: 'bar' }
// "[object Object][object Object]"

/*
Example 2
*/
{
  foo: 'foo',
  valueOf() {
    return 'foo';
  },
  toString() {
    return 'bar';
  },
} +
{
  bar: 'bar',
  toString() {
    return 'bar';
  },
}
// "foobar"

/*
Example 3
*/
{
  foo: 'foo',
  valueOf() {
    return Object.create(null);
  },
  toString() {
    return Object.create(null);
  },
} +
{
  bar: 'bar',
}
// Uncaught TypeError: Cannot convert object to primitive value

/*
Example 4
*/
const date = new Date();
date.valueof = () => '123';
date.toString = () => '456';
date + 1;
// "4561"

Example 3 will report an error becauseToPrimitiveCannot convert to underlying type.

summary

Using V8 to deeply understand JavaScript, this title may be a bit crazy, but for the author, through this learning, I really further understand the working mechanism of JavaScript and other languages, and have a deeper thinking on the concepts of front-end and technology stack.

This paper mainly introduces some concepts related to V8 and computer science through daily simple code storage, deduces the reasons for the current design from the positioning of JavaScript, and gives a macro understanding combined with V8 workflow; Then, through detailed steps, the product of each link of the V8 compilation pipeline is completely displayed; By analyzing JavaScript objects, the storage rules are derived; Finally, the rules of interaction between different types of data in V8 are introduced through the type system.

For the huge and complex execution structure of V8, this paper only describes the rare, there are too many topics in the paper can be used to extend and lead to more knowledge worthy of study, I hope students can gain and think through this paper, if there are errors in the paper, welcome to point out in the comments area.

reference material

Author information

Understanding JavaScript Design with V8