Know webassembly

Time:2022-4-29

origin

Webassembly originated from an amateur project of Mozilla employees. In 2010, alon zakai, who was engaged in the development of Android Firefox in Mozilla, developed a game engine calledEmscriptenC + + code can be compiled into JavaScript code through llvm IR.

By the end of 2011, emscripten was even able to successfully compile large c + + projects such as Python and doom. Mozilla felt that the project was promising, so it set up a team and invited alon to develop the project full-time. In 2013, alon and other members proposedasm.jsSpecification, ASM After adding JavaScript as a subset of the browser’s “JavaScript space”, it is more suitable to improve the “JavaScript space” as a more complete compiler than the “JavaScript. JS” language.

asm. JS only provides two data types: 32-bit signed integer, 64 bit signed floating-point number, and other data types, such as string, Boolean or object, ASM JS does not provide anything. They all exist in the form of numerical values and are saved in memory throughTypedArrayCall. The declaration of type also has a fixed writing method:Variable | 0Represents an integer,+VariableRepresents a floating point number. For example, the following code:

function MyAsmModule() {
    "use asm";  //  Tell the browser this is an ASM JS module
    function add(x, y) {
        x = x | 0;  //  The variable | 0 represents an integer
        y = y | 0;
        return (x + y) | 0;
    }
    return { add: add };
}

Support ASM JS engine recognizes the type in advance, and can carry out radical JIT (just in time compilation) optimization, or even AOT (pre compilation) compilation, which greatly improves the performance. ASM. Is not supported JS is executed according to normal JavaScript code and will not affect the running results.

But ASM The disadvantage of JS is also obvious, that is, the “bottom layer” is not thorough enough. For example, the code is still in text format; Code writing is still limited by JavaScript syntax; The browser still needs to complete a series of steps, such as parsing scripts, interpreting and executing, collecting performance indicators, JIT compilation and so on. If the binary format like Java class file is adopted, it can not only reduce the file volume, reduce the network transmission time and parsing time, but also select the bytecode closer to the machine. In this way, the implementation of AOT / JIT compiler will be easier and the effect will be better.

Meanwhile, Google’s Chrome team is also trying to solve JavaScript performance problems, but in a different direction. The solutions given by chrome are NaCl (Google Native Client) and pnacl (portable NaCl). Through NaCl / pnac1, Chrome browser can directly execute local code in sandbox environment.

asm. JS and NaCl / pnac1 technologies have their own advantages and disadvantages, and they can learn from each other. Mozilla and Google have also seen this, so the two teams have often communicated and cooperated since 2013. Later, they decided to combine the strengths of the two projects to jointly develop a bytecode based technology. By 2015, “webassembly” was officially named and made public. W3C established a wasm community group (including chrome, edge, Firefox and WebKit) to promote the development of wasm technology.

Launched on January 14, 2016, rust.wasm is supported.
In 2017, Google decided to abandon pnacl technology; The updated versions of chrome, edge, Safari and Firefox of the four major browsers began to support wasm.
Go 1.11 was released in 2018 and began to support wasm.
In 2019, emscripten was updated to use llvm to compile wasm code by default, and the ASM JS support; Web assembly has become the recommendation standard of the World Wide Web Consortium (W3C), and has become the fourth language of the web together with HTML, CSS and JavaScript.

brief introduction

Official definition: webassembly / wasm is a binary instruction set based on stack virtual machine, which can be used as the compilation target of programming language and can be deployed in web client and server applications.

Webassembly has the following features:

  • It is a kind of low-level assembly language, which can run at near local speed on all contemporary desktop browsers and many mobile browsers.
  • The file design is compact, so it can be transmitted and downloaded quickly. These files are also designed so that they can be parsed and initialized quickly.
  • It is designed as a compilation target so that code written in C + +, rust and other languages can now run on the web.

In other words, web assembly can make code written in various languages run in browsers at a near native speed.

Webassembly is also designed to coexist and work together with JavaScript. Compared with JavaScript (including ASM. JS), it solves the following problems:

  • Performance improvement. Because web assembly is a kind of bottom class assembly language and the code is static type, the browser can directly compile it into machine code to greatly improve the performance; Moreover, because the web assembly is in the form of bytecode, the file volume is also very small, which is convenient for rapid network transmission. Browser manufacturers even introduced the “stream compilation” technology, so that the file can be compiled while downloading, and can be initialized after downloading.
  • Integrate different languages. Before, if you want to execute other languages on the web, you can only convert other languages into JavaScript language, but this process is not easy, and it will greatly reduce the execution performance; Web assembly is positioned as the compilation target language from the beginning of design, so that other languages can easily be converted into web assembly language code, which not only does not need to worry about performance (although there will still be some losses), but also makes code reuse simple.
  • Enhance code security. Protecting JavaScript code usually only uses obfuscation to significantly reduce code readability, but with the help of some tools, it can still be readable as long as it takes more time. However, the translated wasm code is completely unreadable. Even if it is decompiled through tools such as wasm2c, it is still much more difficult than analyzing JS code (of course, it will not achieve complete code security, but increasing the difficulty of reverse will greatly reduce its risk).

However, web assembly is not a pure browser platform technology, just like JavaScript and node JS, now it also has its own runtime, which has many applications in system application fields such as cloud native, blockchain, security and so on.

compile

C / C + + passEmscriptencompile:

emcc hello.c -o hello.wasm

Rust passedCargocompile:

cargo build --target wasm32-example --release

You can also further compress the volume:

wasm-gc target/wasm32-example/release/hello.wasm

Golang built in compilation:

GOARCH=wasm GOOS=js go build -o hello.wasm main.go

function

Run in JavaScript

In order to run webassembly in JavaScript, you need to put the module into memory before compiling / instantiating. For example, through XMLHttpRequest or fetch, the module will be initialized as a typed array.

Examples of using fetch:

fetch('module.wasm').then(response =>
  response.arrayBuffer()
).then(bytes =>
  WebAssembly.instantiate(bytes, importObject)
).then(results => {
  result.instance.exports
});

You create a webbuffer module by using the binary code described aboveWebAssembly.instantiate()Compile it.

You can also useWebAssembly.instantiateStreaming(), this method directly obtains, compiles and instantiates the module from the original bytecode without conversion to arraybuffer:

WebAssembly.instantiateStreaming(fetch('simple.wasm'), importObject)
.then(result => {
  result.instance.exports
});

The web assembly program will support it in the future<script type='module'>The direct loading form of esimport and esimport.

Run outside the browser

The wasm community provides many runtime containers so that wasm can be executed on systems outside the browser, and the running environment is sandboxed.

Currently popular Runtime:

  • wasmtime: it can be used as a cli or embedded into other application systems, such as IOT or cloud
  • WebAssembly Micro Runtime: the virtual machine that is more inclined to the chip scene, as its name shows, is very small, the starting speed is only 100 microseconds, and the minimum memory consumption is only 100kb
  • wasmerWAPM features that it supports running wasm instances in more programming languages and has its own package management platform:
  • WasmEdge: formerly known as SSVM, it has targeted optimization for cloud native, edge and decentralized applications

Underlying concept

modular

The main unit of a web assembly program is called a module. This term is used to represent both the binary version of the code and the compiled version in the browser.

A large-scale web assembly application is often composed of multiple sub modules. Each module has its own independent data resources, so the sub module cannot tamper with the data of other modules; In addition, the permissions that can be used by each module are specified by the caller of the top layer, so the third-party sub module cannot be invoked beyond its authority without the awareness of the upper layer module. This permission management is similar to that Android development needs to declare all dependent permissions in advance.

When other high-level languages are compiled into webassembly, it will become a module binary file, and the file name is.wasmAt the end of the suffix, the beginning of the file content is an 8-byte module header for Description:

0000000: 0061 736d              ; WASM_BINARY_MAGIC
0000004: 0d00 0000              ; WASM_BINARY_VERSION

The first four bytes are called “magic number”, corresponding to\0asmString to identify that this is a wasm module; The last 4 bytes are the wasm standard version number used by the current module.

paragraph

Some modules are separated into a specific segment of code (some modules are required to be separated into a specific segment of code), and some modules have different functions after the segment of wa.

A segment may contain multiple items. The wasm specification defines a total of 12 segments and assigns an ID to each segment. Except for custom segments, all other segments can only appear once at most, and must appear in the order of increasing segment ID.

The following is the description of each paragraph, in which bold is the required paragraph:

ID paragraph explain
0 Custom segment It is mainly used to store debugging information and other data
1 Type segment Store the function parameter list of import function and module internal function
2 Import segment Used to store the function name, function parameter index of the imported function
3 Function segment Used to store function index values
4 Table segment It is used to store object references. The function pointer can be realized through table segments(call_indirectDirective), which can be imported from an external host or exported to an external host environment
5 Memory segment It is used to store the runtime dynamic data of the program, which can be imported from the external host or exported to the external host environment
6 Global segment Used to store all variable values
7 Export segment Used to store the function name, function parameter index of the exported function
8 Start segment Used to specify the function index value during module initialization
9 Element segment (elem) The table segment is not explicitly initialized, and the element segment is used to store the index value of the function
10 Code snippet Instruction code for storing functions
11 Data segment Static data used to store initialization memory

data type

The data types of wasm in binary coding are as follows:

  • Unsigned integer. Three types of non negative integers are supported: uint8, uint16, and uint32. The following number indicates how many bits are occupied
  • Variable length unsigned integer. Three types of variable length non negative integers are supported: varuint1, varuint7 and varuint32. The so-called variable length means that the number of bits used will be determined according to the specific data size, and the following number indicates the maximum number of bits that can be occupied
  • Variable length signed integer. As above, negative numbers are allowed here, and three types of varint7, varint32 and varint64 are supported
  • Floating point number. The same as JavaScript, IEEE-754 scheme is adopted, and the single precision is 32 bits

For the language itself, the following numeric types are provided:

  • I32: 32-bit integer
  • I64: 64 bit integer
  • F32: 32-bit floating point type
  • 64 bit floating point

Each parameter and local variable must be one of the above four value types. The function signature consists of the type sequence of 0 or more parameters and the type sequence of 0 or more return values. (in the minimum feasible version, a function can have at most one return type). Note that the value types I32 and i64 are not inherently signed or unsigned. The interpretation of these types depends on a specific operator.

Boolean values are represented by unsigned 32-bit integers. 0 is false and non-0 values are true. The values of all other memory types (such as linear) are represented in the memory space.

WAT

The output format of “wasm” is equivalent to the binary format of “wasm”. In addition, the output format of “wasm” is similar to that of “wasm”.

Know webassembly

The developer tools of some browsers support the conversion of wasm into wat for viewing, which is convenient for online debugging. The community provideswasm2watandwat2wasmAnd other mature tools to convert the two, which can be found in the wabt (web assembly binary Toolkit) tool set, so you can directly write Wat and then convert it to wasm.

WASI

Although web assembly is born for the web, it does not mean that it can only and does not intend to run only on the browser. Developers want to push it out of the browser, which needs to provide a set of interfaces to interact with the operating system.

Because web assembly is an assembly language based on conceptual machine rather than physical machine, web assembly provides a fast, scalable and safe way to run the same code on all computers. At the same time, in order to run on all different operating systems, web assembly needs a system interface of conceptual machine, not any single operating system. Therefore, developers defined a unified standard for communication with different operating systems, called Wasi (web assembly system interface). It is an engine independent and non web system oriented API standard specially designed for wasm.

The design of Wasi follows two principles:

  • Portability. It can compile portable binary files, which can be run on different computers once compiled, making it easier for users to distribute code. For example, if the native module of node is written in webassembly, users do not need to run it when installing applications with native modulesnode-gypWith, developers don’t need to configure and distribute dozens of binaries.
  • Security. When a line of code requests the operating system to perform some input or output, the operating system needs to determine whether the operation requested by the code is safe. Webassembly adopts the sandbox mechanism, and the code cannot directly interact with the operating system. The host (possibly the browser or the wasm runtime) needs to put relevant functions into the sandbox that the code can use. The host can limit what each program can do one by one. Although having a sandbox mechanism does not make the system itself secure (the host can still put all its capabilities into the sandbox), it at least allows the host to choose to create a more secure system.

Based on the above two key principles, Wasi is designed as a set of modular standard interfaces, of which the most basic core module iswasi-core, others, such assensorscryptoprocessesmultimediaSuch subsets are organized in the form of separate sub modules.

Know webassembly

wasi-coreIt contains the basic interfaces required by all programs. It will cover almost the same fields as POSIX, including Wasi abstract function interfaces for related system calls such as files, network connections, clocks and random numbers.

Wasi adds a “system call abstraction layer” between wasm bytecode and virtual machine. For example, for those used in C / C + + source codefopenFunction, when we compare this part of the source code with the C standard library specially implemented for Wasiwasi-libcWhen compiling, the source codefopenThe function call procedure of the is called indirectly through the call named__wasi_path_openFunction. this__wasi_path_openFunction is an abstraction of the actual system call.

The main work of Wasi is to define the import interface standard and provide the specific implementation of the general import interface on different systems (similar to the implementation of libc mode on different operating systems). Based on the design idea of Wasi, we can also provide a higher-level wadsi (web assembly domain specific interface) for different fields, and provide the domain general interface as the import interface, so that developers can use it directly.

Security

One of the security sources of webassembly is that it is the first language to share JavaScript VM, and JavaScript VM is sandboxed at runtime. At the same time, it has also experienced many years of inspection and security testing, which ensures its security. The accessible scope of the web assembly module does not exceed the access scope of JavaScript. At the same time, it will also abide by the same security rules, including enhanced rules such as the same origin policy.

Unlike desktop applications, the webassembly module does not have direct access to device memory, but the runtime environment passes an arraybuffer to the module during initialization. The module uses the arraybuffer as linear memory, and the web assembly framework checks to ensure that the code does not cross the array.

For items stored in a table segment such as function pointers, the web assembly module cannot access them directly. The code will use the index value to make a request to the web assembly framework to access a project. The framework then accesses memory and executes the project on behalf of the code.

In C + +, the execution stack is located in memory together with linear memory. Although C + + code should not modify the execution stack, it can use pointers to modify it. The execution stack of webassembly is separated from linear memory and cannot be accessed by code.

Application case

Google Earth
Google Earth released version 9.0 in 2017, which was developed using NaCl technology, so it could only run on chrome at that time. In 2020, Google rewritten the project through web assembly in C + +, and can run on Firefox and edge from then on.

Know webassembly

AutoCAD
AutoCAD is a famous desktop design software with a history of nearly 40 years. It is widely used in civil architecture, decoration, industrial drawing and other fields. AutoCAD released the web version in 2014, which was developed with the help of Google Web Toolkit (a tool set developed by Google that can develop web applications using java language), and translated the Java code on Android side into JS code. However, due to the huge JS code generated, the running efficiency on the browser is very low. In 2015, it passed ASM JS directly compiles and transplants the main functions in the original C + + code to the web platform, and the performance has been greatly improved. In March 2018, AutoCAD web based on wasm was also successfully born.

Know webassembly

Figma
Figma is a browser based collaborative UI design tool. The core interactive interface is hosted in a canvas, and the interaction of this canvas is controlled by wasm. Browser based makes it easy to run across platforms, and web assembly brings high performance, which makes it still beat similar applications based on native OS in speed even on the web platform.

Know webassembly

epilogue

It can be seen that webassembly is not used to completely replace JavaScript, but as a supplement to web technology to make up for the limitations of JavaScript in terms of performance and code reuse. Just as the official slogan of wasm: “all that can be realized by web assembly will be realized by web assembly”, the ultimate goal of web assembly is to compile it in any language and run it efficiently on any platform. The most important thing is that it relies on the support of mainstream development institutions such as Google, Mozilla and edge. I believe it will have further development in the future.

reference material