Build your own virtual machine using a real CPU instruction set that conforms to the current system architecture of CKB virtual machine

Time:2019-9-10

Nervos underlying public-chain CKB virtual machine (CKB-VM) is a block-chain virtual machine based on RISC-V. In the first two phases, we introduced the design concept of CKB virtual machine and the selection logic based on RISC-V instruction set. So one step further, why do we choose to build CKB-VM based on real CPU instruction sets? In this article, CKB-VM designers will continue to discuss with us the design inspiration, design and additional advantages of building CKB-VM based on real CPU instruction sets.

Secret Ape Science and Technology Block Chain Lesson 24


Inspiration and Design

Build your own virtual machine using a real CPU instruction set that conforms to the current system architecture of CKB virtual machine

Before designing CKB-VM, we found that many block chain projects did not use real CPU instruction sets to construct their own virtual machines. EWASM, EOS and Dfinity, the next generation virtual machines we are familiar with, have chosen WASM (Web Assembly, a coding format) to construct their own virtual machines. We can also design a VM with high-level language features, such as VM that can be used for static verification, or can directly support high-level data structures, or support various encryption algorithms.

However, we find that although virtual machines with high-level language features can provide more convenience, such as supporting programming languages with different grammars, there are also other problems: any complex VM with high-level language functions, no matter how flexible, will inevitably be introduced at the design level. Some semantic constraints, for performance reasons, require that different languages share almost the same semantics at the bottom (virtual machines with high-level language characteristics need to bind cryptographic primitives, which need to be implemented by bifurcation if the existing primitives are broken down in the future or when a set of cryptographic primitives need to be replaced). Thus,The flexibility of VM itself will be limited, which is inconsistent with the vision of CKB as the underlying infrastructure of the encryption economy.

At the same time, a VM with high-level language functions usually contains some high-level data structures and algorithms, so any high-level data structures and algorithms embedded in VM may only be suitable for the development of one kind of application, but not for the development of other applications. Moreover, we can not presuppose all possible usage methods. These data structures or algorithms embedded in VM itself have no effect except compatibility, and even become a burden over time.

In addition, we found that all block chain projects can only run under Von Neumann CPU architecture (x86, x86_64, ARM, etc.), and all advanced VM features must be mapped to CPU assembly instructions of modern architecture.

For example, although the V8 engine (an open source JavaScript engine developed by Google for use in Google Chrome and Chromium) seems to have infinite memory, its internal implementation still depends on a very complex garbage collection algorithm to simulate infinite memory in limited memory space. Space.

Similarly, Haskell (a standardized, generic pure functional programming language) or Idris (a generic dependency type pure functional programming language) may have advanced static type checking patterns (to some extent) to prove the correctness of software operation, but after type checking is completed, it is still necessary. Through a translation layer, the static validated code is converted into an unproven native x86_64 assembly instruction.

The key here is that no matter how we design VM, we can’t deviate too much from the current architecture. In other words, at the bottom of any VM, operations need to be converted into original assembly instructions to execute.

So we thought: Why not use the real CPU instruction set that conforms to the current system architecture of CKB virtual machine to build our own virtual machine?

In this way, we will not lose any possibility of adding static validation, advanced data structure, or encryption algorithm, and whatever data structure or algorithm we provide in VM, we can maximize the flexibility of VM. In addition, through the real CPU instruction set, we can maximize the developers to write any contract that meets the requirements.

Additional advantages

In addition to flexibility, VMs based on real CPU instruction sets have other additional advantages:

  • stability

Once the CPU instruction set designed for hardware is finalized and used in the chip, it will be difficult to modify. Therefore, compared with VM instruction set usually implemented by software, the hardware instruction set is very stable. This property fits well with the demands of Layer 1 block chain VM, because a stable instruction set implies fewer hard bifurcations and does not sacrifice flexibility.

  • Operational transparency

Physical CPUs only rely on registers and a segment of memory at run time. In the process of using stack, the space in memory is usually specified. In this way, we can get the usage of stack space according to the stack pointer in VM during program execution, thus maximizing the visibility of runtime state.

CKB-VM can adjust the stack pointer, change the area allocation in memory, and even expand or reduce the size of the stack area according to need, so as to improve the flexibility of VM. The current CPU instruction set can also provide counts of past cycles, allowing queries about the running overhead status of the VM.

  • Runtime overhead

VM with a real CPU instruction set can easily manage run-time overhead, and the number of cycles required for each instruction to execute (regardless of pipeline) is fixed. We can design the overhead computing mechanism of CKB-VM runtime according to the characteristic of real CPU instruction set, so that when we apply the new algorithm, we can also accurately calculate the required overhead.

However, compared with VM which implements encryption algorithm by operation code or native VM instruction set, there is a key disadvantage in using real CPU instruction set: performance.

However, according to the research and test results, we can optimize the encryption algorithm based on real CPU instruction set running on VM through appropriate optimization and JIT compiler implementation, so as to meet the needs of CKB applications. When dealing with instantaneity, we do it based on the underlying instruction set rather than on a high-level language like JavaScript, which makes the VM have lower workload and better performance.

Why Not WASM?

One might ask: Why not use Web Assembly directly by CKB when the Block Chain Community has a strong interest in and wide interest in Web Assembly?

Web Assembly is a great project, and we very much hope that it will succeed in the end. A sandbox environment with wide support is a dream for the whole block chain industry, even the entire software industry. In the long run, Web Assembly also has the potential to implement most of the features needed by CKB, but it does not provide all the benefits that RISC-V can bring to CKB-VM as mentioned in “The Birth of CKB-VM (I)1”. For example, at present, Web Assembly is all JIT implementations and lacks a reasonable run-time overhead calculation. Computational model.

In addition, RISC-V began to design in 2010, released the first version of the specification in 2011, and built hardware based on RISC-V in 2012. Web Assembly appeared in 2015 and released MVP in 2017. RISC-V will be more mature than Web Assembly, so at least at this stage, we feel that using Web Assem will be more mature. Bluey is not the best choice for CKB-VM.

Of course, we are not completely abandoning the use of WebAssembly. Considering that WebAssembly and RISC-V are the same implementation of the underlying VM, and many designs and instruction sets are very similar, it is entirely possible to provide a binary converter from Web Assembly to RISC-V to ensure that CKB can also take advantage of the current zone. Web Assembly-based innovation on block chains. In addition, CKB supports languages that can only be compiled into Web Assembly (such as Forest: https://github.com/forest-lan…).

Community-driven CKB-VM

Through the design of CKB-VM, our goal is to build a community around CKB, which can freely develop and adapt to the progress of new technologies, and minimize manual intervention (such as hard bifurcation). We believe that CKB-VM can achieve this vision.

Note: Like CKB, CKB-VM is an open source project. At present, CKB-VM is still in the process of development. Although most of the designs of CKB-VM have been finalized, some designs may change and advance in the future because of your contribution. This article is to let our community know more about CKB-VM, so that everyone can play better in it and contribute!

CKB-VM:https://github.com/nervosnetw…