Pursue the ultimate design concept! Design of CKB Virtual Machine from scratch with RISC-V

Time:2019-9-11

Nervos underlying public-chain CKB virtual machine (CKB-VM) is a block-chain virtual machine based on RISC-V instruction set. In the last sharing, we briefly introduced the block chain virtual machine and the way our ideal block chain virtual machine looks. In this article, the designers of CKB-VM will introduce in detail the design concept of CKB virtual machine and the thinking logic behind choosing RISC-V instruction set.

Secret Ape Science and Technology Block Chain Classroom No. 23


Design Concept of CKB-VM

CKB is the basic layer of Nervos Network. Its goal is to provide enough security and de-centralization for the upper application. In the process of investigating the selection of CKB-VM, we repeatedly think: What characteristics should CKB-VM have? Obviously, for a virtual machine to be used in block chains, there are two key features that must be satisfied in any case:

  1. Certainty: For fixed programs and inputs, the virtual machine must always return a fixed output result, which will not be changed by other external conditions such as time, running environment, etc.
  2. Security: The execution of the virtual machine will not affect the operation of the platform itself.

But these conditions are only mandatory conditions. We hope to design a virtual machine that can better serve the goals of CKB. After careful consideration, we believe that such a virtual machine should satisfy the following characteristics:

  • flexibility

Our goal is to design a virtual machine that is flexible enough to run for a long time, so that CKB can go hand in hand with the development of cryptography. The history of cryptography is an eternal battle between “holding swords” and “breaking walls”: thousands of years of cryptography development history, encryption and decryption is an endless intellectual contest, in the past as well as in the future. Some of today’s encryption algorithms, such as secp256k1, may be eliminated in the future, and more valuable new algorithms and technologies (such as Schnorr or post-quantum signature) will emerge in the future. Programs running on block-chained virtual machines should be able to use new algorithms more freely and conveniently, while outdated algorithms should be eliminated naturally.

For ease of understanding, we use Bitcoin as an example. At present, Bitcoin uses SIGHASH 1 to sign transactions, and SHA-256 hash algorithm is used in consensus protocols. So can we make sure that this SIGHASH approach to Bitcoin is still the best option in a few years? Or is SHA-256 still suitable as a stable hash algorithm with increasing computational power? At present, if we need to upgrade the encryption algorithm for all the block chain protocols we study, we will inevitably need hard bifurcation. When designing CKB, we want to explore how to reduce the possibility of hard bifurcation through VM design.

We are thinking about whether virtual machines can allow upgraded encryption algorithms. Or is it possible to add new transaction validation logic to VM? For example, if secp256k1 is still used, can we implement more efficient signature verification algorithm without bifurcation if there is an economic incentive drive or the need to update the algorithm? Or, if someone finds a way to implement a better algorithm in CKB, or needs to introduce a new encryption algorithm, can we ensure his or her free implementation?

We hope CKB-VM can provide more implementation space, maximize flexibility, and enable users to use new encryption algorithms without waiting for hard bifurcation.

  • Operational transparency

After studying the current generation of block chain VM, we have noticed a problem, or take Bitcoin as an example: the VM layer of Bitcoin only provides a stack, and the stack cannot know the data size or stack depth that can be stored on the stack when it executes, and all other VMs implemented in stack mode. There are the same problems, although the consensus layer can provide a definition of stack depth or indirectly provide stack depth (based on instruction length or gas limitation). This will make the programmer on VM have to guess the running state of the program. This type of VM makes the program unable to fully exploit the full potential of VM.

Based on this problem, we believe that we should give priority to defining all resource constraints during VM operations, including gas constraints and stack space size, and allow programs running on VM to query resource usage, which will enable programs running on VM to use different algorithms according to resource availability. Through this design, the program can give full play to the potential of VM. And in the following scenarios, we can see that VM is more flexible:

  1. Different strategies can be selected for intelligent contracts for storing data based on the Cell Capacity available to users on CKB. When Cell Capacity is sufficient, the program can store data directly to reduce the number of CPU cycles used (steps that a CPU takes to execute a machine instruction); when Cell Capacity is limited, the program can compress data to accommodate smaller apacity and use more CPU cycles.
  2. Different processing mechanisms can be selected for smart contracts based on the total amount of data stored by users (Cell Data) and the size of residual memory. When there is a small amount of Cell Data or a large amount of residual memory, all Cell Data can be read into memory for processing. When there is a large amount of Cell Data or little residual memory, each operation can read only part of the memory, similar to the operation of swap memory.
  3. For some common contracts, such as hash algorithm, different processing methods can be selected according to the number of CPU cycles provided by users. For example, the security of SHA3-256 is sufficient for most scenarios, but contracts can use SHA3-512 algorithm to meet higher security requirements by using more CPU cycles.
  • Runtime overhead

Gas mechanism in EVM is a very gifted design. It gracefully solves the downtime problem in block chain application scenarios (because EVM is Turing-complete, so it allows circular statements, but infinite circular statements easily lead to downtime problems. Gas mechanism limits the maximum of a block. Computation avoids this problem, and allows programs to compute on a completely de-centralized virtual machine. However, we find that it is very difficult to design a reasonable Gas computing method for different Opcode (operators) in EVM. Almost every version update of EVM requires adjusting Gas computing mechanism (the abstraction level of EVM is relatively high. An EVM instruction may correspond to several underlying hardware instructions. When the program is executed, the amount of data processed and the computational complexity can only be priced by estimation, so EVM needs to constantly adjust the Gas computer mechanism.

So we assume: can we design VM to ensure that the calculation method of resource consumption is more reasonable and accurate when the program runs?

We hope to find a VM design that provides all of the above functions, but find that there is no ready-made solution to achieve our vision of CKB. Therefore, we decided to redesign a VM that satisfies all the above features in order to better realize the vision of CKB.

Solution: RISC-V

Pursue the ultimate design concept! Design of CKB Virtual Machine from scratch with RISC-V

RISC-V is an open source RISC instruction set architecture (ISA) designed by a professor at the University of California, Berkeley, in 2010. The goal of RISC-V is to provide a generic CPU instruction set architecture to support the development of next generation system architecture without the burden of legacy architecture problems in the coming decades.

RISC-V can meet the requirements from low-power microprocessors to high-performance data center (DC) processors. Compared with other CPU instruction sets, RISC-V instruction sets have the following advantages:

  • Transparence

The core design and implementation of RISC-V comply with BSD licensing agreement (one of the most widely used licensing agreements in free software). RISC-V instruction sets can be used by any company or institution, and new software/hardware can be created without restriction.

  • leanness

RISC-V’s 32-bit integer core instruction set has only 41 instructions, even if it supports 64-bit integers, only about 50 instructions. On the premise of providing the same function, RISC-V instruction set is easier to implement and better to avoid Bug than x86 instruction set with thousands of instructions (x86 instruction set manual has more than 2000 pages and will increase continuously, while RISC-V instruction set manual only has more than 100 pages).

  • Modularization

RISC-V uses a simplified kernel and a modular mechanism to provide more extended instruction set settings. For example, CKB may choose to implement the V extension defined in the RISC-V kernel to support vector computing or add an extended instruction set to 256-bit integer computing, thus providing possibilities for high-performance encryption algorithms.

  • Extensiveness of support

Compilers such as GCC and LLVM support RISC-V instruction set, and the back end of Go for RISC-V is also under development. CKB-VM implementations use a wide range of ELF formats, that is, any language that can be compiled into RISC-V instruction sets can be directly used to develop intelligent contracts for CKB.

  • Maturity

RISC-V core instruction set has been finally confirmed and fixed, and all future RISC-V implementations need to be backward compatible. So when VM instructions are updated, CKB will not have hard bifurcation. In addition, RISC-V instruction set has been implemented in hardware and verified in real application scenarios, and there are no potential risks in other less supported instruction sets.

Although other instruction sets may have some of the above characteristics, according to our assessment, RISC-V instruction set is the only instruction set with all the above characteristics. Therefore, we choose to use RISC-V instruction set to implement CKB-VM. In addition, intelligent contracts will use ELF format to ensure broader language support.

In addition, we will add dynamic links to CKB-VM to ensure Cell Sharing. Although the implementation of CKB provides the most popular encryption algorithm, we encourage the community to provide more optimized encryption algorithm implementations to reduce runtime overhead (CPU cycles).