How did CITA reach 15000 TPS?


In the first two issues, Miao Small Classroom shared with you the thinking behind the construction of high-performance block chain core CITA. In this issue, we will study how CITA can optimize its performance and achieve a transaction processing performance of 15000 TPS.

Secret Ape Science and Technology Block Chain Lesson 6
Click on the Technological Community to Focus on Mystery Ape Technologies

In the design of block chains, there is a saying of “impossible triangle”. That is, security, decentralization and performance, which can only take the second. Nervos uses layered design to solve the Impossible Triangle problem. In Layer 1, CKB chooses security and de-centralization, and Layer 2 chooses performance. Layer2 seeks to achieve maximum performance, decentralization and security by CKB.

As a blockchain framework supporting smart contracts, CITA has very good performance. The performance of transaction processing can reach 15000 TPS [1], which is very suitable for layer2 as a high-performance blockchain solution. This article will briefly discuss how Mystery Ape Technology optimizes the performance of CITA.

How did CITA reach 15000 TPS?

Microservice Architecture

Traditional public block chains often adopt a holistic architecture. Because de-centralization needs to be considered, it is necessary to consider that nodes can be executed on common hardware, while performance can not be taken into account in architecture design. As a high-performance license chain designed for enterprise users (license chain can be alliance chain or public license chain), CITA can make better use of server clusters rather than running nodes on a single machine by using micro-service architecture. This can make full use of the advantages of hardware, node is no longer a physical concept, but a logical concept.

How did CITA reach 15000 TPS?


In traditional PBFT algorithms, three-stage protocols prevote, precommit and commit are commonly used, taking Tendermint as an example.

How did CITA reach 15000 TPS?

Commit stage is mainly for Proposer to broadcast another round of BlockProof to other nodes, so that all nodes can vote uniformly. However, in the Precommit phase, each node has collected enough votes, but the voting set may not be the same. For example, for four nodes of ABCD, A may receive votes from ABCD, while B only receives votes from BCD. Since voting is a part of Block, consensus is also needed. In order to ensure that the nodes vote uniformly, Proposer carries out another round of broadcasting.

In CITA-BFT, we optimized the Commit phase. Block consensus is a continuous consensus in the block chain. So, we can put the roof of the current block into the Proposal of the next block, so that we can unify the roof of the previous block in the Prevote stage of the next block, instead of broadcasting the consensus block.

How did CITA reach 15000 TPS?

This has two advantages:

It reduces the broadcast of a round of messages, shortens the consensus time, and reduces the burden of the network.
The traditional PBFT consensus algorithm in Commit stage, if Proposer send off line and so on, will need an additional round of consensus, but this will not happen in CITA-BFT.

Proposal pretreatment

In the traditional block chain of PBFT consensus, the processing of consensus and transaction is serial. Executor is idle in the process of consensus block. After the consensus is completed, the new block is sent to Executor for processing, and Consensus waits for Executor for a new high level of consensus, when the Consensus module is idle. Consensus reached a new level of consensus only after Executor had processed the Block and sent the latest Status.

In the actual consensus process, after the node receives Proposal and verifies it in the Prevote stage, Proposal is likely to become the final Commit Block. Under normal network conditions, the current high level of Block can usually be completed by a round of consensus. If the transaction in Proposal is processed in advance, the latter part of the consensus process and the execution of the transaction are carried out simultaneously. When Executor has finished processing and waits for Consensus to send a confirmed Block, Executor only needs to determine whether the Proposal is a consensus Block. If so, Finalize the processing results directly and notify Consensus of a new high level of consensus; if not, reprocess, which is consistent with the process without pre-execution.

How did CITA reach 15000 TPS?

In most cases, blocks can be processed ahead of time, and transaction processing time can be advanced. Even in the worst case, when there are multiple rounds of consensus, Proposal can compare them according to the timestamp, interrupt the preprocessing currently under way, and execute the updated Proosal. In the worst case, no Proposal was received or the wrong Proposal was received, and Executor was the same as the original consensus process. After receiving CommitedBlock, the transaction was processed without any performance loss.


In performance optimization, caching is a common means, and there are also a large number of caches in CITA to solve performance problems.

Signature verification cache. Generally, the verification of transaction signatures is time-consuming. For transactions that have been verified, the verification results are cached according to their Hash. In this way, if the node receives the same transaction (which may be sent repeatedly by the user or forwarded from other nodes), it can hit the cache and reduce the time consumed to verify the signature.
Block information caching. In the process of transaction processing or user query operation, it is often necessary to query information such as Block or Transaction. This kind of information can be cached, which can greatly improve the efficiency of query.
In the process of transaction processing, we need to read the previous State from the database, while the query path of MPT is relatively long, and it needs many DB queries, which is very time-consuming. In CITA, Accounts are often used for caching.
Through caching technology, transaction verification and processing time is greatly reduced.

Message communication

In micro-service architecture, message communication between micro-services is frequent due to service decomposition, so message middleware is very easy to become a bottleneck. On the one hand, because of the use of micro-service architecture, message middleware itself can be deployed independently. It can be extended vertically by improving hardware capabilities, or horizontally by clustering.

In addition, we also optimize the messages between micro-services to improve the efficiency of communication between micro-services.

Message compression. For example, when the pressure is high, there are often tens of thousands of transactions in the Block, so the news will be very large. So we use message compression technology to compress messages when they exceed a certain size, which reduces the pressure of message middleware, reduces the amount of transmission, and improves the transmission speed.
Reduce unnecessary information. For example, when a Consensus service receives Proposal, its legitimacy needs to be verified. Because it may contain a large number of transactions, it will lead to a large amount of transmission. Consensus can verify whether the Hash of the transaction is correct, and then send the other information of Proposal and the Hash of the transaction to the Auth module, instead of sending the entire transaction to the Auth module.
Pack and send. Packing messages is also a common optimization tool. For example, in RPC module, transactions need to be sent to Auth module for verification. When the pressure is high, the number of messages will be very large when a single message is sent. At this time, RPC will pack the messages and send them to Auth module, which can greatly reduce the number of messages, thus reducing the load of message middleware. Increase the speed of message delivery.

Static Merkle Tree

In Bitcoin, MerkleTree is introduced to solve the transaction verification problem of light nodes. But every node of Merkle Tree generates a Hash, which is time-consuming.

How did CITA reach 15000 TPS?

It is noted that the final leaf node Hc is a direct replication of Hc. This is because transactions in Bitcoin and Ethereum are added to Merkle Tree in turn and Merkle Root can be constructed incrementally. For example, the current Pending Block has transaction TxA, TxB, TxC, TxD, and the current Merkle Root is H (ABCD). After the new transaction comes to TxE, calculate H (EE) and then up, so that the original H (ABCD) part does not need to be calculated.

How did CITA reach 15000 TPS?
How did CITA reach 15000 TPS?

When the transaction TxF arrives, replace the rightmost TxE, and then calculate the root in turn, so that only a part of it can be calculated. The previous H (ABCD) part does not need to be recalculated.

In CITA, trade fairs are first authenticated by Auth into the trading pool. Consensus chooses and packages the transaction at one time, and then agrees on it. Executor processes it. At the same time, the processed result Receipt Root is stored in the header. Since the transaction content is first agreed upon and then the transaction results are calculated, the order of the transaction processing results Receipt Root in Block has been completely determined and will not be modified. Thus, we can calculate all Receipts’Receipt Root at one time, without considering the dynamic calculation process. Thus we can optimize the Receipt Root calculation.

How did CITA reach 15000 TPS?

Comparing Merkle Tree in Bitcoin with Merkle Tree in Ethereum, we find that the Hash calculation in node E will be reduced because there is no replication of odd nodes. We call this tree Static Merkle Tree.

In addition, each transaction in Ethereum generates a new State Root, and the calculation of State Root is time-consuming. So at the beginning of the CITA design, after the transaction is calculated, it will only update its status to Account Model, instead of updating its changes to State MPT. Only after the whole Block calculation is completed will all the calculation results be submitted to the State PT and the State Root be calculated, which greatly reduces the operation and calculation of MPT. The same solution is used in Ethereum’s latest design.

Signature verification

Currently, Bitcoin and Ethereum adopt secp256k1 signature algorithm. In transaction verification, signature verification consumes CPU resources and time-consuming. CITA supports many signature algorithms, and secp256k1 is adopted by default. On the author’s computer (Thinkpad 470p i7-7820HQ), the signature verification speed of simple transfer transactions is about 3000 per second. Auth module provides parallel verification of transactions, which can give full play to the advantages of hardware and improve the speed of system verification.

In addition, CITA also implements Ed25519 signature, which is better than secp256k1 in performance and security. Users can choose the signature algorithm they want according to their personal needs. Our performance test data for 15000 TPS refers to secp256k1.

Asynchronous processing

In software design, asynchronous processing is usually a better means of performance optimization. In addition to the transaction preprocessing mentioned above, there are also some such designs in other modules of CITA. In Executor, for example, when a Block is executed, the latest State needs to be saved to DB, which is time-consuming once the State status changes more. As a result, Executor sends Status to other modules in advance and then stores it in DB. Of course, exceptions may occur when storage fails, so the latest blocks will be saved in Conseneus to prevent exceptional cases of storage failures in other microservices.

Batch Trading

In addition, CITA provides an interface for batch transactions. Users can assemble multiple transaction data and share the same signature. In this way, the original multiple transactions become a transaction, which reduces transaction storage and signature verification, speeds up transaction processing, and also reduces transaction handling fees for users to send transactions. For example, user A needs to invoke N different contracts. Originally, it needs to send N transactions. By means of batch transactions, the data invoked by contract can be assembled together according to the established format, and then a signature is sent to the batch transaction contract. The contract then parses the data into multiple contract calls.

Of course, batch trading can only be considered as a complex combination transaction, only one normal transaction process has been completed, so it can only be considered as a transaction in performance testing. CITA does not use this method in performance testing.


Most of the code in CITA is implemented in Rust language. The minimal runtime of Rust language is comparable to the excellent performance of C language, which is also a guarantee of good performance of CITA. As the first team to use Rust in China, it has grown up with Rust since 2016. Other features of Rust language: ensuring memory security, generics based on trait, pattern matching, type inference, efficient C binding and so on, also greatly improve our development efficiency.

In the future, we will do more research on the improvement of micro-service architecture, network layer, Block/Transaction broadcasting, state storage, hardware acceleration, VM, parallel computing and so on, so as to improve the performance of CITA to a higher level.


CITA Technology White Paper:…




Rust versus C gcc fastest programs: https://benchmarksgame-team.p…