5g Revolution: how to maximize the performance of “data”?

Time:2021-10-27

one

As early as the mid-2000s, h-store was first proposed by us in M.I.T. voltdb is a commercial product of h-store, which means that data with similar structure will be stored together continuously. In the subsequent description of this article, we will use V-H for abbreviation.
The design of V-H (started in 2004) emphasizes the maximum performance of large-scale transactions per second (TPS) with considerable low latency per second (in milliseconds). The reason for this is that with the advent of faster secondary storage (such as SSD and NVRAM), the performance of disk based DBMS will improve.
To sum up, RAM based DBMS must be designed, which has obvious performance advantages compared with traditional DBMS system.

two

V-H uses three key technical focus points:

2.1 focus on single partition transaction

Multi node main memory DBMS must partition data across nodes. Multi node transactions inevitably involve heavily loaded distributed concurrency control protocols. As described in [Harding], distributed concurrency control greatly reduces the execution speed of operations. If there are significant resource waits between transactions, the throughput will also be greatly reduced. To avoid this overhead, V-H focuses on optimizing so-called single partition transactions.
In this case, application designers should organize their data so that almost all transactions do not span data on multiple nodes. Many applications are naturally “single parts”, such as updating a single user’s balance and checking their authorized telephone call permissions. In other words, dividing the user account into multiple nodes will make the above transactions span only one partition. On the other hand, transferring funds from one account to another usually cannot become a single partition, because it is usually impossible to aggregate both accounts in one partition.
In summary, many applications can be made into a single partition, while others cannot. In addition, several very large applications insist that all transactions are single partitioned. Therefore, they prohibit multi partition transactions as a best practice of application architecture to maximize performance.
V-H chooses to optimize “single partition” transactions to achieve high performance.
Although V-H can also perform “multi partition” transactions, their performance is not high. We should use voltdb in scenarios that are primarily single partitions.

2.2 focus on stored procedures

Most OLTP use cases mainly contain repetitive transactions, so there are some high-volume transaction types. Because performing a single transaction requires multiple communications between the server and the client, using ODBC / JDBC to perform these operations is not the best choice.
The application of a single access interface in NoSQL, in which the value is retrieved one-to-one to the client for processing, will also have similar communication overhead, which will not only lead to multiple client server communication overhead, but also cause unnecessary pressure on the use of network bandwidth. On the contrary, if the stored procedure interface is used, in this case, the transaction execution code (a mixture of Java and SQL) is moved into the DBMS and can be executed with only one communication message. When Sybase introduced stored procedures in the mid-1980s, it had about five times the performance advantage over ODBC / JDBC interface data access.
Therefore, V-H makes the stored procedure interface to obtain higher running performance.

2.3 focus on active active data replication

Basically all OLTP applications require high availability (HA). This requires that each object be copied multiple times and that the system fail over to backup in the event of a crash. During normal processing, V-H must ensure that all replicas are processed with related transactions or that no transactions are processed. Only in this way can V-H achieve “failover” without data corruption.
There are two possible strategies for performing replica updates:

2.3.1 active – active replication

That is, transactions are executed on all replicas and committed locally on all nodes.
In this case, all replicas are “active” and each site with replicas will be transacted.
For example, at & T will allow east coast and West Coast customers to talk to the nearest cluster and actively replicate and synchronize in the background.

2.3.2 active passive replication

That is, a replica is designated as the primary replica and each transaction is executed in it first. Log records are written to this node and then moved to the backup node over the network.
In each backup, the logs roll forward to synchronize the secondary database with the primary database.

three

Given that there are two possible strategies, which one should be chosen?
A few years ago, [malvaiya] implemented single copy crash recovery in voltdb. He compared two strategies:
1. Write the command log and rerun the command during recovery
2. Write the data log and roll the log forward during recovery.
He found that the overhead of command logging during execution is negligible, so it is much faster than data recording method. [Yu] extends this code to replication and implements active active and active passive replication. He found that active – active is the performance winner, almost twice as much.
Therefore, voltdb focuses on active active replication, which requires the deterministic concurrency control strategy occasionally used by v-h. In contrast, most V-H competitors use uncertain concurrency control strategies (e.g., dynamic locking, optimistic concurrency control, multi version concurrency control). Therefore, active active is not the choice of these systems, and they do not have twice the speed advantage.
Overall, these three decisions enable V-H to process transactions even an order of magnitude faster than other major memory DBMS. In the benchmark ([somagani], [acme]) test, V-H runs 1m transactions per second on a reasonably sized cluster. So far, this is faster than any customer workload we know. Therefore, V-H competitors can run these workloads on order, but need to invest more hardware costs.

four

However, with the rise of 5g applications, this competition pattern will change greatly.
5g is expected to provide higher bandwidth, higher density (up to one million devices per square kilometer) and millisecond delay. This density of devices forces new radio access network (RAN) cell technologies to avoid saturating existing networks. In turn, this will multiply the number of database TPS to:
Update status change information for each device in the network

Real time authentication and authorization policies are implemented for each new device communication. In addition, network slicing is a 5g requirement, and part of the network is dedicated to each use case, such as industrial Internet of things, video, VR, etc. In each case, immediate decision-making is required to achieve load balancing and quality of service assurance, so as to increase the number of users (personnel + IOT devices).

To demonstrate the need for higher TPS, let’s consider a typical wireless operator example: a medium-sized operator may support 10 million phones; The larger may be 150 million. Typical wireless operators have a large number of transactional applications. Here are some examples:
Billing and charging: the current network charges in increments of 6 seconds, that is, 10 times per minute. If the duty cycle of an ordinary telephone is 10%, the small network has 6 million billing events per minute, or 100000 per second. For larger networks, the number is much higher. Over time, the number of IOT devices is expected to at least quadruple. In this way, billing will be a very high TPS application, and the consistency requirements will not be reduced under millisecond delay.
New services: IOT devices are expected to continue to enable new services in the 5g world. These will include a medical alert application that can be connected to emergency personnel rescue when a person falls or falls. Due to the concentration of subscribers in the football field, the geofence subnet is dynamically rotated to cope with connection peaks. Smart metering will realize personalized customer experience and communication, promote the analysis of power grid consumption and energy demand, and meet the requirements of new regulations. In wind and solar farms, 5g can also continuously monitor and predictive maintain IOT sensors.

five

In short, due to the characteristics of 5g, the transactional operation requirements of many applications are growing rapidly.
In addition, most wireless applications (such as billing) are single partition transactions, which can give full play to the architectural design advantages of voltdb. For such applications, even medium-sized wireless operators must support millions of transactions per second. Most memory DBMS cannot support this number of transactions.
5g Revolution: how to maximize the performance of

Voltdb is an exception. Some of our quasi tests also show the leadership in architecture concept. If you are a ktps application (thousands of transactions per second), there are many solutions. But if you expect MTPs (millions of transactions per second), try voltdb.

By Michael Stonebraker

*Reference reference
[Harding] http://www.vldb.org/pvldb/vol…
[Malvaiya] http://hstore.cs.brown.edu/pa…
[Yu] http://www.cs.cmu.edu/~pavlo/…
[Somagani] https://www.voltdb.com/blog/2…
[Acme] https://www.voltdb.com/blog/2…*

If you are interested in voltdb’s industrial Internet of things big data low latency scheme and life-cycle real-time data platform management, you are welcome to enter our official communication group.