Young people don’t talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

Time:2021-6-20

Last week, Taos data and EMQ online meetup jointly released an integrated solution for industrial Internet, which is based on tdengine and EMQ x to build a lightweight industrial Internet platform for edge computing with the capabilities of industrial data collection, aggregation, cleaning, storage analysis and visualization. At present, tdengine has fully supported arm 32 and arm 64 processors, so why is tdengine a more efficient storage choice for edge side data? What’s better than SQLite? At meetup, Hou JiangXuan, co-founder of Taosi data, shared the technical principles behind this.

From the Internet to the mobile Internet, and now to our Internet of things, computers, mobile terminals, wearable devices, cars, and even lights at home and various devices in factories have been connected to the network. On the whole, all kinds of devices constantly collect real-time state data, and then collect the data to a computing platform in the cloud, which is the general idea of IOT cloud computing.

The whole IOT technology chain has four layers: collecting device status data through sensors, sending data to the cloud through communication module, storing, querying and calculating in the cloud, and finally accessing analysis and application system.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

However, in the cloud computing mode, data must be transferred to the cloud for centralized storage, archiving and analysis. The node on the edge side may be a gateway or a terminal we really use. If it does not have its own computing power, it must send the collected data to the cloud, rely on cloud computing resources for complex calculation, get a guiding conclusion, and then send it to the terminal through the network. It is easy to see that the work of the terminal in this process is very dependent on the network. If there is any interruption or failure in the network, the terminal can not interact with the cloud, and some of its work will be greatly affected. Therefore, this idea of center side (cloud) Master requires very high communication between side clouds, and high-speed communication network with high cost is often used in application. On the other hand, with the increasing amount of data, the storage cost and computing cost of the cloud will continue to rise.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

A good way to solve this problem is edge computing, that is, to sink part of the storage and computing capacity to the edge side (that is, the device side), and the terminal device can store, calculate, make decisions and apply data independently. In this way, the edge side will become more intelligent, less dependent on the cloud, more timely data processing, and no longer affected by the network.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

The advantages of edge computing are summarized as follows.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

However, what are the difficulties of edge computing? We know that the edge side is often some small intelligent terminals that can be laid in large quantities. Considering the cost, the memory, CPU and other hardware resources and computing power are very limited. The difficulty of edge computing is whether it can achieve the most efficient data storage, analysis and calculation with limited computing resources. This makes the database selection on the edge side particularly important. The data collected by the terminal devices on the edge side has obvious characteristics, which are generally time stamped and structured time series data streams. Therefore, the requirements of edge computing for database capability are reflected in the following aspects:

  • Ultra high read / write performance
  • Low hardware overhead
  • Universal interface to meet various computing requirements on the edge side
  • Real time data cache capacity, streaming computing capacity
  • Persistent storage of historical data and efficient compression capability
  • Historical data backtracking ability, statistical aggregation ability by time window
  • Cloud edge collaboration capability

Tdengine — a big data engine more suitable for the edge side

Time series database is the best choice for edge data storage. However, sequential databases such as opentsdb (the bottom layer is based on HBase transformation) and influxdb are still too heavy for the edge side, and the running hardware resource overhead is too high. Tdengine is an extremely lightweight open source timing database. The entire installation package is only 2MB. Its core function is a high performance distributed temporal database; In addition, it also comes with message queuing, caching, streaming computing, data subscription and other functions to provide an all in one solution for sequential structured data storage.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

At present, the tdengine community has released the version that supports arm32 and arm64 processors, which can run smoothly on mainstream edge side hardware such as raspberry pie. At the same time, it also provides various capabilities such as real-time data caching, historical data backtracking, aggregation calculation by time period, etc. Although the probability of using a distributed cluster on the edge side is relatively small, it is also quite possible for any raspberry pie, box or gateway to build a cluster.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

There are many kinds of interfaces supported by tdengine arm, which are almost the same as the normal cluster version. At the same time, it also provides a Taos shell client, so that the debugging personnel can easily check the running status of tdengine.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

Tdengine side cloud collaboration

The edge side resources are limited, and the total amount of data that can be stored is also limited, so we still need to do data backup and collaboration to the cloud. There are also many ideas for edge cloud collaboration. Here are some of our ideas.

Let’s give you an example to facilitate your better understanding. There are many gateways in the edge side factory. We can install an edge side version of tdengine in each gateway. Then tdengine becomes a storage engine on the edge side, which can store the data collected by the gateway persistently. Depending on the data acquisition frequency and compression, the edge side can selectively store a certain length of original data (such as one month to half a year) according to the existing storage resources. For integer or floating-point data, tdengine can compress it to about 10%. Of course, this depends on the specific data type. If the value of the data changes very randomly, the compression ratio will be affected to some extent, but overall, from the actual situation, the compression ratio is still about 10%. Therefore, if we are equipped with a 2GB or even 1GB SD card in the gateway, we can store about 10GB of original data. This order of magnitude is sufficient for real-time edge side analysis.

However, if we need to store more long-term historical data, and further do big data mining and other analysis, we need to synchronize the data to the cloud data center for storage. The edge version of tdengine can be directly accessed by tdengine clients in the cloud (when the network is unblocked), so the data synchronization from the edge to the cloud becomes very simple. Cloud applications can pull the latest data from the edge gateway in real time through the subscription module of tdengine, and then write the received incremental data to the local tdengine cluster in real time for historical archiving. In terms of technical implementation, tdengine is essentially a timed query. Therefore, tdengine allows users to add some data filtering conditions and selectively synchronize the data on the edge side (for example, only pull the records that are larger than a certain threshold, and do not want them if they do not have them), instead of reporting all the historical data to the cloud.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

Based on the storage advantages of tdengine on the edge side and the overall idea of edge cloud collaboration, Taosi data and EMQ also jointly make a solution on the edge side. In short, EMQ x neuron, EMQ x edge, EMQ x Kuiper and tdengine are deployed in the edge gateway. The streaming data collected by the device is converted into mqtt messages through neuron protocol analysis, and then edge (edge side mqtt broker) is released, and then stored in tdengine deployed in the edge department through Kuiper. In this way, the application running at the edge can obtain and process data from tdengine, and do real-time display and alarm. Edge manager, which runs at the edge of EMQ, provides a management console, which can easily implement software configuration and manage the other three components. ClickhereFinally, the configuration method of the scheme is understood in detail. This scheme is equivalent to handing over the coordination work to EMQ.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

However, some users may have used the tdengine cluster in the cloud, and now there are some industrial devices that want to directly access the edge side tdengine through the tdengine cluster client. This can also be directly realized through the data subscription module of tdengine, that is, the application in the cloud calls the data subscription module to create a series of subscription tasks and directly pull the latest incremental data in tdengine on the edge side in real time. This solution is equivalent to the collaborative work to tdengine, of course, here to ensure that the network is smooth.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

Compilation of edge version of tdengine on raspberry pie

Here is also a brief introduction to the practical steps of compiling, installing and running tdengine on raspberry pie.

Environmental preparation

1. Burning operating system

Burning operating system to SD card. Tdengine supports mainstream operating systems such as Ubuntu 16.04, CentOS 7.0 and above.

2. Network settings

Configure the network environment of raspberry pie, set static IP and host name for the development version, and connect to the network.

3. Download and compile tdengine

from www.github.com/taosdata/TDengine Clone tdengine source code to raspberry pie, compile and run.

Compilation process

# clone source code
$ git clone --recursive --recurse-submodules https://github.com/taosdata/TDengine.git

# checkout to the latest version
$ cd TDengine/
$ git checkout ver-2.0.7.0

# compile and install
$ mkdir build && cd build
$ cmake ../ -DCPUTYPE=aarch64 -DVERNUMBER=2.0.7.0 -DVERCOMPATIBLE=2.0.0.0
$ make && make install

# start taosd
$ systemctl start taosd
$ taosdemo

After the compilation and installation, you can see our taosdemo program, which is convenient for you to have a fast experience. You can test the data writing and query efficiency of tdengine through taosdemo.

A simple comparison between tdengine and SQLite

SQLite has to be used for data storage in edge side and embedded devices. SQLite is an ultra lightweight database that does not need the background. It can be described as plug and play. It is also the database with the highest installed capacity in the world. Think of SQLite not as a replacement for Oracle but as a replacement for fopen() SQLite is a compact library. Of course, a series of APIs provided by SQLite are benchmarking against relational databases, and it even supports transactions, Therefore, it is often used as an embedded relational database.

By comparison, SQLite’s installation package on Linux is 1.9mb and tdengine’s is 2.7mb. Both are the ultimate in lightweight. Tdengine is a special solution for temporal structured data. It does not support transaction and complex table relation processing, but it can provide temporal index, real-time stream calculation, column storage, better compression ratio, downsampling aggregation ability according to time, data storage time limit and so on. From this point of view, tdengine is closer to the processing requirements of time series data in the edge side production environment than SQLite. Tdengine edge side version can also achieve seamless docking of cloud products. If the network is not smooth, tdengine can realize automatic data caching and automatic transmission after networking, so as to realize the ability of edge cloud collaboration. Let’s summarize the difference between tdengine and SQLite with a diagram.
Young people don't talk about martial arts. The edge side data storage solution of tdengine challenges SQLite

As a representative of the emerging time series database, tdengine’s many advantages really challenge SQLite, the great master of the generation, in the choice of storage on the edge side. It’s really a little young people don’t talk about martial arts. However, we need to realize that tdengine and SQLite have different focuses. They do not have to choose between each other. Instead, they can be used flexibly according to their own business needs. Tdengine can process time series data and SQLite can process relational data to better realize data autonomy on the edge side.

Pay attention to the official account “TDengine”, reply to “1117” in the background, and get the full version of PPT.