What are libraries, tables and super tables? How to use it? Data modeling of tdengine

Time:2021-6-3

The second video tutorial is to quickly sort out the abstract concepts in tdengine and learn to plan the data model in the production scene.

clicklink, get a video tutorial.

Welcome to the data world of the Internet of things

In the typical scene of Internet of things, there are many different types of acquisition devices to collect a variety of different physical quantities. For the same type of acquisition device, there are often many devices distributed in different places. The system needs to summarize, calculate and analyze all kinds of collected data. For the same type of device, the collected data is very regular.

In this paper, we take smart meter (current and voltage acquisition) as an example to discuss how to build database, super meter and meter in tdengine.

Suppose that each smart meter collects two quantities of current and voltage, and the data collected is shown in the figure below.

What are libraries, tables and super tables? How to use it? Data modeling of tdengine

Each record has device ID, time stamp, collected physical quantities (such as current and voltage in the figure above), and static tags related to each device (such as location and groupid in the figure above). Each device is triggered by the outside world, or collects data according to the set cycle. The collected data points are sequential and a data stream.

So how does tdengine Abstract these IOT data?

Here, we need to mention the key innovation of tdengine——One collection point and one table. Collection points of the same type are described by a super table, that is, a table structure schema and a static label schema. As shown in the figure above, the meter ID is used as the sub table name (d1001, d1002, d1003, d1004, etc.), the dynamically collected physical quantities are used as the fields, and the static attributes (location and groupid) are used as the sub table labels. Using the super table as a template to generate a sub table corresponding to each collection point. With the super table, it is greatly convenient for data retrieval, query and aggregation of similar collection points.

This design has several advantages

  • It can ensure that the data of a collection point is continuously stored in blocks on the storage medium. If the data is read in a period of time, it can greatly reduce the random read operation and improve the read and query speed by several orders of magnitude.
  • Because the data generation process of different collection devices is completely independent, the data source of each device is unique, and there is only one writer in a table. In this way, the lock free mode can be used to write, and the writing speed can be greatly improved.
  • For a data acquisition point, the data generated by it is sequential, so the write operation can be implemented by appending, which further greatly improves the speed of data writing.

If we use the traditional way to write the data of multiple devices to a table, due to the uncontrollable network delay, the time sequence of the data of different devices arriving at the server cannot be guaranteed, the write operation must be protected by lock, and the data of a device cannot be stored together continuously. Using one data collection point and one table can ensure the best performance of insertion and query of single data collection point.

Basic methods of data modeling

Tdengine adopts relational data model, and needs to build database and table. Therefore, for a specific application scenario, we need to consider the design of library, super table and ordinary table.

CREATE DATABASE dbname
USE dbname
CREATE TABLE stbname (ts timestamp, other fields…) tags ( tag fields)
Create table TBNAME using stbname Tags
INSERT INTO tbname VALUES(now, values…)

Create Library

Different types of data collection points often have different data characteristics, including the frequency of data collection, the length of data retention time, the number of copies, the size of data blocks and so on. In order to maximize the efficiency of tdengine in various scenarios, it is recommended to create tables with different data characteristics in different databases, because each database can be configured with different storage strategies.

When creating a database, in addition to the SQL standard options, the application can also specify the retention time, the number of copies, the number of memory blocks, the time precision, the maximum and minimum number of records in the file block, whether to compress, the number of days to cover a data file and other parameters. For example, it is suggested to create a library for tables with the same data characteristics, and each library can be configured with different storage policies.

CREATE DATABASE power KEEP 365;

The above will create a library named power, and the data of this library will be retained for 365 days. For more parameters and syntax, see:

https://www.taosdata.com/cn/documentation20/taos-sql/

After creating a library, you need to use the SQL command use to switch the current library, for example:

USE power;

Replace the current operation library with power. You can also use “library name. Table name” to specify the name of the operation’s Library and table.

Introduction of Super Table

A data acquisition point and a table means that 10 million smart meters correspond to 10 million meters. In an Internet of things system, there are often massive data acquisition points of the same type. How to operate so many tables is a huge challenge. To facilitate the operation of multi tables of the same type, tdengine introduces super tables.

When creating a super table, you need to provide: table name, table structure schema, and label schema.

CREATE TABLE meters (ts timestamp, current float, voltage int) TAGS (location binary(64), groupdId int);

The columns of super table are divided into two parts: dynamic part and static part.

The dynamic part is the collected data, the first column is the time stamp (TS), and the other columns are the collected physical quantity (current, voltage).

The static part refers to the static attribute of the collection point, which is generally used as a label. Such as the geographic location of the collection point, device model, device group, administrator ID, etc.

Tags can be added, deleted and modified afterwards.

Tdengine supports the following data types.

What are libraries, tables and super tables? How to use it? Data modeling of tdengine

Deep understanding of hypertables

Collection of the same table at the same time: in a super table, the collected physical quantities must be collected at the same time, that is, the timestamps are the same.

For a type of equipment, there may be multiple groups of physical quantities, and each group of physical quantities is not collected at the same time, so it is necessary to build a super table for each group of physical quantities. Therefore, a type of device may need to establish multiple super tables.

If the system has n different types of devices, it needs to establish at least n super tables.

A system can have multiple DB libraries, and a DB library can have one or more super tables.

Create table / sub table

Tdengine needs to create independent tables for each data acquisition point; It is also called sub table (d1001) because it is created from super tables; When creating, you need to use the super table as the template and specify the specific value of the label; A super table can contain several sub tables. There is no limit to the number of sub tables.

CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2);

D1001 is the sub table name, meters is the super table name, followed by the label value “Beijing. Chaoyang” of location and the label value 2 of groupid. When creating a table / sub table, you need to specify the label value, and you can modify it afterwards; It is recommended to use the globally unique ID of the data collection point as the sub table name (such as the device serial number).

Automatic creation of sub table

In some special scenarios, when users write data, they are not sure whether a sub table exists. At this time, you can use the automatic table creation syntax to create a table that does not exist. If the table already exists, no new table will be created.

INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219);

The above SQL statement inserts the record (now, 10.2, 219) into the table d1001. If the table d1001 has not been created, the Super Table meters is used to create the template automatically, and the label value “Beijing. Chaoyang”, 2, is marked at the same time.

Multi column Model vs single column model

Tdengine supports both multi column model and single column model.

  • Multi column model: as long as the physical quantities are collected at the same time at the same data acquisition point, they can be placed in a super table as different columns.
  • Single column modelEach physical quantity has its own table. For example, two super meters are built for current and voltage.

We suggest: adopt multi column model as much as possible, because the insertion efficiency and storage efficiency are higher; For some scenes, the type of physical quantity of a collection point often changes, so the single column model can be used.

Examples of new energy vehicles

Scene and modeling analysis

  • A car company plans to track and analyze the new energy vehicles it produces and sells;
  • Each vehicle is equipped with a remote acquisition terminal to collect vehicle status information: location (longitude and latitude), vehicle speed, battery temperature, battery current, ambient temperature and tire pressure;
  • Background statistical analysis needs to be classified and aggregated according to: vehicle type, sales area, salesperson, battery pack capacity and motor power;
  • The first four of the six quantities are collected at the same time, and put them into a super table vehicle_ For the other two measuring points, the frequency of temperature and tire pressure collection is completely different, and two super tables vehicle are created respectively_ temp, vehicle_ tire;
  • Each vehicle has a unique VIN code, and the prefix of the code and super table name is used as the unique table name.

Example of SQL statement

CREATE DATABASE nev KEEP 3650;
USE nev;
CREATE TABLE vehicle_main (ts timestamp, longitude double, latitude double, vspeed int, btemp int, bcurrent int) TAGS (vin binary(30), model binary(20), szone binary(30), sales int, bcapacity float, mpower float);
CREATE TABLE vmTS8392EGV062192009 USING vehicle_main TAGS ("TS8392EGV062192009", "GTS7180", "Beijing.haidian", "10060089", 86.0, 125.5);
CREATE TABLE vehicle_temp (ts timestamp, vtemp int) TAGS (vin binary(30));
CREATE TABLE vtpTS8392EGV062192009 USING vehicle_vtemp TAGS ("TS8392EGV062192009");
CREATE TABLE vehicle_tire (ts timestamp, vpressure int) TAGS (vin binary(30));
CREATE TABLE vtrTS8392EGV062192009 USING vehicle_vtire TAGS ("TS8392EGV062192009");

//Query the running track of the specified vehicle in the last 10 days
SELECT ts, longtitude, latitude FROM vtrTS8392EGV062192009 where ts >now -10d

//Query the average vehicle speed, average power battery temperature and average discharge current by vehicle type
SELECT AVG(vspeed), AVG(btemp), AVG(bcurrent) FROM vehicle_main GROUP BY model

I believe that by now, you have completely sorted out the concepts of library, table and super table in tdengine, and you are ready to operate!

Pay attention to official account number TDengine, reply back to “1203”, get PPT in this tutorial.