Application of tdengine in Haige intelligent manufacturing

Time:2021-9-6

[introduction to the author]: Da Luo, Haige intelligent manufacturing architect, mainly engaged in cloud native and big data system development, and once participated in the construction of national demonstration industrial Internet system.

For industrial Internet or Internet of things systems, the most basic requirement is to display data curves, such as power curves, which are similar to the time-sharing diagram of stocks. Usually, we will take the last power value reported by the equipment every minute as the power of this minute. If the equipment is not reported within a certain minute, we will take the power value of the last minute, and so on. Examples are as follows:

Application of tdengine in Haige intelligent manufacturing

Obtained minute curve:

Application of tdengine in Haige intelligent manufacturing

Application of tdengine in Haige intelligent manufacturing

Usually, we will write the data reported by the device to Apache Kafka first. If it is an offline computing scenario, you may consider writing data to hive, then regularly reading hive using spark SQL, and then writing the calculation results to HBase; If it is a real-time computing scenario, Apache Flink will be used to consume Kafka data and write the results to HBase. In this case, problems such as data disorder and delayed delivery calculation need to be considered.

Moreover, based on the traditional big data Hadoop architecture, zookeeper and HDFS need to be built, followed by hive and HBase. The maintenance cost of the whole system is very high. In addition, HBase stores timing data based on key values, which will waste a lot of space on the data design architecture of the same key value.

The above is one of the pain points in the computing scenario of Internet of things device attribute curve. In addition, the characteristics of data growth, data verification and data disaster recovery need to be considered.

In order to provide customers with integrated solutions based on 3D printing technology, the company where the author works naturally needs to continuously track the operation status of the equipment and store the operation data of the equipment. At this time, we found the open source Internet of things big data platform tdengine(https://github.com/taosdata/TDengine)。

Referring to the writing method of SQL in the tdengine document, when the data is complete, you can easily use one sentence of SQL to solve the above problems:

select last(val) a from super_table_xx where ts >= '2021-06-07 18:10:00' and ts <= '2021-06-07 18:20:00' interval(60s) fill(value, 0);

Why is the execution efficiency of similar SQL and tdengine so high?

This lies in its super table and sub table. For the data of a single device, tdengine designs the feature of continuous storage according to time. In fact, when the business system uses the Internet of things data, whether it is real-time query or offline analysis, it has the characteristics of reading a continuous period of data of a single device.

Assuming that we want to store the temperature and humidity of the device, we can design the super table as follows:

create stable if not exists s_device (ts TIMESTAMP,
  temperature double,
  humidity double
) TAGS (device_sn BINARY(1000));

In actual use, for example, the SQL for inserting the data of devices’ D1 ‘and’ D2 ‘is as follows:

insert into s_device_d1 (ts, temperature, humidity) USING s_device (device_sn) TAGS ('d1') values (1623157875000, 35.34, 80.24);
insert into s_device_d2 (ts, temperature, humidity) USING s_device (device_sn) TAGS ('d2') values (1623157891000, 29.63, 79.48);

Search the data of the device ‘D1’ in a certain time period, and its SQL is as follows:

select * from s_device where device_sn = 'd1' and ts > 1623157871000 and ts < 1623157890000 ;

Assuming that the average temperature curve of the past 7 days is counted, 1 point per hour:

select avg(temperature) temperature from s_device where  device_sn = #{deviceSn} and ts >= #{startTime} and ts < #{endTime}  interval(1h)

Tdengine also provides many aggregation functions, such as last and fill for calculating 1-minute continuous curves, and other commonly used sum and max.

In the process of combining with the application, we choose mybatis, a flexible and easy-to-use ORM framework, for example, for the above data table’s_ Device ‘, let’s define entity first:

import com.baomidou.mybatisplus.annotation.TableField;
import com.baomidou.mybatisplus.annotation.TableName;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.sql.Timestamp;
/**
 * @author: DaLuo
 * @date: 2021/06/25
 * @description:
 */
@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
@TableName(value = "s_device")
public class TestSuperDeviceEntity {
    private Timestamp ts;
    private Float temperature;
    private Float humidity;
    @TableField(value = "device_sn")
    private String device_sn ;
}

Redefine mapper:

import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.hg.device.kafka.tdengine.entity.TestSuperDeviceEntity;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.ibatis.annotations.Insert;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import org.apache.ibatis.annotations.Select;
import java.sql.Timestamp;
import java.util.List;
/**
 * @author: DaLuo
 * @date: 2021/06/25
 * @description:
 */
@Mapper
public interface TestSuperDeviceMapper extends BaseMapper<TestSuperDeviceEntity> {
    /**
     *Single insert
     * @param entity
     * @return
     */
    @Insert({
            "INSERT INTO 's_device_${entity.deviceSn}' (ts ,temperature, humidity ) ",
            "USING s_device (device_sn) TAGS (#{entity.deviceSn}) ",
            "VALUES (#{entity.ts}, #{entity.temperature}, #{entity.humidity})"
    })
    int insertOne(@Param(value = "entity") TestSuperDeviceEntity entity);
    /**
     *Batch insert
     * @param entities
     * @return
     */
    @Insert({
            "<script>",
            "INSERT INTO ",
            "<foreach collection='list' item='item' separator=' '>",
            "'s_device_${item.deviceSn}' (ts ,temperature, humidity) USING s_device (device_sn) TAGS (#{item.deviceSn}) ",
            "VALUES (#{item.ts}, #{item.temperature}, #{item.humidity})",
            "</foreach>",
            "</script>"
    })
    int batchInsert(@Param("list") List<TestSuperDeviceEntity> entities);
    /**
     *Query the average temperature in the past period of time, 1 data point per hour
     * @param deviceSn
     * @param startTime inclusive
     * @param endTime   exclusive
     * @return
     */
    @Select("select avg(temperature) temperature from s_device where  device_sn = #{deviceSn} and ts >= #{startTime} and ts < #{endTime}  interval(1h)")
    List<TempSevenDaysTemperature> selectSevenDaysTemperature(
            @Param(value = "deviceSn") String deviceSn,
            @Param(value = "startTime") long startTime,
            @Param(value = "endTime") long endTime);
    @AllArgsConstructor
    @NoArgsConstructor
    @Data
    @Builder
    class TempSevenDaysTemperature {
        private Timestamp ts;
        private float temperature;
    }
}

Tdengine has a very clever design, that is, it does not need to create sub tables in advance, so we can easily use the ‘tag’ tag as part of the sub table name to insert data and create sub tables at the same time.

Note: considering the internationalization of cross time zones, all our time storage query interactions use the timestamp instead of the “yyyy MM DD HH: mm: SS” format, because the data storage involves the application time zone, connection string time zone and tdengine service time zone. Using the “yyyy MM DD HH: mm: SS” format is easy to lead to the inaccuracy of time storage, and the timestamp is used, Long integer data format can perfectly avoid such problems.

There are two ways for Java to use tdengine JDBC Driver: jdbc-jni and JDBC restful. The former has more advantages in write performance. However, you need to install the tdengine client driver on the server where the application is running.

Our application uses the kubernetes cluster. The program runs in docker. Therefore, we have made an image suitable for the operation of our application. For example, the dockerfile of the basic image is as follows:

FROM openjdk:8-jdk-oraclelinux7

COPY TDengine-client-2.0.16.0-Linux-x64.tar.gz /

RUN tar -xzvf /TDengine-client-2.0.16.0-Linux-x64.tar.gz &&  cd /TDengine-client-2.0.16.0 &&  pwd && ls && ./install_client.sh

build:

docker build -t tdengine-openjdk-8-runtime:2.0.16.0 -f Dockerfile .

The reference program image is shown in dockerfile:

FROM tdengine-openjdk-8-runtime:2.0.16.0

ENV JAVA_OPTS="-Duser.timezone=Asia/Shanghai -Djava.security.egd=file:/dev/./urandom"

COPY app.jar /app.jar

ENTRYPOINT ["java","-jar","/app.jar"]

In this way, our application can be scheduled on any k8s node.

In addition, our program involves automatic task scheduling, which requires frequent mqtt data interaction with the lower computer of the device. For example, the cloud sends the instruction 1000 – “start task a”, and the lower computer replies the instruction 2000 – “receive task a”, understanding the instruction as a device, and understanding the instruction serial number and content as its attributes, Naturally, this kind of data is also very suitable for storage in tdengine timing database:

*************************** 1.row ***************************
       ts: 2021-06-23 16:10:30.000
      msg: {"task_id":"7b40ed4edc1149f1837179c77d8c3c1f","action":"start"}
device_sn: deviceA
     kind: 1000
*************************** 2.row ***************************
       ts: 2021-06-23 16:10:31.000
      msg: {"task_id":"7b40ed4edc1149f1837179c77d8c3c1f","action":"received"}
device_sn: deviceA
     kind: 2000

In the process of docking with devices in the cloud, we often need to study whether messages are sent, so we urgently need to save instructions, so as to create new threads in the application, subscribe to instruction set messages and write them to the tdengine database in batches.

Finally, tdengine also has a super table log.dn, which retains memory, CPU and other usage information, so we can use grafana to display these data and provide reliable operation data reference for monitoring!

Application of tdengine in Haige intelligent manufacturing