Time attribute of Flink table API & SQL Programming Guide (3)

Time:2021-1-12

Flink has three temporal semanticsProcessing time(processing time)Event time(event time) andIngestion time(intake time). For the specific explanation of these temporal semantics, we can refer to another articleFlink’s time and watermarks。 This paper mainly explains how time-based operators in Flink table API & SQL define time semantics. You can learn from this article:

  • Brief introduction of time attribute
  • processing time
  • Event time

Introduction to time attribute

Time based operations (such as window) in Flink table API & SQL need to specify time semantics. Tables can provide a logical time attribute according to the specified time stamp.

The time attribute is a part of the table Schama. When a table is created using DDL, when a datastream is converted to a table, or when a tablesource is used, the time attribute is defined. Once the time attribute is defined, it can be regarded as a reference to a field, which can be used in time-based operations.

Time attribute is like a time stamp, which can be accessed and participate in calculation. If a time attribute participates in calculation, it will be atomized into a regular time stamp. The regular time stamp is not compatible with Flink’s time and waterline, and cannot be used by time-based operations.

The time attribute required by Flink tableapi & SQL can be specified in the datastream program, as follows:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic ( TimeCharacteristic.ProcessingTime ); // default

//You can choose:
// env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
// env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

processing time

Local machine time is the simplest time semantics, but it can’t guarantee the consistency of the results. It doesn’t need to extract time stamp and generate water mark to use this time semantics. There are three ways to define the processing time attribute, as follows

DDL statement defines the processing time when creating a table

The processing time attribute can be defined as a calculated column in a DDL statement. The proctime() function is required, as follows:

CREATE TABLE user_actions (
  user_name STRING,
  data STRING,
  user_ action_ Time as proctime() -- declares an additional field as the processing time attribute
) WITH (
  ...
);

SELECT TUMBLE_START(user_action_time, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name)
FROM user_actions
GROUP BY TUMBLE(user_ action_ Time, interval '10' minute); -- 10 minute scrolling window

Define the processing time in the process of converting datastream to table

When converting a datastream to a table, you can specify the time attribute in the schema definition through the. Proactive attribute and place it at the end of other schema fields. The details are as follows:

DataStream<Tuple2<String, String>> stream = ...;
//Declare an additional logical field as the processing time attribute
Table table = tEnv.fromDataStream(stream, "user_name, data, user_action_time.proctime");

WindowedTable windowedTable = table.window(Tumble.over("10.minutes").on("user_action_time").as("userActionWindow"));

Using tablesource

Customize and implement tablesourceDefinedProctimeAttributeThe interface is as follows:

//Define a table source with processing time attribute
public class UserActionSource implements StreamTableSource<Row>, DefinedProctimeAttribute {

    @Override
    public TypeInformation<Row> getReturnType() {
        String[] names = new String[] {"user_name" , "data"};
        TypeInformation[] types = new TypeInformation[] {Types.STRING(), Types.STRING()};
        return Types.ROW(names, types);
    }

    @Override
    public DataStream<Row> getDataStream(StreamExecutionEnvironment execEnv) {
        //Create stream
        DataStream<Row> stream = ...;
        return stream;
    }

    @Override
    public String getProctimeAttribute() {
        //This field is appended to the schema as the third field
        return "user_action_time";
    }
}

//Register table source
tEnv.registerTableSource("user_actions", new UserActionSource());

WindowedTable windowedTable = tEnv
    .from("user_actions")
    .window(Tumble.over("10.minutes").on("user_action_time").as("userActionWindow"));

Event time

Based on the specific time stamp of the record, the consistency of the results will be guaranteed even if there is out of order or late data. There are three ways to define the processing time attribute, as follows

DDL statement sets the event time when creating a table

The event time attribute can be defined by the water mark statement as follows:

CREATE TABLE user_actions (
  user_name STRING,
  data STRING,
  user_action_time TIMESTAMP(3),
  --Declare user_ action_ Time as the event time attribute, and allows 5S delay  
  WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
  ...
);

SELECT TUMBLE_START(user_action_time, INTERVAL '10' MINUTE), COUNT(DISTINCT user_name)
FROM user_actions
GROUP BY TUMBLE(user_action_time, INTERVAL '10' MINUTE);

Define the event time in the process of converting datastream to table

When defining a schema, the event time attribute is specified through the. RowTime attribute, and the timestamp and waterline must be specified in the datastream. For example, in a dataset, the event time attribute is event_ In this case, the event time field in the table can be accessed through ‘event’_ Time. RowTime ‘.

At present, Flink supports two ways to define the eventtime field, as follows:

//Mode 1:
//Extract timestamp and allocate watermarks
DataStream<Tuple2<String, String>> stream = inputStream.assignTimestampsAndWatermarks(...);

//Declare an additional logical field as the event time property
//Use user at the end of table schema_ action_ time.rowtime Define event time properties
//The system will get the event time property in tableenvironment
Table table = tEnv.fromDataStream(stream, "user_name, data, user_action_time.rowtime");

//Mode 2:

//Extract the timestamp from the first field and assign watermarks
DataStream<Tuple3<Long, String, String>> stream = inputStream.assignTimestampsAndWatermarks(...);

//The first field has been used to extract the time stamp. You can directly use the corresponding field as the event time attribute
Table table = tEnv.fromDataStream(stream, "user_action_time.rowtime, user_name, data");

//Use:

WindowedTable windowedTable = table.window(Tumble.over("10.minutes").on("user_action_time").as("userActionWindow"));

Using tablesource

In addition, when creating a tablesource, you can implement the definedrowtimeattributes interface to define the eventtime field. In the interface, you need to implement the getrowtimeattributedescriptors method to create time attribute information based on eventtime.

//Define a table source with the rowTime attribute
public class UserActionSource implements StreamTableSource<Row>, DefinedRowtimeAttributes {

    @Override
    public TypeInformation<Row> getReturnType() {
        String[] names = new String[] {"user_name", "data", "user_action_time"};
        TypeInformation[] types =
            new TypeInformation[] {Types.STRING(), Types.STRING(), Types.LONG()};
        return Types.ROW(names, types);
    }

    @Override
    public DataStream<Row> getDataStream(StreamExecutionEnvironment execEnv) {

        //Create flow based on user_ action_ Time property to assign watermarks
        DataStream<Row> stream = inputStream.assignTimestampsAndWatermarks(...);
        return stream;
    }

    @Override
    public List<RowtimeAttributeDescriptor> getRowtimeAttributeDescriptors() {
        //Tag user_ action_ Time field as the event time property
        //Create user_ action_ Time descriptor, used to identify the time attribute field
        RowtimeAttributeDescriptor rowtimeAttrDescr = new RowtimeAttributeDescriptor(
            "user_action_time",
            new ExistingField("user_action_time"),
            new AscendingTimestamps());
        List<RowtimeAttributeDescriptor> listRowtimeAttrDescr = Collections.singletonList(rowtimeAttrDescr);
        return listRowtimeAttrDescr;
    }
}

//Register table
tEnv.registerTableSource("user_actions", new UserActionSource());

WindowedTable windowedTable = tEnv
    .from("user_actions")
    .window(Tumble.over("10.minutes").on("user_action_time").as("userActionWindow"));

Summary

This paper mainly introduces how to use time semantics in Flink table API and SQL. There are two kinds of time semantics: processing time and event time. The usage of each time semantics is explained in detail.

Official account “big data technology and multi warehouse”, reply to “information” to receive big data package.

Recommended Today

General method of Tkinter (21) components

method explain after(delay_ms, callback=None, *args) At least delay_ Ms after calling callback, no callback, equivalent time.sleep (); returns an ID to cancel after_ The cancel () method uses after_cancel(id) Cancel the callback of after method call after_idle(func, *args) Similar to the after method, but called when there is no event idle bell() A beep bind(sequence=None, […]