Python basic practical age guessing game

Time:2021-12-2

Flink window background

Flink believes that batch is a special case of streaming, so the underlying engine of Flink is a streaming engine, which implements stream processing and batch processing. Window is the bridge from streaming to batch. Generally speaking, window is a mechanism used to set a finite set of infinite streams to operate on bounded data sets. The set on the stream is delimited by window, such as “calculate the last 10 minutes” or “sum of the last 50 elements”. The window can be driven by time window (e.g. every 30s) or data (count window) (e.g. every 100 elements). The datastream API provides windows for time and count.

The general skeleton structure of a Flink window application is as follows:

// Keyed Window
stream
. keyby (…) < – group by one key
. window (…) < – assign the elements in the data flow to the corresponding window
[. Trigger (…)] < – specify trigger trigger (optional)
[. Evictor (…)] < – specify the cleaner evictor (optional)

. reduce / aggregate / process() < - window function

// Non-Keyed Window
stream
. windowsall (…) < – without grouping, assign all elements in the data flow to the corresponding window
[. Trigger (…)] < – specify trigger trigger (optional)
[. Evictor (…)] < – specify the cleaner evictor (optional)

. reduce / aggregate / process() < - window function

There are two necessary operations in the skeleton structure of Flink window:

Use the window assignor to assign the elements in the data flow to the corresponding window.
When the window trigger conditions are met, the data in the window is processed using the window function. The commonly used window functions include reduce, aggregate and process
scroll window

Time driven
The data is segmented according to the fixed window length. There is no overlap between the windows under the rolling window, and the window length is fixed. We can use tumbling eventtimewindows and tumbling processingtimewindows to create a scrolling time window based on event time or processing time. The length of the window can be set with seconds, minutes, hours and days in org.apache.flink.streaming.api.windowing.time.time.

//Key handling cases
KeyedStream<Tuple2, Tuple> keyedStream = mapStream.keyBy(0);
//Based on time driven, a window is divided every 10s
WindowedStream<Tuple2, Tuple, TimeWindow> timeWindow =
keyedStream.timeWindow(Time.seconds(10));
//Based on event driven, every three events (i.e. data of three same keys) are separated, and a window is divided for calculation
// WindowedStream<Tuple2, Tuple, GlobalWindow> countWindow =
keyedStream.countWindow(3);
//Apply is the application function of the window, that is, the function in apply will be applied to the data of this window.
timeWindow.apply(new MyTimeWindowFunction()).print();
// countWindow.apply(new MyCountWindowFunction()).print();
Event driven
When we want to drive the purchase behavior of every 100 users, every time the window is filled with 100 “same”gameElement, the window will be calculated. It is easy to understand. The following is an implementation case

public class MyCountWindowFunction implements WindowFunction<Tuple2,
String, Tuple, GlobalWindow> {

@Override
public void apply(Tuple tuple, GlobalWindow window, Iterable<Tuple2>
  input, Collectorout) throws Exception {
  SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
  int sum = 0;
  for (Tuple2tuple2 : input){
    sum += tuple2.f1;
  }
  
  //Useless timestamp. The default value is long.max_ Value, because time is not concerned based on event count.
  long maxTimestamp = window.maxTimestamp();
  out.collect("key:" + tuple.getField(0) + " value: " + sum + "| maxTimeStamp :"+ maxTimestamp + "," + format.format(maxTimestamp)
  );

}
}
Sliding time window

Moving window is a more generalized form of fixed window. Sliding window is composed of fixed window length and sliding interval. Characteristics: the window length is fixed and can be overlapped. The sliding window slides forward continuously with a step (slide), and the length of the window is fixed. When using, we need to set slide and size. The size of the slide determines how often Flink creates new windows. If the slide is small, there will be a lot of windows. When the slide is smaller than the size of the window, adjacent windows will overlap and an event will be assigned to multiple windows; If the slide is larger than the size, some events may be discarded

Time based scrolling window
//Based on the time drive, calculate the data of the last 10s every 5S
// WindowedStream<Tuple2, Tuple, TimeWindow> timeWindow =
keyedStream.timeWindow(Time.seconds(10), Time.seconds(5));
SingleOutputStreamOperatorapplyed = countWindow.apply(new WindowFunction<Tuple3, String, String, GlobalWindow>() {

@Override
public void apply(String s, GlobalWindow window, Iterable<Tuple3> input, Collectorout) throws Exception {
    Iterator<Tuple3> iterator = input.iterator();
    StringBuilder sb = new StringBuilder();
    while (iterator.hasNext()) {
        Tuple3next = iterator.next();
        sb.append(next.f0 + ".." + next.f1 + ".." + next.f2);
    }

// window.

    out.collect(sb.toString());
}

});
Event based scrolling window
/**

  • Sliding window: windows can overlap
  • 1. Time driven
  • 2. Event driven
    */
    WindowedStream<Tuple3, String, GlobalWindow> countWindow = keybyed.countWindow(3,2);

SingleOutputStreamOperatorapplyed = countWindow.apply(new WindowFunction<Tuple3, String, String, GlobalWindow>() {

@Override
public void apply(String s, GlobalWindow window, Iterable<Tuple3> input, Collectorout) throws Exception {
    Iterator<Tuple3> iterator = input.iterator();
    StringBuilder sb = new StringBuilder();
    while (iterator.hasNext()) {
        Tuple3next = iterator.next();
        sb.append(next.f0 + ".." + next.f1 + ".." + next.f2);
    }

// window.

    out.collect(sb.toString());
}

});
Session time window

It is composed of a series of events combined with a timeout gap of a specified length of time, which is similar to the session of a web application. Then, at www.cungun, com, that is, if no new data is received for a period of time, a new window will be generated. In this mode, the length of the window is variable, and the start and end times of each window are not determined. We can set a fixed length session gap, or use the sessionwindowtimegapextractor to dynamically determine the length of the session gap.

val input: DataStream[T] = …
// event-time session windows with static gap
input

.keyBy(...)
.window(EventTimeSessionWindows.withGap(Time.minutes(10)))
.(...)

// event-time session windows with dynamic gap
input

.keyBy(...)
.window(EventTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor[T] {
  override def extract(element: T): Long = {
    // determine and return session gap
  }
}))
.(...)

// processing-time session windows with static gap
input

.keyBy(...)
.window(ProcessingTimeSessionWindows.withGap(Time.minutes(10)))
.(...)

// processing-time session windows with dynamic gap
input

.keyBy(...)
.window(DynamicProcessingTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor[T] {
  override def extract(element: T): Long = {
    // determine and return session gap
  }
}))
.(...)

Window function

After the window is divided, it is necessary to process the data in the window. First, the incremental calculation corresponds to reduce and aggregate, and second, the full calculation corresponds to process. Incremental calculation refers to that the window saves an intermediate data. Each time a new element flows in, the new element and the intermediate data are combined to generate new intermediate data, and then saved to the window. Full calculation means that the window caches all the elements of the window first, and performs calculation on the full elements in the window after triggering conditions