Flink datastream common operators

Time:2020-9-27

map

  • Datastream > datastream: it can be understood as mapping. After a certain transformation for each element, it is mapped to another element.
#Implementing map in Java
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class MapDemo {
     public static void main(String[] args) throws Exception {
         //Build execution environment
         StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
         //Defining data sources
         DataStream<String> socketTextStream = env.socketTextStream("localhost", 9000, "n");
         //Lambda expression
         DataStream<String> result3 = socketTextStream.map( value -> value + "love");
         //Print datastream content
         result3.print();
         //Implementation
         env.execute();
     }
}

#Scala implements map
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.scala._
object MapDemoScala {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("127.0.0.1", 9875)
    text.map(x => x + "love")
  }
}

flatMap

  • Datastream > > datastream: input a parameter to generate 0, 1 or more outputs, which are mostly used for splitting operations
  • The use of flatmap and map methods is similar, but because the return value result of general Java methods is one, after introducing flatmap, we can put multiple processed results into a collection collection (similar to returning multiple results)
#Implementation of flatmap in Java
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class FlatMapDemo {
     public static void main(String[] args) throws Exception {
         //Build execution environment
         StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
         //Defining data sources
         DataStream<String> socketTextStream = env.socketTextStream("localhost", 9000, "n");
         DataStream<String> result = socketTextStream.flatMap((String s, Collector<String> collector) ->{
                    for(String str: s.split(" ")){
                        collector.collect(str);
                    }
         });
         //Print datastream content
         result.print();
         //Implementation
         env.execute();
     }
}

#Scala implements flatmap
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.scala._

object FlatMapDemoScala {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("127.0.0.1", 9002)
    text.flatMap{str => str.split(" ")}
    text.flatMap{_.split(" ")}
  }
}

filter

  • Datastream → datastream: a filter is used to filter and judge each element in the data stream, and the element judged to be true enters the next data stream
#Java
DataStream<String> res = socketTestStream.filter(new FilterFunction<String>() {
    @Override
     public boolean filter(String s) throws Exception {
            return s.startsWith("S");
     }
});

#Scala
text.filter{_.startsWith("S")}
  .print()
  .setParallelism(1)

keyBy

  • DataStream → KeyedStream
  • The data stream is divided into multiple disjoint partitions according to the key. The records of the same key will be divided into the same partition. Keyby() is implemented by hash partition.
  • We can take one or more attributes of a POJO class as a key, or we can use the element of a tuple as a key. However, there are two types of keys that cannot be used as keys

    1. No hashcode method is copied, only POJO class of object’s hashcode method is inherited by default
    2. Array type
#Java in POJO
SingleOutputStreamOperator<WordCountPOJO> streamOperator = socketTextStream
        .flatMap((String value, Collector<WordCountPOJO> out) -> {
            Arrays.stream(value.split(" ")).
            forEach(str -> out.collect(WordCountPOJO.of(value, 1)));
        }).returns(WordCountPOJO.class);
        
KeyedStream<WordCountPOJO, Tuple> keyedStream = streamOperator.keyBy("word");
SingleOutputStreamOperator<WordCountPOJO> summed = keyedStream.sum("count");

#Java in Tuple
SingleOutputStreamOperator<Tuple2<String, Integer>> singleOutputStreamOperator = dataStreamSource
            .flatMap((String value, Collector<Tuple2<String, Integer>> out)-> {
                 Arrays.stream(value.split(" "))
                .forEach(str -> out.collect(Tuple2.of(value, 1)));
            }).returns(Types.TUPLE(Types.STRING, Types.INT));
 
KeyedStream<Tuple2<String, Integer>, Tuple> keyedStream = singleOutputStreamOperator.keyBy(0);
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = keyedStream.sum(1);
     
#Scala in POJO
text.flatMap{_.split(" ")}
  .map(x => WordCountPOJO(x,1))
  .keyBy("word")
  .timeWindow(Time.seconds(5))
  .sum("count")
  .print()
  .setParallelism(1)

reduce

  • Keyedstream → datastream: summarize the results of each partition in the data stream, and aggregate the data, such as min (), max (), AVG, count, etc., which can be implemented through reduce
public class StudentPOJO {
     private String name;
     private String gender;
     private String className;
     private double score;
     public StudentPOJO() {

     }
    public StudentPOJO(String name, String gender, String className, double score) {
         this.name = name;
         this.gender = gender;
         this.className = className;
         this.score = score;
    }
    public static StudentPOJO of(String name, String gender, String className, double score) {
        return new StudentPOJO(name,gender, className,score);
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getGender() {
        return gender;
    }
    public void setGender(String gender) {
        this.gender = gender;
    }
    public String getClassName() {
        return className;
    }
    public void setClassName(String className) {
        this.className = className;
    }
    public double getScore() {
        return score;
    }
    public void setScore(double score) {
        this.score = score;
    }
    @Override
    public String toString() {
        return "StudentPOJO{" +
                "name='" + name + ''' +
                ", gender='" + gender + ''' +
                ", className='" + className + ''' +
                ", score=" + score +
                '}';
    }
}

#Java in POJO
SingleOutputStreamOperator<StudentPOJO> flatMapSocketTextStream = socketTextStream
        .flatMap((String value, Collector<StudentPOJO> out) -> {
             String[] values = value.split(" ");
             out.collect(new StudentPOJO(values[0], values[1], values[2], Double.valueOf(values[3])));
        }).returns(StudentPOJO.class);
        
DataStream<StudentPOJO> res = flatMapSocketTextStream
        .keyBy("className")
        .reduce((s1, s2) ->
            s1.getScore() > s2.getScore() ? s1 : s2
        );
        
#Java in Tuple
DataStream<Tuple2<String, Integer>> res1 = socketTextStream
        .map(value -> Tuple2.of(value.trim(), 1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .keyBy(0)
        .timeWindow(Time.seconds(10))
        .reduce((Tuple2<String, Integer> t1, Tuple2<String, Integer> t2) ->
                new Tuple2<>(t1.f0, t1.f1 + t2.f1));
                
DataStream<Tuple2<String, Integer>> res2 = socketTextStream
        .map(value -> Tuple2.of(value.trim(), 1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .keyBy(0)
        .timeWindow(Time.seconds(10))
        .reduce((old, news) -> {
            old.f1 += news.f1;
         return old;
         }).returns(Types.TUPLE(Types.STRING, Types.INT));

fold

  • KeyedStream → DataStream
  • A rollover collapse operation of a grouped data stream with initial values: merges the current element with the results of the previous collapse operation and produces a new value.
  • A fold function that, when applied on the sequence (1,2,3,4,5), emits the sequence “start-1”, “start-1-2”, “start-1-2-3”, …
#Todo has no real effect
DataStream<String> res = socketTextStream.map(value -> Tuple2.of(value.trim(), 1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .keyBy(0)
        . fold ("result: (string current, tuple2 < string, integer > T2) - > current + T2. F0 +", ");

union

  • Using union operator on datastream can merge multiple data streams of the same type and generate new data streams of the same type, that is, multiple datastreams can be merged into a new datastream.
DataStream<String> streamSource01 = env.socketTextStream("localhost", 8888);
DataStream<String> streamSource02 = env.socketTextStream("localhost", 9922);

Datastream < string > mapstreamsource01 = streamsource01.map (value - > "data from port 8888): + value";
Datastream < string > mapstreamsource02 = streamsource02.map (value - > "data from port 9922): + value";

DataStream<String> res = mapStreamSource01.union(mapStreamSource02);

join

  • Associate two streams according to the specified key
DataStream<Tuple2<String, String>> mapStreamSource01 = streamSource01
        . map (value - > tuple2. Of (value, "data from port 8888" + value))
        .returns(Types.TUPLE(Types.STRING, Types.STRING));
        
DataStream<Tuple2<String, String>> mapStreamSource02 = streamSource02
        . map (value > tuple2. Of (value, "data from port 9922" + value))
        .returns(Types.TUPLE(Types.STRING, Types.STRING));
        
DataStream<String> res = mapStreamSource01.join(mapStreamSource02)
        .where(t1->t1.getField(0))
        .equalTo(t2->t2.getField(0))
        .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
        .apply((t1,t2) -> t1.getField(1) + "|" + t2.getField(1));

coGroup

  • Associate two flows, and those that are not related are retained.
DataStream<Tuple2<String, String>> mapStreamSource01 = streamSource01
        . map (value - > tuple2. Of (value, "8888 port data: + value))
        .returns(Types.TUPLE(Types.STRING, Types.STRING));
        
DataStream<Tuple2<String, String>> mapStreamSource02 = streamSource02
        . map (value - > tuple2. Of (value, "9922 port data: + value))
        .returns(Types.TUPLE(Types.STRING, Types.STRING));
        
DataStream<String> res = mapStreamSource01.coGroup(mapStreamSource02)
        .where(t1 -> t1.getField(0))
        .equalTo(t2 -> t2.getField(0))
        .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
        .apply(new CoGroupFunction<Tuple2<String, String>, Tuple2<String, String>, String>() {
            @Override
            public void coGroup(Iterable<Tuple2<String, String>> iterable1, Iterable<Tuple2<String, String>> iterable2, Collector<String> collector) throws Exception {
                 StringBuffer stringBuffer = new StringBuffer();
                 stringBuffer.append ("stream from 8888 --";
                 for (Tuple2<String, String> item : iterable1) {
                                    stringBuffer.append(item.f1 + " | ");
                 }
                 stringBuffer.append ("stream --" from 9922 ");
                 for (Tuple2<String, String> item : iterable2) {
                                    stringBuffer.append(item.f1);
                 }
                collector.collect(stringBuffer.toString());
            }      
        });

split

  • Datastream → splitstream: splits the specified datastream stream into multiple streams, which are represented by splitstreams

select

  • Splitstream → datastream: from a splitstream stream, get the desired stream through the. Select() method
SplitStream<Tuple2<String, Integer>> splitStream = streamSource
        .map(values -> Tuple2.of(values.trim(), 1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .split( t -> {
            List<String> list = new ArrayList<>();
             if (isNumeric(t.f0)) {
                list.add("num");
             } else {
                list.add("str");
             }
             return list;
        });
 
DataStream<Tuple2<String, Integer>>  strDataStream1 = splitStream.select("str")
        . map (T > tuple2. Of ("string: + t.f0, t.f1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .keyBy(0)
        .sum(1);
        
DataStream<Tuple2<String, Integer>>  strDataStream2 = splitStream.select("num")
        . map (T > tuple2. Of ("Number: + t.f0, t.f1))
        .returns(Types.TUPLE(Types.STRING, Types.INT))
        .keyBy(0)
        .sum(1);

Recommended Today

Java security framework

The article is mainly divided into three parts1. The architecture and core components of spring security are as follows: (1) authentication; (2) authority interception; (3) database management; (4) authority caching; (5) custom decision making; and;2. To build and use the environment, the current popular spring boot is used to build the environment, and the actual […]