Talk about the iterate operation of the Flink datastream

Time:2021-9-22

order

This paper mainly studies the iterate operation of the flick datastream

example

IterativeStream<Long> iteration = initialStream.iterate();
DataStream<Long> iterationBody = iteration.map (/*do something*/);
DataStream<Long> feedback = iterationBody.filter(new FilterFunction<Long>(){
    @Override
    public boolean filter(Long value) throws Exception {
        return value > 0;
    }
});
iteration.closeWith(feedback);
DataStream<Long> output = iterationBody.filter(new FilterFunction<Long>(){
    @Override
    public boolean filter(Long value) throws Exception {
        return value <= 0;
    }
});
  • This example shows some basic uses of iterative stream. Create an iterative stream using iterate, and close the feedbackstream using the closewith method of iterative stream

DataStream.iterate

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/DataStream.java

@Public
public class DataStream<T> {
    //......

    @PublicEvolving
    public IterativeStream<T> iterate() {
        return new IterativeStream<>(this, 0);
    }

    @PublicEvolving
    public IterativeStream<T> iterate(long maxWaitTimeMillis) {
        return new IterativeStream<>(this, maxWaitTimeMillis);
    }

    //......
}
  • Datastream provides two iterate methods. They create and return iterativestream. The parameterized iterate method has maxwaittimemillis of 0

IterativeStream

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/IterativeStream.java

@PublicEvolving
public class IterativeStream<T> extends SingleOutputStreamOperator<T> {

    // We store these so that we can create a co-iteration if we need to
    private DataStream<T> originalInput;
    private long maxWaitTime;

    protected IterativeStream(DataStream<T> dataStream, long maxWaitTime) {
        super(dataStream.getExecutionEnvironment(),
                new FeedbackTransformation<>(dataStream.getTransformation(), maxWaitTime));
        this.originalInput = dataStream;
        this.maxWaitTime = maxWaitTime;
        setBufferTimeout(dataStream.environment.getBufferTimeout());
    }

    @SuppressWarnings({ "unchecked", "rawtypes" })
    public DataStream<T> closeWith(DataStream<T> feedbackStream) {

        Collection<StreamTransformation<?>> predecessors = feedbackStream.getTransformation().getTransitivePredecessors();

        if (!predecessors.contains(this.transformation)) {
            throw new UnsupportedOperationException(
                    "Cannot close an iteration with a feedback DataStream that does not originate from said iteration.");
        }

        ((FeedbackTransformation) getTransformation()).addFeedbackEdge(feedbackStream.getTransformation());

        return feedbackStream;
    }

    public <F> ConnectedIterativeStreams<T, F> withFeedbackType(Class<F> feedbackTypeClass) {
        return withFeedbackType(TypeInformation.of(feedbackTypeClass));
    }

    public <F> ConnectedIterativeStreams<T, F> withFeedbackType(TypeHint<F> feedbackTypeHint) {
        return withFeedbackType(TypeInformation.of(feedbackTypeHint));
    }

    public <F> ConnectedIterativeStreams<T, F> withFeedbackType(TypeInformation<F> feedbackType) {
        return new ConnectedIterativeStreams<>(originalInput, feedbackType, maxWaitTime);
    }

    @Public
    public static class ConnectedIterativeStreams<I, F> extends ConnectedStreams<I, F> {

        private CoFeedbackTransformation<F> coFeedbackTransformation;

        public ConnectedIterativeStreams(DataStream<I> input,
                TypeInformation<F> feedbackType,
                long waitTime) {
            super(input.getExecutionEnvironment(),
                    input,
                    new DataStream<>(input.getExecutionEnvironment(),
                            new CoFeedbackTransformation<>(input.getParallelism(),
                                    feedbackType,
                                    waitTime)));
            this.coFeedbackTransformation = (CoFeedbackTransformation<F>) getSecondInput().getTransformation();
        }

        public DataStream<F> closeWith(DataStream<F> feedbackStream) {

            Collection<StreamTransformation<?>> predecessors = feedbackStream.getTransformation().getTransitivePredecessors();

            if (!predecessors.contains(this.coFeedbackTransformation)) {
                throw new UnsupportedOperationException(
                        "Cannot close an iteration with a feedback DataStream that does not originate from said iteration.");
            }

            coFeedbackTransformation.addFeedbackEdge(feedbackStream.getTransformation());

            return feedbackStream;
        }

        private UnsupportedOperationException groupingException =
                new UnsupportedOperationException("Cannot change the input partitioning of an" +
                        "iteration head directly. Apply the partitioning on the input and" +
                        "feedback streams instead.");

        @Override
        public ConnectedStreams<I, F> keyBy(int[] keyPositions1, int[] keyPositions2) {
            throw groupingException;
        }

        @Override
        public ConnectedStreams<I, F> keyBy(String field1, String field2) {
            throw groupingException;
        }

        @Override
        public ConnectedStreams<I, F> keyBy(String[] fields1, String[] fields2) {
            throw groupingException;
        }

        @Override
        public ConnectedStreams<I, F> keyBy(KeySelector<I, ?> keySelector1, KeySelector<F, ?> keySelector2) {
            throw groupingException;
        }

        @Override
        public <KEY> ConnectedStreams<I, F> keyBy(KeySelector<I, KEY> keySelector1, KeySelector<F, KEY> keySelector2, TypeInformation<KEY> keyType) {
            throw groupingException;
        }
    }
}
  • The iterative stream inherits the singleoutputstream operator. Its constructor receives two parameters, one is originalinput and the other is maxwaittime; It creates a feedbacktransformation based on datastream. Gettransformation () and maxwaittime; The constructor will also set the buffertimeout of transformation according to the datastream. Environment. Getbuffertimeout() parameter
  • Iterative stream mainly provides two methods. One is the closewith method, which is used to close iteration. It is mainly used to define the iteration to be fed back to the iteration header(It can be understood as backflow or similar recursive operations. Filter controls the recursive conditions. The elements of filter will re-enter the head of iterative stream and continue to participate in subsequent operations); The withfeedbacktype method creates connecteditivestreams
  • Connecteditivestreams inherits connectedstreams. It allows the type of iteration to be fed back to be different from the type of originalinput. It also defines the closewith method, but it overrides the keyby method of connectedstreams and throws an unsupported operation exception

FeedbackTransformation

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/transformations/FeedbackTransformation.java

@Internal
public class FeedbackTransformation<T> extends StreamTransformation<T> {

    private final StreamTransformation<T> input;

    private final List<StreamTransformation<T>> feedbackEdges;

    private final Long waitTime;

    public FeedbackTransformation(StreamTransformation<T> input, Long waitTime) {
        super("Feedback", input.getOutputType(), input.getParallelism());
        this.input = input;
        this.waitTime = waitTime;
        this.feedbackEdges = Lists.newArrayList();
    }

    public StreamTransformation<T> getInput() {
        return input;
    }

    public void addFeedbackEdge(StreamTransformation<T> transform) {

        if (transform.getParallelism() != this.getParallelism()) {
            throw new UnsupportedOperationException(
                    "Parallelism of the feedback stream must match the parallelism of the original" +
                            " stream. Parallelism of original stream: " + this.getParallelism() +
                            "; parallelism of feedback stream: " + transform.getParallelism() +
                            ". Parallelism can be modified using DataStream#setParallelism() method");
        }

        feedbackEdges.add(transform);
    }

    public List<StreamTransformation<T>> getFeedbackEdges() {
        return feedbackEdges;
    }

    public Long getWaitTime() {
        return waitTime;
    }

    @Override
    public final void setChainingStrategy(ChainingStrategy strategy) {
        throw new UnsupportedOperationException("Cannot set chaining strategy on Split Transformation.");
    }

    @Override
    public Collection<StreamTransformation<?>> getTransitivePredecessors() {
        List<StreamTransformation<?>> result = Lists.newArrayList();
        result.add(this);
        result.addAll(input.getTransitivePredecessors());
        return result;
    }
}
  • Feedbacktransformation inherits streamtransformation, which has properties such as feedbackedges and waittime
  • The addfeedbackedge method is used to add a feedback edge. The closewith method of iterative stream will call addfeedbackedge to add a streamtransformation
  • Waittime specifies the time that the feedback operator waits for feedback elements. Once the waittime has passed, the operation will close and no new feedback elements will be accepted

Summary

  • Datastream provides two iterate methods. They create and return iterativestream. The parameterized iterate method has maxwaittimemillis of 0
  • The constructor of iterative stream receives two parameters, one is originalinput and the other is maxwaittime; It creates a feedbacktransformation based on datastream. Gettransformation () and maxwaittime; The constructor will also set the buffertimeout of transformation according to the datastream. Environment. Getbuffertimeout() parameter; Feedbacktransformation inherits from streamtransformation. It has attributes such as feedbackedges and waittime. Waittime specifies the time that the feedback operator waits for feedback elements. Once the waittime passes, the operation will close and no new feedback elements will be accepted
  • The iterative stream inherits the singleoutputstream operator. It mainly provides two methods. One is the closewith method, which is used to close iteration. It is mainly used to define the iteration to be fed back to the iteration header; The withfeedbacktype method creates connecteditivestreams. Connecteditivestreams inherits connectedstreams. It allows the type of iteration to be fed back to be different from the type of originalinput. It also defines the closewith method, but it overrides the keyby method of connectedstreams and throws an unsupported operationexception

doc