In depth analysis of Flink’s operator chain mechanism

Time:2021-6-11

By littlemagic

“Why is there only one box in my Flink job Web UI, and the indexes of records sent and records received are both 0? Is there something wrong with my program? “

Brief introduction of Flink operator chain

I often see such questions in the Flink community. This situation is almost not caused by the problem of the program, but by the operator chain mechanism of Flink, that is, in the execution plan of the submitted job, all the concurrent instances of operators (sub tasks) are executed as a whole because they meet certain conditions, so the data flow between operators is not observed.

Of course, the above is a special case. What we often see is that only some operators have been optimized by the chain mechanism of operators, as shown in the figure below, which has appeared many times in the official documents. Pay attention to the source and map () operators.

In depth analysis of Flink's operator chain mechanism

The advantage of operator chain mechanism is obvious: all sub tasks chained together will be executed in the same thread (that is, the slot of task manager), which can reduce unnecessary data exchange, serialization and context switching, so as to improve the efficiency of job execution.

In depth analysis of Flink's operator chain mechanism

With so much foreshadowing, let’s take a simple look at the conditions for the generation of operator chain through the source code and how it is implemented in the Flink runtime.

Operator chain in logical plan

Those who know a little bit about Flink runtime should know that the execution plan of Flink jobs is represented by a three-tier graph structure, namely:

  • Streamgraph — original logic execution plan
  • Jobgraph — optimized logical execution plan (this is what you see in the Web UI)
  • Execution graph — physical execution plan

Operator chain is added in the process of optimizing logical plan, that is, in the process of generating jobgraph from streamgraph. So let’s go to the o.a.f.streaming.api.graph.streamingjobgraphgenerator class responsible for generating the jobgraph and check the source code of its core method, createjobgraph().

private JobGraph createJobGraph() {
    // make sure that all vertices start immediately
    jobGraph.setScheduleMode(streamGraph.getScheduleMode());
    // Generate deterministic hashes for the nodes in order to identify them across
    // submission iff they didn't change.
    Map<Integer, byte[]> hashes = defaultStreamGraphHasher.traverseStreamGraphAndGenerateHashes(streamGraph);
    // Generate legacy version hashes for backwards compatibility
    List<Map<Integer, byte[]>> legacyHashes = new ArrayList<>(legacyStreamGraphHashers.size());
    for (StreamGraphHasher hasher : legacyStreamGraphHashers) {
        legacyHashes.add(hasher.traverseStreamGraphAndGenerateHashes(streamGraph));
    }
    Map<Integer, List<Tuple2<byte[], byte[]>>> chainedOperatorHashes = new HashMap<>();
    setChaining(hashes, legacyHashes, chainedOperatorHashes);

    setPhysicalEdges();
    //Briefly

    return jobGraph;
}

It can be seen that the method first calculates the hash code of each node in StreamGraph as the unique identifier, and creates an empty Map structure to save the hash code of the operator to be linked together, and then calls the setChaining () method as shown in the source code.

<pre class="cm-s-default" style="color: rgb(55, 61, 65); margin: 0px; padding: 0px; background: none 0% 0% / auto repeat scroll padding-box border-box rgba(0, 0, 0, 0);">private void setChaining(Map<Integer, byte[]> hashes, List<Map<Integer, byte[]>> legacyHashes, Map<Integer, List<Tuple2<byte[], byte[]>>> chainedOperatorHashes) {    for (Integer sourceNodeId : streamGraph.getSourceIDs()) {        createChain(sourceNodeId, sourceNodeId, hashes, legacyHashes, 0, chainedOperatorHashes);    }}</pre>

You can see that you traverse the source node in the streamgraph one by one and call the createchain() method. Createchain () is the core method of creating operator chain in logical planning layer. The complete source code is as follows, which is a little long.

private List<StreamEdge> createChain(
        Integer startNodeId,
        Integer currentNodeId,
        Map<Integer, byte[]> hashes,
        List<Map<Integer, byte[]>> legacyHashes,
        int chainIndex,
        Map<Integer, List<Tuple2<byte[], byte[]>>> chainedOperatorHashes) {
    if (!builtVertices.contains(startNodeId)) {
        List<StreamEdge> transitiveOutEdges = new ArrayList<StreamEdge>();
        List<StreamEdge> chainableOutputs = new ArrayList<StreamEdge>();
        List<StreamEdge> nonChainableOutputs = new ArrayList<StreamEdge>();

        StreamNode currentNode = streamGraph.getStreamNode(currentNodeId);
        for (StreamEdge outEdge : currentNode.getOutEdges()) {
            if (isChainable(outEdge, streamGraph)) {
                chainableOutputs.add(outEdge);
            } else {
                nonChainableOutputs.add(outEdge);
            }
        }

        for (StreamEdge chainable : chainableOutputs) {
            transitiveOutEdges.addAll(
                    createChain(startNodeId, chainable.getTargetId(), hashes, legacyHashes, chainIndex + 1, chainedOperatorHashes));
        }

        for (StreamEdge nonChainable : nonChainableOutputs) {
            transitiveOutEdges.add(nonChainable);
            createChain(nonChainable.getTargetId(), nonChainable.getTargetId(), hashes, legacyHashes, 0, chainedOperatorHashes);
        }

        List<Tuple2<byte[], byte[]>> operatorHashes =
            chainedOperatorHashes.computeIfAbsent(startNodeId, k -> new ArrayList<>());

        byte[] primaryHashBytes = hashes.get(currentNodeId);
        OperatorID currentOperatorId = new OperatorID(primaryHashBytes);

        for (Map<Integer, byte[]> legacyHash : legacyHashes) {
            operatorHashes.add(new Tuple2<>(primaryHashBytes, legacyHash.get(currentNodeId)));
        }

        chainedNames.put(currentNodeId, createChainedName(currentNodeId, chainableOutputs));
        chainedMinResources.put(currentNodeId, createChainedMinResources(currentNodeId, chainableOutputs));
        chainedPreferredResources.put(currentNodeId, createChainedPreferredResources(currentNodeId, chainableOutputs));

        if (currentNode.getInputFormat() != null) {
            getOrCreateFormatContainer(startNodeId).addInputFormat(currentOperatorId, currentNode.getInputFormat());
        }
        if (currentNode.getOutputFormat() != null) {
            getOrCreateFormatContainer(startNodeId).addOutputFormat(currentOperatorId, currentNode.getOutputFormat());
        }

        StreamConfig config = currentNodeId.equals(startNodeId)
                ? createJobVertex(startNodeId, hashes, legacyHashes, chainedOperatorHashes)
                : new StreamConfig(new Configuration());

        setVertexConfig(currentNodeId, config, chainableOutputs, nonChainableOutputs);

        if (currentNodeId.equals(startNodeId)) {
            config.setChainStart();
            config.setChainIndex(0);
            config.setOperatorName(streamGraph.getStreamNode(currentNodeId).getOperatorName());
            config.setOutEdgesInOrder(transitiveOutEdges);
            config.setOutEdges(streamGraph.getStreamNode(currentNodeId).getOutEdges());
            for (StreamEdge edge : transitiveOutEdges) {
                connect(startNodeId, edge);
            }
            config.setTransitiveChainedTaskConfigs(chainedConfigs.get(startNodeId));
        } else {
            chainedConfigs.computeIfAbsent(startNodeId, k -> new HashMap<Integer, StreamConfig>());
            config.setChainIndex(chainIndex);
            StreamNode node = streamGraph.getStreamNode(currentNodeId);
            config.setOperatorName(node.getOperatorName());
            chainedConfigs.get(startNodeId).put(currentNodeId, config);
        }

        config.setOperatorID(currentOperatorId);
        if (chainableOutputs.isEmpty()) {
            config.setChainEnd();
        }
        return transitiveOutEdges;
    } else {
        return new ArrayList<>();
    }
}

First, explain the three list structures created at the beginning of the method

  • Transitiveoutedges: the list of out edges of the current operator chain in the jobgraph, which is also the final return value of the createchain() method;
  • Chainable outputs: list of streamgraph edges that can be chained together at present;
  • Nonchainable outputs: the list of streamgraph edges chained together cannot be reached at present.

Next, start from the source to traverse all the outgoing edges of the current node in the streamgraph, and call ischainable() method to determine whether it can be chained together (this logic will be discussed later). The outgoing edges that can be linked are placed in the chainable outputs list, otherwise they are placed in the nonchainable outputs list. For the edges in chainableoutputs, starting from the direct downstream of these edges, we continue to recursively call the createchain() method to extend the operator chain. For the edges in nonchainableoutputs, since the extension of the current operator chain has reached the end, we will continue to call the createchain() method recursively from these “breakpoints” to try to create a new operator chain. That is to say, the whole process of creating the operator chain in the logical plan is recursive, that is, the actual return starts from the sink side.

Then we need to judge whether the current node is the starting node of the operator chain. If it is, call the createjobvertex() method to create a jobvertex (that is, the node in the jobgraph) for the operator chain, which forms the jobgraph effect we see in the Web UI

In depth analysis of Flink's operator chain mechanism

Finally, the operator chain data of each node needs to be written into its own streamconfig, and the starting node of the operator chain needs to save transitiveoutedges additionally. Streamconfig will be used again later in the physical execution phase.

The conditions of forming operator chain

Take a look at the code for the ischainable () method.

public static boolean isChainable(StreamEdge edge, StreamGraph streamGraph) {
    StreamNode upStreamVertex = streamGraph.getSourceVertex(edge);
    StreamNode downStreamVertex = streamGraph.getTargetVertex(edge);

    StreamOperatorFactory<?> headOperator = upStreamVertex.getOperatorFactory();
    StreamOperatorFactory<?> outOperator = downStreamVertex.getOperatorFactory();

    return downStreamVertex.getInEdges().size() == 1
            && outOperator != null
            && headOperator != null
            && upStreamVertex.isSameSlotSharingGroup(downStreamVertex)
            && outOperator.getChainingStrategy() == ChainingStrategy.ALWAYS
            && (headOperator.getChainingStrategy() == ChainingStrategy.HEAD ||
                headOperator.getChainingStrategy() == ChainingStrategy.ALWAYS)
            && (edge.getPartitioner() instanceof ForwardPartitioner)
            && edge.getShuffleMode() != ShuffleMode.BATCH
            && upStreamVertex.getParallelism() == downStreamVertex.getParallelism()
            && streamGraph.isChainingEnabled();
}

From this, it can be concluded that the conditions for upstream and downstream operators to be able to chain together are very harsh (a clich é)

  • The upstream and downstream operator instances are in the same slot sharing group (to be mentioned later);
  • The chaining strategy of downstream operators is always, which can be linked to both upstream and downstream. Our common map (), filter (), etc. all belong to this category;
  • The link strategy of upstream operator is head or always. Head policy means that it can only link with downstream, which is exclusive to source operator under normal circumstances;
  • The physical partition logic between the two operators is forward partition. Please refer to “talking about eight physical partition logics of Flink datastream” written before;
  • Shuffle mode between two operators is not batch mode;
  • The parallel degree of upstream and downstream operators is the same;
  • The chain of operators is not disabled.

Forbidden operator chain

Users can call the startnewchain() method on an operator to force the start of a new operator chain, or call the disableoperatorchain() method to specify that it does not participate in the operator chain. The code is in the singleoutputstreamoperator class, which is implemented by changing the link strategy of the operator.

@PublicEvolving
public SingleOutputStreamOperator<T> disableChaining() {
    return setChainingStrategy(ChainingStrategy.NEVER);
}

@PublicEvolving
public SingleOutputStreamOperator<T> startNewChain() {
    return setChainingStrategy(ChainingStrategy.HEAD);
}

If you want to disable the operator chain in the whole runtime environment, call the streamexecutionenvironment. Disableoperatorchaining() method.

Operator chain in physical plan

After the job graph is converted into the execution graph and executed by the task manager, the basic task unit for scheduling execution, streamtask, is generated, which is responsible for executing the specific streamoperator logic. In the streamtask. Invoke() method, after initializing the state back end, checkpoint storage and timer service, you can find that:

operatorChain = new OperatorChain<>(this, recordWriters);
headOperator = operatorChain.getHeadOperator();

An example of operatorchain is constructed, which is the form of operator chain in actual execution. Explain some of the main properties in operatorchain.

private final StreamOperator<?>[] allOperators;
private final RecordWriterOutput<?>[] streamOutputs;
private final WatermarkGaugeExposingOutput<StreamRecord<OUT>> chainEntryPoint;
private final OP headOperator;
  • Head operator: the first operator of the operator chain, corresponding to the starting node of the operator chain in the jobgraph;
  • All operators: all operators in the operator chain are arranged in reverse order, that is, the head operator is at the end of the array;
  • Stream outputs: the output of the operator chain, which can have multiple outputs;
  • Chainentrypoint: the “entry point” of an operator chain. Its meaning will be explained later.

As can be seen from the above, all streamtasks will create operatorchain. If an operator cannot enter the operator chain, an operator chain with only head operator will be formed. The core code of the operatorchain construction method is as follows.

StreamEdge outEdge = outEdgesInOrder.get(i);
    RecordWriterOutput<?> streamOutput = createStreamOutput(
        recordWriters.get(i),
        outEdge,
        chainedConfigs.get(outEdge.getSourceId()),
        containingTask.getEnvironment());
    this.streamOutputs[i] = streamOutput;
    streamOutputMap.put(outEdge, streamOutput);
}

// we create the chain of operators and grab the collector that leads into the chain
List<StreamOperator<?>> allOps = new ArrayList<>(chainedConfigs.size());
this.chainEntryPoint = createOutputCollector(
    containingTask,
    configuration,
    chainedConfigs,
    userCodeClassloader,
    streamOutputMap,
    allOps);

if (operatorFactory != null) {
    WatermarkGaugeExposingOutput<StreamRecord<OUT>> output = getChainEntryPoint();
    headOperator = operatorFactory.createStreamOperator(containingTask, configuration, output);
    headOperator.getMetricGroup().gauge(MetricNames.IO_CURRENT_OUTPUT_WATERMARK, output.getWatermarkGauge());
} else {
    headOperator = null;
}

// add head operator to end of chain
allOps.add(headOperator);
this.allOperators = allOps.toArray(new StreamOperator<?>[allOps.size()]);

First, it will traverse all the outgoing edges of the whole operator chain, and call the createstreamoutput() method to create the corresponding downstream output recordwriteroutput. Then, you will call the createoutputcollector () method to create a physical operator chain and return chainentrypoint. This method is more important, and part of the code is as follows.

StreamTask<?, ?> containingTask,
        StreamConfig operatorConfig,
        Map<Integer, StreamConfig> chainedConfigs,
        ClassLoader userCodeClassloader,
        Map<StreamEdge, RecordWriterOutput<?>> streamOutputs,
        List<StreamOperator<?>> allOperators) {
    List<Tuple2<WatermarkGaugeExposingOutput<StreamRecord<T>>, StreamEdge>> allOutputs = new ArrayList<>(4);

    // create collectors for the network outputs
    for (StreamEdge outputEdge : operatorConfig.getNonChainedOutputs(userCodeClassloader)) {
        @SuppressWarnings("unchecked")
        RecordWriterOutput<T> output = (RecordWriterOutput<T>) streamOutputs.get(outputEdge);
        allOutputs.add(new Tuple2<>(output, outputEdge));
    }

    // Create collectors for the chained outputs
    for (StreamEdge outputEdge : operatorConfig.getChainedOutputs(userCodeClassloader)) {
        int outputId = outputEdge.getTargetId();
        StreamConfig chainedOpConfig = chainedConfigs.get(outputId);
        WatermarkGaugeExposingOutput<StreamRecord<T>> output = createChainedOperator(
            containingTask,
            chainedOpConfig,
            chainedConfigs,
            userCodeClassloader,
            streamOutputs,
            allOperators,
            outputEdge.getOutputTag());
        allOutputs.add(new Tuple2<>(output, outputEdge));
    }
    //The following is omitted
}

This method extracts the data of edge and link edge from streamconfig mentioned in the previous section, and creates their respective outputs. The output of the output side is to send the data to the recordwriteroutput downstream of the operator chain, while the output of the link side depends on the createchainedoperator() method.

private <IN, OUT> WatermarkGaugeExposingOutput<StreamRecord<IN>> createChainedOperator(
        StreamTask<?, ?> containingTask,
        StreamConfig operatorConfig,
        Map<Integer, StreamConfig> chainedConfigs,
        ClassLoader userCodeClassloader,
        Map<StreamEdge, RecordWriterOutput<?>> streamOutputs,
        List<StreamOperator<?>> allOperators,
        OutputTag<IN> outputTag) {
    // create the output that the operator writes to first. this may recursively create more operators
    WatermarkGaugeExposingOutput<StreamRecord<OUT>> chainedOperatorOutput = createOutputCollector(
        containingTask,
        operatorConfig,
        chainedConfigs,
        userCodeClassloader,
        streamOutputs,
        allOperators);

    // now create the operator and give it the output collector to write its output to
    StreamOperatorFactory<OUT> chainedOperatorFactory = operatorConfig.getStreamOperatorFactory(userCodeClassloader);
    OneInputStreamOperator<IN, OUT> chainedOperator = chainedOperatorFactory.createStreamOperator(
            containingTask, operatorConfig, chainedOperatorOutput);

    allOperators.add(chainedOperator);

    WatermarkGaugeExposingOutput<StreamRecord<IN>> currentOperatorOutput;
    if (containingTask.getExecutionConfig().isObjectReuseEnabled()) {
        currentOperatorOutput = new ChainingOutput<>(chainedOperator, this, outputTag);
    }
    else {
        TypeSerializer<IN> inSerializer = operatorConfig.getTypeSerializerIn1(userCodeClassloader);
        currentOperatorOutput = new CopyingChainingOutput<>(chainedOperator, inSerializer, outputTag, this);
    }

    // wrap watermark gauges since registered metrics must be unique
    chainedOperator.getMetricGroup().gauge(MetricNames.IO_CURRENT_INPUT_WATERMARK, currentOperatorOutput.getWatermarkGauge()::getValue);
    chainedOperator.getMetricGroup().gauge(MetricNames.IO_CURRENT_OUTPUT_WATERMARK, chainedOperatorOutput.getWatermarkGauge()::getValue);
    return currentOperatorOutput;
}

We can see at a glance that this method recursively calls the above-mentioned createoutputcollector() method. Similar to the logical planning stage, it generates chainedoperators (that is, the operators in the operator chain other than the head operator) by continuously extending the output, and returns them in reverse order. This is also the reason why the operators in the alloperators array are in reverse order.

After chained operators are generated, they are connected through chaining output to form a structure as shown in the figure below.

In depth analysis of Flink's operator chain mechanism

Picture from:http://wuchong.me/blog/2016/05/09/flink-internals-understanding-execution-resources/

Finally, let’s see how the chainingoutput. Collect () method outputs the data stream.

@Override
public void collect(StreamRecord<T> record) {
    if (this.outputTag != null) {
        // we are only responsible for emitting to the main input
        return;
    }
    pushToOperator(record);
}

@Override
public <X> void collect(OutputTag<X> outputTag, StreamRecord<X> record) {
    if (this.outputTag == null || !this.outputTag.equals(outputTag)) {
        // we are only responsible for emitting to the side-output specified by our
        // OutputTag.
        return;
    }
    pushToOperator(record);
}

protected <X> void pushToOperator(StreamRecord<X> record) {
    try {
        // we know that the given outputTag matches our OutputTag so the record
        // must be of the type that our operator expects.
        @SuppressWarnings("unchecked")
        StreamRecord<T> castRecord = (StreamRecord<T>) record;
        numRecordsIn.inc();
        operator.setKeyContextElement1(castRecord);
        operator.processElement(castRecord);
    }
    catch (Exception e) {
        throw new ExceptionInChainedOperatorException(e);
    }
}

It can be seen that by calling the processelement () method of the link operator, the data is directly pushed to the downstream for processing. In other words, the operator chain can be regarded as a single operator composed of head operator and stream outputs. The chained operator and chaining output inside the operator chain seem to be covered by a black box without introducing any overhead.

Through the logic of the operator chain in the execution layer, the watcher should understand the meaning of chainentrypoint. Since it is at the end of the recursive return, it is the starting output of the inflow operator chain, that is, the recordwriter output pointing to the head operator in the figure above.

The article is reprinted from the short book by little magic.
Link to the original text:https://www.jianshu.com/p/799744e347c7