Source code interpretation of eight partition strategies of Flink

Time:2021-1-10

Flink contains 8 partition strategies, and the 8 partition strategies (partitioners) are shown below. This article will interpret the implementation of each partitioner one by one from the perspective of source code.

  • GlobalPartitioner
  • ShufflePartitioner
  • RebalancePartitioner
  • RescalePartitioner
  • BroadcastPartitioner
  • ForwardPartitioner
  • KeyGroupStreamPartitioner
  • CustomPartitionerWrapper

Inheritance diagram

Interface

name

ChannelSelector

realization

public interface ChannelSelector<T extends IOReadableWritable> {

    /**
     *Initialize the number of channels, which can be understood as an instance of the downstream operator (a subtask of the parallel operator)
     */
    void setup(int numberOfChannels);

    /**
     *According to the current record and the total number of channels,
     *Decide which downstream channel the record should be sent to.
     *Different partition strategies will implement different methods.
     */
    int selectChannel(T record);

    /**
    *Whether to broadcast to all downstream operator instances
     */
    boolean isBroadcast();
}

abstract class

name

StreamPartitioner

realization

public abstract class StreamPartitioner<T> implements
        ChannelSelector<SerializationDelegate<StreamRecord<T>>>, Serializable {
    private static final long serialVersionUID = 1L;

    protected int numberOfChannels;

    @Override
    public void setup(int numberOfChannels) {
        this.numberOfChannels = numberOfChannels;
    }

    @Override
    public boolean isBroadcast() {
        return false;
    }

    public abstract StreamPartitioner<T> copy();
}

Inheritance diagram

Source code interpretation of eight partition strategies of Flink

GlobalPartitioner

brief introduction

The partition will send all data to a downstream operator instance (subtask id = 0)

Source code interpretation

/**
 *Send all data to the first task of the downstream operator (id = 0)
 * @param <T>
 */
@Internal
public class GlobalPartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        //Only 0 is returned, that is, only the first task sent to the downstream operator
        return 0;
    }

    @Override
    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "GLOBAL";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

ShufflePartitioner

brief introduction

Randomly select a downstream operator instance to send

Source code interpretation

/**
 *Randomly select a channel to send
 * @param <T>
 */
@Internal
public class ShufflePartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    private Random random = new Random();

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        //Generate [0, numberofchannels) pseudo-random number and randomly send it to a downstream task
        return random.nextInt(numberOfChannels);
    }

    @Override
    public StreamPartitioner<T> copy() {
        return new ShufflePartitioner<T>();
    }

    @Override
    public String toString() {
        return "SHUFFLE";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

BroadcastPartitioner

brief introduction

Send to all downstream operator instances

Source code interpretation

/**
 *Send to all channels
 */
@Internal
public class BroadcastPartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;
    /**
     *Broadcast mode is to send all tasks directly to the downstream, so there is no need to select the sending channel through the following method
     */
    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        throw new UnsupportedOperationException("Broadcast partitioner does not support select channels.");
    }

    @Override
    public boolean isBroadcast() {
        return true;
    }

    @Override
    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "BROADCAST";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

RebalancePartitioner

brief introduction

The tasks are sent to the downstream tasks in a circular way

Source code interpretation

/**
 *The tasks are sent to the downstream tasks in a circular way
 * @param <T>
 */
@Internal
public class RebalancePartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    private int nextChannelToSendTo;

    @Override
    public void setup(int numberOfChannels) {
        super.setup(numberOfChannels);
        //Initialize the ID of channel and return the pseudo-random number of [0, numberofchannels]
        nextChannelToSendTo = ThreadLocalRandom.current().nextInt(numberOfChannels);
    }

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        //The loop sends the tasks to the downstream in turn. For example, the initial value of nextchanneltosendto is 0, and the value of numberofchannels (the number of instances and parallelism of the downstream operator) is 2
        //Then it is sent to the task with id = 1 for the first time, to the task with id = 0 for the second time, to the task with id = 1 for the third time, and so on
        nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
        return nextChannelToSendTo;
    }

    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "REBALANCE";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

RescalePartitioner

brief introduction

Based on the parallelism of the upstream and downstream operators, the records are output to each instance of the downstream operators in a circular manner.
For example: if the upstream parallelism is 2 and the downstream parallelism is 4, then one of the upstream parallelism outputs records to the two downstream parallelism in a circular way; the other of the upstream parallelism outputs records to the other two of the downstream parallelism in a circular way.
If the upstream parallelism is 4 and the downstream parallelism is 2, the two upstream parallelism will output the record to one downstream parallelism; the other two upstream parallelism will output the record to another downstream parallelism.

Source code interpretation

@Internal
public class RescalePartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    private int nextChannelToSendTo = -1;

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        if (++nextChannelToSendTo >= numberOfChannels) {
            nextChannelToSendTo = 0;
        }
        return nextChannelToSendTo;
    }

    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "RESCALE";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

Scream tip

The execution graph in Flink can be divided into four layers: streamgraph, jobgraph, executiongraph and physical execution graph.

StreamGraph: is the initial graph generated from the code written by the user through the stream API. Used to represent the topology of a program.

JobGraph: after optimization, streamgraph generates the jobgraph and submits it to the data structure of the jobmanager. The main optimization is to chain multiple qualified nodes together as one node, which can reduce the serialization / deserialization / transmission consumption of data flow between nodes.

ExecutionGraph: jobmanager generates an executiongraph based on the jobgraph. Executiongraph is a parallel version of jobgraph, which is the core data structure of scheduling layer.

Physical execution graph: after job manager schedules jobs according to the execution graph and deploys tasks on each task manager, the graph is not a specific data structure.

The streamingjobgraphgenerator is the conversion of streamgraph to jobgraph. In this class, forward partitioner and rescale partitioner are listed as pointwise allocation mode, and others are all_ TO_ All allocation mode. The code is as follows:

if (partitioner instanceof ForwardPartitioner || partitioner instanceof RescalePartitioner) {
            jobEdge = downStreamVertex.connectNewDataSetAsInput(
                headVertex,

               //The instance (subtask) of the upstream operator (production end) connects one or more instances (subtasks) of the downstream operator (consumer end)
                DistributionPattern.POINTWISE,
                resultPartitionType);
        } else {
            jobEdge = downStreamVertex.connectNewDataSetAsInput(
                headVertex,
                //The subtask of the upstream operator (production end) connects all the subtasks of the downstream operator (consumer end)
                DistributionPattern.ALL_TO_ALL,
                resultPartitionType);
        }

ForwardPartitioner

brief introduction

Send to the first task corresponding to the downstream to ensure that the parallel degree of the upstream and downstream operators is consistent, that is, the relationship between the upstream operator and the downstream operator is 1:1

Source code interpretation

/**
 *Send to the first task corresponding to the downstream
 * @param <T>
 */
@Internal
public class ForwardPartitioner<T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        return 0;
    }

    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "FORWARD";
    }
}

graphic

Source code interpretation of eight partition strategies of Flink

Scream tip

In the case that the upstream and downstream operators do not specify partitions, if the parallelism of the upstream and downstream operators is consistent, the forwardpartitioner is used; otherwise, the rebalance partitioner is used. For the forwardpartitioner, the parallelism of the upstream and downstream operators must be consistent, otherwise the exception will be thrown

//In the case that the upstream and downstream operators do not specify a partitioner, if the parallelism of the upstream and downstream operators is consistent, the forwardpartitioner is used; otherwise, the rebalance partitioner is used
            if (partitioner == null && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
                partitioner = new ForwardPartitioner<Object>();
            } else if (partitioner == null) {
                partitioner = new RebalancePartitioner<Object>();
            }

            if (partitioner instanceof ForwardPartitioner) {
                //If the parallelism of upstream and downstream is inconsistent, an exception will be thrown
                if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
                    throw new UnsupportedOperationException("Forward partitioning does not allow " +
                        "change of parallelism. Upstream operation: " + upstreamNode + " parallelism: " + upstreamNode.getParallelism() +
                        ", downstream operation: " + downstreamNode + " parallelism: " + downstreamNode.getParallelism() +
                        " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
                }
            }

KeyGroupStreamPartitioner

brief introduction

According to the grouping index of the key, select to send to the corresponding downstream subtask

Source code interpretation

  • org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner
/**
 *According to the grouping index of the key, select to send to the corresponding downstream subtask
 * @param <T>
 * @param <K>
 */
@Internal
public class KeyGroupStreamPartitioner<T, K> extends StreamPartitioner<T> implements ConfigurableStreamPartitioner {
...

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        K key;
        try {
            key = keySelector.getKey(record.getInstance().getValue());
        } catch (Exception e) {
            throw new RuntimeException("Could not extract key from " + record.getInstance().getValue(), e);
        }
        //Call the assignkeytopralleloperator method of the keygrouprangeassignment class, and the code is as follows
        return KeyGroupRangeAssignment.assignKeyToParallelOperator(key, maxParallelism, numberOfChannels);
    }
...
}
  • org.apache.flink.runtime.state.KeyGroupRangeAssignment
public final class KeyGroupRangeAssignment {
...

    /**
     *An index of the parallel operator instance is assigned according to the key, which is the routing information of the downstream operator instance to be sent by the key,
     *Which task is the key sent to
     */
    public static int assignKeyToParallelOperator(Object key, int maxParallelism, int parallelism) {
        Preconditions.checkNotNull(key, "Assigned key must not be null!");
        return computeOperatorIndexForKeyGroup(maxParallelism, parallelism, assignToKeyGroup(key, maxParallelism));
    }

    /**
     *Assign a group ID (keygroupid) according to the key
     */
    public static int assignToKeyGroup(Object key, int maxParallelism) {
        Preconditions.checkNotNull(key, "Assigned key must not be null!");
        //Get the hashcode of the key
        return computeKeyGroupForKeyHash(key.hashCode(), maxParallelism);
    }
     
    /**
     *Assign a group ID (keygroupid) according to the key,
     */
    public static int computeKeyGroupForKeyHash(int keyHash, int maxParallelism) {

        //Get keygroupid from maxparallelism
        return MathUtils.murmurHash(keyHash) % maxParallelism;
    }

    //Calculate the partition index, which operator instance the key group should send to downstream
    public static int computeOperatorIndexForKeyGroup(int maxParallelism, int parallelism, int keyGroupId) {
        return keyGroupId * parallelism / maxParallelism;
    }
...

graphic

[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-uanrg1pe-1585574080128) (F: npmmywebitesource_ Source code interpretation of eight partition strategies of postsflink key.png ]

CustomPartitionerWrapper

brief introduction

adoptPartitionerInstancepartitionMethod (custom) outputs the record downstream.

public class CustomPartitionerWrapper<K, T> extends StreamPartitioner<T> {
    private static final long serialVersionUID = 1L;

    Partitioner<K> partitioner;
    KeySelector<T, K> keySelector;

    public CustomPartitionerWrapper(Partitioner<K> partitioner, KeySelector<T, K> keySelector) {
        this.partitioner = partitioner;
        this.keySelector = keySelector;
    }

    @Override
    public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
        K key;
        try {
            key = keySelector.getKey(record.getInstance().getValue());
        } catch (Exception e) {
            throw new RuntimeException("Could not extract key from " + record.getInstance(), e);
        }
//Implement the partitioner interface and rewrite the partition method
        return partitioner.partition(key, numberOfChannels);
    }

    @Override
    public StreamPartitioner<T> copy() {
        return this;
    }

    @Override
    public String toString() {
        return "CUSTOM";
    }
}

For example:

public class CustomPartitioner implements Partitioner<String> {
      //Key: partition according to the value of key
      //Numpartitions: parallelism of downstream operators
      @Override
      public int partition(String key, int numPartitions) {
         return  key.length ()% numpartitions; // define partition policy here
      }
  }

Summary

This paper analyzes 8 partition strategies of Flink one by one from the source level, and gives the corresponding diagrams for each partition strategy, so as to facilitate the quick understanding of the source code. If you think this article is useful to you, you can pay attention to my official account and get more information. Wechat searchBig data technology and data warehouse

Official account “big data technology and multi warehouse”, reply to “information” to receive big data package.