Talk about the keyselector of flyedstream

Time:2021-10-14

order

This paper mainly studies the keyselector of the flick keyedstream

KeyedStream

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/KeyedStream.java

@Public
public class KeyedStream<T, KEY> extends DataStream<T> {

    /**
     * The key selector that can get the key by which the stream if partitioned from the elements.
     */
    private final KeySelector<T, KEY> keySelector;

    /** The type of the key by which the stream is partitioned. */
    private final TypeInformation<KEY> keyType;

    /**
     * Creates a new {@link KeyedStream} using the given {@link KeySelector}
     * to partition operator state by key.
     *
     * @param dataStream
     *            Base stream of data
     * @param keySelector
     *            Function for determining state partitions
     */
    public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector) {
        this(dataStream, keySelector, TypeExtractor.getKeySelectorTypes(keySelector, dataStream.getType()));
    }

    /**
     * Creates a new {@link KeyedStream} using the given {@link KeySelector}
     * to partition operator state by key.
     *
     * @param dataStream
     *            Base stream of data
     * @param keySelector
     *            Function for determining state partitions
     */
    public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
        this(
            dataStream,
            new PartitionTransformation<>(
                dataStream.getTransformation(),
                new KeyGroupStreamPartitioner<>(keySelector, StreamGraphGenerator.DEFAULT_LOWER_BOUND_MAX_PARALLELISM)),
            keySelector,
            keyType);
    }

    /**
     * Creates a new {@link KeyedStream} using the given {@link KeySelector} and {@link TypeInformation}
     * to partition operator state by key, where the partitioning is defined by a {@link PartitionTransformation}.
     *
     * @param stream
     *            Base stream of data
     * @param partitionTransformation
     *            Function that determines how the keys are distributed to downstream operator(s)
     * @param keySelector
     *            Function to extract keys from the base stream
     * @param keyType
     *            Defines the type of the extracted keys
     */
    @Internal
    KeyedStream(
        DataStream<T> stream,
        PartitionTransformation<T> partitionTransformation,
        KeySelector<T, KEY> keySelector,
        TypeInformation<KEY> keyType) {

        super(stream.getExecutionEnvironment(), partitionTransformation);
        this.keySelector = clean(keySelector);
        this.keyType = validateKeyType(keyType);
    }

    //......
}
  • Here you can see that different constructors of keyedstream need a keyselector type parameter

KeySelector

flink-core-1.7.0-sources.jar!/org/apache/flink/api/java/functions/KeySelector.java

@Public
@FunctionalInterface
public interface KeySelector<IN, KEY> extends Function, Serializable {

    /**
     * User-defined function that deterministically extracts the key from an object.
     *
     * <p>For example for a class:
     * <pre>
     *     public class Word {
     *         String word;
     *         int count;
     *     }
     * </pre>
     * The key extractor could return the word as
     * a key to group all Word objects by the String they contain.
     *
     * <p>The code would look like this
     * <pre>
     *     public String getKey(Word w) {
     *         return w.word;
     *     }
     * </pre>
     *
     * @param value The object to get the key from.
     * @return The extracted key.
     *
     * @throws Exception Throwing an exception will cause the execution of the respective task to fail,
     *                   and trigger recovery or cancellation of the program.
     */
    KEY getKey(IN value) throws Exception;
}
  • The keyselector interface inherits the function interface and defines the getKey method, which is used to extract the key from the in type

DataStream.keyBy

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/DataStream.java

    /**
     * It creates a new {@link KeyedStream} that uses the provided key for partitioning
     * its operator states.
     *
     * @param key
     *            The KeySelector to be used for extracting the key for partitioning
     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
     */
    public <K> KeyedStream<T, K> keyBy(KeySelector<T, K> key) {
        Preconditions.checkNotNull(key);
        return new KeyedStream<>(this, clean(key));
    }

    /**
     * It creates a new {@link KeyedStream} that uses the provided key with explicit type information
     * for partitioning its operator states.
     *
     * @param key The KeySelector to be used for extracting the key for partitioning.
     * @param keyType The type information describing the key type.
     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
     */
    public <K> KeyedStream<T, K> keyBy(KeySelector<T, K> key, TypeInformation<K> keyType) {
        Preconditions.checkNotNull(key);
        Preconditions.checkNotNull(keyType);
        return new KeyedStream<>(this, clean(key), keyType);
    }

    /**
     * Partitions the operator state of a {@link DataStream} by the given key positions.
     *
     * @param fields
     *            The position of the fields on which the {@link DataStream}
     *            will be grouped.
     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
     */
    public KeyedStream<T, Tuple> keyBy(int... fields) {
        if (getType() instanceof BasicArrayTypeInfo || getType() instanceof PrimitiveArrayTypeInfo) {
            return keyBy(KeySelectorUtil.getSelectorForArray(fields, getType()));
        } else {
            return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
        }
    }

    /**
     * Partitions the operator state of a {@link DataStream} using field expressions.
     * A field expression is either the name of a public field or a getter method with parentheses
     * of the {@link DataStream}'s underlying type. A dot can be used to drill
     * down into objects, as in {@code "field1.getInnerField2()" }.
     *
     * @param fields
     *            One or more field expressions on which the state of the {@link DataStream} operators will be
     *            partitioned.
     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
     **/
    public KeyedStream<T, Tuple> keyBy(String... fields) {
        return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
    }

    private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
        return new KeyedStream<>(this, clean(KeySelectorUtil.getSelectorForKeys(keys,
                getType(), getExecutionConfig())));
    }
  • The keystream method of datastream is used to convert datastream to keyedstream. This method has different overloads
  • One is to support variable length int array, which is usually used for simple tuple types. Int is the subscript of tuple, starting from 0. If there are multiple ints, it means a combined key. For example, keyby (0,1) means to use the first and second fields of tuple as the key;
  • One is to support variable length string arrays, which are usually used for complex tuple types and POJO types. For POJO, string is used to specify field names, and also supports object / tuple nested properties, such as user.zip. For tuple of object type, F0 represents the first field of the tuple
  • One is to support keyselector. You can freely specify keys through the key selector function, such as extracting from objects and then doing some processing
  • Both keyby (int… Fields) and keyby (string… Fields) call private keyby (keys < T > keys) methods. Since the constructor of keyedstream requires keyselector parameters, this method finally converts keys into keyselector objects through keyselectorutil.getselectorforkeys

Keys.ExpressionKeys

flink-core-1.7.0-sources.jar!/org/apache/flink/api/common/operators/Keys.java

    /**
     * Represents (nested) field access through string and integer-based keys
     */
    public static class ExpressionKeys<T> extends Keys<T> {
        
        public static final String SELECT_ALL_CHAR = "*";
        public static final String SELECT_ALL_CHAR_SCALA = "_";
        private static final Pattern WILD_CARD_REGEX = Pattern.compile("[\\.]?("
                + "\\" + SELECT_ALL_CHAR + "|"
                + "\\" + SELECT_ALL_CHAR_SCALA +")$");

        // Flattened fields representing keys fields
        private List<FlatFieldDescriptor> keyFields;
        private TypeInformation<?>[] originalKeyTypes;

        //......

        /**
         * Create String-based (nested) field expression keys on a composite type.
         */
        public ExpressionKeys(String[] keyExpressions, TypeInformation<T> type) {
            checkNotNull(keyExpressions, "Field expression cannot be null.");

            this.keyFields = new ArrayList<>(keyExpressions.length);

            if (type instanceof CompositeType){
                CompositeType<T> cType = (CompositeType<T>) type;
                this.originalKeyTypes = new TypeInformation<?>[keyExpressions.length];

                // extract the keys on their flat position
                for (int i = 0; i < keyExpressions.length; i++) {
                    String keyExpr = keyExpressions[i];

                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }
                    // strip off whitespace
                    keyExpr = keyExpr.trim();

                    List<FlatFieldDescriptor> flatFields = cType.getFlatFields(keyExpr);

                    if (flatFields.size() == 0) {
                        throw new InvalidProgramException("Unable to extract key from expression '" + keyExpr + "' on key " + cType);
                    }
                    // check if all nested fields can be used as keys
                    for (FlatFieldDescriptor field : flatFields) {
                        if (!field.getType().isKeyType()) {
                            throw new InvalidProgramException("This type (" + field.getType() + ") cannot be used as key.");
                        }
                    }
                    // add flat fields to key fields
                    keyFields.addAll(flatFields);

                    String strippedKeyExpr = WILD_CARD_REGEX.matcher(keyExpr).replaceAll("");
                    if (strippedKeyExpr.isEmpty()) {
                        this.originalKeyTypes[i] = type;
                    } else {
                        this.originalKeyTypes[i] = cType.getTypeAt(strippedKeyExpr);
                    }
                }
            }
            else {
                if (!type.isKeyType()) {
                    throw new InvalidProgramException("This type (" + type + ") cannot be used as key.");
                }

                // check that all key expressions are valid
                for (String keyExpr : keyExpressions) {
                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }
                    // strip off whitespace
                    keyExpr = keyExpr.trim();
                    // check that full type is addressed
                    if (!(SELECT_ALL_CHAR.equals(keyExpr) || SELECT_ALL_CHAR_SCALA.equals(keyExpr))) {
                        throw new InvalidProgramException(
                            "Field expression must be equal to '" + SELECT_ALL_CHAR + "' or '" + SELECT_ALL_CHAR_SCALA + "' for non-composite types.");
                    }
                    // add full type as key
                    keyFields.add(new FlatFieldDescriptor(0, type));
                }
                this.originalKeyTypes = new TypeInformation[] {type};
            }
        }

        //......
    }
  • Expressionkeys is a static class in keys, which inherits the keys object; KeyBy (int… Fields) and keyBy (String… Fields) all converts fields to Keys.ExpressionKeys through new Keys.ExpressionKeys, and finally calls the private keyBy (fields) method.

KeySelectorUtil.getSelectorForKeys

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/util/keys/KeySelectorUtil.java

@Internal
public final class KeySelectorUtil {

    public static <X> KeySelector<X, Tuple> getSelectorForKeys(Keys<X> keys, TypeInformation<X> typeInfo, ExecutionConfig executionConfig) {
        if (!(typeInfo instanceof CompositeType)) {
            throw new InvalidTypesException(
                    "This key operation requires a composite type such as Tuples, POJOs, or Case Classes.");
        }

        CompositeType<X> compositeType = (CompositeType<X>) typeInfo;

        int[] logicalKeyPositions = keys.computeLogicalKeyPositions();
        int numKeyFields = logicalKeyPositions.length;

        TypeInformation<?>[] typeInfos = keys.getKeyFieldTypes();
        // use ascending order here, the code paths for that are usually a slight bit faster
        boolean[] orders = new boolean[numKeyFields];
        for (int i = 0; i < numKeyFields; i++) {
            orders[i] = true;
        }

        TypeComparator<X> comparator = compositeType.createComparator(logicalKeyPositions, orders, 0, executionConfig);
        return new ComparableKeySelector<>(comparator, numKeyFields, new TupleTypeInfo<>(typeInfos));
    }

    //......
}
  • The keyselectorutil.getselectorforkeys method is used to convert keys to the keyselector type

Summary

  • A keyselector parameter is required in different constructors of keyedstream
  • Datastream’s keyby method has different overloads and supports variable length int array, variable length string array and keyselector type
  • KeyBy (int… Fields) and keyBy (String… Fields) all converts fields to Keys.ExpressionKeys through new Keys.ExpressionKeys, and finally calls the private keyBy (fields) method, which converts the word to the type by calling the method.

doc