Java8 stream source code analysis

Time:2021-9-9

Java8 stream source code analysis

Stream

Stream is an operation interface added in Java se 8 API to enhance collection, which allows you to process collection data in a declarative way. The set to be processed is regarded as the creator of a flow, the elements in the set are converted into a flow and transmitted in the pipeline, and can be processed on the nodes of the pipeline, such as filtering, sorting, aggregation, etc. The element flow is processed by intermediate operation in the pipeline, and finally the result of the previous processing is obtained by final operation. The inheritance diagram of stream is as follows, and let me slowly peel off the cocoon.

ReferencePipeline

Filtering, transformation, aggregation, reduction
Stream.of("one", "two", "three", "four")
       .filter(e -> e.length() > 3)
       .peek(e -> System.out.println("Filtered value: " + e))
       .map(String::toUpperCase)
       .peek(e -> System.out.println("Mapped value: " + e))
       .collect(Collectors.toList());
Copy code

Before there was no stream, most of our processing of set data was external traversal, and then we did data aggregation, sorting, merge and so on. This belongs to the OO idea. After the introduction of Java se 8 and FP, the operation of FP can improve the productivity of Java programmers. Lambda expressions based on type inference can enable programmers to write efficient, clean and concise code. Redundant code can be avoided. Pass the operation according to the given setstream()Method to create an initial flow and fitmap(),flatMap(),filter()Filter and convert the set data. I won’t talk more about API calls here. Starting directly from the source code, the core of the above figure is that the class isAbstractPipelineReferencePipelineandSinkInterface.AbstractPipelineAbstract class is the highly abstract source of pipeline in the whole streamsourceStage, upstreampreviousStage, downstreamnextStage, definitionevaluateEnd method, andReferencePipelineIt abstracts the functions of filtering, transformation, aggregation and reduction. The addition of each function can actually be understood as cabbage. Cabbage is the source. Each addition of a function is equivalent to a new leaf covering cabbage. After the integration of the last function, the whole cabbage will grow up. andSinkThe interface is responsible for concatenating the whole pipeline, and then adjusting it when performing aggregation and reductionAbstractPipelineAbstract classevaluateThe end method calls different end logic according to whether it is parallel execution. If it is not a parallel method, it will be executedterminalOp.evaluateSequentialOtherwise, executeterminalOp.evaluateParallelYes, in non parallel execution modeAbstractPipelineAbstract classwrapAndCopyIntoMethod to callcopyInto, it will be executed before callingwrapSinkTo peel off the cabbage we produced on the assembly line. Traverse from downstream to upstreamAbstractPipeline, then package it into sink, and thencopyIntoThe corresponding method is executed iteratively within the method. Finally, the call is completed,
Parallel execution is actually building aForkJoinTaskAnd executeinvokeTo submit toForkJoinPoolThread pool.

BaseStream

BaseStream

The basic interface of stream, which specifies that the stream can support disordered, sequential and parallel. Stream implements the basestream interface.

  • Iterator iterator();

    External Iterator

  • Spliterator spliterator();

    Used to create an internal iterator

  • isParallel

    Used to determine whether the stream is parallel

  • S sequential();

    Identify that the stream creation is performed sequentially

  • S parallel();

    Identify that the stream creation is parallel and needs to be usedForkJoinPool

  • S unordered();

    Identifies that the stream creation is out of order

  • S onClose(Runnable closeHandler);

    When the stream is closed, execute a method callback to close the stream.

PipelineHelper

PipelineHelper

This abstract class mainly defines the core methods of the operation pipeline, and can collect all the information in the flow pipeline. If passedTerminalOp#evaluateParallelUsed to perform parallel stream operations throughTerminalOp#evaluateSequentialPerform the operation of sequential flow.

  • abstract StreamShape getSourceShape();

    Used to define the prototype of the elements in the stream and return an enumeration value for slicing operationlimitperhapsskip

    Enumeration value range {reference: reference type element, int_value: int type element, long_value: long type element, double_value: double type element}

  • abstract int getStreamAndOpFlags();

    Used to obtain the prototype of the elements in the flow and the combination of all operations,StreamAll instructions defining stream types and operations in are contained in the ‘streamopflag’ enumeration class. First look at the operation of complement mask

    Common CRUD operations for bit masks
          A&~b: clear flag bit B;
          A|b: add flag bit B;
          A & B: take out the flag bit B;
          A ^ B: take out the different parts of a and B;
      The following is the table corresponding to the flag bit of the corresponding flow.
      /*
       * Characteristics belong to certain types, see the Type enum. Bit masks for
       * the types are constructed as per the following table:
       *
       *                        DISTINCT  SORTED  ORDERED  SIZED  SHORT_CIRCUIT
       *          SPLITERATOR      01       01       01      01        00
       *               STREAM      01       01       01      01        00
       *                   OP      11       11       11      10        01
       *          TERMINAL_OP      00       00       10      00        01
       * UPSTREAM_TERMINAL_OP      00       00       10      00        00
       *
       * 01 = set/inject SET_ Bits = 0b01 setting instruction
       * 10 = clear CLEAR_ Bits = 0b10 clear command
       * 11 = preserve PRESERVE_ Bits = 0b11 save instruction
       */
      Constructor
       private StreamOpFlag(int position, MaskBuilder maskBuilder) {
          this.maskTable = maskBuilder.build();
          // Two bits per flag
          position *= 2;
          this.bitPosition = position;
          this.set = SET_BITS << position;
          this.clear = CLEAR_BITS << position;
          this.preserve = PRESERVE_BITS << position;
      }
    Copy code
    • StreamOpFlag.DISTINCT

      DISTINCT(0,set(Type.SPLITERATOR).set(Type.STREAM).setAndClear(Type.OP))

      output:StreamOpFlag.DISTINCT: StreamOpFlag(maskTable={SPLITERATOR=1, STREAM=1, OP=3, TERMINAL_OP=0, UPSTREAM_TERMINAL_OP=0}, bitPosition=0, set=1, clear=2, preserve=3)

    OK, we know that the [set] offset bit of streamopflag.distinct is 1, and the hexadecimal representation is 0x00000001. When getstreamandopflags returns the containingIS_DISTINCTThat is, 0x00000001 means that for the X and Y elements encountered in the stream, {@ code! X. equals (y)}. The corresponding is includeSpliterator.DISTINCT, identify that the stream is already distinct.

    • StreamOpFlag.SIZED

      SIZED(3, set(Type.SPLITERATOR).set(Type.STREAM).clear(Type.OP))

      output:StreamOpFlag.SIZED: StreamOpFlag(maskTable={SPLITERATOR=1, STREAM=1, OP=2, TERMINAL_OP=0, UPSTREAM_TERMINAL_OP=0}, bitPosition=6, set=64, clear=128, preserve=192)【0x00000040】->[Spliterator.SIZED]

    Indicates from before traversing or splittingestimateSize()The characteristic value of the returned value represents a finite size. Without modifying the source structure, this value represents the exact value of the number of elements in the complete traversal flow. If the stream does not have the size|subsized attribute, you can return the estimatesize as long.max_ Value. This indicates that the estimatesize calculation of this stream is very complex or it is an infinite stream. After this setting, the performance will be worse, but it will not affect the sorted method. If you want to perform parallel operations on streams, you can implement customSpliteratorIt needs to be rewrittentrySplit()Methods andlong estimateSize()method. Add the splitter into the fork / join thread pool by splitting it, and then realize parallel processing.

    • StreamOpFlag.SORTED

      SORTED(1, set(Type.SPLITERATOR).set(Type.STREAM).setAndClear(Type.OP))

      output:StreamOpFlag.SORTED: StreamOpFlag(maskTable={SPLITERATOR=1, STREAM=1, OP=3, TERMINAL_OP=0, UPSTREAM_TERMINAL_OP=0}, bitPosition=2, set=4, clear=8, preserve=12) 【0x00000004】->[Spliterator.SORTED]

    Indicates that the order in the stream follows the defined sort order. If this property is included, the methodgetComparator()Returns the associated comparator or null if the property is set and the methodgetComparator()Returns null, which indicates that the stream change has been ordered. If the methodgetComparator()If the return is not null, thenfromCharacteristicsMethod, the sorted attribute will be cancelled. If all elements in the stream implement comparable, the sorting order is based on their natural ordersorted(x->{...})Method execution can be passed into a lambda. If any value is transferred in, the stream will be sorted according to the lambda

    • StreamOpFlag.ORDERED

      ORDERED(2, set(Type.SPLITERATOR).set(Type.STREAM).setAndClear(Type.OP).clear(Type.TERMINAL_OP) .clear(Type.UPSTREAM_TERMINAL_OP))

      output:StreamOpFlag.ORDERED: StreamOpFlag(maskTable={SPLITERATOR=1, STREAM=1, OP=3, TERMINAL_OP=2, UPSTREAM_TERMINAL_OP=2}, bitPosition=4, set=16, clear=32, preserve=48)【0x00000010】->[Spliterator.ORDERED]

    Indicates that the order of the elements in the flow has been defined. Contains the ordered attribute, which is the guarantee of the splittertrySplitMandatory preconditions for splitting elements,tryAdvanceMethod will also be split element by element in the defined order,forEachRemainingMethods also perform internal iterations in a defined order. Generally, the order of sets is ascending. However, for hash based collections, such as HashSet, the order is not guaranteed. Therefore, the sorting constraint should be enforced in parallel computing without exchange scenarios.

    • StreamOpFlag.SHORT_CIRCUIT

      SHORT_CIRCUIT(12, set(Type.OP).set(Type.TERMINAL_OP))

      output:StreamOpFlag.SHORT_ Circuit: streamopflag (masktable = {splitter = 0, stream = 0, Op = 1, terminal_op = 1, upstream_terminal_op = 0}, bitposition = 24, set = 16777216, clear = 33554432, preserve = 50331648) [0x01000000] – > [indicates that the operation may short circuit the current]

    Indicates that the operation may short circuit the current

  • abstract

    long exactOutputSizeIfKnown(Spliterator

    spliterator);

Apply elements within the pipeline at this time to the providedSpliteratorAnd send the result to the provided receiver sink

  • abstract

    > S wrapAndCopyInto(S sink, Spliterator

    spliterator);

The size used to output the return value.

  • abstract

    void copyInto(Sink

    wrappedSink, Spliterator

    spliterator);

Used to transfer data fromSpliteratorThe obtained element is pushed into the provided receiverSink。 If it is known that there is a short circuit phase in the flow pipeline (including streamopflag#short_current), execute it after each elementSink#cancellationRequested(), if the request returns true, the execution is terminated. After this method is implemented, you need to abide by the sink protocol, that is, sink #begin – > sink #accept – > sink – > end

  • abstract

    void copyIntoWithCancel(Sink

    wrappedSink, Spliterator

    spliterator);

Used to transfer data fromSpliteratorThe obtained element is pushed into the provided receiverSink。 Execute after each elementSink#cancellationRequested(), if the request returns true, the execution is terminated. After this method is implemented, you need to abide by the sink protocol, that is, sink #begin – > sink #accept – > sink – > end

  • abstract

    Sink

    wrapSink(Sink

    sink);

This method is mainly used to package sink and traverse from downstream to upstreamAbstractPipelineAnd then packaged into a sink forcopyIntoThe corresponding method is executed iteratively within the method.

  • abstract Node.Builder

    makeNodeBuilder(long exactSizeIfKnown,IntFunction

    generator);

It is used to construct a node builder and convert it into an array to process the array type, which is the same as the output type defined by pipelinehelper.

  • abstract

    Node

    evaluate(Spliterator

    spliterator,boolean flatten,IntFunction

    generator);

This method applies the source splitter to all elements in the pipeline. For array processing. If the pipe has no middle(filter,map)Operation, and the source is supported by a node (source), the node will be returned (internal traversal and then return). This reduces the duplication of pipes composed of stateful operations and terminal operations that return arrays. For example: stream. Sorted(). Toarray(); This method corresponds toAbstractPipelineInternal code is as follows:

@Override
  @SuppressWarnings("unchecked")
  final  Node evaluate(Spliterator spliterator,
                                    boolean flatten,
                                    IntFunction generator) {
      if (isParallel()) {
          // @@@ Optimize if op of this pipeline stage is a stateful op
          return evaluateToNode(this, spliterator, flatten, generator);
      }
      else {
          Node.Builder nb = makeNodeBuilder(
                  exactOutputSizeIfKnown(spliterator), generator);
          return wrapAndCopyInto(nb, spliterator).build();
      }
  }
Copy code
AbstractPipeline

The abstract base class of the “pipeline” class is the core implementation of the flow interface and its original specialization. Used to represent the initial part of the flow pipeline, encapsulating the flow source and zero or more intermediate operations. For sequential flow, parallel flow and parallel flow without state intermediate operation, the data processing in the pipeline is completed in the process of “blocking” all operations at one time, that is, it is processed at the end. For a parallel flow with state operations, the execution is divided into multiple segments, where each state operation marks the end of one segment, each segment is evaluated separately, and the result is used as the input of the next segment. In all the above cases, the processing of source data is started only after the terminal operation.

  • AbstractPipeline(Supplier> source,
    int sourceFlags, boolean parallel)

The first parameter of creating source stage specifies a supplier interface (factory mode). Only objects of splitter > can be generated. According to the incoming lambda implementation, extends SpliteratorUnderstand the PECS principles of generics.)

  • AbstractPipeline(Spliterator> source,
    int sourceFlags, boolean parallel)

Create the first parameter of the source stage, and formulate the splitter. Like the above construction method, directly analyze this method:

AbstractPipeline(Spliterator> source,
                    int sourceFlags, boolean parallel) {
       this.previousStage = null;
       this.sourceSpliterator = source;
       this.sourceStage = this;
       this.sourceOrOpFlags = sourceFlags & StreamOpFlag.STREAM_MASK;
       // The following is an optimization of:
       // StreamOpFlag.combineOpFlags(sourceOrOpFlags, StreamOpFlag.INITIAL_OPS_VALUE);
       this.combinedFlags = (~(sourceOrOpFlags << 1)) & StreamOpFlag.INITIAL_OPS_VALUE;
       this.depth = 0;
       this.parallel = parallel;
   }
Copy code

When creating the stream source phasepreviousStagebynullthis.sourceOrOpFlags = sourceFlags & StreamOpFlag.STREAM_MASK;Used to set the identification bit of the current stage.this.combinedFlags = (~(sourceOrOpFlags << 1)) & StreamOpFlag.INITIAL_OPS_VALUE;Add the operation ID of the stream in the source phase. ThiscombinedFlagsIt is the collection of all operations in the whole pipeline, which is parsed during the final specification operation.

  • AbstractPipeline(AbstractPipeline previousStage, int opFlags)

Create downstream from upstreamPipeline

AbstractPipeline(AbstractPipeline, E_IN, ?> previousStage, int opFlags) {
      if (previousStage.linkedOrConsumed)
          throw new IllegalStateException(MSG_STREAM_LINKED);
      previousStage.linkedOrConsumed = true;
      previousStage.nextStage = this;

      this.previousStage = previousStage;
      this.sourceOrOpFlags = opFlags & StreamOpFlag.OP_MASK;
      this.combinedFlags = StreamOpFlag.combineOpFlags(opFlags, previousStage.combinedFlags);
      this.sourceStage = previousStage.sourceStage;
      if (opIsStateful())
          sourceStage.sourceAnyStateful = true;
      this.depth = previousStage.depth + 1;
  }
Copy code

this.sourceStage = previousStage.sourceStage;For upstream and downstream associations,this.combinedFlags = StreamOpFlag.combineOpFlags(opFlags, previousStage.combinedFlags);Add the upstream operation identification bit to the operation identification bit of this stage.depthRecords the intermediate operands for the entire pipeline.

  • final R evaluate(TerminalOp terminalOp)

Perform terminal aggregation calculation. Execute the final calculation and get the result. Call different end logic according to whether it is parallel execution. If it is not parallel method, execute itterminalOp.evaluateSequentialOtherwise, executeterminalOp.evaluateParallel

  • final Node evaluateToArrayNode(IntFunction generator)

Process stream conversion array.

final Node evaluateToArrayNode(IntFunction generator) {
        if (linkedOrConsumed)
            throw new IllegalStateException(MSG_STREAM_LINKED);
        linkedOrConsumed = true;
        if (isParallel() && previousStage != null && opIsStateful()) {
            depth = 0;
            return opEvaluateParallel(previousStage, previousStage.sourceSpliterator(0), generator);
        }
        else {
            return evaluate(sourceSpliterator(0), true, generator);
        }
    }
Copy code

When converting an array, if it is a parallel stream and is not the source phase, and it has been calledsorted||limit||skip||distinctAfter these stateful operations, here is a template method call. Actually, by callingDistinctOps||SortedOps||SliceOpsThese are implementedopEvaluateParallelMethod to commit to the forkjoin thread pool to convert the array. Direct execution during serial executionevaluate(sourceSpliterator(0), true, generator);

  • evaluate(sourceSpliterator(0), true, generator);

The specific execution method is used to put the output results inside the pipeline into the node.

@Override
    @SuppressWarnings("unchecked")
    final  Node evaluate(Spliterator spliterator,
                                      boolean flatten,
                                      IntFunction generator) {
        if (isParallel()) {
            // @@@ Optimize if op of this pipeline stage is a stateful op
            return evaluateToNode(this, spliterator, flatten, generator);
        }
        else {
            Node.Builder nb = makeNodeBuilder(
                    exactOutputSizeIfKnown(spliterator), generator);
            return wrapAndCopyInto(nb, spliterator).build();
        }
    }
 @Override
    final  Node evaluateToNode(PipelineHelper helper,
                                        Spliterator spliterator,
                                        boolean flattenTree,
                                        IntFunction generator) {
        return Nodes.collect(helper, spliterator, flattenTree, generator);
    }
    //Nodes.collect method
    public static  Node collect(PipelineHelper helper,
                                                    Spliterator spliterator,
                                                    boolean flattenTree,
                                                    IntFunction generator) {
        long size = helper.exactOutputSizeIfKnown(spliterator);
        if (size >= 0 && spliterator.hasCharacteristics(Spliterator.SUBSIZED)) {
            if (size >= MAX_ARRAY_SIZE)
                throw new IllegalArgumentException(BAD_SIZE);
            P_OUT[] array = generator.apply((int) size);
            new SizedCollectorTask.OfRef<>(spliterator, helper, array).invoke();
            return node(array);
        } else {
            Node node = new CollectorTask.OfRef<>(helper, generator, spliterator).invoke();
            return flattenTree ? flatten(node, generator) : node;
        }
    }
Copy code

If the source is a parallel stream, useReferencePipelineReferring to the pipeline, the main implementation isreturn Nodes.collect(helper, spliterator, flattenTree, generator);, the collect method is internally based on whether the cutter exists or notSpliterator.SUBSIZEDThe length of the generated node is determined. The main work is to create a task and submit it to the thread pool. Then call invoke to get the result. Sample codeArrays.asList("2","22","222").parallelStream().skip(2).toArray();The whole process is as follows:
image
Serial execution example codeArrays.asList("2","22","222").stream().skip(2).toArray();The whole process is as follows:
image

  • final Spliterator sourceStageSpliterator()

Get the splitter set by the stream source. If it is set, return and empty the source splitter. If there is a supplier, call the get method to return the splitter and empty the source splitter.

  • public final S sequential()

Set to serial stream, and set the parallel property of the source to false. The final state method cannot be overridden

  • public final S sequential()

Set to parallel flow, and set the parallel property of the source to true. The final state method cannot be overridden

  • public void close()

For the method of closing the pipeline, the pipeline usage flag will be set to false and the splitter will be set to null. If the source callback closing job exists and is not null, the invoker will call back the job.

  • public S onClose(Runnable closeHandler)

It is used to register the closed callback job and execute the callback job when calling close.

  • public Spliterator spliterator()

andsourceStageSpliteratorMethod, but it is not a final state method. It can be rewritten for user-defined extension.

  • public final boolean isParallel()

Used to drive whether your current pipeline is a parallel flow.

  • final int getStreamFlags()

Gets the flag of the stream and all operations contained in the stream.

  • private Spliterator> sourceSpliterator(int terminalFlags) {

Get source splitter, andsourceStageSpliteratorMethod has the same function. When a parallel stream is created and there is an intermediate state in the stream creation stage, the stream flag and operation will be combined to build a splitter. If the passed opcode is not equal to 0, it is added to the opcode of the splitter.

  • final StreamShape getSourceShape()

Type of output stream source. (reference or int or double or long)

  • final

    long exactOutputSizeIfKnown(Spliterator

    spliterator)

Get the expected size. If the splitter has a size flag, call the getexactsizeifknow method of the splitter. Otherwise, return – 1.

  • final

    > S wrapAndCopyInto(S sink, Spliterator

    spliterator)

The stage of encapsulating the entire pipeline is packaged in sink. Connect each stage in series. Packaged inside sinkdownstream.

The code execution process of wrapandcopyinto is as follows:
image

After reading three things ❤️

If you think this article is very helpful to you, I’d like to invite you to help me with three small things:

  1. Praise, forwarding, and your “praise and comments” are the driving force for my creation.

  2. Pay attention to the official account.Rotten pig skin“And share original knowledge from time to time.

  3. At the same time, we can look forward to the follow-up articles

Java8 stream source code analysis

source:club.perfma.com/article/116…