Delayed execution and immutability, the system explains javastream data processing

Time:2021-7-28

Recently, when I was writing business in the company, I suddenly couldn’t rememberStreamHow should the accumulation in be written?

But I can only program for Google. It took me three precious minutes to learn. It’s very simple.

Since I used jdk8, stream is my most commonly used feature. It is used for various streaming operations. However, after this incident, I suddenly feel that stream is really strange to me.

Maybe everyone is the same,For the most commonly used things, it is also the easiest to ignore them, even if you have to prepare for an interview, you probably can’t remember to take a look at stream.

However, since I have noticed it, I have to comb it again, which can be regarded as checking the leaks and filling the gaps in my overall knowledge system.

I’ve spent a lot of time writing this stream. I hope you and I can get to know and learn about stream, understand API and internal features,Fear what truth is infinite, there is further joy.

In this article, I divide the content of stream into the following parts:

Delayed execution and immutability, the system explains javastream data processing

At first glance, you may be a little confused about the terms transform stream operation and end stream operation. In fact, I divide all APIs in stream into two categories, and each category has a corresponding name (refer to Java 8 related books, see the end):

  • Transform stream operation: for example, filter and map methods convert one stream into another, and the return value is stream.
  • End stream operationFor example, count and collect methods summarize a stream into the results we need, and the return value is not stream.

Among them, the API of conversion flow operation is also divided into two categories. There will be detailed examples in the article. Let’s take a look at the definition first and have a general impression:

  1. Stateless: that is, the execution of this method does not depend on the result set of the previous method.
  2. Stateful: that is, the execution of this method depends on the result set of the previous method.

Because there are too many contents in the stream, I split the stream into two parts. This is the first one, full and accurate, and the use cases are simple and rich.

Although there is only one termination operation in the second part, the termination operation API is relatively complex, so the content is also detailed, and the use cases are simple and rich. In terms of length, the two are similar. Please look forward to it.


notes: because my local computer is jdk11 and I forgot to switch to jdk8 when writing, a large number of problems appear in the use casesList.of()It is not available in jdk8. It is equivalent to that in jdk8Arrays.asList()

1. Why use stream?

Everything also comes from the release of jdk8. In that era of functional programming language in full swing, java was criticized for its bloated (strong object-oriented). The community urgently needs JAVA to add functional language features to improve this situation. Finally, Java released jdk8 in 2014.

In jdk8, I think the biggest new feature is the addition of functional interfaces and lambda expressions, which are taken from functional programming.

It’s easy to use Java functions to consolidate the status of foreigners and make them more elegant.

Stream is a class library made by jdk8 for the collection class library based on the above two features. It allows us to process the data in the collection in a pipeline way through lambda expression, and can easily complete operations such as filtering, grouping, collection and reduction. Therefore, I would like to call stream the best practice of functional interface.

1.1 clearer code structure

Stream has a clearer code structure. In order to better explain how stream makes the code clearer, let’s assume that we have a very simple requirement:Find all elements greater than 2 in a collection

Let’s take a look before using stream:

        List<Integer> list = List.of(1, 2, 3);
        
        List<Integer> filterList = new ArrayList<>();
        
        for (Integer i : list) {
            if (i > 2) {
                filterList.add(i);
            }
        }
        
        System.out.println(filterList);

The above code is easy to understand, but I can’t explain it more. In fact, it’s OK, because our requirements are relatively simple. What if we need more?

For each additional requirement, a condition will be added to the if. In our development, there are often many fields on the object, so there may be four or five conditions. Finally, it may become like this:

        List<Integer> list = List.of(1, 2, 3);

        List<Integer> filterList = new ArrayList<>();

        for (Integer i : list) {
            if (i > 2 && i < 10 && (i % 2 == 0)) {
                filterList.add(i);
            }
        }

        System.out.println(filterList);

If there are many conditions in it, it looks messy. In fact, it’s OK. The most important thing is that there are often many similar requirements in the project. The difference between them is that a certain condition is different. Then you need to copy a large piece of code, change it, and go online, resulting in a large number of duplicate codes in the code.

If you stream, everything will become clear and easy to understand:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

In this code, you only need to pay attention to what we pay most attention to: the filter condition is enough. The method name of filter can let you clearly know that it is a filter condition, and the method name of collect can also see that it is a collector to collect the final results into a list.

At the same time, you may find that why don’t you write loops in the above code?

Because stream will help us carry out implicit loops, this is called:Internal iteration, the corresponding is our common external iteration.

So even if you don’t write a loop, it will loop again.

1.2 do not care about variable status

Stream was designed asImmutable, its immutability has two meanings:

  1. Since each stream operation will generate a new stream, the stream is immutable, just like a string.
  2. Only the reference of the original collection is saved in the stream, so when performing some operations that will modify the element, a new element is generated through the original element, so any operation of the stream will not affect the original object.

The first meaning can help us make chain calls. In fact, we often use chain calls when using stream, while the second meaning is a major feature of functional programming: no state modification.

No matter what operation is performed on the stream, it will not affect the original set in the end, and its return value is calculated based on the original set.

Therefore, in the stream, we don’t have to care about the side effects of operating the original object collection. It’s over.

For functional programming, seeOn Ruan Yifeng’s functional programming

1.3 delayed execution and optimization

Stream only encounteredTerminate operationIt will be executed only when, for example:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println);

Such a piece of code will not be executed. The peek method can be regarded as foreach. Here I use it to print the elements in the stream.

Because both filter method and peek method are transformation flow methods, execution will not be triggered.

If we add a count method later, it can be executed normally:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println)
                .count();

The count method is a termination operation used to calculate the number of elements in the stream, and its return value is a long type.

This feature of stream that will not be executed without terminating operations is calledDelayed execution

At the same time, stream also names the stateless methods in the APICircular mergeSee Section III for specific examples.

2. Create a stream

For the sake of the integrity of this article, I think about adding the section of creating a stream. This section mainly introduces some common methods of creating a stream. The creation of a stream can be divided into two cases:

  1. Create using steam interface
  2. Create through collection class library

At the same time, we will also talk about the parallel flow and connection of stream. Both create stream, but they have different characteristics.

2.1 create through stream interface

As an interface, stream defines several static methods in the interface to provide us with APIs for creating streams:

    public static<T> Stream<T> of(T... values) {
        return Arrays.stream(values);
    }

The first is the of method, which provides a generic variable parameter and creates a stream stream with generic for us. At the same time, if your parameter is a basic type, it will use automatic packaging to wrap the basic type:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<Double> doubleStream = Stream.of(1.1d, 2.2d, 3.3d);

        Stream<String> stringStream = Stream.of("1", "2", "3");

Of course, you can also directly create an empty stream by calling another static method – empty (), whose generic type is an object:

        Stream<Object> empty = Stream.empty();

The above methods are easy for us to understand. There is another way to create a stream with unlimited number of elements – generate():

    public static<T> Stream<T> generate(Supplier<? extends T> s) {
        Objects.requireNonNull(s);
        return StreamSupport.stream(
                new StreamSpliterators.InfiniteSupplyingSpliterator.OfRef<>(Long.MAX_VALUE, s), false);
    }

In terms of method parameters, it accepts a functional interface – Supplier as the parameter. This functional interface is the interface used to create objects. You can compare it to the object creation factory. Stream puts the objects created from this factory into the stream:

        Stream<String> generate = Stream.generate(() -> "Supplier");

        Stream<Integer> generateInteger = Stream.generate(() -> 123);

I use lamdba to construct a supplier object directly. You can also directly pass in a supplier object, which will construct the object through the get () method of the supplier interface.

2.2 create through collection class library

Compared with the above method, the second method is more common. We often operate on the collection stream instead of manually building a stream:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();
        
        Stream<String> stringStreamList = List.of("1", "2", "3").stream(); 

In Java 8, the top-level interface of a collectionCollectionA new interface default method has been added——stream(), through this method, we can easily create a stream for all collection subclasses:

        Stream<Integer> listStream = List.of(1, 2, 3).stream();
        
        Stream<Integer> setStream = Set.of(1, 2, 3).stream();

By consulting the source code, you can send firststream()Method essentially creates a stream by calling a stream tool class:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

2.3 creating parallel streams

In the above example, all streams are serial streams. In some scenarios, in order to maximize the performance of multi-core CPUs, we can use parallel streams, which perform parallel operations through the fork / join framework introduced in JDK7. We can create parallel streams in the following ways:

        Stream<Integer> integerParallelStream = Stream.of(1, 2, 3).parallel();

        Stream<String> stringParallelStream = Stream.of("1", "2", "3").parallel();

        Stream<Integer> integerParallelStreamList = List.of(1, 2, 3).parallelStream();

        Stream<String> stringParallelStreamList = List.of("1", "2", "3").parallelStream();

Yes, there is no method to directly create a parallel flow in the static method of the stream. We need to call the parallel () method again after constructing the stream to create a parallel flow, because calling the parallel () method does not re create a parallel flow object, but sets a parallel parameter on the original stream object.

Of course, we can also see that parallel streams can be created directly in the collection interface, just by calling andstream()CorrespondingparallelStream()Methods, as I just mentioned, there are only parameter differences between them:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

    default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }

However, in general, we do not need to use parallel streams. When the number of elements in the stream is less than 1000, the performance will not be greatly improved, because it is also cost to distribute the elements to different CPUs for calculation.

The advantage of parallelism is to make full use of the performance of multi-core CPU, but in use, the data is often segmented and then distributed to each CPU for processing. If the data we use is array structure, it can be easily segmented, but if it is linked list structure data or hash structure data, it is obviously not as convenient as array structure.

Therefore, only when the elements in the stream are more than 10000 or even larger, the selection of parallel stream can bring you more obvious performance improvement.

Finally, when you have a parallel stream, you can also usesequential()Convert it to serial stream conveniently:

        Stream.of(1, 2, 3).parallel().sequential();

2.4 connecting streams

If you construct two streams at two places and want to use them together, you can use concat():

        Stream<Integer> concat = Stream
                .concat(Stream.of(1, 2, 3), Stream.of(4, 5, 6));

If two different generic flows are combined, automatic inference will automatically infer two parent classes with the same type:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<String> stringStream = Stream.of("1", "2", "3");

        Stream<? extends Serializable> stream = Stream.concat(integerStream, stringStream);

3. Stateless method of stream conversion operation

Delayed execution and immutability, the system explains javastream data processing

Stateless method: that is, the execution of this method does not depend on the result set executed by the previous method.

There are about three stateless APIs commonly used in stream:

  1. map()Method: the parameter of this method is a function object, which enables you to customize the elements in the collection and retain the elements after the operation.
  2. filter()Method: the parameter of this method is a predicate object, and the execution result of predicate is a boolean type, so this method only retains the element with the return value of true, just like its name. We can use this method to do some filtering operations.
  3. flatMap()Method: like the map () method, the parameter of this method is a function object, but the return value of this function must be a stream. This method can aggregate the elements in multiple streams for return.

Let’s take a look at an example of the map () method:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();

        Stream<Integer> mapStream = integerStreamList.map(i -> i * 10);

We have a list and want to edit each element in itMultiply by 10You can use the above writing method, in whichiIs the variable name of the element in the list,The following logic is the operation to be performed on this element. In a very concise and clear way, a piece of code is passed in for logical execution, and this code will finally return a new stream containing the operation results.

Here, in order to better help you understand, I draw a diagram:

Delayed execution and immutability, the system explains javastream data processing


Next is an example of the filter () method:

        Stream<Integer> integerStreamList = List.of(10, 20, 30).stream();

        Stream<Integer> filterStream = integerStreamList.filter(i -> i >= 20);

In this code, thei >= 20This logic, and then save the result with the return value of true in a new stream and return it.

Here I also have a simple diagram:

Delayed execution and immutability, the system explains javastream data processing


flatMap()I have described the method above, but it is a little too abstract. I also searched many examples in learning this method to have a better understanding.

According to the official documents, this method is used to flatten one to many elements:

        List<Order> orders = List.of(new Order(), new Order());

        Stream<Item> itemStream = orders.stream()
                .flatMap(order -> order.getItemList().stream());

Here I use an order example to illustrate this method. Each order contains a commodity list. If I want to form all the commodity lists in the two orders into a new commodity list, I need to use the flatmap () method.

In the above code example, we can see that each order returns a stream of commodity list. In this example, we have only two orders, so the stream of two commodity lists will eventually be returned. The function of the flatmap () method is to extract the elements in these two streams and put them into a new stream.

The old rule is illustrated by a simple diagram:

Delayed execution and immutability, the system explains javastream data processing

In the legend, I use cyan to represent the stream. In the final output, you can see that flatmap () turns two streams into one stream for output, which is very useful in some scenarios, such as my order example above.


There is also a less commonly used stateless methodpeek()

    Stream<T> peek(Consumer<? super T> action);

The peek method accepts a consumer object as a parameter, which is a parameter with no return value. We can do some operations such as printing elements through the peek method:

        Stream<Integer> peekStream = integerStreamList.peek(i -> System.out.println(i));

However, if you are not familiar with it, it is not recommended. In some cases, it will not take effect, such as:

        List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .peek(System.out::println)
                .count();

The API document also indicates that this method is used for debugging. In my experience, peek will execute only when the stream finally needs to reproduce elements.

In the above example, count only needs to return the number of elements, so peek is not executed. If you change to the collect method, it will be executed.

Or if there are filtering methods in the stream, such as filter method and match related methods, it will also execute.

3.1 foundation type stream

The previous section mentioned the three most commonly used stateless methods in the three streams. Among the stateless methods of the stream, there are several methods corresponding to map() and flatmap(), which are:

  1. mapToInt
  2. mapToLong
  3. mapToDouble
  4. flatMapToInt
  5. flatMapToLong
  6. flatMapToDouble

These six methods can be seen from the method name first. They only convert the return value based on map () or flatmap (). It is not necessary to carry them out alone to make a method. In fact, their key lies in the return value:

  1. The return value of maptoint isIntStream
  2. Maptolong returns a value ofLongStream
  3. The return value of maptodouble isDoubleStream
  4. The return value of flatmaptoint isIntStream
  5. The return value of flatmaptolong isLongStream
  6. The return value of flatmaptodouble isDoubleStream

In jdk5, in order to make Java more object-oriented, the concept of packaging class is introduced. The eight basic data types correspond to a packaging class, which enables you to automatically unpack / box when using the basic type, that is, automatically use the conversion method of packaging class.

For example, in the previous example, I used this example:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

I used the basic data type parameter in creating a stream, and its generic type is automatically wrapped into an integer. However, sometimes we may ignore the automatic disassembly of boxes at a cost. If we want to ignore this cost in using a stream, we can use the stream transit to a stream designed for the basic data type:

  1. Intstream: correspondingInt, short, char, Boolean in basic data type
  2. Longstream: corresponds to long in the basic data type
  3. Doublestream: corresponds to double and float in the basic data type

In these interfaces, the stream can be constructed through the of method as in the above example, and the box will not be disassembled automatically.

Therefore, the six methods mentioned above are actually to convert ordinary streams into this basic type of stream, which can have higher efficiency when we need it.

The basic type flow has the same API as stream in terms of API, so as long as you understand stream in terms of use, the basic type flow is also the same.

notes: intstream, longstream and doublestream are all interfaces, but they do not inherit from the stream interface.

3.2 cyclic merging of stateless methods

After finishing these stateless methods, let’s look at an example in the previous article:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

In this example, I used the filter method three times. Do you think the stream will cycle three times for filtering?

If one of the filters is changed to map, how many times do you think it will cycle?

        List<Integer> list = List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

From our intuition, we need to use the map method to process all elements first, and then use the filter method to filter, so we need to execute three loops.

However, reviewing the definition of stateless methods, you can find that the other three conditions can be done in a loop, because the filter only depends on the calculation results of the map and does not have to rely on the result set after the map is executed. Therefore, as long as the map is operated first and then the filter is operated, they can be completed in a loop. This optimization method is calledCircular merge

All stateless methods can be executed in the same loop, and they can also be easily executed on multiple CPUs using parallel streams.

4. Stateful method of stream conversion operation

Having finished the stateless method, the stateful method is relatively simple. Its function can be known only by its name:

Method name Method results
distinct() Element de duplication.
sorted() Element sorting, two overloaded methods, can pass in a sorting object when necessary.
limit(long maxSize) Pass in a number, which means that only the first X elements are taken.
skip(long n) Pass in a number to skip x elements and take the following elements.
takeWhile(Predicate predicate) Jdk9 adds that an assertion parameter is passed in. When the first assertion is false, it stops and returns the element previously asserted as true.
dropWhile(Predicate predicate) Jdk9 adds that an assertion parameter is passed in. When the first assertion is false, it stops and deletes the elements previously asserted as true.

The above is all stateful methods. Their method execution must depend on the result set of the previous method. For example, sorting methods need to rely on the result set of the previous method to sort.

At the same time, the limit method and takeWhile are two short-circuit operation methods, which means higher efficiency, because we may have selected the elements we want before the internal loop is completed.

Therefore, stateful methods can not be executed in a loop like stateless methods. Each stateful method has to go through a separate internal loop. Therefore, the order of writing code will affect the execution results and performance of the program. I hope readers will pay attention to it in the development process.

5. Summary

This paper mainly gives an overview of stream and describes two characteristics of stream:

  1. Immutable: does not affect the original collection. Each call returns a new stream.
  2. Delayed execution: stream will not execute until a termination operation is encountered.

At the same time, the API of stream is divided into conversion operation and termination operation, and all common conversion operations are explained. The main content of the next chapter will be termination operation.

In the process of looking at the stream source code, I found an interesting thing inReferencePipelineClass (the implementation class of stream), its method order is exactly from top to bottom: stateless method → stateful method → aggregation method.

Well, after learning this article, I think you have a clear understanding of the whole stream and the API of conversion operation. After all, there are not many. Java 8 has many powerful features. Let’s talk about it next time


At the same time, the following books are also referred to in the writing process of this paper:

These three books are very good. The first one is written by the author of Java core technology. If you want to fully understand the upgrade of jdk8, you can read this one.

The second can be said to be a booklet, only more than 100 pages, very short, mainly talking about some functional ideas.

If you can only read one, I recommend the third one here. The Douban score is as high as 9.2, and the content and quality are excellent.