This example-driven tutorial gives an in-depth overview about Java 8 streams. When I first read about the
Stream API, I was confused about the name since it sounds similar to
OutputStream from Java I/O. But Java 8 streams are a completely different thing. Streams are Monads, thus playing a big part in bringing functional programming to Java:
In functional programming, a monad is a structure that represents computations defined as sequences of steps. A type with a monad structure defines what it means to chain operations, or nest functions of that type together.
This guide teaches you how to work with Java 8 streams and how to use the different kind of available stream operations. You'll learn about the processing order and how the ordering of stream operations affect runtime performance. The more powerful stream operations
flatMap are covered in detail. The tutorial ends with an in-depth look at parallel streams.
If you're not yet familiar with Java 8 lambda expressions, functional interfaces and method references, you probably want to read my Java 8 Tutorial first before starting with this tutorial.
How streams work
A stream represents a sequence of elements and supports different kind of operations to perform computations upon those elements:
Stream operations are either intermediate or terminal. Intermediate operations return a stream so we can chain multiple intermediate operations without using semicolons. Terminal operations are either void or return a non-stream result. In the above example
sorted are intermediate operations whereas
forEach is a terminal operation. For a full list of all available stream operations see the Stream Javadoc. Such a chain of stream operations as seen in the example above is also known as operation pipeline.
Most stream operations accept some kind of lambda expression parameter, a functional interface specifying the exact behavior of the operation. Most of those operations must be both non-interfering and stateless. What does that mean?
A function is non-interfering when it does not modify the underlying data source of the stream, e.g. in the above example no lambda expression does modify
myList by adding or removing elements from the collection.
A function is stateless when the execution of the operation is deterministic, e.g. in the above example no lambda expression depends on any mutable variables or states from the outer scope which might change during execution.
Different kind of streams
Streams can be created from various data sources, especially collections. Lists and Sets support new methods
parallelStream() to either create a sequential or a parallel stream. Parallel streams are capable of operating on multiple threads and will be covered in a later section of this tutorial. We focus on sequential streams for now:
Calling the method
stream() on a list of objects returns a regular object stream. But we don't have to create collections in order to work with streams as we see in the next code sample:
Stream.of() to create a stream from a bunch of object references.
Besides regular object streams Java 8 ships with special kinds of streams for working with the primitive data types
double. As you might have guessed it's
IntStreams can replace the regular for-loop utilizing
All those primitive streams work just like regular object streams with the following differences: Primitive streams use specialized lambda expressions, e.g.
IntFunction instead of
IntPredicate instead of
Predicate. And primitive streams support the additional terminal aggregate operations
Sometimes it's useful to transform a regular object stream to a primitive stream or vice versa. For that purpose object streams support the special mapping operations
Primitive streams can be transformed to object streams via
Here's a combined example: the stream of doubles is first mapped to an int stream and than mapped to an object stream of strings:
Now that we've learned how to create and work with different kinds of streams, let's dive deeper into how stream operations are processed under the hood.
An important characteristic of intermediate operations is laziness. Look at this sample where a terminal operation is missing:
When executing this code snippet, nothing is printed to the console. That is because intermediate operations will only be executed when a terminal operation is present.
Let's extend the above example by the terminal operation
Executing this code snippet results in the desired output on the console:
The order of the result might be surprising. A naive approach would be to execute the operations horizontally one after another on all elements of the stream. But instead each element moves along the chain vertically. The first string "d2" passes
forEach, only then the second string "a2" is processed.
This behavior can reduce the actual number of operations performed on each element, as we see in the next example:
true as soon as the predicate applies to the given input element. This is true for the second element passed "A2". Due to the vertical execution of the stream chain,
map has only to be executed twice in this case. So instead of mapping all elements of the stream,
map will be called as few as possible.
Why order matters
The next example consists of two intermediate operations
filter and the terminal operation
forEach. Let's once again inspect how those operations are being executed:
As you might have guessed both
filter are called five times for every string in the underlying collection whereas
forEach is only called once.
We can greatly reduce the actual number of executions if we change the order of the operations, moving
filter to the beginning of the chain:
map is only called once so the operation pipeline performs much faster for larger numbers of input elements. Keep that in mind when composing complex method chains.
Let's extend the above example by an additional operation,
Sorting is a special kind of intermediate operation. It's a so called stateful operation since in order to sort a collection of elements you have to maintain state during ordering.
Executing this example results in the following console output:
First, the sort operation is executed on the entire input collection. In other words
sorted is executed horizontally. So in this case
sorted is called eight times for multiple combinations on every element in the input collection.
Once again we can optimize the performance by reordering the chain:
In this example
sorted is never been called because
filter reduces the input collection to just one element. So the performance is greatly increased for larger input collections.
Java 8 streams cannot be reused. As soon as you call any terminal operation the stream is closed:
anyMatch on the same stream results in the following exception:
To overcome this limitation we have to to create a new stream chain for every terminal operation we want to execute, e.g. we could create a stream supplier to construct a new stream with all intermediate operations already set up:
Each call to
get() constructs a new stream on which we are save to call the desired terminal operation.
Streams support plenty of different operations. We've already learned about the most important operations like
map. I leave it up to you to discover all other available operations (see Stream Javadoc). Instead let's dive deeper into the more complex operations
Most code samples from this section use the following list of persons for demonstration purposes:
Collect is an extremely useful terminal operation to transform the elements of the stream into a different kind of result, e.g. a
Map. Collect accepts a
Collector which consists of four different operations: a supplier, an accumulator, a combiner and a finisher. This sounds super complicated at first, but the good part is Java 8 supports various built-in collectors via the
Collectors class. So for the most common operations you don't have to implement a collector yourself.
Let's start with a very common usecase:
As you can see it's very simple to construct a list from the elements of a stream. Need a set instead of list - just use
The next example groups all persons by age:
Collectors are extremely versatile. You can also create aggregations on the elements of the stream, e.g. determining the average age of all persons:
If you're interested in more comprehensive statistics, the summarizing collectors return a special built-in summary statistics object. So we can simply determine min, max and arithmetic average age of the persons as well as the sum and count.
The next example joins all persons into a single string:
The join collector accepts a delimiter as well as an optional prefix and suffix.
In order to transform the stream elements into a map, we have to specify how both the keys and the values should be mapped. Keep in mind that the mapped keys must be unique, otherwise an
IllegalStateException is thrown. You can optionally pass a merge function as an additional parameter to bypass the exception:
Now that we know some of the most powerful built-in collectors, let's try to build our own special collector. We want to transform all persons of the stream into a single string consisting of all names in upper letters separated by the
| pipe character. In order to achieve this we create a new collector via
Collector.of(). We have to pass the four ingredients of a collector: a supplier, an accumulator, a combiner and a finisher.
Since strings in Java are immutable, we need a helper class like
StringJoiner to let the collector construct our string. The supplier initially constructs such a StringJoiner with the appropriate delimiter. The accumulator is used to add each persons upper-cased name to the StringJoiner. The combiner knows how to merge two StringJoiners into one. In the last step the finisher constructs the desired String from the StringJoiner.
We've already learned how to transform the objects of a stream into another type of objects by utilizing the
map operation. Map is kinda limited because every object can only be mapped to exactly one other object. But what if we want to transform one object into multiple others or none at all? This is where
flatMap comes to the rescue.
FlatMap transforms each element of the stream into a stream of other objects. So each object will be transformed into zero, one or multiple other objects backed by streams. The contents of those streams will then be placed into the returned stream of the
Before we see
flatMap in action we need an appropriate type hierarchy:
Next, we utilize our knowledge about streams to instantiate a couple of objects:
Now we have a list of three foos each consisting of three bars.
FlatMap accepts a function which has to return a stream of objects. So in order to resolve the bar objects of each foo, we just pass the appropriate function:
As you can see, we've successfully transformed the stream of three foo objects into a stream of nine bar objects.
Finally, the above code example can be simplified into a single pipeline of stream operations:
FlatMap is also available for the
Optional class introduced in Java 8. Optionals
flatMap operation returns an optional object of another type. So it can be utilized to prevent nasty
Think of a highly hierarchical structure like this:
In order to resolve the inner string
foo of an outer instance you have to add multiple null checks to prevent possible NullPointerExceptions:
The same behavior can be obtained by utilizing optionals
Each call to
flatMap returns an
Optional wrapping the desired object if present or
null if absent.
The reduction operation combines all elements of the stream into a single result. Java 8 supports three different kind of
reduce methods. The first one reduces a stream of elements to exactly one element of the stream. Let's see how we can use this method to determine the oldest person:
reduce method accepts a
BinaryOperator accumulator function. That's actually a
BiFunction where both operands share the same type, in that case
Person. BiFunctions are like
Function but accept two arguments. The example function compares both persons ages in order to return the person with the maximum age.
reduce method accepts both an identity value and a
BinaryOperator accumulator. This method can be utilized to construct a new Person with the aggregated names and ages from all other persons in the stream:
reduce method accepts three parameters: an identity value, a
BiFunction accumulator and a combiner function of type
BinaryOperator. Since the identity values type is not restricted to the
Person type, we can utilize this reduction to determine the sum of ages from all persons:
As you can see the result is 76, but what's happening exactly under the hood? Let's extend the above code by some debug output:
As you can see the accumulator function does all the work. It first get called with the initial identity value 0 and the first person Max. In the next three steps
sum continually increases by the age of the last steps person up to a total age of 76.
Wait wat? The combiner never gets called? Executing the same stream in parallel will lift the secret:
Executing this stream in parallel results in an entirely different execution behavior. Now the combiner is actually called. Since the accumulator is called in parallel, the combiner is needed to sum up the separate accumulated values.
Let's dive deeper into parallel streams in the next chapter.
Streams can be executed in parallel to increase runtime performance on large amount of input elements. Parallel streams use a common
ForkJoinPool available via the static
ForkJoinPool.commonPool() method. The size of the underlying thread-pool uses up to five threads - depending on the amount of available physical CPU cores:
On my machine the common pool is initialized with a parallelism of 3 per default. This value can be decreased or increased by setting the following JVM parameter:
Collections support the method
parallelStream() to create a parallel stream of elements. Alternatively you can call the intermediate method
parallel() on a given stream to convert a sequential stream to a parallel counterpart.
In order to understate the parallel execution behavior of a parallel stream the next example prints information about the current thread to
By investigating the debug output we should get a better understanding which threads are actually used to execute the stream operations:
As you can see the parallel stream utilizes all available threads from the common
ForkJoinPool for executing the stream operations. The output may differ in consecutive runs because the behavior which particular thread is actually used is non-deterministic.
Let's extend the example by an additional stream operation,
The result may look strange at first:
It seems that
sort is executed sequentially on the main thread only. Actually,
sort on a parallel stream uses the new Java 8 method
Arrays.parallelSort() under the hood. As stated in Javadoc this method decides on the length of the array if sorting will be performed sequentially or in parallel:
If the length of the specified array is less than the minimum granularity, then it is sorted using the appropriate Arrays.sort method.
Coming back to the
reduce example from the last section. We already found out that the combiner function is only called in parallel but not in sequential streams. Let's see which threads are actually involved:
The console output reveals that both the accumulator and the combiner functions are executed in parallel on all available threads:
In summary, it can be stated that parallel streams can bring be a nice performance boost to streams with a large amount of input elements. But keep in mind that some parallel stream operations like
collect need additional computations (combine operations) which isn't needed when executed sequentially.
Furthermore we've learned that all parallel stream operations share the same JVM-wide common
ForkJoinPool. So you probably want to avoid implementing slow blocking stream operations since that could potentially slow down other parts of your application which rely heavily on parallel streams.
My programming guide to Java 8 streams ends here. If you're interested in learning more about Java 8 streams, I recommend to you the Stream Javadoc package documentation. If you want to learn more about the underlying mechanisms, you probably want to read Martin Fowlers article about Collection Pipelines.
Hopefully this tutorial was helpful to you and you've enjoyed reading it. The full source code of the tutorial samples is hosted on GitHub. Feel free to fork the repository or send me your feedback via Twitter.