Java 8 Streams 中副作用的危险是什么?

2022-01-22 00:00:00 java java-stream

我正在尝试理解我在 Streams 文档中发现的警告.我已经养成了使用 forEach() 作为通用迭代器的习惯.这导致我编写这种类型的代码:

I'm trying to understand warnings I found in the Documentation on Streams. I've gotten in the habit of using forEach() as a general purpose iterator. And that's lead me to writing this type of code:

public class FooCache {
    private static Map<Integer, Integer> sortOrderCache = new ConcurrentHashMap<>();
    private static Map<Integer, String> codeNameCache = new ConcurrentHashMap<>();

    public static void populateCache() {
        List<Foo> myThings = getThings();

        myThings.forEach(thing -> {
            sortOrderCache.put(thing.getId(), thing.getSortOrder());
            codeNameCache.put(thing.getId(), thing.getCodeName())

这是一个简单的例子.我了解此代码违反了 Oracle 对有状态 lamdas 和副作用的警告.但我不明白为什么会出现这个警告.

This is a trivialized example. I understand that this code violates Oracle's warning against stateful lamdas and side-effects. But I don't understand why this warning exists.


When running this code it appears to behave as expected. So how do I break this to demonstrate why it's a bad idea?


In sort, I read this:

如果并行执行,ArrayList 的非线程安全性将导致不正确的结果,并且添加所需的同步会导致争用,破坏了并行性的好处.

If executed in parallel, the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism.


But can anyone add clarity to help me understand the warning?


来自 Javadoc:

From the Javadoc:


Note also that attempting to access mutable state from behavioral parameters presents you with a bad choice with respect to safety and performance; if you do not synchronize access to that state, you have a data race and therefore your code is broken, but if you do synchronize access to that state, you risk having contention undermine the parallelism you are seeking to benefit from. The best approach is to avoid stateful behavioral parameters to stream operations entirely; there is usually a way to restructure the stream pipeline to avoid statefulness.


The problem here is that if you access a mutable state, you loose on two side:

  • 安全,因为您需要 Stream 试图最小化的同步
  • 性能,因为所需的同步成本是您的(在您的示例中,如果您使用 ConcurrentHashMap,这是有成本的).
  • Safety, because you need synchronization which the Stream tries to minimize
  • Performance, because the required synchronization cost you (in your example, if you use a ConcurrentHashMap, this has a cost).


Now, in your example, there are several points here:

  • 如果你想使用Stream和多线程流,你需要像myThings.parralelStream()那样使用parralelStream();就目前而言,java.util.Collection 提供的 forEach 方法很简单 for each.
  • 您将 HashMap 用作 static 成员并对其进行变异.HashMap 不是线程安全的;您需要使用 ConcurrentHashMap.
  • If you want to use Stream and multi threading stream, you need to use parralelStream() as in myThings.parralelStream(); as it stands, the forEach method provided by java.util.Collection is simple for each.
  • You use HashMap as a static member and you mutate it. HashMap is not threadsafe; you need to use a ConcurrentHashMap.

在 lambda 和 Stream 的情况下,您不得改变流的来源:

In the lambda, and in the case of a Stream, you must not mutate the source of your stream: -> myThings.remove(thing));

这可能行得通(但我怀疑它会抛出 ConcurrentModificationException),但这可能行不通:

This may work (but I suspect it will throw a ConcurrentModificationException) but this will likely not work:

myThings.parallelStream().forEach(thing -> myThings.remove(thing));

那是因为 ArrayList 不是线程安全的.

That's because the ArrayList is not thread safe.


If you use a synchronized view (Collections.synchronizedList), then you would have a performance it because you synchronize on each access.


In your example, you would rather use:

sortOrderCache =
                           Thing::getId, Thing::getSortOrder);
                         Thing::getId, Thing::getCodeName);

finisher(这里是 groupingBy)完成你正在做的工作,并且可能会被顺序调用(我的意思是,Stream 可能会被拆分到多个线程中,finisher 可能会被调用多次(在不同的线程中),然后它可能需要合并.

The finisher (here the groupingBy) does the work you were doing and might be called sequentially (I mean, the Stream may be split across several thread, the the finisher may be invoked several times (in different thread) and then it might need to merge.

顺便说一句,您最终可能会删除 codeNameCache/sortOrderCache 并简单地存储 id->Thing 映射.

By the way, you might eventually drop the codeNameCache/sortOrderCache and simply store the id->Thing mapping.
