flatMap 保证是懒惰的吗?
考虑以下代码:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
当第一个 url 足够时,会为第二个 url 调用 fetchDataFromInternet
吗?
Will fetchDataFromInternet
be called for second url when the first one was enough?
我尝试了一个较小的示例,它看起来像预期的那样工作.即一个一个地处理数据,但可以依赖这种行为吗?如果没有,在 .flatMap(...)
之前调用 .sequential()
有帮助吗?
I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential()
before .flatMap(...)
help?
Stream.of("one", "two", "three")
.flatMap(num -> {
System.out.println("Processing " + num);
// return FetchFromInternetForNum(num).data().stream();
return Stream.of(num);
})
.peek(num -> System.out.println("Peek before filter: "+ num))
.filter(num -> num.length() > 0)
.peek(num -> System.out.println("Peek after filter: "+ num))
.forEach(num -> {
System.out.println("Done " + num);
});
输出:
Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three
更新:如果对实施很重要,请使用官方的 Oracle JDK8
Update: Using official Oracle JDK8 if that matters on implementation
回答:根据下面的评论和答案,flatmap 是部分懒惰的.即完全读取第一个流,并且仅在需要时才进行下一个.读取流是急切的,但读取多个流是懒惰的.
Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.
如果这种行为是有意的,API 应该让函数返回一个 Iterable
而不是一个流.
If this behavior is intended, the API should let the function return an Iterable
instead of a stream.
换句话说:链接
推荐答案
当前实现下,flatmap
是eager;像任何其他有状态的中间操作(如 sorted
和 distinct
).而且很容易证明:
Under the current implementation, flatmap
is eager; like any other stateful intermediate operation (like sorted
and distinct
). And it's very easy to prove :
int result = Stream.of(1)
.flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
.findFirst()
.get();
System.out.println(result);
这永远不会结束,因为 flatMap
被急切地计算.以您为例:
This never finishes as flatMap
is computed eagerly. For your example:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
这意味着对于每个 url
,flatMap
将阻止其后的所有其他操作,即使您关心单个操作.因此,假设您的 fetchDataFromInternet(url)
从单个 url
生成 10_000
行,那么您的 findFirst
将具有等待 all 10_000 被计算出来,即使你只关心一个.
It means that for each url
, the flatMap
will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url
your fetchDataFromInternet(url)
generates 10_000
lines, well your findFirst
will have to wait for all 10_000 to be computed, even if you care about only one.
编辑
这在 Java 10 中得到了修复,在那里我们恢复了懒惰:参见 JDK-8075939
This is fixed in Java 10, where we get our laziness back: see JDK-8075939
编辑 2
这在 Java 8 中也已修复 (8u222):JDK-8225328
This is fixed in Java 8 too (8u222): JDK-8225328
相关文章