Stream.toList() 会比 Collectors.toList() 表现更好吗

2022-01-22 00:00:00 performance java java-stream java-16 jmh

JDK 正在引入 API Stream.toList() 与 JDK-8180352.这是我尝试将其性能与现有 Collectors.toList 进行比较的基准代码:

JDK is introducing an API Stream.toList() with JDK-8180352. Here is a benchmarking code that I have attempted to compare its performance with the existing Collectors.toList:

@BenchmarkMode(Mode.All)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 20, time = 1, batchSize = 10000)
@Measurement(iterations = 20, time = 1, batchSize = 10000)
public class CollectorsVsStreamToList {

    @Benchmark
    public List<Integer> viaCollectors() {
        return IntStream.range(1, 1000).boxed().collect(Collectors.toList());
    }

    @Benchmark
    public List<Integer> viaStream() {
        return IntStream.range(1, 1000).boxed().toList();
    }
}

结果汇总如下:

Benchmark                                                       Mode  Cnt   Score    Error  Units
CollectorsVsStreamToList.viaCollectors                         thrpt   20  17.321 ±  0.583  ops/s
CollectorsVsStreamToList.viaStream                             thrpt   20  23.879 ±  1.682  ops/s
CollectorsVsStreamToList.viaCollectors                          avgt   20   0.057 ±  0.002   s/op
CollectorsVsStreamToList.viaStream                              avgt   20   0.040 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors                        sample  380   0.054 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.00    sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.50    sample        0.054            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.90    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.95    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.99    sample        0.062            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.999   sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.9999  sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p1.00    sample        0.068            s/op
CollectorsVsStreamToList.viaStream                            sample  525   0.039 ±  0.001   s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.00            sample        0.037            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.50            sample        0.038            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.90            sample        0.040            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.95            sample        0.042            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.99            sample        0.050            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.999           sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.9999          sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p1.00            sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors                            ss   20   0.060 ±  0.007   s/op
CollectorsVsStreamToList.viaStream                                ss   20   0.043 ±  0.006   s/op

当然,领域专家的第一个问题是基准测试程序是否正确?测试类在 MacOS 上执行.如果需要任何进一步的详细信息,请告诉我.

Of course, the first question to the domain experts would be if the benchmarking procedure is correct or not? The test class was executed on MacOS. Do let me know for any further details required.

跟进,据我所知,Stream.toList 的平均时间、吞吐量和采样时间看起来比 Collectors.toList.这种理解正确吗?

Follow-up, as far as I could infer from the readings the average time, throughput, and sampling time of the Stream.toList looks better than the Collectors.toList. Is that understanding correct?

推荐答案

Stream::toList 构建在 toArray 之上,而不是 collect.toArray 中有许多优化,使其可能比收集更快,尽管这在很大程度上取决于细节.如果流管道(从源到最终中间操作)是 SIZED,则可以预先确定目标数组的大小(而不是像 toList 收集器必须做的那样重新分配.)如果管道进一步SUBSIZED,那么并行执行不仅可以预先确定结果数组的大小,还可以计算每个分片的精确偏移量,因此每个子任务都可以将其结果放在正确的位置,从而无需将中间结果复制到最终结果中.

Stream::toList is built atop toArray, not collect. There are a number of optimizations in toArray that make it potentially faster than collecting, though this depends heavily on the details. If the stream pipeline (from source through final intermediate operation) is SIZED, the target array can be presized (rather than potentially reallocating as the toList collector must do.) If the pipeline is further SUBSIZED, then parallel executions can not only presize the result array, but can compute exact per-shard offsets so each sub-task can drop its results in exactly the right place, eliminating the need for copying intermediate results into the final result.

因此,根据细节,toList 可能比 collect 快得多.

So, depending on the details, toList may well be considerably faster than collect.

相关文章