什么时候应该优先使用流而不是传统循环以获得最佳性能?流是否利用了分支预测?

我刚刚阅读了有关 Branch-Prediction 的文章,并想尝试一下它如何与 Java 8 Streams 一起工作.

I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams.

但是,Streams 的性能总是比传统循环差.

However the performance with Streams is always turning out to be worse than traditional loops.

int totalSize = 32768;
int filterValue = 1280;
int[] array = new int[totalSize];
Random rnd = new Random(0);
int loopCount = 10000;

for (int i = 0; i < totalSize; i++) {
    // array[i] = rnd.nextInt() % 2560; // Unsorted Data
    array[i] = i; // Sorted Data
}

long start = System.nanoTime();
long sum = 0;
for (int j = 0; j < loopCount; j++) {
    for (int c = 0; c < totalSize; ++c) {
        sum += array[c] >= filterValue ? array[c] : 0;
    }
}
long total = System.nanoTime() - start;
System.out.printf("Conditional Operator Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));

start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
    for (int c = 0; c < totalSize; ++c) {
        if (array[c] >= filterValue) {
            sum += array[c];
        }
    }
}
total = System.nanoTime() - start;
System.out.printf("Branch Statement Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));

start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
    sum += Arrays.stream(array).filter(value -> value >= filterValue).sum();
}
total = System.nanoTime() - start;
System.out.printf("Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));

start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
    sum += Arrays.stream(array).parallel().filter(value -> value >= filterValue).sum();
}
total = System.nanoTime() - start;
System.out.printf("Parallel Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));

输出:

  1. 对于排序数组:

  1. For Sorted-Array :

Conditional Operator Time : 294062652 ns, (0.294063 sec) 
Branch Statement Time : 272992442 ns, (0.272992 sec) 
Streams Time : 806579913 ns, (0.806580 sec) 
Parallel Streams Time : 2316150852 ns, (2.316151 sec) 

  • 对于未排序的数组:

  • For Un-Sorted Array:

    Conditional Operator Time : 367304250 ns, (0.367304 sec) 
    Branch Statement Time : 906073542 ns, (0.906074 sec) 
    Streams Time : 1268648265 ns, (1.268648 sec) 
    Parallel Streams Time : 2420482313 ns, (2.420482 sec) 
    

  • 我使用 List 尝试了相同的代码:
    list.stream() 而不是 Arrays.stream(array)
    list.get(c) 而不是 array[c]

    I tried the same code using List:
    list.stream() instead of Arrays.stream(array)
    list.get(c) instead of array[c]

    输出:

    1. 对于排序列表:

    1. For Sorted-List :

    Conditional Operator Time : 860514446 ns, (0.860514 sec) 
    Branch Statement Time : 663458668 ns, (0.663459 sec) 
    Streams Time : 2085657481 ns, (2.085657 sec) 
    Parallel Streams Time : 5026680680 ns, (5.026681 sec) 
    

  • 对于未排序的列表

  • For Un-Sorted List

    Conditional Operator Time : 704120976 ns, (0.704121 sec) 
    Branch Statement Time : 1327838248 ns, (1.327838 sec) 
    Streams Time : 1857880764 ns, (1.857881 sec) 
    Parallel Streams Time : 2504468688 ns, (2.504469 sec) 
    

  • 我参考了一些博客 这个 &this 建议流中存在相同的性能问题.

    I referred to few blogs this & this which suggest the same performance issue w.r.t streams.

    1. 我同意使用流编程在某些情况下既好又容易的观点,但是当我们失去性能时,为什么我们需要使用它们?有什么我错过的吗?
    2. 在哪种情况下流执行等于循环?是否仅在您定义的函数花费大量时间的情况下,导致循环性能可以忽略不计?
    3. 在任何场景中,我都看不到利用 branch-prediction 的流(我尝试使用排序和无序流,但没有用.与正常相比,它对性能的影响增加了一倍以上流)?
    1. I agree to the point that programming with streams is nice and easier for some scenarios but when we're losing out on performance, why do we need to use them? Is there something I'm missing out on?
    2. Which is the scenario in which streams perform equal to loops? Is it only in the case where your function defined takes a lot of time, resulting in a negligible loop performance?
    3. In none of the scenario's I could see streams taking advantage of branch-prediction (I tried with sorted and unordered streams, but of no use. It gave more than double the performance impact compared to normal streams)?

    推荐答案

    我同意在某些情况下使用流编程既好又容易,但是当我们失去性能时,为什么我们需要使用它们?

    I agree to the point that programming with streams is nice and easier for some scenarios but when we're losing out on performance, why do we need to use them?

    性能很少成为问题.通常需要将 10% 的流重写为循环才能获得所需的性能.

    Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance you need.

    我有什么遗漏的吗?

    使用 parallelStream() 更容易使用流,并且可能更高效,因为很难编写高效的并发代码.

    Using parallelStream() is much easier using streams and possibly more efficient as it's hard to write efficient concurrent code.

    在哪种情况下流执行等于循环?是否仅在您定义的函数花费大量时间的情况下,导致循环性能可以忽略不计?

    Which is the scenario in which streams perform equal to loops? Is it only in the case where your function defined takes a lot of time, resulting in a negligible loop performance?

    您的基准测试存在缺陷,因为代码在启动时尚未编译.我会像 JMH 那样循环进行整个测试,或者我会使用 JMH.

    Your benchmark is flawed in the sense that the code hasn't been compiled when it starts. I would do the whole test in a loop as JMH does, or I would use JMH.

    在任何场景中,我都看不到流利用了分支预测

    In none of the scenario's I could see streams taking advantage of branch-prediction

    分支预测是一种 CPU 功能,而不是 JVM 或流功能.

    Branch prediction is a CPU feature not a JVM or streams feature.

    相关文章