按概率排序 if...else if 语句的效果是什么?

2021-12-12 00:00:00 performance optimization if-statement c++ branch-prediction

具体来说，如果我有一系列 if...else if 语句，并且我以某种方式事先知道每个语句将评估为 的相对概率真的，按照概率顺序对它们进行排序在执行时间上有多大差异?例如，我应该更喜欢这个:

Specifically, if I have a series of if...else if statements, and I somehow know beforehand the relative probability that each statement will evaluate to true, how much difference in execution time does it make to sort them in order of probability? For example, should I prefer this:

if (highly_likely) //do something else if (somewhat_likely) //do something else if (unlikely) //do something

到这个?:

if (unlikely) //do something else if (somewhat_likely) //do something else if (highly_likely) //do something

很明显排序版本会更快，但是为了可读性或副作用的存在，我们可能希望对它们进行非最佳排序.在您实际运行代码之前，也很难判断 CPU 在分支预测方面的表现如何.

It seems obvious that the sorted version would be faster, however for readability or the existence of side-effects, we might want to order them non-optimally. It's also hard to tell how well the CPU will do with branch prediction until you actually run the code.

因此，在对此进行试验的过程中，我最终针对特定案例回答了我自己的问题，但我也想听听其他意见/见解.

So, in the course of experimenting with this, I ended up answering my own question for a specific case, however I'd like to hear other opinions/insights as well.

重要提示:这个问题假设 if 语句可以任意重新排序，而不会对程序的行为产生任何其他影响.在我的回答中，三个条件测试是相互排斥的，不会产生副作用.当然，如果必须按照一定的顺序对语句进行评估才能实现某些期望的行为，那么效率问题就没有实际意义了.

Important: this question assumes that the if statements can be arbitrarily reordered without having any other effects on the behavior of the program. In my answer, the three conditional tests are mutually exclusive and produce no side effects. Certainly, if the statements must be evaluated in a certain order to achieve some desired behavior, then the issue of efficiency is moot.

推荐答案

作为一般规则，大多数(如果不是全部)Intel CPU 假定前向分支在第一次看到它们时不会被采用.参见 Godbolt 的工作.

As a general rule, most if not all Intel CPUs assume forward branches are not taken the first time they see them. See Godbolt's work.

之后，分支进入分支预测缓存，过去的行为用于通知未来的分支预测.

After that, the branch goes into a branch prediction cache, and past behavior is used to inform future branch prediction.

因此，在紧密循环中，乱序的影响将相对较小.分支预测器将了解哪一组分支最有可能，如果循环中的工作量很大，那么微小的差异就不会加起来太多.

So in a tight loop, the effect of misordering is going to be relatively small. The branch predictor is going to learn which set of branches is most likely, and if you have non-trivial amount of work in the loop the small differences won't add up much.

在一般代码中，默认情况下，大多数编译器(缺少另一个原因)会大致按照您在代码中的排序方式对生成的机器代码进行排序.因此 if 语句在失败时是前向分支.

In general code, most compilers by default (lacking another reason) will order the produced machine code roughly the way you ordered it in your code. Thus if statements are forward branches when they fail.

因此，您应该按照可能性降低的顺序对分支进行排序，以便从第一次遇到"中获得最佳分支预测.

So you should order your branches in the order of decreasing likelihood to get the best branch prediction from a "first encounter".

在一组条件上紧密循环多次并完成琐碎工作的微基准测试将受到指令计数等的微小影响的支配，而在相对分支预测问题方面几乎没有.因此，在这种情况下，您必须分析，因为经验法则不可靠.

A microbenchmark that loops tightly many times over a set of conditions and does trivial work is going to dominated by tiny effects of instruction count and the like, and little in the way of relative branch prediction issues. So in this case you must profile, as rules of thumb won't be reliable.

最重要的是，矢量化和许多其他优化适用于微小的紧密循环.

On top of that, vectorization and many other optimizations apply to tiny tight loops.

所以在一般代码中，将最可能的代码放在 if 块中，这将导致最少的未缓存分支预测未命中.在紧凑的循环中，遵循一般规则开始，如果您需要了解更多信息，您别无选择，只能进行概要分析.

So in general code, put most likely code within the if block, and that will result in the fewest un-cached branch prediction misses. In tight loops, follow the general rule to start, and if you need to know more you have little choice but to profile.

当然，如果某些测试比其他测试便宜得多，这一切都会消失.

Naturally this all goes out the window if some tests are far cheaper than others.

相关文章