我们还应该“在小范围内"进行优化吗?

2021-12-20 00:00:00 optimization c c++

我正在改变我的 for 循环以使用 ++i 而不是 i++ 递增，然后开始思考，这真的有必要了吗?当然，今天的编译器会自己进行这种优化.

I was changing my for loop to increment using ++i instead of i++ and got to thinking, is this really necessary anymore? Surely today's compilers do this optimization on their own.

在本文中，http://leto.net/docs/C-optimization.php，从 1997 年开始，Michael Lee 进行了其他优化，例如内联、循环展开、循环干扰、循环反转、强度降低等.这些仍然相关吗?

In this article, http://leto.net/docs/C-optimization.php, from 1997 Michael Lee goes into other optimizations such as inlining, loop unrolling, loop jamming, loop inversion, strength reduction, and many others. Are these still relevant?

我们应该进行哪些底层代码优化，哪些优化可以安全地忽略?

这与过早优化无关.优化的决定已经做出.现在的问题是最有效的方法是什么.

This has nothing to do with premature optimization. The decision to optimize has already been made. Now the question is what is the most effective way to do it.

轶事:我曾经看过一个需求规范，上面写着:程序员应该左移 1 而不是乘以 2".

anecdote: I once reviewed a requirements spec that stated: "The programmer shall left shift by one instead of multiplying by 2".

推荐答案

如果优化没有成本，就去做.写代码时，++i和i++一样容易写，所以更喜欢前者.没有任何费用.

If there is no cost to the optimization, do it. When writing the code, ++i is just as easy to write as i++, so prefer the former. There is no cost to it.

另一方面，返回并之后进行此更改需要时间，而且很可能不会产生显着差异，因此您可能不应该为此烦恼.

On the other hand, going back and making this change afterwards takes time, and it most likely won't make a noticeable difference, so you probably shouldn't bother with it.

但是，是的，它可以有所作为.对于内置类型，可能不是，但对于复杂的类，编译器不太可能将其优化掉.这样做的原因是增量操作 no 不再是编译器内置的内在操作，而是类中定义的函数.编译器可能能够像任何其他函数一样优化它，但通常它不能假设可以使用前增量而不是后增量.这两个函数可能做完全不同的事情.

But yes, it can make a difference. On built-in types, probably not, but for complex classes, the compiler is unlikely to be able to optimize it away. The reason for this is that the increment operation no is no longer an intrinsic operation, built into the compiler, but a function defined in the class. The compiler may be able to optimize it like any other function, but it can not, in general, assume that pre-increment can be used instead of post-increment. The two functions may do entirely different things.

因此在确定编译器可以进行哪些优化时，请考虑它是否有足够的信息来执行.在这种情况下，编译器不知道 post-increment 和 pre-increment 对对象执行相同的修改，因此它不能假设一个可以替换为另一个.但是你有这些知识，所以你可以安全地执行优化.

So when determining which optimizations can be done by the compiler, consider whether it has enough information to perform it. In this case, the compiler doesn't know that post-increment and pre-increment perform the same modifications to the object, so it can not assume that one can be replaced with the other. But you have this knowledge, so you can safely perform the optimization.

您提到的许多其他内容通常可以由编译器非常有效地完成:内联可以由编译器完成，而且它通常比你更擅长.它只需要知道函数中有多少比例是由函数调用构成的，调用的频率是多少?一个经常被调用的大函数可能不应该被内联，因为你最终会复制大量代码，导致更大的可执行文件和更多的指令缓存未命中.内联总是一种权衡，而且编译器通常比您更擅长权衡所有因素.

Many of the others you mention can usually be done very efficiently by the compiler: Inlining can be done by the compiler, and it's usually better at it than you. All it needs to know is how large a proportion of the function consists of function call over head, and how often is it called? A big function that is called often probably shouldn't be inlined, because you end up copying a lot of code, resulting in a larger executable, and more instruction cache misses. Inlining is always a tradeoff, and often, the compiler is better at weighing all the factors than you.

循环展开是一个纯粹的机械操作，编译器可以很容易地做到这一点.强度降低也是如此.交换内外循环比较棘手，因为编译器必须证明改变的遍历顺序不会影响结果，这很难自动完成.所以这里有一个你应该自己做的优化.

Loop unrolling is a purely mechanic operation, and the compiler can do that easily. Same goes for strength reduction. Swapping inner and outer loops is trickier, because the compiler has to prove that the changed order of traversal won't affect the result, which is difficult to do automatically. So here is an optimization you should do yourself.

但即使在编译器能够执行的简单操作中，您有时也会获得编译器没有的信息.如果您知道一个函数将被极其频繁地调用，即使它只从一个地方调用，可能值得检查编译器是否自动内联它，如果没有，则手动执行.

But even in the simple ones that the compiler is able to do, you sometimes have information your compiler doesn't. If you know that a function is going to be called extremely often, even if it's only called from one place, it may be worth checking whether the compiler automatically inlines it, and do it manually if not.

有时您可能比编译器更了解循环(例如，迭代次数始终是 4 的倍数，因此您可以安全地展开 4 次).编译器可能没有此信息，因此如果要内联循环，则必须插入一个结语以确保最后几次迭代正确执行.

Sometimes you may know more about a loop than the compiler as well (for example, that the number of iterations will always be a multiple of 4, so you can safely unroll it 4 times). The compiler may not have this information, so if it were to inline the loop, it would have to insert an epilog to ensure that the last few iterations get performed correctly.

因此，如果 1) 您确实需要性能，并且 2) 您拥有编译器没有的信息，那么这种小规模"优化仍然是必要的.

So such "small-scale" optimizations can still be necessary, if 1) you actually need the performance, and 2) you have information that the compiler doesn't.

在纯粹的机械优化方面，您无法超越编译器.但是您可以做出编译器不能做的假设，那就是当您能够比编译器进行更好的优化时.

You can't outperform the compiler on purely mechanical optimizations. But you may be able to make assumptions that the compiler can't, and that is when you're able to optimize better than the compiler.

相关文章