使编译器/优化器能够制作更快程序的编码实践
很多年前,C 编译器并不是特别聪明.作为一种解决方法,K&R 发明了 register 关键字,以提示编译器,将这个变量保存在内部寄存器中可能是个好主意.他们还制作了三级运算符以帮助生成更好的代码.
Many years ago, C compilers were not particularly smart. As a workaround K&R invented the register keyword, to hint to the compiler, that maybe it would be a good idea to keep this variable in an internal register. They also made the tertiary operator to help generate better code.
随着时间的推移,编译器逐渐成熟.他们变得非常聪明,因为他们的流分析使他们能够比您可能做的更好地决定在寄存器中保存哪些值.register 关键字变得不重要了.
As time passed, the compilers matured. They became very smart in that their flow analysis allowing them to make better decisions about what values to hold in registers than you could possibly do. The register keyword became unimportant.
由于别名,对于某些类型的操作,FORTRAN 可能比 C 更快问题.理论上,通过仔细编码,可以绕过这一限制,使优化器能够生成更快的代码.
FORTRAN can be faster than C for some sorts of operations, due to alias issues. In theory with careful coding, one can get around this restriction to enable the optimizer to generate faster code.
哪些编码实践可以让编译器/优化器生成更快的代码?
- 确定您使用的平台和编译器,我们将不胜感激.
- 为什么该技术似乎有效?
- 鼓励使用示例代码.
这是一个相关问题
这个问题与分析和优化的整体过程无关.假设程序已经正确编写、充分优化编译、测试并投入生产.您的代码中可能存在阻止优化器尽其所能完成最佳工作的构造.您可以做些什么来重构以消除这些禁令,并允许优化器生成更快的代码?
This question is not about the overall process to profile, and optimize. Assume that the program has been written correctly, compiled with full optimization, tested and put into production. There may be constructs in your code that prohibit the optimizer from doing the best job that it can. What can you do to refactor that will remove these prohibitions, and allow the optimizer to generate even faster code?
偏移相关链接
推荐答案
写入局部变量而不是输出参数!这对于解决混叠减速问题非常有帮助.例如,如果您的代码看起来像
Write to local variables and not output arguments! This can be a huge help for getting around aliasing slowdowns. For example, if your code looks like
void DoSomething(const Foo& foo1, const Foo* foo2, int numFoo, Foo& barOut)
{
for (int i=0; i<numFoo, i++)
{
barOut.munge(foo1, foo2[i]);
}
}
编译器不知道 foo1 != barOut,因此每次循环都必须重新加载 foo1.在对 barOut 的写入完成之前,它也无法读取 foo2[i].您可以开始使用受限制的指针,但这样做同样有效(并且更清晰):
the compiler doesn't know that foo1 != barOut, and thus has to reload foo1 each time through the loop. It also can't read foo2[i] until the write to barOut is finished. You could start messing around with restricted pointers, but it's just as effective (and much clearer) to do this:
void DoSomethingFaster(const Foo& foo1, const Foo* foo2, int numFoo, Foo& barOut)
{
Foo barTemp = barOut;
for (int i=0; i<numFoo, i++)
{
barTemp.munge(foo1, foo2[i]);
}
barOut = barTemp;
}
这听起来很傻,但是编译器可以更聪明地处理局部变量,因为它不可能在内存中与任何参数重叠.这可以帮助您避免可怕的加载命中存储(Francis Boivin 在此线程中提到).
It sounds silly, but the compiler can be much smarter dealing with the local variable, since it can't possibly overlap in memory with any of the arguments. This can help you avoid the dreaded load-hit-store (mentioned by Francis Boivin in this thread).
相关文章