我什么时候可以自信地用 -O3 编译程序?
我看到很多人抱怨 -O3
选项:
- GCC:程序不适用于编译选项
-O3
- David Hammen 提供的浮点问题
我查看 GCC 的手册:
<块引用> -O3 优化更多.-O3 打开所有优化由 -O2 指定并打开-finline-functions 和 -frename-registers 选项.
而且我还确认了代码以确保两个选项是 -O3
上唯一包含的两个优化:
if (优化 >= 3){flag_inline_functions = 1;flag_rename_registers = 1;}
对于这两个优化:
-finline-functions
在某些情况下(主要是 C++)很有用,因为它允许我们使用 -finline-limit 定义内联函数的大小(默认为 600).设置高内联限制时,编译器可能会报告错误,抱怨内存不足.-frename-registers
尝试通过使用寄存器分配后剩余的寄存器来避免调度代码中的错误依赖.这种优化最有利于拥有大量寄存器的处理器.
对于inline-functions,虽然可以减少函数调用次数,但是可能会导致二进制文件过大,所以-finline-functions
可能会引入严重的缓存惩罚,甚至比-O2.我认为缓存惩罚不仅取决于程序本身.
对于重命名寄存器,我认为它不会对像 x86 这样的 cisc 架构产生任何积极影响.
我的问题有 2.5 个部分:
我是否正确地声称使用 -O3 选项可以使程序运行得更快取决于底层平台/架构?[已回答]
第一部分已确认为真.David Hammen 还声称,我们应该非常小心优化和浮点运算如何在具有扩展精度浮点寄存器(如 Intel 和 AMD)的机器上进行交互.
我什么时候可以自信地使用
-O3
选项?我想这两个优化尤其是重命名寄存器可能会导致与 -O0/O2 不同的行为.我看到一些用-O3
编译的程序在执行过程中崩溃了,这是确定性的吗?如果我运行一次可执行文件没有任何崩溃,是否意味着使用-O3
是安全的?确定性与优化无关,它是一个多线程问题.但是,对于多线程程序,当我们运行一次可执行文件而没有错误时,使用
-O3
是不安全的.David Hammen 表明,-O3
对浮点运算的优化可能会违反用于比较的严格弱排序标准.当我们想使用-O3
选项时,还有其他需要注意的问题吗?如果第一个问题的答案是是",那么当我改变目标平台或在不同机器的分布式系统中时,我可能需要在
-O3
之间进行更改和-O2
.是否有任何通用方法可以决定我是否可以使用-O3
获得性能改进?例如,更多的寄存器,短内联函数等.[已回答]第三部分已由 Louen 回答为平台的多样性使对这个问题的一般推理变得不可能";在评估
-O3
的性能提升时,我们必须同时尝试这两种方法,并对我们的代码进行基准测试,看看哪个更快.
- 我看到一些程序在使用 -O3 编译时崩溃,这是确定性的吗?
如果程序是单线程的,程序使用的所有算法都是确定性的,如果从运行到运行的输入是相同的,是的.如果其中任何一个条件不成立,答案是不一定".
同样适用于不使用 -O3 编译的情况.
<块引用>如果我运行一次可执行文件没有任何崩溃,是否意味着使用 -O3 是安全的?
当然不是.同样,如果您不使用 -O3 进行编译,则同样适用.仅仅因为您的应用程序运行一次并不意味着它在所有情况下都能成功运行.这就是测试成为难题的部分原因.
在浮点寄存器比双精度数更高的机器上,浮点运算会导致奇怪的行为.例如,
void add (double a, double b, double & result) {双温度 = a + b;结果 = 温度;如果(结果!= 温度){抛出 FunkyAdditionError (temp);}}
编译使用此 add
函数的程序未经优化,您可能永远不会看到任何 FunkyAdditionError
异常.编译优化,某些输入会突然开始导致这些异常.问题在于,通过优化,编译器会将 temp
作为寄存器,而作为引用的 result
不会被编译到寄存器中.添加一个 inline
限定符,当您的编译器使用 -O3
编译时,这些异常可能会消失,因为现在 result
也可以是一个寄存器.关于浮点运算的优化可能是一个棘手的主题.
最后,让我们看看当使用 -O3 编译程序时,事情确实在晚上发生颠簸的情况之一,GCC:程序不适用于编译选项 -O3.问题只发生在 -O3 上,因为编译器可能内联了 distance
函数,但将结果中的一个(但不是两个)保存在扩展精度浮点寄存器中.通过这种优化,某些点 p1
和 p2
可以导致 p1
p2
真
.这违反了比较函数的严格弱排序标准.
您需要非常小心优化和浮点运算如何在具有扩展精度浮点寄存器的机器(例如,Intel 和 AMD)上进行交互.
I've seen a lot of people complaining about -O3
option:
- GCC: program doesn't work with compilation option
-O3
- Floating Point Problem provided by David Hammen
I check the manual from the GCC:
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.
And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3
on:
if (optimize >= 3){
flag_inline_functions = 1;
flag_rename_registers = 1;
}
For those two optimizations:
-finline-functions
is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit.-frename-registers
attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers.
For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions
may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself.
For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.
My question has 2.5 parts:
Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture? [Answered]
EDIT:
The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.
When can I confidently use
-O3
option? I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with-O3
got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use-O3
?EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use
-O3
when we run an executable once without errors. David Hammen shows that-O3
optimization on floating point operations may violate the strict weak ordering criterion for a comparison. Is there any other concern we need to take care when we want to use-O3
option?If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between
-O3
and-O2
. Is there any general ways to decide whether I can get a performance improvement with-O3
? For example, more registers, short inline functions, etc. [Answered]EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by
-O3
, we have to try it with both and benchmark our code to see which is faster.
解决方案
- I saw some programs got crashed when compiling with -O3, is it deterministic?
If the program is single threaded, all algorithms used by program are deterministic, and if the inputs from run to run are identical, yes. The answer is "not necessarily" if any of those conditions is not true.
The same applies if you compile without using -O3.
If I run an executable once without any crash, does it mean it is safe to use -O3?
Of course not. Once again, the same applies if you compile without using -O3. Just because your application runs once does not mean it will run successfully in all cases. That's part of what makes testing a hard problem.
Floating point operations can result in weird behaviors on machines in which the floating point registers have greater precision than do doubles. For example,
void add (double a, double b, double & result) {
double temp = a + b;
result = temp;
if (result != temp) {
throw FunkyAdditionError (temp);
}
}
Compile a program that uses this add
function unoptimized and you probably will never see any FunkyAdditionError
exceptions. Compile optimized and certain inputs will suddenly start resulting in these exceptions. The problem is that with optimization, the compiler will make temp
a register while result
, being a reference, won't be compiled away into a register. Add an inline
qualifier and those exceptions may disappear when your compiler is compiled with -O3
because now result
can also be a register. Optimization with regard to floating point operations can be a tricky subject.
Finally, let's look at one of those cases where things did go bump in the night when a program was compiled with -O3, GCC: program doesn't work with compilation option -O3. The problem only occurred with -O3 because the compiler probably inlined the distance
function but kept one (but not both) of the results in an extended precision floating point register. With this optimization, certain points p1
and p2
can result in both p1<p2
and p2<p1
evaluating to true
. This violates the strict weak ordering criterion for a comparison function.
You need to be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers (e.g., Intel and AMD).
相关文章