g++ 中的优化级别 -O3 是否危险?

2021-12-20 00:00:00 optimization g++ c++ compiler-flags

我从各种来源(尽管主要来自我的同事)听说,在 g++ 中以 -O3 的优化级别进行编译在某种程度上是危险的",一般应该避免,除非被证明是必要的.

这是真的吗,如果是,为什么?我应该坚持使用 -O2 吗?

解决方案

在 gcc 的早期(2.8 等)和 egcs 时代,redhat 2.96 -O3 有时会出现很多问题.但这是十多年前的事了,-O3 与其他级别的优化(在错误方面)没有太大区别.

然而,它确实倾向于揭示人们依赖未定义行为的情况,因为他们更严格地依赖语言的规则,尤其是极端情况.

作为个人说明,我使用 -O3 在金融部门运行生产软件多年,但还没有遇到如果我使用 -O2 就不会出现的错误.

应大众需求,这里补充:

-O3 尤其是像 -funroll-loops(未由 -O3 启用)这样的附加标志有时会导致生成更多机器代码.在某些情况下(例如在具有非常小的 L1 指令缓存的 cpu 上)这可能会由于所有代码而导致速度变慢,例如一些内部循环现在不再适合 L1I.通常 gcc 会非常努力地不生成这么多代码,但由于它通常会优化通用情况,因此可能会发生这种情况.特别容易出现这种情况的选项(如循环展开)通常不包含在 -O3 中,并在联机帮助页中进行了相应标记.因此,使用 -O3 生成快速代码通常是个好主意,并且仅在适当的时候(例如,当分析器指示 L1I 未命中时)才回退到 -O2 或 -Os(尝试优化代码大小).

如果您想将优化发挥到极致,您可以在 gcc 中通过 --param 调整与某些优化相关的成本.另外请注意,gcc 现在能够将属性放在控制这些函数的优化设置的函数中,因此当您发现在一个函数中使用 -O3 有问题时(或者想为该函数尝试特殊标志),你不需要用 O2 编译整个文件甚至整个项目.

otoh 似乎在使用 -Ofast 时必须小心,其中指出:

<块引用>

-Ofast 启用所有 -O3 优化.它还启用了对所有标准都无效的优化合规程序.

这让我得出结论,-O3 旨在完全符合标准.

I have heard from various sources (though mostly from a colleague of mine), that compiling with an optimisation level of -O3 in g++ is somehow 'dangerous', and should be avoided in general unless proven to be necessary.

Is this true, and if so, why? Should I just be sticking to -O2?

解决方案

In the early days of gcc (2.8 etc.) and in the times of egcs, and redhat 2.96 -O3 was quite buggy sometimes. But this is over a decade ago, and -O3 is not much different than other levels of optimizations (in buggyness).

It does however tend to reveal cases where people rely on undefined behavior, due to relying more strictly on the rules, and especially corner cases, of the language(s).

As a personal note, I am running production software in the financial sector for many years now with -O3 and have not yet encountered a bug that would not have been there if I would have used -O2.

By popular demand, here an addition:

-O3 and especially additional flags like -funroll-loops (not enabled by -O3) can sometimes lead to more machine code being generated. Under certain circumstances (e.g. on a cpu with exceptionally small L1 instruction cache) this can cause a slowdown due to all the code of e.g. some inner loop now not fitting anymore into L1I. Generally gcc tries quite hard to not to generate so much code, but since it usually optimizes the generic case, this can happen. Options especially prone to this (like loop unrolling) are normally not included in -O3 and are marked accordingly in the manpage. As such it is generally a good idea to use -O3 for generating fast code, and only fall back to -O2 or -Os (which tries to optimize for code size) when appropriate (e.g. when a profiler indicates L1I misses).

If you want to take optimization into the extreme, you can tweak in gcc via --param the costs associated with certain optimizations. Additionally note that gcc now has the ability to put attributes at functions that control optimization settings just for these functions, so when you find you have a problem with -O3 in one function (or want to try out special flags for just that function), you don't need to compile the whole file or even whole project with O2.

otoh it seems that care must be taken when using -Ofast, which states:

-Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard compliant programs.

which makes me conclude that -O3 is intended to be fully standards compliant.

相关文章