为什么 C++ 编译需要这么长时间?

2021-12-08 00:00:00 performance compilation c++

与 C# 和 Java 相比，编译 C++ 文件需要很长时间.与运行正常大小的 Python 脚本相比，编译 C++ 文件所需的时间要长得多.我目前正在使用 VC++，但它与任何编译器都是一样的.这是为什么?

Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm currently using VC++ but it's the same with any compiler. Why is this?

我能想到的两个原因是加载头文件和运行预处理器，但这似乎不能解释为什么需要这么长时间.

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

推荐答案

几个原因

每个编译单元都需要 (1) 加载和 (2) 编译数百甚至数千个头文件.它们中的每一个通常都必须为每个编译单元重新编译，因为预处理器确保编译头文件的结果可能因每个编译单元而异.(可以在一个编译单元中定义一个宏来改变头文件的内容).

Every single compilation unit requires hundreds or even thousands of headers to be (1) loaded and (2) compiled. Every one of them typically has to be recompiled for every compilation unit, because the preprocessor ensures that the result of compiling a header might vary between every compilation unit. (A macro may be defined in one compilation unit which changes the content of the header).

这可能是主要原因，因为它需要为每个编译单元编译大量代码，此外，每个头文件都必须多次编译(对于包含它的每个编译单元一次).

This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it).

一旦编译，所有的目标文件必须链接在一起.这基本上是一个无法很好地并行化的整体流程，并且必须处理您的整个项目.

Once compiled, all the object files have to be linked together. This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

语法解析极其复杂，严重依赖上下文，并且很难消除歧义.这需要很多时间.

The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate. This takes a lot of time.

在 C# 中，List 是唯一被编译的类型，无论您的程序中有多少个 List 实例.在 C++ 中，vector 是与 vector 完全不同的类型，每个类型都必须单独编译.

In C#, List<T> is the only type that is compiled, no matter how many instantiations of List you have in your program. In C++, vector<int> is a completely separate type from vector<float>, and each one will have to be compiled separately.

此外，模板构成了编译器必须解释的完整图灵完备子语言"，这会变得非常复杂.即使是相对简单的模板元编程代码也可以定义递归模板，这些模板创建数十个模板实例.模板也可能导致极其复杂的类型，名称长得可笑，给链接器增加了很多额外的工作.(它必须比较很多符号名称，如果这些名称可以长到数千个字符，那会变得相当昂贵).

Add to this that templates make up a full Turing-complete "sub-language" that the compiler has to interpret, and this can become ridiculously complicated. Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations. Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker. (It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

当然，它们加剧了头文件的问题，因为模板一般都必须在头文件中定义，这意味着必须为每个编译单元解析和编译更多的代码.在普通的 C 代码中，标头通常只包含前向声明，但很少包含实际代码.在 C++ 中，几乎所有代码都驻留在头文件中的情况并不少见.

And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers, which means far more code has to be parsed and compiled for every compilation unit. In plain C code, a header typically only contains forward declarations, but very little actual code. In C++, it is not uncommon for almost all the code to reside in header files.

C++ 允许进行一些非常引人注目的优化.C# 或 Java 不允许完全消除类(为了反射目的，它们必须存在)，但即使是一个简单的 C++ 模板元程序也可以轻松生成数十个或数百个类，所有这些都在优化阶段被内联并再次消除.

C++ allows for some very dramatic optimizations. C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes), but even a simple C++ template metaprogram can easily generate dozens or hundreds of classes, all of which are inlined and eliminated again in the optimization phase.

此外，C++ 程序必须由编译器进行全面优化.C# 程序可以依赖 JIT 编译器在加载时执行额外的优化，C++ 没有任何这样的第二次机会".编译器生成的内容是最优化的.

Moreover, a C++ program must be fully optimized by the compiler. A C# program can rely on the JIT compiler to perform additional optimizations at load-time, C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

C++ 被编译成机器码，这可能比 Java 或 .NET 使用的字节码更复杂(尤其是在 x86 的情况下).(这只是出于完整性的考虑才被提及，因为它在评论等中被提及.在实践中，这一步不太可能占用总编译时间的一小部分).

C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86). (This is mentioned out of completeness only because it was mentioned in comments and such. In practice, this step is unlikely to take more than a tiny fraction of the total compilation time).

大多数这些因素由 C 代码共享，实际上编译效率相当高.解析步骤在 C++ 中要复杂得多，并且会占用更多的时间，但主要的违规者可能是模板.它们很有用，并使 C++ 成为一种功能更强大的语言，但它们也会对编译速度产生影响.

Most of these factors are shared by C code, which actually compiles fairly efficiently. The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates. They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

相关文章