为什么内联被认为比函数调用更快?
现在,我知道这是因为没有调用函数的开销,但是调用函数的开销真的那么重(值得内联它的膨胀)?
Now, I know it's because there's not the overhead of calling a function, but is the overhead of calling a function really that heavy (and worth the bloat of having it inlined) ?
据我所知,当一个函数被调用时,比如说 f(x,y),x 和 y 被压入堆栈,堆栈指针跳转到一个空块,并开始执行.我知道这有点过于简单化了,但我是否遗漏了什么?几推一跳调用一个函数,真的有那么多开销吗?
From what I can remember, when a function is called, say f(x,y), x and y are pushed onto the stack, and the stack pointer jumps to an empty block, and begins execution. I know this is a bit of an oversimplification, but am I missing something? A few pushes and a jump to call a function, is there really that much overhead?
如果我忘记了什么,请告诉我,谢谢!
Let me know if I'm forgetting something, thanks!
推荐答案
除了没有调用(因此没有相关费用,例如调用前的参数准备和调用后的清理)之外,还有一个显着的优势内联.当函数体被内联时,它的体可以在调用者的特定上下文中重新解释.这可能会立即允许编译器进一步减少和优化代码.
Aside from the fact that there's no call (and therefore no associated expenses, like parameter preparation before the call and cleanup after the call), there's another significant advantage of inlining. When the function body is inlined, it's body can be re-interpreted in the specific context of the caller. This might immediately allow the compiler to further reduce and optimize the code.
举一个简单的例子,这个函数
For one simple example, this function
void foo(bool b) {
if (b) {
// something
}
else {
// something else
}
}
如果作为非内联函数调用将需要实际分支
will require actual branching if called as a non-inlined function
foo(true);
...
foo(false);
但是,如果上述调用被内联,编译器将立即能够消除分支.本质上,在上述情况下,内联允许编译器将函数参数解释为编译时常量(如果参数是编译时常量)――这对于非内联函数通常是不可能的.
However, if the above calls are inlined, the compiler will immediately be able to eliminate the branching. Essentially, in the above case inlining allows the compiler to interpret the function argument as a compile-time constant (if the parameter is a compile-time constant) - something that is generally not possible with non-inlined functions.
然而,它甚至远不止于此.一般来说,内联启用的优化机会要深远得多.再举一个例子,当函数体被内联到特定调用者的上下文中时,编译器在一般情况下将能够将调用代码中存在的已知别名相关关系传播到内联函数代码中,从而可以更好地优化函数的代码.
However, it is not even remotely limited to that. In general, the optimization opportunities enabled of inlining are significantly more far-reaching. For another example, when the function body is inlined into the specific caller's context, the compiler in general case will be able to propagate the known aliasing-related relationships present in the calling code into the inlined function code, thus making it possible to optimize the function's code better.
同样,可能的例子很多,所有这些都源于这样一个基本事实,即内联调用沉浸在特定调用者的上下文中,从而实现了各种不可能实现的上下文间优化与非内联调用.通过内联,您基本上可以获得原始函数的许多单独版本,每个版本都针对每个特定的调用者上下文单独定制和优化.显然,这样做的代价是代码膨胀的潜在危险,但如果使用得当,它可以提供显着的性能优势.
Again, the possible examples are numerous, all of them stemming from the basic fact that inlined calls are immersed into the specific caller's context, thus enabling various inter-context optimizations, which would not be possible with non-inlined calles. With inlining you basically get many individual versions of your original function, each version is tailored and optimized individually for each specific caller context. The price of that is, obviously, the potential danger of code bloat, but if used correctly, it can provide noticeable performance benefits.
相关文章