为什么这个 C++ 包装类没有被内联?
EDIT - 我的构建系统出了点问题.我仍在弄清楚到底是什么,但是 gcc
产生了奇怪的结果(即使它是一个 .cpp
文件),但是一旦我使用了 g++
然后它按预期工作.
EDIT - something's up with my build system. I'm still figuring out exactly what, but gcc
was producing weird results (even though it's a .cpp
file), but once I used g++
then it worked as expected.
对于我遇到的问题,这是一个非常简化的测试用例,其中使用数字包装类(我认为会被内联)使我的程序慢了 10 倍.
This is a very reduced test-case for something I've been having trouble with, where using a numerical wrapper class (which I thought would be inlined away) made my program 10x slower.
这与优化级别无关(尝试使用 -O0
和 -O3
).
This is independent of optimisation level (tried with -O0
and -O3
).
我是否在包装类中遗漏了一些细节?
Am I missing some detail in my wrapper class?
我有以下程序,我在其中定义了一个包含 double
并提供 +
运算符的类:
I have the following program, in which I define a class which wraps a double
and provides the +
operator:
#include <cstdio>
#include <cstdlib>
#define INLINE __attribute__((always_inline)) inline
struct alignas(8) WrappedDouble {
double value;
INLINE friend const WrappedDouble operator+(const WrappedDouble& left, const WrappedDouble& right) {
return {left.value + right.value};
};
};
#define doubleType WrappedDouble // either "double" or "WrappedDouble"
int main() {
int N = 100000000;
doubleType* arr = (doubleType*)malloc(sizeof(doubleType)*N);
for (int i = 1; i < N; i++) {
arr[i] = arr[i - 1] + arr[i];
}
free(arr);
printf("done
");
return 0;
}
我认为这会编译为相同的东西 - 它进行相同的计算,并且所有内容都是内联的.
I thought that this would compile to the same thing - it's doing the same calculations, and everything is inlined.
然而,事实并非如此 - 无论优化级别如何,它都会产生更大更慢的结果.
However, it's not - it produces a larger and slower result, regardless of optimisation level.
(这个特殊的结果并没有显着慢,但我的实际用例包括更多的算术.)
(This particular result is not significantly slower, but my actual use-case includes more arithmetic.)
EDIT - 我知道这不是在构建我的数组元素.我认为这可能会产生更少的 ASM,所以我可以更好地理解它,但如果它有问题,我可以更改它.
EDIT - I am aware that this isn't constructing my array elements. I thought this might produce less ASM so I could understand it better, but I can change it if it's a problem.
EDIT - 我也知道我应该使用 new[]
/delete[]
.不幸的是 gcc
拒绝编译它,即使它在一个 .cpp
文件中.这是我的构建系统被搞砸的症状,这可能是我的实际问题.
EDIT - I am also aware that I should be using new[]
/delete[]
. Unfortunately gcc
refused to compile that, even though it was in a .cpp
file. This was a symptom of my build system being screwed up, which is probably my actual problem.
EDIT - 如果我使用 g++
而不是 gcc
,它会产生相同的输出.
EDIT - If I use g++
instead of gcc
, it produces identical output.
EDIT - 我发布了错误版本的 ASM(-O0
而不是 -O3
),所以本节没有帮助.
EDIT - I posted the wrong version of the ASM (-O0
instead of -O3
), so this section isn't helpful.
我在 64 位系统上的 Mac 上使用 XCode 的 gcc.结果是一样的,除了 for 循环的主体.
I'm using XCode's gcc on my Mac, on a 64-bit system. The result is the same, aside from the body of the for-loop.
如果 doubleType
是 double
,它为循环体产生的结果如下:
Here's what it produces for the body of the loop if doubleType
is double
:
movq -16(%rbp), %rax
movl -20(%rbp), %ecx
subl $1, %ecx
movslq %ecx, %rdx
movsd (%rax,%rdx,8), %xmm0 ## xmm0 = mem[0],zero
movq -16(%rbp), %rax
movslq -20(%rbp), %rdx
addsd (%rax,%rdx,8), %xmm0
movq -16(%rbp), %rax
movslq -20(%rbp), %rdx
movsd %xmm0, (%rax,%rdx,8)
WrappedDouble
版本要长得多:
movq -40(%rbp), %rax
movl -44(%rbp), %ecx
subl $1, %ecx
movslq %ecx, %rdx
shlq $3, %rdx
addq %rdx, %rax
movq -40(%rbp), %rdx
movslq -44(%rbp), %rsi
shlq $3, %rsi
addq %rsi, %rdx
movq %rax, -16(%rbp)
movq %rdx, -24(%rbp)
movq -16(%rbp), %rax
movsd (%rax), %xmm0 ## xmm0 = mem[0],zero
movq -24(%rbp), %rax
addsd (%rax), %xmm0
movsd %xmm0, -8(%rbp)
movsd -8(%rbp), %xmm0 ## xmm0 = mem[0],zero
movsd %xmm0, -56(%rbp)
movq -40(%rbp), %rax
movslq -44(%rbp), %rdx
movq -56(%rbp), %rsi
movq %rsi, (%rax,%rdx,8)
推荐答案
当您使用 启用优化时,两个版本都会使用
.g++
和 clang++
生成相同的汇编代码>-O3
Both versions result in identical assembly code with g++
and clang++
when you turn on optimizations with -O3
.
相关文章