将 int 重新解释为 float 的最有效的标准兼容方式
假设我保证 float
是 IEEE 754 binary32.给定一个对应于有效浮点数的位模式,存储在 std::uint32_t
中,如何以最有效的标准兼容方式将其重新解释为 float
?p>
Assume I have guarantees that float
is IEEE 754 binary32. Given a bit pattern that corresponds to a valid float, stored in std::uint32_t
, how does one reinterpret it as a float
in a most efficient standard compliant way?
float reinterpret_as_float(std::uint32_t ui) {
return /* apply sorcery to ui */;
}
我有几种方法我知道/怀疑/假设有一些问题:
I've got a few ways that I know/suspect/assume have some issues:
通过
reinterpret_cast
,
float reinterpret_as_float(std::uint32_t ui) {
return reinterpret_cast<float&>(ui);
}
或等效
float reinterpret_as_float(std::uint32_t ui) {
return *reinterpret_cast<float*>(&ui);
}
存在别名问题.
通过联合
,
float reinterpret_as_float(std::uint32_t ui) {
union {
std::uint32_t ui;
float f;
} u = {ui};
return u.f;
}
这实际上是不合法的,因为它只允许从最近写给成员的内容中读取.然而,似乎有些编译器 (gcc) 允许这样做.
which is not actually legal, as it is only allowed to read from most recently written to member. Yet, it seems some compilers (gcc) allow this.
通过 std::memcpy
,
float reinterpret_as_float(std::uint32_t ui) {
float f;
std::memcpy(&f, &ui, 4);
return f;
}
哪个 AFAIK 是合法的,但是复制单个单词的函数调用似乎很浪费,尽管它可能会被优化掉.
which AFAIK is legal, but a function call to copy single word seems wasteful, though it might get optimized away.
通过 reinterpret_cast
ing 到 char*
并复制,
Via reinterpret_cast
ing to char*
and copying,
float reinterpret_as_float(std::uint32_t ui) {
char* uip = reinterpret_cast<char*>(&ui);
float f;
char* fp = reinterpret_cast<char*>(&f);
for (int i = 0; i < 4; ++i) {
fp[i] = uip[i];
}
return f;
}
哪个 AFAIK 也是合法的,因为 char
指针不受别名问题的影响,并且手动字节复制循环节省了可能的函数调用.循环肯定会被展开,但是 4 个可能单独的一字节加载/存储令人担忧,我不知道这是否可优化为单个四字节加载/存储.
which AFAIK is also legal, as char
pointers are exempt from aliasing issues and manual byte copying loop saves a possible function call. The loop will most definitely be unrolled, yet 4 possibly separate one-byte loads/stores are worrisome, I have no idea whether this is optimizable to single four byte load/store.
4
是我能想到的最好的.
The 4
is the best I've been able to come up with.
到目前为止我是对的吗?有没有更好的方法来做到这一点,特别是可以保证单次加载/存储的方法?
Am I correct so far? Is there a better way to do this, particulary one that will guarantee single load/store?
推荐答案
Afaik,只有两种方法符合严格的别名规则:memcpy()
和强制转换为 char*
复制.所有其他人从属于 uint32_t
的内存中读取 float
,并且允许编译器在写入该内存位置之前执行读取.它甚至可以完全优化写入,因为它可以证明存储的值永远不会根据严格的别名规则使用,从而导致垃圾返回值.
Afaik, there are only two approaches that are compliant with strict aliasing rules: memcpy()
and cast to char*
with copying. All others read a float
from memory that belongs to an uint32_t
, and the compiler is allowed to perform the read before the write to that memory location. It might even optimize away the write altogether as it can prove that the stored value will never be used according to strict aliasing rules, resulting in a garbage return value.
这真的取决于编译器/优化 memcpy()
或 char*
复制是否更快.在这两种情况下,智能编译器可能会发现它只能加载和复制 uint32_t
,但在我在生成的汇编代码中看到它之前,我不相信任何编译器会这样做.
It really depends on the compiler/optimizes whether memcpy()
or char*
copy is faster. In both cases, an intelligent compiler might be able to figure out that it can just load and copy an uint32_t
, but I would not trust any compiler to do so before I have seen it in the resulting assembler code.
在使用 gcc 4.8.1 进行一些测试后,我可以说 memcpy()
方法是这个特定编译器的最佳选择,详情请参见下文.
After some testing with gcc 4.8.1, I can say that the memcpy()
approach is the best for this particulare compiler, see below for details.
编译
#include <stdint.h>
float foo(uint32_t a) {
float b;
char* aPointer = (char*)&a, *bPointer = (char*)&b;
for( int i = sizeof(a); i--; ) bPointer[i] = aPointer[i];
return b;
}
使用 gcc -S -std=gnu11 -O3 foo.c
生成以下汇编代码:
with gcc -S -std=gnu11 -O3 foo.c
yields this assemble code:
movl %edi, %ecx
movl %edi, %edx
movl %edi, %eax
shrl $24, %ecx
shrl $16, %edx
shrw $8, %ax
movb %cl, -1(%rsp)
movb %dl, -2(%rsp)
movb %al, -3(%rsp)
movb %dil, -4(%rsp)
movss -4(%rsp), %xmm0
ret
这不是最优的.
对
#include <stdint.h>
#include <string.h>
float foo(uint32_t a) {
float b;
char* aPointer = (char*)&a, *bPointer = (char*)&b;
memcpy(bPointer, aPointer, sizeof(a));
return b;
}
yields(除了 -O0
之外的所有优化级别):
yields (with all optimization levels except -O0
):
movl %edi, -4(%rsp)
movss -4(%rsp), %xmm0
ret
这是最佳选择.
相关文章