关于 C++ 中类型双关的观点?
我对 C++ 中类型双关指针/数组的约定很好奇.这是我目前的用例:
<块引用>通过将二进制数据块视为 32 位整数数组(我们知道其总长度是 4 的倍数),然后对所有值求和并忽略溢出,计算一个简单的 32 位校验和.我希望这样的函数看起来像这样:
uint32_t compute_checksum(const char *data, size_t size){const uint32_t *udata =/* ???*/;uint32_t 校验和 = 0;for (size_t i = 0; i != size/4; ++i)校验和 += udata[i];返回udata;}
现在我的问题是,您认为将 data
转换为 udata
的最佳"方法是什么?
C 风格的类型转换?
udata = (const uint32_t *)data
假设所有指针都可以转换的 C++ 类型转换?
udata = reinterpret_cast(data)
C++ 使用中间 void*
在任意指针类型之间进行转换?
udata = static_cast(static_cast(data))
通过联合施放?
union {const uint32_t *udata;const char *cdata;};cdata = 数据;//现在使用 udata
我完全意识到这不是一个 100% 可移植的解决方案,但我只希望在我知道它可以工作的一小组平台上使用它(即未对齐的内存访问和编译器对指针别名的假设).你会推荐什么?
解决方案就 C++ 标准而言,litb 的答案是完全正确且最便携的.将 const char *data
转换为 const uint3_t *
,无论是通过 C 风格的转换、static_cast
还是 reinterpret_cast
,打破了严格的别名规则(参见 了解严格别名).如果您以完全优化的方式编译,则代码很可能不会做正确的事情.
通过联合(例如 litb 的 my_reint
)进行转换可能是最好的解决方案,尽管它在技术上确实违反了规则,即如果您通过一个成员写入联合并通过另一个成员读取它,它导致未定义的行为.但是,实际上所有编译器都支持这一点,并且会产生预期的结果.如果您绝对希望 100% 符合标准,请使用位移位方法.否则,我建议您通过联合进行强制转换,这可能会给您带来更好的性能.
I'm curious about conventions for type-punning pointers/arrays in C++. Here's the use case I have at the moment:
Compute a simple 32-bit checksum over a binary blob of data by treating it as an array of 32-bit integers (we know its total length is a multiple of 4), and then summing up all values and ignoring overflow.
I would expect such an function to look like this:
uint32_t compute_checksum(const char *data, size_t size)
{
const uint32_t *udata = /* ??? */;
uint32_t checksum = 0;
for (size_t i = 0; i != size / 4; ++i)
checksum += udata[i];
return udata;
}
Now the question I have is, what do you consider the "best" way to convert data
to udata
?
C-style cast?
udata = (const uint32_t *)data
C++ cast that assumes all pointers are convertible?
udata = reinterpret_cast<const uint32_t *>(data)
C++ cast that between arbitrary pointer types using intermediate void*
?
udata = static_cast<const uint32_t *>(static_cast<const void *>(data))
Cast through a union?
union {
const uint32_t *udata;
const char *cdata;
};
cdata = data;
// now use udata
I fully realize that this will not be a 100% portable solution, but I am only expecting to use it on a small set of platforms where I know it works (namely unaligned memory accesses and compiler assumptions on pointer aliasing). What would you recommend?
解决方案As far as the C++ standard is concerned, litb's answer is completely correct and the most portable. Casting const char *data
to a const uint3_t *
, whether it be via a C-style cast, static_cast
, or reinterpret_cast
, breaks the strict aliasing rules (see Understanding Strict Aliasing). If you compile with full optimization, there's a good chance that the code will not do the right thing.
Casting through a union (such as litb's my_reint
) is probably the best solution, although it does technically violate the rule that if you write to a union through one member and read it through another, it results in undefined behavior. However, practically all compilers support this, and it results in the the expected result. If you absolutely desire to conform to the standard 100%, go with the bit-shifting method. Otherwise, I'd recommend going with casting through a union, which is likely to give you better performance.
相关文章