编译器重新排序结构

2021-12-23 00:00:00 c struct c++ memory-alignment

假设我有一个这样的结构:

struct MyStruct{uint8_t var0;uint32_t var1;uint8_t var2;uint8_t var3;uint8_t var4;};

这可能会浪费大量(而不是一吨)空间.这是因为 uint32_t 变量的必要对齐.

实际上(在对齐结构以便它可以实际使用 uint32_t 变量之后)它可能看起来像这样:

struct MyStruct{uint8_t var0;uint8_t 未使用[3];//浪费了3个字节的空间uint32_t var1;uint8_t var2;uint8_t var3;uint8_t var4;};

更有效的结构是:

struct MyStruct{uint8_t var0;uint8_t var2;uint8_t var3;uint8_t var4;uint32_t var1;};

现在的问题是:

为什么编译器(根据标准)禁止对结构重新排序?

如果重新排序结构体,我看不出有什么办法可以让你自己在脚下开枪.

解决方案

为什么编译器(根据标准)禁止对结构重新排序?

根本原因是:为了兼容C.

请记住,C 最初是一种高级汇编语言.在 C 中通过将字节重新解释为特定的 struct 来查看内存(网络数据包,...)是很常见的.

这导致多个功能依赖此属性:

  • C 保证 struct 的地址和它的第一个数据成员的地址是相同的,所以 C++ 也这样做(在没有 virtual代码>继承/方法).

  • C 保证如果你有两个 struct AB 并且都以数据成员 char 后跟一个数据成员 int(以及之后的任何内容),然后当您将它们放入 union 时,您可以编写 B 成员并通过其 A 成员读取 charint,因此 C++ 也这样做:标准布局.

后者是极其广泛的,并且完全防止对大多数struct(或class)的数据成员进行任何重新排序.<小时>

请注意,该标准确实允许进行一些重新排序:由于 C 没有访问控制的概念,因此 C++ 指定未指定具有不同访问控制说明符的两个数据成员的相对顺序.

据我所知,没有编译器试图利用它;但理论上他们可以.

在 C++ 之外,诸如 Rust 之类的语言允许编译器对字段进行重新排序,而主 Rust 编译器 (rustc) 在默认情况下会这样做.只有历史决定和对向后兼容性的强烈渴望才能阻止 C++ 这样做.

Suppose I have a struct like this:

struct MyStruct
{
  uint8_t var0;
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the uint32_t variable.

In actuality (after aligning the structure so that it can actually use the uint32_t variable) it might look something like this:

struct MyStruct
{
  uint8_t var0;
  uint8_t unused[3];  //3 bytes of wasted space
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

A more efficient struct would be:

struct MyStruct
{
  uint8_t var0;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
  uint32_t var1;
};

Now, the question is:

Why is the compiler forbidden (by the standard) from reordering the struct?

I don't see any way you could shoot your self in the foot if the struct was reordered.

解决方案

Why is the compiler forbidden (by the standard) from reordering the struct?

The basic reason is: for compatibility with C.

Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific struct.

This has led to multiple features relying on this property:

  • C guaranteed that the address of a struct and the address of its first data member are one and the same, so C++ does too (in the absence of virtual inheritance/methods).

  • C guaranteed that if you have two struct A and B and both start with a data member char followed by a data member int (and whatever after), then when you put them in a union you can write the B member and read the char and int through its A member, so C++ does too: Standard Layout.

The latter is extremely broad, and completely prevents any re-ordering of data members for most struct (or class).


Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified.

As far as I know, no compiler attempts to take advantage of it; but they could in theory.

Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.

相关文章