什么是聚合和 POD,它们如何/为什么特别?

2022-01-30 00:00:00 aggregate c++ c++11 c++17 standard-layout

此常见问题解答是关于聚合和POD的,涵盖以下材料:

This FAQ is about Aggregates and PODs and covers the following material:

  • 什么是聚合?
  • 什么是POD(普通旧数据)?
  • 它们有什么关系?
  • 它们有什么特别之处,为什么特别?
  • C++11 有哪些变化?

推荐答案

如何阅读:

这篇文章比较长.如果您想了解聚合和 POD(普通旧数据),请花时间阅读.如果您只对聚合感兴趣,请阅读第一部分.如果您只对 POD 感兴趣,那么您必须首先阅读聚合的定义、含义和示例,然后可能跳到 POD,但我仍然建议您完整阅读第一部分.聚合的概念对于定义 POD 至关重要.如果您发现任何错误(即使是很小的错误,包括语法、文体、格式、语法等),请发表评论,我会编辑.

How to read:

This article is rather long. If you want to know about both aggregates and PODs (Plain Old Data) take time and read it. If you are interested just in aggregates, read only the first part. If you are interested only in PODs then you must first read the definition, implications, and examples of aggregates and then you may jump to PODs but I would still recommend reading the first part in its entirety. The notion of aggregates is essential for defining PODs. If you find any errors (even minor, including grammar, stylistics, formatting, syntax, etc.) please leave a comment, I'll edit.

这个答案适用于 C++03.有关其他 C++ 标准,请参阅:

This answer applies to C++03. For other C++ standards see:

  • C++11 更改
  • C++14 更改
  • C++17 更改
  • C++20 更改

来自 C++ 标准的正式定义(C++03 8.5.1 §1):

Formal definition from the C++ standard (C++03 8.5.1 §1):

聚合是没有用户声明的数组或类(第 9 条)构造函数(12.1),没有私有或受保护的非静态数据成员(第 11 条),没有基类(第 10 条),也没有虚函数(10.3).

An aggregate is an array or a class (clause 9) with no user-declared constructors (12.1), no private or protected non-static data members (clause 11), no base classes (clause 10), and no virtual functions (10.3).

那么,好吧,让我们解析一下这个定义.首先,任何数组都是聚合.如果……等等,一个类也可以是一个聚合.关于结构或联合什么都没说,它们不能是聚合吗?是的他们可以.在 C++ 中,术语 class 指代所有的类、结构和联合.因此,一个类(或结构或联合)是一个聚合当且仅当它满足上述定义的标准.这些标准意味着什么?

So, OK, let's parse this definition. First of all, any array is an aggregate. A class can also be an aggregate if… wait! nothing is said about structs or unions, can't they be aggregates? Yes, they can. In C++, the term class refers to all classes, structs, and unions. So, a class (or struct, or union) is an aggregate if and only if it satisfies the criteria from the above definitions. What do these criteria imply?

  • 这并不意味着聚合类不能有构造函数,事实上它可以有默认构造函数和/或复制构造函数,只要它们是由编译器隐式声明的,而不是由用户显式声明的

  • This does not mean an aggregate class cannot have constructors, in fact it can have a default constructor and/or a copy constructor as long as they are implicitly declared by the compiler, and not explicitly by the user

没有私有或受保护的非静态数据成员.您可以拥有任意数量的私有和受保护的成员函数(但不是构造函数)以及许多私有或受保护的静态数据成员和成员函数,并且不违反聚合类规则

No private or protected non-static data members. You can have as many private and protected member functions (but not constructors) as well as as many private or protected static data members and member functions as you like and not violate the rules for aggregate classes

聚合类可以有一个用户声明/用户定义的复制赋值运算符和/或析构函数

An aggregate class can have a user-declared/user-defined copy-assignment operator and/or destructor

一个数组是一个聚合,即使它是一个非聚合类类型的数组.

An array is an aggregate even if it is an array of non-aggregate class type.

现在让我们看一些例子:

Now let's look at some examples:

class NotAggregate1
{
  virtual void f() {} //remember? no virtual functions
};

class NotAggregate2
{
  int x; //x is private by default and non-static 
};

class NotAggregate3
{
public:
  NotAggregate3(int) {} //oops, user-defined constructor
};

class Aggregate1
{
public:
  NotAggregate1 member1;   //ok, public member
  Aggregate1& operator=(Aggregate1 const & rhs) {/* */} //ok, copy-assignment  
private:
  void f() {} // ok, just a private function
};

你明白了.现在让我们看看聚合有什么特别之处.与非聚合类不同,它们可以用花括号 {} 进行初始化.这种初始化语法通常用于数组,我们刚刚了解到这些是聚合.那么,让我们从它们开始吧.

You get the idea. Now let's see how aggregates are special. They, unlike non-aggregate classes, can be initialized with curly braces {}. This initialization syntax is commonly known for arrays, and we just learnt that these are aggregates. So, let's start with them.

类型 array_name[n] = {a1, a2, ..., am};

Type array_name[n] = {a1, a2, …, am};

如果(m == n)
数组的第 ith 元素用 ai
初始化else if(m < n)
数组的前 m 个元素被初始化为 a1, a2, ..., am 和另一个 n - m 元素,如果可能的话,值初始化(见下文对术语的解释)
else if(m > n)
编译器会报错
else (这种情况是完全没有指定 n 的情况,例如 int a[] = {1, 2, 3};)
假设数组 (n) 的大小等于 m,所以 int a[] = {1, 2, 3}; 等价于 int a[3] = {1, 2, 3};

if(m == n)
the ith element of the array is initialized with ai
else if(m < n)
the first m elements of the array are initialized with a1, a2, …, am and the other n - m elements are, if possible, value-initialized (see below for the explanation of the term)
else if(m > n)
the compiler will issue an error
else (this is the case when n isn't specified at all like int a[] = {1, 2, 3};)
the size of the array (n) is assumed to be equal to m, so int a[] = {1, 2, 3}; is equivalent to int a[3] = {1, 2, 3};

当标量类型的对象(boolintchardouble、指针等) 是 value-initialized 这意味着它是用 0 为该类型初始化的(false for bool, 0.0 表示 double 等).当具有用户声明的默认构造函数的类类型对象被值初始化时,它的默认构造函数被调用.如果默认构造函数是隐式定义的,那么所有非静态成员都会递归地进行值初始化.这个定义不精确而且有点不正确,但它应该给你基本的想法.不能对引用进行值初始化.例如,如果类没有适当的默认构造函数,则非聚合类的值初始化可能会失败.

When an object of scalar type (bool, int, char, double, pointers, etc.) is value-initialized it means it is initialized with 0 for that type (false for bool, 0.0 for double, etc.). When an object of class type with a user-declared default constructor is value-initialized its default constructor is called. If the default constructor is implicitly defined then all nonstatic members are recursively value-initialized. This definition is imprecise and a bit incorrect but it should give you the basic idea. A reference cannot be value-initialized. Value-initialization for a non-aggregate class can fail if, for example, the class has no appropriate default constructor.

数组初始化示例:

class A
{
public:
  A(int) {} //no default constructor
};
class B
{
public:
  B() {} //default constructor available
};
int main()
{
  A a1[3] = {A(2), A(1), A(14)}; //OK n == m
  A a2[3] = {A(2)}; //ERROR A has no default constructor. Unable to value-initialize a2[1] and a2[2]
  B b1[3] = {B()}; //OK b1[1] and b1[2] are value initialized, in this case with the default-ctor
  int Array1[1000] = {0}; //All elements are initialized with 0;
  int Array2[1000] = {1}; //Attention: only the first element is 1, the rest are 0;
  bool Array3[1000] = {}; //the braces can be empty too. All elements initialized with false
  int Array4[1000]; //no initializer. This is different from an empty {} initializer in that
  //the elements in this case are not value-initialized, but have indeterminate values 
  //(unless, of course, Array4 is a global array)
  int array[2] = {1, 2, 3, 4}; //ERROR, too many initializers
}

现在让我们看看如何用大括号初始化聚合类.几乎相同的方式.我们将按照它们在类定义中出现的顺序来初始化非静态数据成员,而不是数组元素(根据定义,它们都是公共的).如果初始化器的数量少于成员,则其余的都是值初始化的.如果无法对未显式初始化的成员之一进行值初始化,则会出现编译时错误.如果初始化器的数量超过了必要的数量,我们也会收到编译时错误.

Now let's see how aggregate classes can be initialized with braces. Pretty much the same way. Instead of the array elements we will initialize the non-static data members in the order of their appearance in the class definition (they are all public by definition). If there are fewer initializers than members, the rest are value-initialized. If it is impossible to value-initialize one of the members which were not explicitly initialized, we get a compile-time error. If there are more initializers than necessary, we get a compile-time error as well.

struct X
{
  int i1;
  int i2;
};
struct Y
{
  char c;
  X x;
  int i[2];
  float f; 
protected:
  static double d;
private:
  void g(){}      
}; 

Y y = {'a', {10, 20}, {20, 30}};

在上面的例子中,yc'a' 初始化,yxi110 初始化,yxi220yi[0]20yi[1]30yf 是值初始化的,即用 0.0 进行初始化.受保护的静态成员 d 根本没有初始化,因为它是 static.

In the above example y.c is initialized with 'a', y.x.i1 with 10, y.x.i2 with 20, y.i[0] with 20, y.i[1] with 30 and y.f is value-initialized, that is, initialized with 0.0. The protected static member d is not initialized at all, because it is static.

聚合联合的不同之处在于您只能用大括号初始化它们的第一个成员.我认为,如果你在 C++ 方面足够先进,甚至可以考虑使用联合(它们的使用可能非常危险,必须仔细考虑),你可以自己在标准中查找联合规则:).

Aggregate unions are different in that you may initialize only their first member with braces. I think that if you are advanced enough in C++ to even consider using unions (their use may be very dangerous and must be thought of carefully), you could look up the rules for unions in the standard yourself :).

现在我们知道了聚合的特殊之处,让我们试着理解类的限制;也就是说,他们为什么在那里.我们应该理解,使用大括号进行成员初始化意味着类只不过是其成员的总和.如果存在用户定义的构造函数,则意味着用户需要做一些额外的工作来初始化成员,因此大括号初始化是不正确的.如果存在虚函数,则意味着此类的对象(在大多数实现上)具有指向该类的所谓 vtable 的指针,该指针在构造函数中设置,因此大括号初始化是不够的.您可以通过与练习类似的方式找出其余的限制:).

Now that we know what's special about aggregates, let's try to understand the restrictions on classes; that is, why they are there. We should understand that memberwise initialization with braces implies that the class is nothing more than the sum of its members. If a user-defined constructor is present, it means that the user needs to do some extra work to initialize the members therefore brace initialization would be incorrect. If virtual functions are present, it means that the objects of this class have (on most implementations) a pointer to the so-called vtable of the class, which is set in the constructor, so brace-initialization would be insufficient. You could figure out the rest of the restrictions in a similar manner as an exercise :).

关于聚合就足够了.现在我们可以定义一组更严格的类型,即 POD

So enough about the aggregates. Now we can define a stricter set of types, to wit, PODs

来自 C++ 标准的正式定义 (C++03 9 §4):

Formal definition from the C++ standard (C++03 9 §4):

POD-struct 是一个聚合类没有非静态数据成员类型非 POD 结构,非 POD 联合(或此类类型的数组)或引用,以及没有用户定义的副本分配运算符,没有用户定义析构函数.类似地,POD 联合是一个没有类型的非静态数据成员非 POD 结构,非 POD 联合(或此类类型的数组)或引用,以及没有用户定义的副本分配运算符,没有用户定义析构函数.POD 类是一个类这是一个 POD 结构或一个POD 联合.

A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. A POD class is a class that is either a POD-struct or a POD-union.

哇,这个更难解析,不是吗?:) 让我们将工会排除在外(基于与上述相同的理由)并以更清晰的方式重新表述:

Wow, this one's tougher to parse, isn't it? :) Let's leave unions out (on the same grounds as above) and rephrase in a bit clearer way:

如果满足以下条件,则聚合类称为 POD它没有用户定义的复制分配运算符和析构函数,没有它的非静态成员是非 POD类、非 POD 数组或参考.

An aggregate class is called a POD if it has no user-defined copy-assignment operator and destructor and none of its nonstatic members is a non-POD class, array of non-POD, or a reference.

这个定义意味着什么?(我有没有提到 POD 代表 Plain Old Data?)

What does this definition imply? (Did I mention POD stands for Plain Old Data?)

  • 所有 POD 类都是聚合,或者,换句话说,如果一个类不是聚合,那么它肯定不是 POD
  • 类,就像结构一样,可以是 POD,尽管这两种情况的标准术语都是 POD-struct
  • 就像聚合的情况一样,类有什么静态成员并不重要

例子:

struct POD
{
  int x;
  char y;
  void f() {} //no harm if there's a function
  static std::vector<char> v; //static members do not matter
};

struct AggregateButNotPOD1
{
  int x;
  ~AggregateButNotPOD1() {} //user-defined destructor
};

struct AggregateButNotPOD2
{
  AggregateButNotPOD1 arrOfNonPod[3]; //array of non-POD class
};

POD 类、POD 联合、标量类型和此类类型的数组统称为 POD 类型.
POD 在很多方面都很特别.我将仅提供一些示例.

POD-classes, POD-unions, scalar types, and arrays of such types are collectively called POD-types.
PODs are special in many ways. I'll provide just some examples.

  • POD 类最接近 C 结构.与它们不同的是,POD 可以具有成员函数和任意静态成员,但这两者都不会改变对象的内存布局.因此,如果您想编写一个或多或少可以从 C 甚至 .NET 使用的可移植动态库,您应该尝试让所有导出的函数只接受和返回 POD 类型的参数.

  • POD-classes are the closest to C structs. Unlike them, PODs can have member functions and arbitrary static members, but neither of these two change the memory layout of the object. So if you want to write a more or less portable dynamic library that can be used from C and even .NET, you should try to make all your exported functions take and return only parameters of POD-types.

非 POD 类类型的对象的生命周期从构造函数完成时开始,到析构函数完成时结束.对于 POD 类,生命周期从对象的存储空间被占用开始,到该存储空间被释放或重用时结束.

The lifetime of objects of non-POD class type begins when the constructor has finished and ends when the destructor has finished. For POD classes, the lifetime begins when storage for the object is occupied and finishes when that storage is released or reused.

对于 POD 类型的对象,标准保证当您 memcpy 将对象的内容转换为 char 或 unsigned char 数组,然后 memcpy内容回到你的对象,对象将保持其原始值.请注意,对于非 POD 类型的对象,没有这样的保证.此外,您可以使用 memcpy 安全地复制 POD 对象.以下示例假设 T 是 POD 类型:

For objects of POD types it is guaranteed by the standard that when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value. Do note that there is no such guarantee for objects of non-POD types. Also, you can safely copy POD objects with memcpy. The following example assumes T is a POD-type:

 #define N sizeof(T)
 char buf[N];
 T obj; // obj initialized to its original value
 memcpy(buf, &obj, N); // between these two calls to memcpy,
 // obj might be modified
 memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
 // holds its original value

  • goto 语句.您可能知道,通过 goto 从某个变量尚未在范围内的点跳转到它已经在范围内的点是非法的(编译器应该发出错误).仅当变量为非 POD 类型时,此限制才适用.在以下示例中,f() 格式不正确,而 g() 格式正确.请注意,微软的编译器对这条规则过于宽松――它只是在两种情况下都会发出警告.

  • goto statement. As you may know, it is illegal (the compiler should issue an error) to make a jump via goto from a point where some variable was not yet in scope to a point where it is already in scope. This restriction applies only if the variable is of non-POD type. In the following example f() is ill-formed whereas g() is well-formed. Note that Microsoft's compiler is too liberal with this rule―it just issues a warning in both cases.

     int f()
     {
       struct NonPOD {NonPOD() {}};
       goto label;
       NonPOD x;
     label:
       return 0;
     }
    
     int g()
     {
       struct POD {int i; char c;};
       goto label;
       POD x;
     label:
       return 0;
     }
    

  • 保证 POD 对象的开头不会有填充.换句话说,如果 POD 类 A 的第一个成员是类型 T,您可以安全地 reinterpret_castA*T* 并得到指向第一个成员的指针,反之亦然.

  • It is guaranteed that there will be no padding in the beginning of a POD object. In other words, if a POD-class A's first member is of type T, you can safely reinterpret_cast from A* to T* and get the pointer to the first member and vice versa.

    名单还在继续……

    了解 POD 到底是什么很重要,因为正如您所见,许多语言功能对它们的行为不同.

    It is important to understand what exactly a POD is because many language features, as you see, behave differently for them.

  • 相关文章