实现没有未定义行为的 std::vector 之类的容器

这可能会让一些编码人员感到惊讶,尽管令人惊讶的是,如果没有编译器的非标准支持,就不可能实现 std::vector.问题本质上在于在原始存储区域上执行指针运算的能力.论文p0593:为低级对象操作隐式创建对象,出现在@ShafikYaghmour 回答中,清楚地揭示了针对存在的问题,提出对标准进行修改,以便于实现vector like container等法律级别的编程技术.

It may surprise some coders and, as surprising as it can be, it is not possible to implement std::vector without non-standard support of the compilers. The problem essentially resides on the ability to perform pointer arithmetic on a raw storage region. The paper, p0593: Implicit creation of objects for low-level object manipulation, that appears in @ShafikYaghmour answer, exposes clearly the problematic and proposes modification of the standard in order to make implementation of vector like container and other law level programming technique easier.

尽管如此,我想知道是否没有办法仅使用语言提供的内容而不使用任何标准库来实现与 std::vector 等效的类型.

Nevertheless I was wondering if there were no work around to implement a type equivalent to std::vector only using what is provided by the language without any use of the standard library.

目标是在原始存储区域中一个一个地构造向量元素,并能够使用迭代器访问这些元素.这相当于 std::vector 上的 push_back 序列.

The objective is to construct vector elements, one by one, in a raw storage region, and be able to access those elements using an iterator. This would be equivalent to a sequence of push_back on a std::vector.

为了理解这个问题,下面是对 libc++ 或 libstdc++ 中 std::vector 实现的操作的简化:

To get an idea of the problem, bellow a simplification of the operations that are performed on the implementation of std::vector in the libc++ or libstdc++:

void access_value(std::string x);

std::string s1, s2, s3;
//allocation
auto p=static_cast<std::string*>(::operator new(10*sizeof(std::string)));

//push_back s1
new(p) std::string(s1);
access_value(*p);//undefined behavior, p is not a pointer to object

//push_back s2
new(p+1) std::string(s2);//undefined behavior
        //, pointer arithmetic but no array (neither implicit array of size 1)
access_value(*(p+1));//undefined behavior, p+1 is not a pointer to object

//push_back s2
new(p+2) std::string(s3);//undefined behavior
        //, pointer arithmetic but no array
access_value(*(p+2));//undefined behavior, p+2 is not a pointer to object

我的想法是使用一个从不初始化其成员的联合.

My idea is to use a union that never initialize its member.

//almost trivialy default constructible
template<class T>
union atdc{
  char _c;
  T value;
  atdc ()noexcept{ }
  ~atdc(){}
};

原始存储将使用此联合类型的数组进行初始化,并且始终在此数组上执行指针运算.然后在每次 push_back 时在联合的非活动成员上构造元素.

The raw storage will be initialized with an array of this union type, and pointer arithmetic is always performed on this array. Then elements are constructed on the inactive member of the union at each push_back.

std::string s1, s2, s3;
auto p=::operator new(10*sizeof(std::string));
auto arr = new(p) atdc<std::string>[10];
//pointer arithmetic on arr is allowed

//push_back s1
new(&arr[0].value) std::string(s1); //union member activation
access_value(arr[0].value);

//push_back s2
new(&arr[1].value) std::string(s2);
access_value(arr[1].value);

//push_back s2
new(&arr[2].value) std::string(s2);
access_value(arr[2].value);

上面这段代码中是否有任何未定义的行为?

Is there any undefined behavior in this code above?

推荐答案

这是正在积极讨论的话题,我们可以在提案中看到这一点 p0593:为低级对象操作隐式创建对象.这是对这些问题的非常扎实的讨论,以及为什么不进行更改就无法修复的原因.如果您对所考虑的方法有不同的方法或强烈的看法,您可能需要联系提案作者.

This is topic that is under active discussion, we can see this in the proposal p0593: Implicit creation of objects for low-level object manipulation. This is a pretty solid discussion of the issues and why they are not fixable without changes. If you have different approaches or strong views on the approaches being considered you may want to reach out to the proposals authors.

它包括这个讨论:

2.3.数组的动态构建

2.3. Dynamic construction of arrays

考虑这个试图实现像 std::vector 这样的类型的程序(为简洁起见省略了许多细节):

Consider this program that attempts to implement a type like std::vector (with many details omitted for brevity):

....

在实践中,此代码适用于一系列现有的实现,但根据 C++ 对象模型,未定义行为发生在点 #a、#b、#c、#d 和 #e,因为它们试图对分配的存储区域执行指针运算不包含数组对象.

In practice, this code works across a range of existing implementations, but according to the C++ object model, undefined behavior occurs at points #a, #b, #c, #d, and #e, because they attempt to perform pointer arithmetic on a region of allocated storage that does not contain an array object.

在位置#b、#c 和#d 处,对char* 执行算术运算,在位置#a、#e 和#f 处,对T* 执行算术运算.理想情况下,此问题的解决方案将使两种计算都具有定义的行为.

At locations #b, #c, and #d, the arithmetic is performed on a char*, and at locations #a, #e, and #f, the arithmetic is performed on a T*. Ideally, a solution to this problem would imbue both calculations with defined behavior.

  1. 方法

上面的代码片段有一个共同的主题:它们试图使用它们从未创建过的对象.实际上,程序员假定他们不需要显式创建对象的一系列类型.我们建议识别这些类型,并仔细制定规则,通过隐式创建来消除显式创建此类对象的需要.

The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.

使用 adc 联合的方法存在一个问题,即我们希望能够通过指针 T* 访问包含的数据,即通过 std::vector::data.将联合作为 T* 访问将违反 严格的别名规则,因此是未定义的行为.

The approach using the adc union has the problem that we expect to be able to access the contained data via a pointer T* i.e. via std::vector::data. Accessing the union as a T* would violate the strict aliasing rules and therefore be undefined behavior.

相关文章