究竟什么数据结构是 C++ 中的双端队列?

2021-12-22 00:00:00 data-structures c++

是否存在 C++ STL 中的双端队列应该实现的特定数据结构,或者双端队列只是这种模糊的数组概念,即可以从正面和背面增长的数组,无论实现选择如何实现?

Is there a specific data structure that a deque in the C++ STL is supposed to implement, or is a deque just this vague notion of an array growable from both the front and the back, to be implemented however the implementation chooses?

我过去一直认为双端队列是一个循环缓冲区,但我最近正在阅读C++ 参考 here,听起来 deque 是某种数组数组.它似乎不是一个普通的旧循环缓冲区.那么它是 gap buffer 还是 可增长数组的其他一些变体,还是仅仅依赖于实现?

I used to always assume a deque was a circular buffer, but I was recently reading a C++ reference here, and it sounds like a deque is some kind of array of arrays. It doesn't seem like it's a plain old circular buffer. Is it a gap buffer, then, or some other variant of growable array, or is it just implementation-dependent?

答案的更新和总结:

似乎普遍的共识是双端队列是这样一种数据结构:

It seems the general consensus is that a deque is a data structure such that:

  • 插入或删除元素的时间在列表的开头或结尾应该是恒定的,而在其他地方最多是线性的.如果我们将其解释为真正的恒定时间而不是摊销的恒定时间,正如有人评论的那样,这似乎具有挑战性.有些人认为我们不应将其解释为非摊销不变时间.
  • 双端队列要求任何插入都应保持对成员元素的任何引用有效.迭代器无效是可以的,但成员本身必须保留在内存中的同一位置."正如有人评论的那样:这很容易,只需将成员复制到堆上的某个位置并将 T* 存储在引擎盖下的数据结构中即可.
  • 在双端队列的开头或结尾插入单个元素总是需要恒定的时间并导致对 T 的构造函数的单次调用."如果数据结构在幕后存储 T*,也将实现 T 的单个构造函数.
  • 数据结构必须具有随机访问权限.

如果我们将第一个条件设为非摊销常数时间",似乎没有人知道如何获得第一个和第四个条件的组合.链表实现 1) 但不是 4),而典型的循环缓冲区实现 4) 但不是 1).我想我有一个满足以下两个要求的实现.评论?

It seems no one knows how to get a combination of the 1st and 4th conditions if we take the first condition to be "non-amortized constant time". A linked list achieves 1) but not 4), whereas a typical circular buffer achieves 4) but not 1). I think I have an implementation that fulfills both below. Comments?

我们从别人建议的一个实现开始:我们分配一个数组并从中间开始放置元素,前后都留有空间.在这个实现中,我们跟踪前后两个方向从中心有多少个元素,称这些值 F 和 B.然后,让我们用一个辅助数组来扩充这个数据结构,它的大小是原始数组的两倍数组(所以现在我们浪费了大量空间,但渐近复杂性没有变化).我们还将从它的中间填充这个辅助数组,并给它类似的值 F' 和 B'.策略是这样的:每次我们在给定方向上向主数组添加一个元素时,如果 F > F' 或 B > B'(取决于方向),则最多复制两个值从主阵列到辅助阵列,直到 F' 赶上 F(或 B' 赶上 B).因此,插入操作涉及将 1 个元素放入主数组并将最多 2 个元素从主数组复制到辅助数组,但它仍然是 O(1).当主阵列变满时,我们释放主阵列,使辅助阵列成为主阵列,并创建另一个 2 倍大的辅助阵列.这个新的辅助数组以 F' = B' = 0 开始,并且没有复制任何内容(因此,如果堆分配的复杂度为 O(1),则调整大小操作为 O(1)).由于辅助节点为添加到主节点的每个元素复制 2 个元素,并且主节点最多半满开始,因此辅助节点不可能在主节点再次耗尽空间时赶上主节点.删除同样只需要从主要元素中删除 1 个元素,从辅助元素中删除 0 或 1.因此,假设堆分配为 O(1),则此实现满足条件 1).我们使数组为 T* 并在每次插入时使用 new 来满足条件 2) 和 3).最后,4) 得到满足,因为我们使用的是数组结构,可以轻松实现 O(1) 访问.

We start with an implementation someone else suggested: we allocate an array and start placing elements from the middle, leaving space in both the front and back. In this implementation, we keep track of how many elements there are from the center in both the front and back directions, call those values F and B. Then, let's augment this data structure with an auxiliary array that is twice the size of the original array (so now we're wasting a ton of space, but no change in asymptotic complexity). We will also fill this auxiliary array from its middle and give it similar values F' and B'. The strategy is this: every time we add one element to the primary array in a given direction, if F > F' or B > B' (depending on the direction), up to two values are copied from the primary array to the auxiliary array until F' catches up with F (or B' with B). So an insert operation involves putting 1 element into the primary array and copying up to 2 from the primary to the auxiliary, but it's still O(1). When the primary array becomes full, we free the primary array, make the auxiliary array the primary array, and make another auxiliary array that's yet 2 times bigger. This new auxiliary array starts out with F' = B' = 0 and having nothing copied to it (so the resize op is O(1) if a heap allocation is O(1) complexity). Since the auxiliary copies 2 elements for every element added to the primary and the primary starts out at most half-full, it is impossible for the auxiliary to not have caught up with the primary by the time the primary runs out of space again. Deletions likewise just need to remove 1 element from the primary and either 0 or 1 from the auxiliary. So, assuming heap allocations are O(1), this implementation fulfills condition 1). We make the array be of T* and use new whenever inserting to fulfill conditions 2) and 3). Finally, 4) is fulfilled because we are using an array structure and can easily implement O(1) access.

推荐答案

(将此答案设为社区维基.请坚持下去.)

(Making this answer a community-wiki. Please get stuck in.)

第一件事:deque 要求在前面或后面的任何插入都应保持对成员元素的任何引用有效.迭代器失效是可以的,但成员本身必须留在内存中的同一位置.这很容易,只需将成员复制到堆上的某个位置并将 T* 存储在引擎盖下的数据结构中即可.请参阅其他 StackOverflow 问题关于 deque 的额外间接访问"

First things first: A deque requires that any insertion to the front or back shall keep any reference to a member element valid. It's OK for iterators to be invalidated, but the members themselves must stay in the same place in memory. This is easy enough by just copying the members to somewhere on the heap and storing T* in the data structure under the hood. See this other StackOverflow question " About deque<T>'s extra indirection "

(vector 不保证保留迭代器或引用,而 list 保留两者.

(vector doesn't guarantee to preserve either iterators or references, whereas list preserves both).

所以让我们把这个间接"视为理所当然,然后看看剩下的问题.有趣的一点是从列表的开头或结尾插入或删除的时间.乍一看,deque 似乎可以用 vector 轻松实现,也许可以将其解释为 循环缓冲区.

So let's just take this 'indirection' for granted and look at the rest of the problem. The interesting bit is the time to insert or remove from the beginning or end of the list. At first, it looks like a deque could trivially be implemented with a vector, perhaps by interpreting it as a circular buffer.

但是双端队列必须满足在一个元素的开头或结尾插入一个元素deque 总是花费常数时间并导致对 T 的构造函数的单次调用."

多亏了我们已经提到的间接性,很容易确保只有一个构造函数调用,但挑战是保证时间恒定.如果我们可以使用恒定的摊销时间会很容易,这将允许简单的vector实现,但它必须是恒定的(非摊销)时间.

Thanks to the indirection we've already mentioned, it's easy to ensure there is just one constructor call, but the challenge is to guarantee constant time. It would be easy if we could just use constant amortized time, which would allow the simple vector implementation, but it must be constant (non-amortized) time.

相关文章