C++ 中的高效链表?

这个

假设我们要删除一个节点.这是您标准的双向链表删除,除了我们使用索引而不是指针,并且您还将节点推送到空闲列表(仅涉及操作整数):

去除链接调整:

将删除的节点推送到空闲列表:

现在假设您插入到此列表中.在这种情况下,您可以弹出空闲头并覆盖该位置的节点.

插入后:

在恒定时间内插入中间也应该很容易理解.基本上,如果空闲堆栈为空,您只需插入到空闲头或 push_back 到向量中.然后你做你的标准双链表插入.空闲列表的逻辑(虽然我为别人制作了这张图,它涉及到一个 SLL,但你应该明白这个想法):

确保在插入/删除时使用新的放置和手动调用 dtor 正确构造和销毁元素.如果你真的想概括它,你还需要考虑异常安全,我们还需要一个只读的 const 迭代器.

优点和缺点

这种结构的好处是它确实允许从列表中的任何地方非常快速地插入/删除(即使对于一个巨大的列表),插入顺序被保留以进行遍历,并且它永远不会使迭代器无效t 直接删除(尽管它会使指向它们的指针无效;如果您不希望指针无效,请使用 deque).就我个人而言,我发现它比 std::list(我几乎从未使用过)更有用.

对于足够大的列表(比方说,大于整个 L3 缓存,因为您绝对应该期待一个巨大的优势),这应该大大优于 std::vector 在删除和插入到/从中间和前面.对于小的元素,从向量中移除元素可能会非常快,但是尝试从向量中移除一百万个元素,从前面开始向后面工作.那里的东西会开始爬行,而这个会在眨眼间完成.当人们开始使用其 erase 方法从跨越 10k 个元素或更多元素的向量中间删除元素时,std::vector 被 IMO 稍微夸大了,尽管我假设这仍然比人们天真地在任何地方使用链表更可取,其中每个节点都针对通用分配器单独分配,同时导致大量缓存未命中.

缺点是它只支持顺序访问,每个元素需要两个整数的开销,而且正如你在上图中看到的,如果你不断地偶尔删除一些东西,它的空间局部性会降低.

空间局部性退化

当您开始从/向中间删除和插入大量内容时,空间局部性的损失将导致锯齿形的内存访问模式,可能会从缓存行中逐出数据,只是为了在单个顺序期间返回并重新加载它环形.对于任何允许在恒定时间内从中间删除同时允许在保留插入顺序的同时回收该空间的数据结构,这通常是不可避免的.但是,您可以通过提供某种方法来恢复空间局部性,或者您可以复制/交换列表.复制构造函数可以通过遍历源列表并插入所有元素的方式复制列表,从而返回一个完美连续、缓存友好且无孔的向量(尽管这样做会使迭代器失效).

替代方案:空闲列表分配器

满足您要求的替代方法是实现符合 std::allocator 的空闲列表,并将其与 std::list 一起使用.我从不喜欢触及数据结构并使用自定义分配器,并且通过使用指针而不是 32 位索引,这会使 64 位链接的内存使用量加倍,所以我更喜欢上述解决方案个人使用 std::vector 基本上是您的类比内存分配器和索引而不是指针(如果我们使用 std::vector,它们既减少了大小,又成为一个要求,因为指针将在以下情况下无效vector 预留了一个新的容量).

索引链表

我称这种东西为索引链表",因为链表并不是真正的容器,而是一种将已存储在数组中的事物链接在一起的方式.我发现这些索引链表更有用,因为您不必深入内存池以避免每个节点的堆分配/解除分配,并且仍然可以保持合理的引用位置(如果您能负担得起,那就太好了处处处理事物以恢复空间局部性).

如果您向节点迭代器再添加一个整数以存储前一个节点索引(假设 int 和 64 位用于指针).但是,您随后就无法添加反向迭代器并使所有迭代器成为双向的.

基准

因为您似乎对它们感兴趣,所以我快速编写了上述版本:发布版本、MSVC 2012、没有检查的迭代器或类似的东西:

--------------------------------------------- test_vector_linked-----------------------------------------------插入 200000 个元素...插入"的时间:{0.000015 秒}删除一半的列表...擦除"所用时间:{0.000021 秒}迭代"所用的时间:{0.000002 秒}复制"所用时间:{0.000003 秒}结果(最多显示 10 个元素):[ 11 13 15 17 19 21 23 25 27 29 ]完成 test_vector_linked:{0.062000 秒}------------------------------------------------ 测试向量-----------------------------------------------插入 200000 个元素...插入"所用的时间:{0.000012 秒}擦除一半的向量...擦除"所用时间:{5.320000 秒}迭代"所用的时间:{0.000000 秒}复制"所用时间:{0.000000 秒}结果(最多显示 10 个元素):[ 11 13 15 17 19 21 23 25 27 29 ]完成测试向量:{5.320000 秒}

懒得使用高精度计时器,但希望这能说明为什么不应该在关键路径中使用 vector's 线性时间 erase 方法上面带有 vector 的非平凡输入大小需要大约 86 倍的时间(输入大小越大,情况越糟糕――我最初尝试了 200 万个元素,但在等待近 10 分钟后放弃了)以及为什么我认为 vector 在这种用途上被过度夸大了.也就是说,我们可以将中间的移除变成一个非常快速的恒定时间操作,而不会改变元素的顺序,不会使存储它们的索引和迭代器失效,并且仍然使用 vector... 所有我们要做的只是让它存储一个带有 prev/next 索引的链接节点,以允许跳过已删除的元素.

为了删除,我使用了偶数索引的随机混洗源向量来确定要删除的元素和顺序.这在某种程度上模拟了一个真实世界的用例,您通过以前获得的索引/迭代器从这些容器的中间删除,例如在用户按下删除按钮后使用选取框工具删除用户以前选择的元素(再次,您真的不应该使用标量 vector::erase 来使用非平凡的大小;最好建立一组索引来删除和使用 remove_if -- 仍然比一次调用一个迭代器的 vector::erase 好).

请注意,链接节点的迭代确实变慢了,这与迭代逻辑无关,因为添加链接后向量中的每个条目都更大(更多内存来顺序处理等于更多的缓存未命中和页面错误).尽管如此,如果您正在执行诸如从非常大的输入中删除元素之类的操作,那么对于大型容器而言,线性时间删除和恒定时间删除之间的性能偏差非常大,因此这往往是值得交换的.

This document says std::list is inefficient:

std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types.

Comment: that is to my surprise. std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.

My question: Say I need a sequential container for small-sized homogeneous elements, and this container should support element insert/delete in O(1) complexity and does not need random access (though supporting random access is nice, it is not a must here). I also don't want the high constant factor introduced by heap allocation for each element's construction, at least when the number of element is small. Lastly, iterators should be invalidated only when the corresponding element is deleted. Apparently I need a custom container class, which might (or might not) be a variant of doubly linked list. How should I design this container?

If the aforementioned specification cannot be achieved, then perhaps I should have a custom memory allocator, say, bump pointer allocator? I know std::list takes an allocator as its second template argument.

Edit: I know I shouldn't be too concerned with this issue, from an engineering standpoint - fast enough is good enough. It is just a hypothetical question so I don't have a more detailed use case. Feel free to relax some of the requirements!

Edit2: I understand two algorithms of O(1) complexity can have entirely different performance due to the difference in their constant factors.

解决方案

The simplest way I see to fulfill all your requirements:

  1. Constant-time insertion/removal (hope amortized constant-time is okay for insertion).
  2. No heap allocation/deallocation per element.
  3. No iterator invalidation on removal.

... would be something like this, just making use of std::vector:

template <class T>
struct Node
{
    // Stores the memory for an instance of 'T'.
    // Use placement new to construct the object and
    // manually invoke its dtor as necessary.
    typename std::aligned_storage<sizeof(T), alignof(T)>::type element;

    // Points to the next element or the next free
    // element if this node has been removed.
    int next;

    // Points to the previous element.
    int prev;
};

template <class T>
class NodeIterator
{
public:
    ...
private:
    std::vector<Node<T>>* nodes;
    int index;
};

template <class T>
class Nodes
{
public:
    ...
private:
    // Stores all the nodes.
    std::vector<Node> nodes;

    // Points to the first free node or -1 if the free list
    // is empty. Initially this starts out as -1.
    int free_head;
};

... and hopefully with a better name than Nodes (I'm slightly tipsy and not so good at coming up with names at the moment). I'll leave the implementation up to you but that's the general idea. When you remove an element, just do a doubly-linked list removal using the indices and push it to the free head. The iterator doesn't invalidate since it stores an index to a vector. When you insert, check if the free head is -1. If not, overwrite the node at that position and pop. Otherwise push_back to the vector.

Illustration

Diagram (nodes are stored contiguously inside std::vector, we simply use index links to allow skipping over elements in a branchless way along with constant-time removals and insertions anywhere):

Let's say we want to remove a node. This is your standard doubly-linked list removal, except we use indices instead of pointers and you also push the node to the free list (which just involves manipulating integers):

Removal adjustment of links:

Pushing removed node to free list:

Now let's say you insert to this list. In that case, you pop off the free head and overwrite the node at that position.

After insertion:

Insertion to the middle in constant-time should likewise be easy to figure out. Basically you just insert to the free head or push_back to the vector if the free stack is empty. Then you do your standard double-linked list insertion. Logic for the free list (though I made this diagram for someone else and it involves an SLL, but you should get the idea):

Make sure you properly construct and destroy the elements using placement new and manual calls to the dtor on insertion/removal. If you really want to generalize it, you'll also need to think about exception-safety and we also need a read-only const iterator.

Pros and Cons

The benefit of such a structure is that it does allow very rapid insertions/removals from anywhere in the list (even for a gigantic list), insertion order is preserved for traversal, and it never invalidates the iterators to element which aren't directly removed (though it will invalidate pointers to them; use deque if you don't want pointers to be invalidated). Personally I'd find more use for it than std::list (which I practically never use).

For large enough lists (say, larger than your entire L3 cache as a case where you should definitely expect a huge edge), this should vastly outperform std::vector for removals and insertions to/from the middle and front. Removing elements from vector can be quite fast for small ones, but try removing a million elements from a vector starting from the front and working towards the back. There things will start to crawl while this one will finish in the blink of an eye. std::vector is ever-so-slightly overhyped IMO when people start using its erase method to remove elements from the middle of a vector spanning 10k elements or more, though I suppose this is still preferable over people naively using linked lists everywhere in a way where each node is individually allocated against a general-purpose allocator while causing cache misses galore.

The downside is that it only supports sequential access, requires the overhead of two integers per element, and as you can see in the above diagram, its spatial locality degrades if you constantly remove things sporadically.

Spatial Locality Degradation

The loss of spatial locality as you start removing and inserting a lot from/to the middle will lead to zig-zagging memory access patterns, potentially evicting data from a cache line only to go back and reload it during a single sequential loop. This is generally inevitable with any data structure that allows removals from the middle in constant-time while likewise allowing that space to be reclaimed while preserving the order of insertion. However, you can restore spatial locality by offering some method or you can copy/swap the list. The copy constructor can copy the list in a way that iterates through the source list and inserts all the elements which gives you back a perfectly contiguous, cache-friendly vector with no holes (though doing this will invalidate iterators).

Alternative: Free List Allocator

An alternative that meets your requirements is implement a free list conforming to std::allocator and use it with std::list. I never liked reaching around data structures and messing around with custom allocators though and that one would double the memory use of the links on 64-bit by using pointers instead of 32-bit indices, so I'd prefer the above solution personally using std::vector as basically your analogical memory allocator and indices instead of pointers (which both reduce size and become a requirement if we use std::vector since pointers would be invalidated when vector reserves a new capacity).

Indexed Linked Lists

I call this kind of thing an "indexed linked list" as the linked list isn't really a container so much as a way of linking together things already stored in an array. And I find these indexed linked lists exponentially more useful since you don't have to get knee-deep in memory pools to avoid heap allocations/deallocations per node and can still maintain reasonable locality of reference (great LOR if you can afford to post-process things here and there to restore spatial locality).

You can also make this singly-linked if you add one more integer to the node iterator to store the previous node index (comes free of memory charge on 64-bit assuming 32-bit alignment requirements for int and 64-bit for pointers). However, you then lose the ability to add a reverse iterator and make all iterators bidirectional.

Benchmark

I whipped up a quick version of the above since you seem interested in 'em: release build, MSVC 2012, no checked iterators or anything like that:

--------------------------------------------
- test_vector_linked
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000015 secs}

Erasing half the list...
time passed for 'erasing': {0.000021 secs}
time passed for 'iterating': {0.000002 secs}
time passed for 'copying': {0.000003 secs}

Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]

finished test_vector_linked: {0.062000 secs}
--------------------------------------------
- test_vector
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000012 secs}

Erasing half the vector...
time passed for 'erasing': {5.320000 secs}
time passed for 'iterating': {0.000000 secs}   
time passed for 'copying': {0.000000 secs}

Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]

finished test_vector: {5.320000 secs}

Was too lazy to use a high-precision timer but hopefully that gives an idea of why one shouldn't use vector's linear-time erase method in critical paths for non-trivial input sizes with vector above there taking ~86 times longer (and exponentially worse the larger the input size -- I tried with 2 million elements originally but gave up after waiting almost 10 minutes) and why I think vector is ever-so-slightly-overhyped for this kind of use. That said, we can turn removal from the middle into a very fast constant-time operation without shuffling the order of the elements, without invalidating indices and iterators storing them, and while still using vector... All we have to do is simply make it store a linked node with prev/next indices to allow skipping over removed elements.

For removal I used a randomly shuffled source vector of even-numbered indices to determine what elements to remove and in what order. That somewhat mimics a real world use case where you're removing from the middle of these containers through indices/iterators you formerly obtained, like removing the elements the user formerly selected with a marquee tool after he his the delete button (and again, you really shouldn't use scalar vector::erase for this with non-trivial sizes; it'd even be better to build a set of indices to remove and use remove_if -- still better than vector::erase called for one iterator at a time).

Note that iteration does become slightly slower with the linked nodes, and that doesn't have to do with iteration logic so much as the fact that each entry in the vector is larger with the links added (more memory to sequentially process equates to more cache misses and page faults). Nevertheless, if you're doing things like removing elements from very large inputs, the performance skew is so epic for large containers between linear-time and constant-time removal that this tends to be a worthwhile exchange.

相关文章