使用向量类实现堆栈的链表与动态数组

2022-01-01 00:00:00 linked-list arrays stack dynamic-arrays c++

我正在阅读实现堆栈的两种不同方式:链表和动态数组.链表相对于动态数组的主要优点是链表不必调整大小,而如果插入的元素过多,则必须调整动态数组的大小,从而浪费大量时间和内存.

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.

这让我想知道这是否适用于 C++(因为有一个向量类,它会在插入新元素时自动调整大小)?

That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?

推荐答案

这两者很难比较,因为它们的内存使用模式大不相同.

It's difficult to compare the two, because the patterns of their memory usage are quite different.

矢量调整大小

矢量会根据需要动态调整自身大小.它通过分配新的内存块,将数据从旧块移动(或复制)到新块,释放旧块来实现.在典型的情况下,新块的大小是旧块的 1.5 倍(与流行的看法相反,2 倍在实践中似乎很不寻常).这意味着在重新分配的短时间内,它需要的内存大约是您实际存储的数据的 2.5 倍.其余时间,正在使用的块"最少为 2/3rds 满,最多为完全满.如果所有尺寸的可能性均等,我们可以预期它平均大约 5/6ths 已满.从另一个方向看,我们可以预期在任何给定时间大约有 1/6th 或大约 17% 的空间被浪费".

A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.

当我们按照这样的常数因子调整大小时(而不是,例如,总是添加特定大小的块,例如以 4Kb 的增量增长),我们得到了所谓的摊销常数时间添加.换句话说,随着数组的增长,调整大小的频率呈指数下降.数组中项目被复制的平均次数趋于恒定(通常约为 3,但取决于您使用的增长因子).

When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).

链表分配

使用链表,情况就大不一样了.我们从未看到调整大小,因此我们看不到某些插入的额外时间或内存使用.同时,我们确实看到额外的时间和内存基本上所有使用.特别是,链表中的每个节点都需要包含一个指向下一个节点的指针.根据节点中数据的大小与指针的大小相比,这可能会导致显着的开销.例如,假设您需要一堆 int .在 int 与指针大小相同的典型情况下,这意味着 50% 的开销――一直都是.指针大比int越来越常见;两倍的大小相当常见(64 位指针,32 位整数).在这种情况下,您有大约 67% 的开销――也就是说,很明显,每个节点为指针分配的空间是所存储数据的两倍.

Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.

不幸的是,这通常只是冰山一角.在典型的链表中,每个节点都是单独动态分配的.至少,如果您正在存储小数据项(例如 int),则为节点分配的内存可能(通常会)甚至大于您实际请求的数量.所以――你要求 12 字节的内存来保存一个 int 和一个指针――但是你得到的内存块可能会被四舍五入到 16 或 32 字节.现在您看到的开销至少为 75%,很可能约为 88%.

Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.

就速度而言,情况相当相似:动态分配和释放内存通常很慢.堆管理器通常具有空闲内存块,并且必须花时间搜索它们以找到最适合您要求的大小的块.然后它(通常)必须将该块分成两部分,一个用于满足您的分配,另一个可用于满足其他分配.同样,当您释放内存时,它通常会返回到相同的空闲块列表,并检查是否有相邻的内存块已经空闲,以便将两者重新连接在一起.

As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.

分配和管理大量内存块的成本很高.

Allocating and managing lots of blocks of memory is expensive.

缓存使用

最后,对于最近的处理器,我们遇到了另一个重要因素:缓存使用.在向量的情况下,我们拥有彼此相邻的所有数据.然后,在使用的向量部分结束后,我们有一些空内存.这导致了出色的缓存使用――我们使用的数据被缓存;我们没有使用的数据对缓存几乎没有影响.

Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.

对于链表,指针(以及每个节点中可能的开销)分布在整个链表中.即,我们关心的每条数据旁边都有指针的开销,以及分配给我们未使用的节点的空白空间.简而言之,缓存的有效大小减少了与列表中每个节点的总体开销大致相同的因素――即,我们可能很容易看到只有 1/8th 缓存存储我们关心的日期,7/8ths 专门用于存储指针和/或纯垃圾.

With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.

总结

当您的节点数量相对较少时,链表可以很好地工作,每个节点都非常大.如果(对于堆栈来说更典型)您正在处理相对大量的项目,每个项目都非常小,那么您看到节省的时间或内存使用情况.恰恰相反,对于这种情况,链表更有可能基本上浪费大量时间和内存.

A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.

相关文章