Bjarne Stroustrup 说我们必须避免链表

2021-12-21 00:00:00 linked-list vector c++

我在 YouTube 上看到了这个视频:https://www.youtube.com/watch?v=YQs6IC-vgmo其中 Bjarne 说最好使用向量而不是链表.我无法理解整个事情,所以有人可以用外行的话解释他在说什么吗?

I saw this video on YouTube: https://www.youtube.com/watch?v=YQs6IC-vgmo in which Bjarne says it is better to use vectors, rather than linked lists. I am unable to grasp the entire thing, so could anyone explain what he is saying in layman's terms?

P.S:我是一名高中生,可以轻松处理链表,但我很难自己学习向量.你能推荐任何学习向量的来源吗?

P.S: I am an high school student and can easily handle linked lists, but I am struggling to learn vectors on my own. Could you suggest any sources to learn vectors?

推荐答案

向量 vs. 链表的优势

向量与链表的主要优点是内存局部性.

Advantages of vector vs. linked list

The main advantage of vector versus linked lists is memory locality.

通常,每个元素在链表中单独分配.因此,这些元素在内存中可能并不相邻.(内存中元素之间的间隙.)

Usually, each element is allocated seperately in a linked list. As a consequence those elements probably are not next to each other in memory. (Gaps between the elements in memory.)

一个向量保证连续存储所有包含的元素.(项目彼此相邻,没有间隙;)

A vector is guaranteed to store all contained elements contiguously. (Items next to each other, no gaps;)

注意:可能会出现过度简化...;)

Imo,关于连续存储的数据存储模式与非连续存储的卓越性能的简化关键点是

Imo, the simplified key points about the superior performance of a contiguously stored data storage pattern versus non-contiguous storage are

现代 CPU 不会从内存中获取单个字节,而是从稍大的块中获取.因此,如果您的数据对象大小小于这些块的大小,并且如果存储是连续的,则一次可以获取多个元素,因为多个元素可能位于单个块中.

Modern CPUs do not fetch single bytes from memory but slightly larger chunks. Thus, if your data objects size is less than the size of those chunks and if storage is contiguous, you can get more than one element at a time because multiple elements may be in a single chunk.

一个 64 字节的块(通常的缓存行大小)一次可容纳 16 个 32 位整数.因此,缓存未命中(数据尚未在缓存中 -> 需要从主内存加载)最早发生在从第一个元素被带入缓存的那一刻起处理 16 个元素之后.如果使用链表,第一个元素很可能是 64 字节块中的唯一元素.理论上,列表中的每个元素都可能发生缓存未命中.

A 64byte block (usual cache line size) fits sixteen 32bit integers at a time. Therefore, a cache miss (data not in cache yet -> load from main memory required) occurs at the earliest after processing 16 elements from the moment first one has been brought to cache. If a linked list is used, the first element might well be the only one within a 64byte block. It might in theory happen that a cache miss occurs for every single element of the list.

std::vector<std::uint32_t> v;
// somehow fill 64 values into v
std::uint32_t s{};
for(std::size_t i{0}; i<v.size(); ++i)
{
  s += v[i];
}

想象一下 v 的内容还没有被缓存.

Imagine the contents of v not being cached yet.

在for循环中处理数据的过程中会发生什么?

1)检查元素v[0]是否在缓存中.--> 没有

1)Check whether element v[0] is in cache. --> Nope

2) 从主存中取从 v[0] 的地址开始的 64 个字节到缓存行中

2)Fetch 64 bytes starting at the address of v[0] from main memory into a cache line

3) 从缓存中加载 v[0] 并通过将其值添加到 s 进行处理

3)Load v[0] from cache and process by adding its value to s

4)是元素 v1 在缓存中?--> 是加载了先前的提取,因为相邻的 v[0]

4)Is element v1 in cache? --> Yes loaded with previous fetch because neighbouring v[0]

5)加载 v1 从缓存和进程中通过将其值添加到 s

5)Load v1 from cache and process by adding its value to s

6) 元素 v[2] 是否在缓存中?--> 是的...

6)Is element v[2] in cache? --> Yes ...

7) 从缓存中加载 v[2] 并通过将其值添加到 s 进行处理

7) Load v[2] from cache and process by adding its value to s

...等等...

34) 元素 v[16] 是否在缓存中?--> 没有

34)Is element v[16] in cache? --> Nope

35) 从主存中取从 v[16] 的地址开始的 64 个字节到缓存行中

35) Fetch 64 bytes starting at the address of v[16] from main memory into a cache line

36) 从缓存中加载 v[16] 并通过将其值添加到 s 进行处理

36)Load v[16] from cache and process by adding its value to s

37) 元素 v[17] 是否在缓存中?--> 是加载了先前的提取,因为相邻的 v[16]

37)Is element v[17] in cache? --> Yes loaded with previous fetch because neighbouring v[16]

等等...

将数据从主内存读取到缓存中比将数据从缓存加载到处理器寄存器并执行简单操作花费的时间要多得多.因此,多个值可能驻留在单个缓存行上的事实可以显着提高性能.

Fetching data from main memory into the cache takes much more time than loading data from cache into processor registers and perform simple operations. Therefore, the fact that multiple values may reside on a single cache line can boost performance significantly.

链表不提供连续存储保证,您不能指望获得这种性能提升.这也是为什么连续容器的随机迭代(随机访问元素)比前向迭代(按顺序访问元素)表现更差的原因.

Linked lists do not provide a contiguous storage guarantee and you cannot hope to get this performance boost. This is also the reason why random iteration (accessing elements randomly) performs worse than forward iteration (accessing elements in order) for contiguous containers.

上述效果被称为预取器"的 CPU 功能放大.

The above effect is amplified by a CPU feature called "prefetcher".

如果一个块已经从主内存加载,预取器准备加载下一个块/已经将它放入缓存,显着减少从主内存的那部分加载内容的惩罚.

If a chunk has been loaded from main memory, the prefetcher prepares loading the next chunk / already puts it into cache, significantly reducing the penality of loading stuff from that part of the main memory.

当且仅当您确实需要来自下一个准备好的块的数据时,这当然是有效的.

This is of course effective if and only if you in fact need data from the next prepared chunk.

参见:c++向量,每当它在堆栈上扩展/重新分配时会发生什么?

相关文章