为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗?

2021-12-20 00:00:00 performance caching c++ c++11 unordered-map

我最近发现了一件奇怪的事情.看起来，使用完全没有缓存来计算 Collat??z 序列长度似乎快了 2 倍以上/em> 比使用 std::unordered_map 缓存所有元素.

I’ve recently found out an odd thing. It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map to cache all elements.

请注意，我确实从问题 Is gcc std 中得到了提示::unordered_map 实现慢?如果是这样 - 为什么? 我试图利用这些知识使 std::unordered_map 表现得尽可能好(我使用了 g++ 4.6，它确实比最新版本的 g++ 表现更好，并且我尝试指定一个合理的初始桶数，我使它完全等于地图必须容纳的最大元素数).

Note I did take hints from question Is gcc std::unordered_map implementation slow? If so - why? and I tried to used that knowledge to make std::unordered_map perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).

相比之下，使用std::vector缓存一些元素比完全没有缓存快 17 倍，比使用 std::unordered_map 快近 40 倍.

In comparision, using std::vector to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map.

我做错了什么还是这个容器太慢了，为什么?可以让它执行得更快吗?或者哈希图本质上是无效的，应该在高性能代码中尽可能避免使用?

Am I doing something wrong or is this container THAT slow and why? Can it be made performing faster? Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?

有问题的基准是:

#include <iostream> #include <unordered_map> #include <cstdint> #include <ctime> std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) { static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611); if(cache.count(val) == 0) { if(val%2 == 0) cache[val] = getCollatzLength(val/2) + 1; else cache[val] = getCollatzLength(3*val+1) + 1; } return cache[val]; } int main() { std::clock_t tStart = std::clock(); std::uint_fast16_t largest = 0; for(int i = 1; i <= 999999; ++i) { auto cmax = getCollatzLength(i); if(cmax > largest) largest = cmax; } std::cout << largest << ' '; std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << ' '; }

输出:耗时:0.761717

而一个完全没有缓存的基准:

Whereas a benchmark with no caching at all:

#include <iostream> #include <unordered_map> #include <cstdint> #include <ctime> std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) { std::uint_fast16_t length = 1; while(val != 1) { if(val%2 == 0) val /= 2; else val = 3*val + 1; ++length; } return length; } int main() { std::clock_t tStart = std::clock(); std::uint_fast16_t largest = 0; for(int i = 1; i <= 999999; ++i) { auto cmax = getCollatzLength(i); if(cmax > largest) largest = cmax; } std::cout << largest << ' '; std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << ' '; }

输出耗时:0.324586

推荐答案

标准库的映射确实天生就很慢(std::map 特别是，但 std::unoredered_map代码>以及).Google 的 Chandler Carruth 在他的 CppCon 2014 演讲中解释了这一点；简而言之:std::unordered_map 对缓存不友好，因为它使用链表作为存储桶.

The standard library's maps are, indeed, inherently slow (std::map especially but std::unoredered_map as well). Google's Chandler Carruth explains this in his CppCon 2014 talk; in a nutshell: std::unordered_map is cache-unfriendly because it uses linked lists as buckets.

这个问题提到了一些有效的哈希映射实现 - 改用其中一个.

This SO question mentioned some efficient hash map implementations - use one of those instead.

相关文章