C++ std::unordered_map 中使用的默认哈希函数是什么?

2021-12-27 00:00:00 hash c++ c++11 stl unordered-map

我正在使用

unordered_map<string, int>

和

unordered_map<int, int>

每种情况下使用什么哈希函数，每种情况下发生碰撞的几率是多少?我将分别在每种情况下插入唯一字符串和唯一 int 作为键.

What hash function is used in each case and what is chance of collision in each case? I will be inserting unique string and unique int as keys in each case respectively.

我有兴趣了解字符串和 int 键的哈希函数算法及其碰撞统计.

I am interested in knowing the algorithm of hash function in case of string and int keys and their collision stats.

推荐答案

函数对象 std::使用哈希<>.

所有内置类型和一些其他标准库类型都存在标准特化例如 std::string 和 std::thread.查看完整列表的链接.

Standard specializations exist for all built-in types, and some other standard library types such as std::string and std::thread. See the link for the full list.

对于要在 std::unordered_map 中使用的其他类型，您必须专门化 std::hash<> 或创建您自己的函数对象.

For other types to be used in a std::unordered_map, you will have to specialize std::hash<> or create your own function object.

冲突的可能性完全取决于实现，但考虑到整数限制在定义的范围内这一事实，而字符串理论上是无限长的，我认为与字符串发生冲突的可能性要大得多.

The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.

至于在 GCC 中的实现，内置类型的特化只返回位模式.以下是它们在 bits/functional_hash.h 中的定义:

As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h:

/// Partial specializations for pointer types. template<typename _Tp> struct hash<_Tp*> : public __hash_base<size_t, _Tp*> { size_t operator()(_Tp* __p) const noexcept { return reinterpret_cast<size_t>(__p); } }; // Explicit specializations for integer types. #define _Cxx_hashtable_define_trivial_hash(_Tp) template<> struct hash<_Tp> : public __hash_base<size_t, _Tp> { size_t operator()(_Tp __val) const noexcept { return static_cast<size_t>(__val); } }; /// Explicit specialization for bool. _Cxx_hashtable_define_trivial_hash(bool) /// Explicit specialization for char. _Cxx_hashtable_define_trivial_hash(char) /// ...

<小时>
std::string 的特化定义为:

#ifndef _GLIBCXX_COMPATIBILITY_CXX0X /// std::hash specialization for string. template<> struct hash<string> : public __hash_base<size_t, string> { size_t operator()(const string& __s) const noexcept { return std::_Hash_impl::hash(__s.data(), __s.length()); } };

一些进一步的搜索导致我们:

Some further search leads us to:

struct _Hash_impl { static size_t hash(const void* __ptr, size_t __clength, size_t __seed = static_cast<size_t>(0xc70f6907UL)) { return _Hash_bytes(__ptr, __clength, __seed); } ... }; ... // Hash function implementation for the nontrivial specialization. // All of them are based on a primitive that hashes a pointer to a // byte array. The actual hash algorithm is not guaranteed to stay // the same from release to release -- it may be updated or tuned to // improve hash quality or speed. size_t _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

_Hash_bytes 是 libstdc++ 的外部函数.更多的搜索让我找到这个文件，其中说明:

_Hash_bytes is an external function from libstdc++. A bit more searching led me to this file, which states:

// This file defines Hash_bytes, a primitive used for defining hash // functions. Based on public domain MurmurHashUnaligned2, by Austin // Appleby. http://murmurhash.googlepages.com/

所以 GCC 对字符串使用的默认散列算法是 MurmurHashUnaligned2.

So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.

相关文章