所有 STL 容器的通用哈希函数

2021-12-27 00:00:00 hash c++ c++11 map stl

我在我的实现中使用了 std::unordered_map.我将使用任何 STL 容器作为密钥.我想知道是否可以为正在使用的任何容器创建通用哈希函数.

I'm using an std::unordered_map<key,value> in my implementation. i will be using any of the STL containers as the key. I was wondering if it is possible to create a generic hash function for any container being used.

SO 中的这个问题为所有 STL 容器提供通用打印功能.虽然你可以拥有它,但为什么你不能拥有定义一切的哈希函数之类的东西?是的,一个大问题还在于它需要快速高效.

This question in SO offers generic print function for all STL containers. While you can have that, why cant you have something like a Hash function that defines everything ? And yeah, a big concern is also that it needs to fast and efficient.

我正在考虑做一个简单的哈希函数,将键的值转换为 size_t 并做一个简单的函数,如 这个.

I was considering doing a simple hash function that converts the values of the key to a size_t and do a simple function like this.

这能做到吗?

PS:请不要使用 boost 库.谢谢.

PS : Please don't use boost libraries. Thanks.

推荐答案

我们可以通过模仿 Boost 和组合哈希来得到答案.

We can get an answer by mimicking Boost and combining hashes.

警告: 组合散列,即从事物的许多散列计算许多事物的散列,通常不是一个好主意,因为产生的散列函数在统计意义上并不好".许多事物的正确散列应该从所有成分的整个原始数据中构建,而不是从中间散列中构建.但目前还没有一个很好的标准方法来做到这一点.

Warning: Combining hashes, i.e. computing a hash of many things from many hashes of the things, is not a good idea generally, since the resulting hash function is not "good" in the statistical sense. A proper hash of many things should be build from the entire raw data of all the constituents, not from intermediate hashes. But there currently isn't a good standard way of doing this.

无论如何:

首先,我们需要 hash_combine 函数.由于我无法理解的原因,它没有包含在标准库中,但它是其他一切的核心:

First off, we need the hash_combine function. For reasons beyond my understanding it's not been included in the standard library, but it's the centrepiece for everything else:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

使用它,我们可以散列由可散列元素组成的所有内容,特别是对和元组(读者练习).

Using this, we can hash everything that's made up from hashable elements, in particular pairs and tuples (exercise for the reader).

但是,我们也可以使用它通过散列元素来散列容器.这正是 Boost 的范围散列"所做的,但使用 combine 函数自己制作它是直接的.

However, we can also use this to hash containers by hashing their elements. This is precisely what Boost's "range hash" does, but it's straight-forward to make that yourself by using the combine function.

完成范围哈希器的编写后,只需专门化 std::hash 即可:

Once you're done writing your range hasher, just specialize std::hash and you're good to go:

namespace std
{
  template <typename T, class Comp, class Alloc>
  struct hash<std::set<T, Comp, Alloc>>
  {
    inline std::size_t operator()(const std::set<T, Comp, Alloc> & s) const
    {
      return my_range_hash(s.begin(), s.end());
    }
  };

  /* ... ditto for other containers */
}

如果你想模仿漂亮的打印机,你甚至可以做一些更极端的事情,并为所有容器专门化 std::hash,但我可能会更加小心并做出明确的容器的哈希对象:

If you want to mimic the pretty printer, you could even do something more extreme and specialize std::hash for all containers, but I'd probably be more careful with that and make an explicit hash object for containers:

template <typename C> struct ContainerHasher
{
  typedef typename C::value_type value_type;
  inline size_t operator()(const C & c) const
  {
    size_t seed = 0;
    for (typename C::const_iterator it = c.begin(), end = c.end(); it != end; ++it)
    {
      hash_combine<value_type>(seed, *it);
    }
    return seed;
  }
};

用法:

std::unordered_map<std::set<int>, std::string, ContainerHasher<std::set<int>>> x;

相关文章