你什么时候使用 std::unordered_map::emplace_hint ?

2022-01-07 00:00:00 c++ c++11 stl unordered-map

我知道如何使用 std::unordered_map::emplace,但我如何使用 emplace_hint?cplusplus 和 cppreference 提供了一组示例来说明我们如何知道将元素放在哪里.

谁能提供一些关于此的信息或提供一些示例/插图,说明我们何时可能知道放置的元素应该去哪里?

解决方案

unordered_map 可以用提示做什么?好吧,如果迭代器使用与 emplace_hint 被要求插入的元素相同的键来寻址一个元素,那么它可能会很快失败――只是一个键比较,没有任何散列或摸索任何散列列表- 在那个桶上碰撞元素.但是,如果密钥不匹配,则该提示在其他情况下是无用的,因为任何其他密钥――无论如何关闭"都无法使用.价值 - 应该(概率上)在一个完全不相关的桶(考虑到通常被认为是好的"哈希函数),所以时间会浪费在关键比较上,只是不得不重新开始,好像它是一个正常的<代码>就位.

当您插入按键预先排序的元素时,这可能很有用,旨在删除过程中的大量重复项,但键是如此之大,因此将迭代器保留到刚刚插入的元素上比将迭代器更容易密钥的副本,或者哈希函数可能特别慢.

unordered_map::emplace_hint 的另一个好处是与 map::emplace_hint 更好的 API 兼容性,因此代码可以切换容器类型并具有 emplace_hint 不会破坏编译,尽管它们最终可能比将代码切换到 emplace() 作为帮助 map<的关闭但不同的键提示要慢/code> 对于 unordered_map 可能没用.


只需使用 GCC 10.2 g++ -E 输出,看看它是否执行上述操作.emplace_hint 向下调用 _M_insert_multi_node(...) 其中有这一行:

__node_base* __prev = __builtin_expect(__hint != nullptr, false)&&this->_M_equals(__k, __code, __hint)?__暗示: _M_find_before_node(__bkt, __k, __code);

上面,__k是可以插入的key,__code是哈希码,__hint是提示迭代器/指针;_M_equals(...) 返回:

return _Equal_hash_code<__node_type>::_S_equals(__c, *__n) &&_M_eq()(__k, this->_M_extract()(__n->_M_v()));

因此,在使用之前,它会检查散列码是否相等(如果您已经计算了散列,则进行快速而肮脏的检查)和键是否相等(可能较慢的操作,例如长字符串的质量散列)提示迭代器.这是它使用提示的唯一情况.想象一下,存储桶在逻辑上有一些碰撞元素与键 K1、K2、K3、K4 链接在一起,并且您的提示迭代器指向 K4,但您正尝试使用 K2 插入一个副本:因为迭代器仅向前em>,您必须使用 _M_find_before_node(...) 在链中比您的提示指向的更早到达碰撞元素.在 _M_find_before_node(...) 之后,您可以从 K1 向前扫描以查看要插入的键 - K2 - 是否已经存在于在桶中发生碰撞的元素中.

(当已知键比较便宜时,可以通过跳过散列比较来改进实现,但是使用类型特征正确地获得该条件会有点痛苦 - 你怎么知道哪些键相等函数是便宜的? 对于小型、标准布局、可简单复制的类型或类似类型,至少当使用默认的 std::equals<> 比较实例化无序容器时,可以假设如此.....>

I know how to use std::unordered_map::emplace, but how do I use emplace_hint? Neither cplusplus nor cppreference provide a set of examples that illustrate how we might know where to put the element.

Can anyone provide some information on this or give some examples / illustrations of when we might know where the emplaced element should go?

解决方案

What could an unordered_map potentially do with the hint? Well, if the iterator addresses an element with the same key as the element that emplace_hint has been asked to insert, then it can fail quickly - just a key comparison without any hashing or groping through any list of hash-colliding elements at that bucket. But if the key doesn't match, then the hint is otherwise useless because any other key - no matter how "close" in value - should (probabilistically) be at a completely unrelated bucket (given what's normally considered a "good" hash function), so time would have been wasted on a key comparison only to have to start over as if it were a normal emplace.

This might be useful when you're inserting pre-sorted-by-key elements, aiming to remove lots of duplicates in the process, but the key is so huge it's easier to keep an iterator to the just-inserted element than a copy of the key, or perhaps the hash function is particularly slow.

Another benefit of unordered_map::emplace_hint is better API compatibility with map::emplace_hint, so code can switch the container type and have the emplace_hints not break the compile, though they might end up slower than if the code were switched to emplace() as the close-but-different-key hints that help with a map may be useless with an unordered_map.


Just taking GCC 10.2 g++ -E output to see if it does what's described above. emplace_hint calls down into _M_insert_multi_node(...) wherein there's this line:

__node_base* __prev = __builtin_expect(__hint != nullptr, false)
                      && this->_M_equals(__k, __code, __hint)
                      ? __hint
                      : _M_find_before_node(__bkt, __k, __code);

Above, __k is the key that may be inserted, __code is the hash code, __hint is the hint iterator/pointer; _M_equals(...) returns:

return _Equal_hash_code<__node_type>::_S_equals(__c, *__n) &&
       _M_eq()(__k, this->_M_extract()(__n->_M_v()));

So, it's checking that the hash codes are equal (a quick and dirty check if you've already calculated the hashes) and the keys are equal (a potentially slower operation for e.g. a quality hash of long strings) before using the hint iterator. That's the only case in which it uses the hint. Imagine the bucket logically has some colliding elements chained off it with keys K1, K2, K3, K4 and your hint iterator is to K4 but you're trying to insert a duplicate with K2: as the iterators are forward only, you have to use _M_find_before_node(...) to reach colliding elements earlier in the chain than your hint points to. After the _M_find_before_node(...) you can scan from K1 forwards to see if the key to insert - K2 - is already present in the elements that collided at the bucket.

(The implementation could be improved by skipping the hash comparison when the key comparison is known to be cheap, but getting that condition right with type traits would be a bit of a pain - how do you know which key equality functions are cheap? Could assume so for small, standard layout, trivially copyable types or similar, at least when the unordered container is instantiated with the default std::equals<> comparison....).

相关文章