使用 C++ 的最近最少使用缓存

2021-12-22 00:00:00 algorithm data-structures lru c++

我正在尝试使用 C++ 实现 LRU 缓存.我想知道实现它们的最佳设计是什么.我知道 LRU 应该提供 find(),添加一个元素并删除一个元素.remove 应该删除 LRU 元素.什么是最好的 ADT 来实现这个例如:如果我使用一个元素作为值和时间计数器作为键的映射,我可以在 O(logn) 时间内搜索,插入是 O(n),删除是 O(logn).

I am trying to implement LRU Cache using C++ . I would like to know what is the best design for implementing them. I know LRU should provide find(), add an element and remove an element. The remove should remove the LRU element. what is the best ADTs to implement this For ex: If I use a map with element as value and time counter as key I can search in O(logn) time, Inserting is O(n), deleting is O(logn).

推荐答案

LRU 缓存的一个主要问题是很少有const"操作,大多数会改变底层表示(如果仅仅是因为它们碰撞了访问的元素).

One major issue with LRU caches is that there is little "const" operations, most will change the underlying representation (if only because they bump the element accessed).

这当然很不方便,因为这意味着它不是传统的 STL 容器,因此任何展示迭代器的想法都非常复杂:当迭代器被取消引用时,这是一个访问,它应该修改我们正在迭代的列表......哦,天哪.

This is of course very inconvenient, because it means it's not a traditional STL container, and therefore any idea of exhibiting iterators is quite complicated: when the iterator is dereferenced this is an access, which should modify the list we are iterating on... oh my.

还有性能方面的考虑,包括速度和内存消耗.

And there are the performances consideration, both in term of speed and memory consumption.

不幸的是,您需要某种方式在队列 (LRU) 中组织数据(可以从中间删除元素),这意味着您的元素必须彼此独立.当然,std::list 很适合,但它超出了您的需要.单向链表在这里就足够了,因为您不需要向后迭代列表(毕竟您只需要一个队列).

It is unfortunate, but you'll need some way to organize your data in a queue (LRU) (with the possibility to remove elements from the middle) and this means your elements will have to be independant from one another. A std::list fits, of course, but it's more than you need. A singly-linked list is sufficient here, since you don't need to iterate the list backward (you just want a queue, after all).

然而,它们的一个主要缺点是它们的参考位置较差,如果您需要更高的速度,则需要为节点提供自己的自定义(池?)分配器,以便它们尽可能靠近.这也将在一定程度上缓解堆碎片.

However one major drawback of those is their poor locality of reference, if you need more speed you'll need to provide your own custom (pool ?) allocator for the nodes, so that they are kept as close together as possible. This will also alleviate heap fragmentation somewhat.

接下来,您显然需要一个索引结构(用于缓存位).最自然的就是转向哈希图.std::tr1::unordered_mapstd::unordered_mapboost::unordered_map 通常是高质量的实现,有些应该可供您使用.他们还为哈希冲突处理分配了额外的节点,您可能更喜欢其他类型的哈希映射,请查看 维基百科的文章 关于这个主题并阅读各种实现技术的特点.

Next, you obviously need an index structure (for the cache bit). The most natural is to turn toward a hash map. std::tr1::unordered_map, std::unordered_map or boost::unordered_map are normally good quality implementation, some should be available to you. They also allocate extra nodes for hash collision handling, you might prefer other kinds of hash maps, check out Wikipedia's article on the subject and read about the characteristics of the various implementation technics.

继续,有(明显的)线程支持.如果您不需要线程支持,那很好,但是如果您需要,那就有点复杂了:

Continuing, there is the (obvious) threading support. If you don't need thread support, then it's fine, if you do however, it's a bit more complicated:

  • 正如我所说,在这样的结构上几乎没有const 操作,因此你真的不需要区分读/写访问
  • 内部锁定很好,但您可能会发现它不适合您的使用.内部锁定的问题在于它不支持事务"的概念,因为它在每次调用之间放弃锁定.如果是这种情况,请将您的对象转换为互斥锁并提供 std::unique_ptr;lock() 方法(在调试中,您可以断言在每个方法的入口点获取锁)
  • (在锁定策略中)存在重入问题,即能够从同一线程内重新锁定"互斥锁,请查看 Boost.Thread 以获取有关各种可用锁和互斥锁的更多信息
  • As I said, there is little const operation on such a structure, thus you don't really need to differentiate Read/Write accesses
  • Internal locking is fine, but you might find that it doesn't play nice with your uses. The issue with internal locking is that it doesn't support the concept of "transaction" since it relinquish the lock between each call. If this is your case, transform your object into a mutex and provide a std::unique_ptr<Lock> lock() method (in debug, you can assert than the lock is taken at the entry point of each method)
  • There is (in locking strategies) the issue of reentrance, ie the ability to "relock" the mutex from within the same thread, check Boost.Thread for more information about the various locks and mutexes available

最后是错误报告的问题.由于预计缓存可能无法检索您放入的数据,因此我会考虑使用异常不良品味".考虑指针 (Value*) 或 Boost.Optional(boost::optional).我更喜欢 Boost.Optional 因为它的语义很明确.

Finally, there is the issue of error reporting. Since it is expected that a cache may not be able to retrieve the data you put in, I would consider using an exception "poor taste". Consider either pointers (Value*) or Boost.Optional (boost::optional<Value&>). I would prefer Boost.Optional because its semantic is clear.

相关文章