什么容器来存储唯一值?

2022-01-24 00:00:00 containers memory-management c++

我遇到了以下问题.我有一个平均每秒运行 60 帧的游戏.每一帧我都需要将值存储在一个容器中,并且不能有重复项.

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.

它可能每帧必须存储少于 100 个项目,但插入调用的数量会更多(并且由于它必须是唯一的而许多被拒绝).只有在帧的末尾我才需要遍历容器.因此,每帧大约有 60 次容器迭代,但插入次数更多.

It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.

记住要存储的项目是简单的整数.

Keep in mind the items to store are simple integer.

我可以使用很多容器,但我无法决定选择什么.性能是其中的关键问题.

There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.

我收集的一些优点/缺点:

Some pros/cons that I've gathered:

矢量

  • (PRO):连续内存,一个巨大的因素.
  • (PRO):可以先保留内存,之后很少分配/释放
  • (CON):除了遍历容器(std::find)每个 insert() 来查找唯一键之外,别无选择?比较很简单(整数),整个容器可能适合缓存

设置

  • (PRO):简单,明确的目的
  • (CON):插入时间不固定
  • (CON):每帧有很多分配/解除分配
  • (CON):非连续内存.遍历一组数百个对象意味着在内存中跳来跳去.

unordered_set

  • (PRO):简单,明确的目的
  • (PRO):平均大小写常数时间插入
  • (CON):当我存储整数时,哈希操作可能比其他任何操作都贵很多
  • (CON):每帧有很多分配/解除分配
  • (CON):非连续内存.遍历一组数百个对象意味着在内存中跳来跳去.

由于内存访问模式,我倾向于使用向量路由,即使 set 显然是针对这个问题的.我不清楚的一个大问题是遍历每个插入的向量是否比分配/解除分配(特别是考虑到必须执行的频率)和 set 的内存查找成本更高.

I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.

我知道归根结底这一切都归结为对每个案例进行分析,但如果只是作为先发制人或只是理论上,那么在这种情况下什么可能是最好的?是否还有我可能遗漏的优点/缺点?

I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?

正如我没有提到的,容器在每一帧结束时被清除()

As I didnt mention, the container is cleared() at the end of each frame

推荐答案

这里我要顶一下块了,建议向量路由在大小为 100 且存储的对象为整数值.这样做的简单原因是 set 和 unordered_set 为每个插入分配内存,而向量不需要超过一次.

I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.

您可以通过保持向量有序来显着提高搜索性能,因为这样所有搜索都可以是二分搜索,因此在 log2N 时间内完成.

You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.

不利的一面是,由于内存移动,插入将花费一小部分时间,但听起来好像搜索比插入多得多,并且移动(平均)50 个连续的内存字几乎是瞬时操作.

The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.

最后一句话:现在写出正确的逻辑.当用户抱怨时担心性能.

Final word: Write the correct logic now. Worry about performance when the users are complaining.

因为我忍不住,这里有一个相当完整的实现:

Because I couldn't help myself, here's a reasonably complete implementation:

template<typename T>
struct vector_set
{
    using vec_type = std::vector<T>;
    using const_iterator = typename vec_type::const_iterator;
    using iterator = typename vec_type::iterator;

    vector_set(size_t max_size)
    : _max_size { max_size }
    {
        _v.reserve(_max_size);
    }

    /// @returns: pair of iterator, bool
    /// If the value has been inserted, the bool will be true
    /// the iterator will point to the value, or end if it wasn't
    /// inserted due to space exhaustion
    auto insert(const T& elem)
    -> std::pair<iterator, bool>
    {
        if (_v.size() < _max_size) {
            auto it = std::lower_bound(_v.begin(), _v.end(), elem);
            if (_v.end() == it || *it != elem) {
                return make_pair(_v.insert(it, elem), true);
            }
            return make_pair(it, false);
        }
        else {
            return make_pair(_v.end(), false);
        }
    }

    auto find(const T& elem) const
    -> const_iterator
    {
        auto vend = _v.end();
        auto it = std::lower_bound(_v.begin(), vend, elem);
        if (it != vend && *it != elem)
            it = vend;
        return it;
    }

    bool contains(const T& elem) const {
        return find(elem) != _v.end();
    }

    const_iterator begin() const {
        return _v.begin();
    }

    const_iterator end() const {
        return _v.end();
    }


private:
    vec_type _v;
    size_t _max_size;
};

using namespace std;


BOOST_AUTO_TEST_CASE(play_unique_vector)
{
    vector_set<int> v(100);

    for (size_t i = 0 ; i < 1000000 ; ++i) {
        v.insert(int(random() % 200));
    }

    cout << "unique integers:" << endl;
    copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
    cout << endl;

    cout << "contains 100: " << v.contains(100) << endl;
    cout << "contains 101: " << v.contains(101) << endl;
    cout << "contains 102: " << v.contains(102) << endl;
    cout << "contains 103: " << v.contains(103) << endl;
}

相关文章