实现“最后[秒/分钟/小时]中的点击次数";数据结构

2021-12-22 00:00:00 algorithm data-structures c++

我认为这是一个相当普遍的问题,但我似乎无法通过谷歌搜索找到答案(也许我不知道的问题有更准确的名称?)

I think this is a fairly common question but I can't seem to find answer by googling around (maybe there's a more precise name for the problem I don't know?)

您需要使用用于报告命中的hit()"方法和 hitsInLastSecond|Minute|Hour 方法来实现结构.您有一个精度为纳秒的计时器.您如何有效地实现这一点?

You need to implement a structure with a "hit()" method used to report a hit and hitsInLastSecond|Minute|Hour methods. You have a timer with say nanosecond accuracy. How do you implement this efficiently?

我的想法是这样的(在伪 C++ 中)

My thought was something like this (in psuedo-C++)

class HitCounter {
  void hit() {
    hits_at[now()] = ++last_count;
  }

  int hitsInLastSecond() {
    auto before_count = hits_at.lower_bound(now() - 1 * second)
    if (before_count == hits_at.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  map<time_point, int> hits_at;
  int last_count = 0;
};

这行得通吗?好吗?有什么更好的吗?

Does this work? Is it good? Is something better?

更新:根据评论添加修剪并切换到双端队列:

Update: Added pruning and switched to a deque as per comments:

class HitCounter {
  void hit() {
    hits.push_back(make_pair(now(), ++last_count));
  }

  int hitsInLastSecond() {
    auto before = lower_bound(hits.begin(), hits.end(), make_pair(now() - 1 * second, -1));
    if (before == hits.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  void prune() {
    auto old = upper_bound(hits.begin(). hits.end(), make_pair(now - 1 * hour, -1));
    if (old != hits.end()) {
      hits.erase(hits.begin(), old)
    }
  }

  deqeue<pair<time_point, int>> hits;
  int last_count = 0;
};

推荐答案

您所描述的称为直方图.

What you are describing is called a histogram.

使用散列,如果你想要纳秒级的精度,会消耗掉你的大部分 CPU.您可能需要一个环形缓冲区来存储数据.

Using a hash, if you intend nanosecond accuracy, will eat up much of your cpu. You probably want a ring buffer for storing the data.

使用 std::chrono 来实现您需要的计时精度,但坦率地说,每秒点击次数似乎是您需要的最高粒度,如果您着眼于整体大局,它似乎并不重要精度是.

Use std::chrono to achieve the timing precision you require, but frankly hits per second seems like the highest granularity you need and if you are looking at the overall big picture, it doesn't seem like it will matter terribly what the precision is.

这是一个关于如何进行的部分介绍性示例:

This is a partial, introductory sample of how you might go about it:

#include <array>
#include <algorithm>

template<size_t RingSize>
class Histogram
{
    std::array<size_t, RingSize> m_ringBuffer;
    size_t m_total;
    size_t m_position;
public:
    Histogram() : m_total(0)
    {
        std::fill_n(m_ringBuffer.begin(), RingSize, 0);
    }

    void addHit()
    {
        ++m_ringBuffer[m_position];
        ++m_total;
    }

    void incrementPosition()
    {
        if (++m_position >= RingSize)
            m_position = 0;
        m_total -= m_ringBuffer[m_position];
        m_ringBuffer[m_position] = 0;
    }

    double runningAverage() const
    {
        return (double)m_total / (double)RingSize;
    }

    size_t runningTotal() const { return m_total; }
};

Histogram<60> secondsHisto;
Histogram<60> minutesHisto;
Histogram<24> hoursHisto;
Histogram<7> weeksHisto;

这是一个简单的实现,它假设您将每秒调用一次并增加位置,并且将在每个 RingSize 中将 runningTotal 从一个直方图转移到下一个直方图(因此每 60 秒,将 secondsHisto.runningTotal 添加到 minutesHisto).

This is a naive implementation which assumes you will call it every second and increment the position, and will transpose runningTotal from one histogram to the next every RingSize (so every 60s, add secondsHisto.runningTotal to minutesHisto).

希望这是一个有用的介绍性的地方.

Hopefully it will be a useful introductory place to start from.

如果你想跟踪更长的每秒点击次数直方图,你可以用这个模型来做到这一点,通过增加环大小,添加第二个总数来跟踪最后 N 个环缓冲区条目,这样 m_subTotal = sum(m_ringBuffer[m_position - N .. m_position]),类似于 m_total 的工作方式.

If you want to track a longer histogram of hits per second, you can do that with this model, by increasing the ring size, add a second total to track the last N ring buffer entries, so that m_subTotal = sum(m_ringBuffer[m_position - N .. m_position]), similar to the way m_total works.

size_t m_10sTotal;

...

void addHit()
{
    ++m_ringBuffer[m_position];
    ++m_total;
    ++m_10sTotal;
}

void incrementPosition()
{
    // subtract data from >10 sample intervals ago.
    m_10sTotal -= m_ringBuffer[(m_position + RingBufferSize - 10) % RingBufferSize];
    // for the naive total, do the subtraction after we
    // advance position, since it will coincide with the
    // location of the value RingBufferSize ago.
    if (++m_position >= RingBufferSize)
        m_position = 0;
    m_total -= m_ringBuffer[m_position];
}

您不必将组织图制作成这些大小,这只是一个简单的抓取模型.有多种选择,例如同时增加每个直方图:

You don't have to make the histo grams these sizes, this is simply a naive scraping model. There are various alternatives, such as incrementing each histogram at the same time:

secondsHisto.addHit();
minutesHisto.addHit();
hoursHisto.addHit();
weeksHisto.addHit();

每个都独立滚动,所以都有当前值.根据您希望该粒度的数据返回来调整每个组的大小.

Each rolls over independently, so all have current values. Size each histo as far as you want data at that granularity to go back.

相关文章