std::vector 插入的摊销分析
我们如何对 std::vector 中的后面(push_back)插入进行分析?每次插入的摊销时间为 O(1).特别是在 Stephan T Lavavej 的 channel9 视频 和 在此(从 17:42 开始) 他说,为了获得最佳性能,Microsoft 实施此方法将向量的容量增加了大约 1.5.
How do we do the analysis of insertion at the back (push_back) in a std::vector? It's amortized time is O(1) per insertion. In particular in a video in channel9 by Stephan T Lavavej and in this ( 17:42 onwards ) he says that for optimal performance Microsoft's implementation of this method increases capacity of the vector by around 1.5.
这个常数是如何确定的?
How is this constant determined?
推荐答案
假设你的意思是 push_back
而不是插入,我相信重要的部分是乘以某个常数(而不是抓住 N每次都有更多元素),只要你这样做,你就会得到摊销固定时间.更改因子会改变平均情况和最坏情况的性能.
Assuming you mean push_back
and not insertion, I believe that the important part is the multiply by some constant (as opposed to grabbing N more elements each time) and as long as you do this you'll get amortized constant time. Changing the factor changes the average case and worst case performance.
具体来说:如果您的常数因子太大,您将获得良好的平均情况性能,但最坏情况下的性能会很差,尤其是当数组变大时.例如,想象一下将一个 10000 大小的向量加倍 (2x) 只是因为您推送了第 10001 个元素.正如 Michael Burr 间接指出的那样,这里的实际成本可能是您将内存增长得比您需要的大得多.我想补充一点,如果您的因素太大,缓存问题会影响速度.可以说,如果您增长得比您需要的大得多,就会产生实际成本(内存和计算).
Concretely: If your constant factor is too large, you'll have good average case performance, but bad worst case performance especially as the arrays get big. For instance, imagine doubling (2x) a 10000 size vector just because you have the 10001th element pushed. As Michael Burr indirectly pointed out, the real cost here is probably that you'll grow your memory much larger than you need it to be. I would add to this that there are cache issues that affect speed if your factor is too large. Suffice it to say that there are real costs (memory and computation) if you grow much larger than you need.
但是,如果您的常数因子太小,例如 (1.1x),那么您的最坏情况性能将很好,但平均性能很差,因为您将不得不承担重新分配太多次的成本.
However if your constant factor is too small, say (1.1x) then you're going to have good worst case performance, but bad average performance, because you're going to have to incur the cost of reallocating too many times.
另外,请参阅 Jon Skeet 对之前有一个类似的问题.(感谢@Bo Persson)
Also, see Jon Skeet's answer to a similar question previously. (Thanks @Bo Persson)
关于分析的更多信息:假设您有 n
个要推回的项目和 M
的乘法因子.那么重新分配的次数将大致为 n
(log_M(n)
) 的对数基数 M
.而第 i
次重新分配的成本将与 M^i
成正比(M
的 i
次幂).那么所有回推的总时间将是M^1 + M^2 + ... M^(log_M(n))
.回推的次数是n
,因此你得到这个级数(这是一个几何级数,在极限中大约减少到(nM)/(M-1)
) 除以 n
.这大致是一个常数,M/(M-1)
.
A little more about the analysis: Say you have n
items you are pushing back and a multiplication factor of M
. Then the number of reallocations will be roughly log base M
of n
(log_M(n)
). And the i
th reallocation will cost proportional to M^i
(M
to the i
th power). Then the total time of all the pushbacks will be M^1 + M^2 + ... M^(log_M(n))
. The number of pushbacks is n
, and thus you get this series (which is a geometric series, and reduces to roughly (nM)/(M-1)
in the limit) divided by n
. This is roughly a constant, M/(M-1)
.
对于较大的 M
值,您会超调很多并分配比合理经常需要的更多(我在上面提到过).对于 M
的小值(接近 1),这个常数 M/(M-1)
会变大.这个因素直接影响平均时间.
For large values of M
you will overshoot a lot and allocate much more than you need reasonably often (which I mentioned above). For small values of M
(close to 1) this constant M/(M-1)
becomes large. This factor directly affects the average time.
相关文章