list::size() 真的是 O(n) 吗?

2021-12-06 00:00:00 list complexity-theory big-o c++ stl

最近，我注意到有人提到 std::list::size() 具有线性复杂度.
根据 some sources，这实际上取决于实现，因为标准没有说明复杂性是什么.
在此博客条目中的评论说:

Recently, I noticed some people mentioning that std::list::size() has a linear complexity.
According to some sources, this is in fact implementation dependent as the standard doesn't say what the complexity has to be.
The comment in this blog entry says:

其实要看你是哪个STL正在使用.微软 Visual Studio V6将 size() 实现为 {return (_Size);而gcc(至少在版本中3.3.2 和 4.1.0) 这样做 { return std::distance(begin(), end());} 这第一个有恒速，第二个有 o(N) 速度

Actually, it depends on which STL you are using. Microsoft Visual Studio V6 implements size() as {return (_Size); } whereas gcc (at least in versions 3.3.2 and 4.1.0) do it as { return std::distance(begin(), end()); } The first has constant speed, the second has o(N) speed

所以我的猜测是对于 VC++ 人群来说 size() 与 Dinkumware 一样具有恒定的复杂性自 VC6 以来可能不会改变这一事实.我在吗?
它目前在 gcc 中是什么样子的?如果它真的是 O(n)，为什么开发者选择这样做?

So my guess is that for the VC++ crowd size() has constant complexity as Dinkumware probably won't have changed that fact since VC6. Am I right there?

What does it look like currently in gcc? If it is really O(n), why did the developers choose to do so?

推荐答案

Pre-C++11 答案
您是正确的，标准没有说明 list::size() 的复杂性必须是什么 - 但是，它确实建议它应该具有恒定的复杂性"(表中的注释 A65).

Pre-C++11 answer

You are correct that the standard does not state what the complexity of list::size() must be - however, it does recommend that it "should have constant complexity" (Note A in Table 65).

这是 Howard Hinnant 的一篇有趣的文章，解释了为什么有些人认为 list::size() 应该有 O(N) 复杂度(主要是因为他们相信 O(1) list::size() 使得 list::splice() 具有 O(N) 复杂度)以及为什么 O(1) list::size() 是一个好主意(在作者看来):

Here's an interesting article by Howard Hinnant that explains why some people think list::size() should have O(N) complexity (basically because they believe that O(1) list::size() makes list::splice() have O(N) complexity) and why an O(1) list::size() is be a good idea (in the author's opinion):

http://howardhinnant.github.io/On_list_size.html

我认为论文的主要观点是:

I think the main points in the paper are:

很少有维护内部计数的情况，所以 list::size() 可以是 O(1) 导致拼接操作变成线性
可能还有更多情况，某人可能不知道可能发生的负面影响，因为他们调用了 O(N) size()(例如他的一个示例，其中 list::size() 在持有锁时被调用).
不是允许 size() 是 O(N)，为了最少惊喜"，标准应该要求任何实现 size() 的容器以 O(1) 的方式实现它.如果容器不能做到这一点，它根本不应该实现 size().在这种情况下，容器的用户将知道 size() 不可用，如果他们仍然想要或需要获取容器中元素的数量，他们仍然可以使用 container::distance( begin(), end()) 来获取该值 - 但他们会完全意识到这是一个 O(N) 操作.

there are few situations where maintaining an internal count so list::size() can be O(1) causes the splice operation to become linear

there are probably many more situations where someone might be unaware of the negative effects that might happen because they call an O(N) size() (such as his one example where list::size() is called while holding a lock).

that instead of permitting size() be O(N), in the interest of 'least surprise', the standard should require any container that implements size() to implement it in an O(1) fashion. If a container cannot do this, it should not implement size() at all. In this case, the user of the container will be made aware that size() is unavailable, and if they still want or need to get the number of elements in the container they can still use container::distance( begin(), end()) to get that value - but they will be completely aware that it's an O(N) operation.

我想我倾向于同意他的大部分推理.但是，我不喜欢他提议的对 splice() 重载的添加.必须传入必须等于 distance( first, last) 的 n 才能获得正确的行为，这似乎是难以诊断错误的秘诀.

I think I tend to agree with most of his reasoning. However, I do not like his proposed addition to the splice() overloads. Having to pass in an n that must be equal to distance( first, last) to get correct behavior seems like a recipe for hard to diagnose bugs.

我不确定应该或可以做什么，因为任何更改都会对现有代码产生重大影响.但就目前而言，我认为现有代码已经受到影响 - 对于本应明确定义的某些内容，从一种实现到另一种实现的行为可能会有相当大的不同.也许一个人关于将大小缓存"并标记为已知/未知的评论可能会很好用 - 你会得到 O(1) 行为的摊销 - 你唯一一次得到 O(N) 行为是当列表被一些 splice() 操作修改时.这样做的好处是它可以由今天的实现者完成，而无需更改标准(除非我遗漏了一些东西).

I'm not sure what should or could be done moving forward, as any change would have a significant impact on existing code. But as it stands, I think that existing code is already impacted - behavior might be rather significantly different from one implementation to another for something that should have been well-defined. Maybe onebyone's comment about having the size 'cached' and marked known/unknown might work well - you get amortized O(1) behavior - the only time you get O(N) behavior is when the list is modified by some splice() operations. The nice thing about this is that it can be done by implementors today without a change to the standard (unless I'm missing something).

~~据我所知，C++0x 在这方面没有任何改变.~~

相关文章