是否允许 std::string 的 end+1 迭代器?
为 std::string
创建 end(str)+1
的迭代器是否有效?
如果不是,为什么不是?
这个问题仅限于 C++11 及更高版本,因为在 C++11 之前,数据已经存储在一个连续块中 以这种方式存储.
我认为这可能会有所不同.
std::string
与我推测的任何其他标准容器之间的显着区别在于,它总是包含比其 size
多一个元素,即零终止符,满足.c_str()
的要求.
21.4.7.1 basic_string 访问器[string.accessors]
const charT* c_str() const noexcept;const charT* data() const noexcept;
1 返回:一个指针 p
使得 p + i == &operator[](i)
对于 i
中的每个 i
代码>[0,size()].
2 复杂性:恒定时间.
3 要求:程序不得更改字符数组中存储的任何值.
尽管如此,即使它应该恕我直言,保证所述表达式是有效的,为了与零终止字符串的一致性和互操作性,如果没有别的,我发现的唯一一段对此表示怀疑:
<块引用>21.4.1 basic_string 一般要求[string.require]
4 basic_string
对象中的类字符对象应连续存储.也就是说,对于任何 basic_string
对象 s
,标识 &*(s.begin() + n) == &*s.begin()+ n
应适用于 n
的所有值,使得 0 <= n <s.size()
.
(所有引用均来自 C++14 最终草案 (n3936).)
相关:合法覆盖std::string的空终止符?
解决方案TL;DR: s.end() + 1
是未定义的行为.
std::string
是个奇怪的野兽,主要是历史原因:
- 它试图带来 C 兼容性,其中已知存在一个额外的
字符超出
strlen
报告的长度. - 它采用基于索引的界面设计.
- 事后想到,在标准库中与其他 STL 代码合并时,添加了一个基于迭代器的接口.
这导致 std::string
在 C++03 中编号为 103 个成员函数,之后又增加了一些.
因此,应该预料到不同方法之间的差异.
已经在基于索引的界面中出现了差异:
<块引用>§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
引用运算符[](size_type pos);
1/ 要求: pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ 抛出: out_of_range
if pos >= size()
是的,你没看错,s[s.size()]
返回一个对 NUL 字符的引用,而 s.at(s.size())
抛出 out_of_range
异常.如果有人告诉您将 operator[]
的所有用法替换为 at
因为它们更安全,请注意 string
陷阱...
那么,迭代器呢?
<块引用>§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ 返回: 一个迭代器,它是 past-the-end 值.
非常平淡.
所以我们必须参考其他段落.指针由
提供<块引用>§21.4 [basic.string]
3/ basic_string
支持的迭代器是随机访问迭代器 (24.2.7).
而 §17.6 [requirements] 似乎没有任何相关内容.因此,字符串迭代器只是普通的旧迭代器(您可能会感觉到这是怎么回事......但是既然我们已经走到了这一步,让我们一路走下去).
这导致我们:
<块引用>24.2.1 [iterator.requirements.general]
5/ 正如指向数组的常规指针保证有一个指针值指向数组的最后一个元素,所以对于任何迭代器类型,都有一个迭代器值指向对应序列的最后一个元素.这些值称为过去的值.定义了表达式 *i
的迭代器 i
的值称为可解引用.该库从不假定 past-the-end 值是可取消引用的.[...]
所以,*s.end()
格式不正确.
24.2.3 [input.iterators]
2/ 表 107 -- 输入迭代器要求(除了迭代器)
列出 ++r
和 r++
的前提条件,即 r
可以取消引用.
Forward 迭代器、Bidirectional 迭代器和 Random 迭代器都没有解除这个限制(并且都表明它们继承了其前身的限制).
此外,为了完整起见,在 24.2.7 [random.access.iterators] 中,表 111 -- 随机访问迭代器要求(除了双向迭代器) 列出以下操作语义:
r += n
等价于 [inc|dec]rememtingr
n
次a + n
和n + a
相当于复制a
然后应用+= n
到副本
-= n
和 -n
也是如此.
因此 s.end() + 1
是未定义的行为.
Is it valid to create an iterator to end(str)+1
for std::string
?
And if it isn't, why isn't it?
This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.
The significant difference between std::string
and any other standard container I speculate on is that it always contains one element more than its size
, the zero-terminator, to fulfill the requirements of .c_str()
.
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept; const charT* data() const noexcept;
1 Returns: A pointer
p
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.
Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:
21.4.1 basic_string general requirements [string.require]
4 The char-like objects in a
basic_string
object shall be stored contiguously. That is, for anybasic_string
objects
, the identity&*(s.begin() + n) == &*s.begin() + n
shall hold for all values ofn
such that0 <= n < s.size()
.
(All quotes are from C++14 final draft (n3936).)
Related: Legal to overwrite std::string's null terminator?
解决方案TL;DR: s.end() + 1
is undefined behavior.
std::string
is a strange beast, mainly for historical reasons:
- It attempts to bring C compatibility, where it is known that an additional
character exists beyond the length reported by
strlen
. - It was designed with an index-based interface.
- As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.
This led std::string
, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires:
pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws:
out_of_range
ifpos >= size()
Yes, you read this right, s[s.size()]
returns a reference to a NUL character while s.at(s.size())
throws an out_of_range
exception. If anyone tells you to replace all uses of operator[]
by at
because they are safer, beware the string
trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by
basic_string
are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator
i
for which the expression*i
is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end()
is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r
and r++
that r
be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n
is equivalent to [inc|dec]rememtingr
n
timesa + n
andn + a
are equivalent to copyinga
and then applying+= n
to the copy
and similarly for -= n
and - n
.
Thus s.end() + 1
is undefined behavior.
相关文章