GNU STL 字符串:这里涉及到写时复制吗?
(免责声明:我不知道 C++ 标准对此会说什么……我知道,我很糟糕)
(Disclaimer: I don't know what the C++ standard might say about this..I know, I'm horrible)
在处理非常大的字符串时,我注意到 std::string 正在使用写时复制.我设法编写了最小的循环来重现观察到的行为,例如,下面的循环运行得非常快:
while operating on very large strings I noticed that std::string is using copy-on-write. I managed to write the smallest loop that would reproduce the observed behaviour and the following one, for instance, runs suspiciously fast:
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
}
}
在循环体 a_copy[1] = 'B';
中添加写入时,显然发生了实际复制,并且程序在 0.3 秒内运行,而不是几毫秒.100 次写入使其速度减慢了大约 100 倍.
when adding a write in the loop body a_copy[1] = 'B';
, an actual copy apparently took place, and the program ran in 0.3s instead of a few milliseconds. 100 writes slowed it down by about 100 times.
但后来变得很奇怪.我的一些字符串没有写入,只是读取,这没有反映在执行时间上,这几乎与字符串上的操作数量成正比.经过一番挖掘,我发现简单地从字符串中读取仍然会给我带来性能损失,因此我假设 GNU STL 字符串正在使用读取时复制 (?).
But then it got weird. Some of my strings weren't written to, only read from, and this was not reflected in the execution time, which was almost exactly proportional to the number of operations on the strings. With some digging, I found that simply reading from a string still gave me that performance hit, so it led me to assume GNU STL strings are using copy-on-read (?).
#include <string>
using std::string;
int main(void) {
string basestr(1024 * 1024 * 10, 'A');
for (int i = 0; i < 100; i++) {
string a_copy = basestr;
a_copy[99]; // this also ran in 0.3s!
}
}
在陶醉于我的发现一段时间后,我发现从基本字符串中读取(使用 operator[])对于整个玩具程序也需要 0.3 秒.我对此不是 100% 满意.STL 字符串确实是读取时复制,还是它们根本允许写入时复制?我被引导认为 operator[] 有一些保护措施,可以防止保留它返回的引用并稍后写入它的人;真的是这样吗?如果不是,那么到底发生了什么?如果有人可以指出 C++ 标准中的某些相关部分,那也将不胜感激.
After revelling in my discovery for a while, I found out that reading (with operator[]) from the base string also takes 0.3s for the entire toy program..I'm not 100% comfortable with this. Are STL strings indeed copy-on-read, or are they allowing copy-on-write at all? I'm led to think that operator[] has some safeguards against one who would keep the reference it returns and later write to it; is this really the case? If not, what is really happening? If someone can point to some relevant section in the C++ standard, that'd also be appreciated.
作为参考,我使用的是 g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
和 GNU STL.
For reference, I'm using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
, and the GNU STL.
推荐答案
C++不区分operator[]
进行读写,只区分operator[]
用于 const 对象和可变(非 const)对象.由于 a_copy
是可变的,可变的 operator[]
将被选择,这将强制复制,因为该运算符返回(可变)引用.
C++ doesn't distinguish between the operator[]
for reading and writing, but only the operator[]
for const object and mutable (non-const) object. Since a_copy
is mutable, the mutable operator[]
will be chosen, which forces the copying because that operator returns a (mutable) reference.
如果关注效率,您可以将 a_copy
强制转换为 const string
以强制 const
版本的 operator[]
被使用,它不会复制内部缓冲区.
If efficiency is a concern, you could cast the a_copy
to a const string
to force the const
version of operator[]
to be used, which won't make a copy of the internal buffer.
char f = static_cast<const string>(a_copy)[99];
相关文章