libc++ 中短字符串优化的机制是什么?

2021-12-05 00:00:00 string optimization c++ c++-standard-library libc++

这个答案对短字符串优化 (SSO) 进行了很好的高级概述.但是，我想更详细地了解它在实践中是如何工作的，特别是在 libc++ 实现中:

This answer gives a nice high-level overview of short string optimization (SSO). However, I would like to know in more detail how it works in practice, specifically in the libc++ implementation:

字符串必须有多短才能符合 SSO 的条件?这是否取决于目标架构?

How short does the string have to be in order to qualify for SSO? Does this depend on the target architecture?

实现上是如何区分short和long的访问字符串数据时的字符串?它是像 m_size <= 16 一样简单还是作为其他成员变量的一部分的标志?(一世想象一下 m_size 或它的一部分也可能用于存储字符串数据).

How does the implementation distinguish between short and long strings when accessing the string data? Is it as simple as m_size <= 16 or is it a flag that is part of some other member variable? (I imagine that m_size or part of it might also be used to store string data).

我专门针对 libc++ 提出了这个问题，因为我知道它使用 SSO，甚至在 libc++ 主页上也提到了这一点.

I asked this question specifically for libc++ because I know that it uses SSO, this is even mentioned on the libc++ home page.

以下是查看来源后的一些观察结果:

Here are some observations after looking at the source:

libc++ 可以为字符串类使用两种稍微不同的内存布局进行编译，这由 _LIBCPP_ALTERNATE_STRING_LAYOUT 标志控制.这两种布局还区分了小端和大端机器，这给我们留下了总共 4 种不同的变体.我将在下面假设正常"布局和小端.

libc++ can be compiled with two slightly different memory layouts for the string class, this is governed by the _LIBCPP_ALTERNATE_STRING_LAYOUT flag. Both of the layouts also distinguish between little-endian and big-endian machines which leaves us with a total of 4 different variants. I will assume the "normal" layout and little-endian in what follows.

进一步假设 size_type 是 4 个字节，value_type 是 1 个字节，这就是字符串的前 4 个字节在内存中的样子:

Assuming further that size_type is 4 bytes and that value_type is 1 byte, this is what the first 4 bytes of a string would look like in memory:

// short string: (s)ize and 3 bytes of char (d)ata sssssss0;dddddddd;dddddddd;dddddddd ^- is_long = 0 // long string: (c)apacity ccccccc1;cccccccc;cccccccc;cccccccc ^- is_long = 1

由于短字符串的大小在高7位，访问时需要移位:

Since the size of the short string is in the upper 7 bits, it needs to be shifted when accessing it:

size_type __get_short_size() const { return __r_.first().__s.__size_ >> 1; }

类似地，长字符串容量的 getter 和 setter 使用 __long_mask 来绕过 is_long 位.

Similarly, the getter and setter for the capacity of a long string uses __long_mask to work around the is_long bit.

我仍在寻找我的第一个问题的答案，即__min_cap，短字符串的容量，对于不同的架构有什么价值?

I am still looking for an answer to my first question, i.e. what value would __min_cap, the capacity of short strings, take for different architectures?

其他标准库实现

这个答案很好地概述了其他标准中的 std::string 内存布局库实现.

This answer gives a nice overview of std::string memory layouts in other standard library implementations.

推荐答案

libc++ basic_string 被设计为在所有架构上都有 sizeof 3 个字，其中 sizeof(word) == sizeof(void*).您已经正确剖析了多头/空头标志和短格式中的大小字段.

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

对于不同的架构，__min_cap(短字符串的容量)会取什么值?

what value would __min_cap, the capacity of short strings, take for different architectures?

在简短的形式中，有 3 个词可以使用:

In the short form, there are 3 words to work with:

1 位进入多头/空头标志.
大小为 7 位.
假设 char，1 个字节进入尾随空值(libc++ 将始终在数据后面存储尾随空值).

1 bit goes to the long/short flag.

7 bits goes to the size.

Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

这留下了 3 个字减去 2 个字节来存储一个短字符串(即最大的 capacity() 没有分配).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

在 32 位机器上，10 个字符将适合短字符串.sizeof(string) 是 12.

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

在 64 位机器上，22 个字符将适合短字符串.sizeof(string) 是 24.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

一个主要的设计目标是最小化sizeof(string)，同时使内部缓冲区尽可能大.其基本原理是加快移动构建和移动分配.sizeof 越大，在移动构造或移动分配期间必须移动的单词就越多.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

长格式最少需要3个字来存储数据指针、大小和容量.因此，我将简短形式限制为相同的 3 个单词.有人建议 4 个字的 sizeof 可能有更好的性能.我还没有测试过这种设计选择.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

有一个名为 _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT 的配置标志，它重新排列数据成员，使长布局"从:

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long { size_type __cap_; size_type __size_; pointer __data_; };

到:

struct __long { pointer __data_; size_type __size_; size_type __cap_; };

此更改的动机是相信将 __data_ 放在首位将由于更好的对齐而具有一些性能优势.试图衡量性能优势，但很难衡量.不会使性能变差，可能会稍微好一点.

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

应谨慎使用该标志.它是一个不同的 ABI，如果不小心与使用不同设置的 _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT 编译的 libc++ std::string 混合在一起，将产生运行时错误.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

我建议仅由 libc++ 供应商更改此标志.

I recommend this flag only be changed by a vendor of libc++.

相关文章