为什么字符串文字是 const?

2022-01-23 00:00:00 string literals constants c++

众所周知,在 C++ 中,字符串文字是不可变的,修改字符串文字的结果是未定义的.例如

It is known that in C++ string literals are immutable and the result of modifying a string literal is undefined. For example

char * str = "Hello!";
str[1] = 'a';

这将导致未定义的行为.

This will bring to an undefined behavior.

此外,字符串文字被放置在静态内存中.所以它们存在于整个程序中.我想知道为什么字符串文字有这样的属性.

Besides that string literals are placed in static memory. So they exists during whole program. I would like to know why do string literals have such properties.

推荐答案

有几个不同的原因.

一种是允许将字符串文字存储在只读内存中(正如其他人已经提到的那样).

One is to allow storing string literals in read-only memory (as others have already mentioned).

另一个是允许字符串文字的合并.如果一个程序在几个不同的地方使用相同的字符串字面量,最好允许(但不一定要求)编译器将它们合并,这样您就可以获得指向同一内存的多个指针,而不是每个指针占用一个单独的内存块.这也适用于两个字符串文字不一定相同,但确实有相同的结尾:

Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:

char *foo = "long string";
char *bar = "string";

在这种情况下,bar 可能是 foo+5(如果我计算正确的话).

In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).

在这两种情况下,如果您允许修改字符串文字,它可能会修改恰好具有相同内容的 other 字符串文字.同时,老实说,强制这样做也没有什么意义――拥有足够多的字符串文字可以重叠,大多数人可能希望编译器运行得更慢只是为了节省(也许)几十个字节,这是非常罕见的大约内存.

In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.

在编写第一个标准时,已经有编译器使用了所有这三种技术(可能还有一些其他技术).由于无法描述修改字符串文字所产生的一种行为,而且显然没有人认为这是一种重要的支持能力,因此他们做了显而易见的事情:说即使尝试这样做也会导致未定义的行为.

By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.

相关文章