std::string 和字符串文字之间的不一致

2021-12-26 00:00:00 string foreach c++ c++11 string-literals

我发现 std::string 和 C++0x 中的字符串文字之间存在令人不安的不一致:

I have discovered a disturbing inconsistency between std::string and string literals in C++0x:

#include <iostream>
#include <string>

int main()
{
    int i = 0;
    for (auto e : "hello")
        ++i;
    std::cout << "Number of elements: " << i << '
';

    i = 0;
    for (auto e : std::string("hello"))
        ++i;
    std::cout << "Number of elements: " << i << '
';

    return 0;
}

输出为:

Number of elements: 6
Number of elements: 5

我理解为什么会发生这种情况的机制:字符串文字实际上是一个包含空字符的字符数组,并且当基于范围的 for 循环调用 std::end()在字符数组上,它得到一个指针,越过数组的末尾;由于空字符是数组的一部分,因此它会得到一个指针越过空字符.

I understand the mechanics of why this is happening: the string literal is really an array of characters that includes the null character, and when the range-based for loop calls std::end() on the character array, it gets a pointer past the end of the array; since the null character is part of the array, it thus gets a pointer past the null character.

然而,我认为这是非常不可取的:当涉及到像长度这样基本的属性时,std::string 和字符串文字肯定应该表现相同吗?

However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

有没有办法解决这种不一致的问题?例如,可以为字符数组重载 std::begin()std::end() 以便它们分隔的范围不包括终止的空字符吗?如果是这样,为什么不这样做?

Is there a way to resolve this inconsistency? For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?

编辑:为了向那些说我只是遭受使用 C 样式字符串(遗留功能")的后果的人更多地表达我的愤慨,请考虑如下代码以下:

EDIT: To justify my indignation a bit more to those who have said that I'm just suffering the consequences of using C-style strings which are a "legacy feature", consider code like the following:

template <typename Range>
void f(Range&& r)
{
    for (auto e : r)
    {
        ...
    }
}

您是否希望 f("hello")f(std::string("hello")) 做一些不同的事情?

Would you expect f("hello") and f(std::string("hello")) to do something different?

推荐答案

可以使用 C++0x 工具箱中的另一个工具解决不一致问题:用户定义的文字.使用适当定义的用户定义文字:

The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:

std::string operator""s(const char* p, size_t n)
{
    return string(p, n);
}

我们将能够写:

int i = 0;     
for (auto e : "hello"s)         
    ++i;     
std::cout << "Number of elements: " << i << '
';

现在输出预期的数字:

Number of elements: 5

有了这些新的 std::string 文字,可以说再也没有理由使用 C 风格的字符串文字了.

With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.

相关文章