STL Character Traits 的重点是什么?

2022-01-07 00:00:00 string c++ stl stdstring char-traits

我注意到在我的 SGI STL 参考副本中,有一个关于 Character Traits 的页面,但我看不到它们是如何使用的?它们会替换 string.h 函数吗?std::string 似乎没有使用它们,例如std::string 上的 length() 方法没有使用 Character Traits length() 方法.为什么会存在 Character Traits 并在实践中使用过?

I notice that in my copy of the SGI STL reference, there is a page about Character Traits but I can't see how these are used? Do they replace the string.h functions? They don't seem to be used by std::string, e.g. the length() method on std::string doesn't make use of the Character Traits length() method. Why do Character Traits exist and are they ever used in practice?

推荐答案

字符特征是流和字符串库的一个极其重要的组成部分,因为它们允许流/字符串类分离出什么是字符的逻辑存储来自应该对这些字符执行什么操作的逻辑.

Character traits are an extremely important component of the streams and strings libraries because they allow the stream/string classes to separate out the logic of what characters are being stored from the logic of what manipulations should be performed on those characters.

首先,默认的字符特征类 char_traits 在 C++ 标准中被广泛使用.例如,没有名为 std::string 的类.相反,有一个类模板 std::basic_string 看起来像这样:

To begin with, the default character traits class, char_traits<T>, is used extensively in the C++ standard. For example, there is no class called std::string. Rather, there's a class template std::basic_string that looks like this:

template <typename charT, typename traits = char_traits<charT> >
    class basic_string;

那么,std::string 定义为

typedef basic_string<char> string;

类似地,标准流定义为

template <typename charT, typename traits = char_traits<charT> >
    class basic_istream;

typedef basic_istream<char> istream;

那么为什么这些类的结构是这样的呢?为什么我们要使用一个奇怪的 traits 类作为模板参数?

So why are these classes structured as they are? Why should we be using a weird traits class as a template argument?

原因是在某些情况下,我们可能想要一个类似于 std::string 的字符串,但具有一些稍微不同的属性.一个典型的例子是,如果您想以一种忽略大小写的方式存储字符串.例如,我可能想要创建一个名为 CaseInsensitiveString 的字符串,这样我就可以拥有

The reason is that in some cases we might want to have a string just like std::string, but with some slightly different properties. One classic example of this is if you want to store strings in a way that ignores case. For example, I might want to make a string called CaseInsensitiveString such that I can have

CaseInsensitiveString c1 = "HI!", c2 = "hi!";
if (c1 == c2) {  // Always true
    cout << "Strings are equal." << endl;
}

也就是说,我可以有一个字符串,其中两个仅区分大小写不同的字符串比较相等.

That is, I can have a string where two strings differing only in their case sensitivity are compared equal.

现在,假设标准库作者在不使用特征的情况下设计了字符串.这意味着我在标准库中有一个非常强大的字符串类,但在我的情况下完全没有用.我不能重用这个字符串类的大部分代码,因为比较总是与我希望它们的工作方式相反.但是通过使用 trait,实际上可以重用驱动 std::string 的代码来获得不区分大小写的字符串.

Now, suppose that the standard library authors designed strings without using traits. This would mean that I'd have in the standard library an immensely powerful string class that was entirely useless in my situation. I couldn't reuse much of the code for this string class, since comparisons would always work against how I wanted them to work. But by using traits, it's actually possible to reuse the code that drives std::string to get a case-insensitive string.

如果您打开 C++ ISO 标准的副本并查看字符串比较运算符如何工作的定义,您会发现它们都是根据 compare 函数定义的.这个函数又通过调用

If you pull up a copy of the C++ ISO standard and look at the definition of how the string's comparison operators work, you'll see that they're all defined in terms of the compare function. This function is in turn defined by calling

traits::compare(this->data(), str.data(), rlen)

其中 str 是您要比较的字符串,rlen 是两个字符串长度中较小的一个.这其实挺有意思的,因为这意味着compare的定义直接使用了指定为模板参数的traits类型导出的compare函数!因此,如果我们定义一个新的 traits 类,然后定义 compare 以便它不区分大小写地比较字符,我们就可以构建一个行为类似于 std::string 的字符串类,但不区分大小写!

where str is the string you're comparing to and rlen is the smaller of the two string lengths. This is actually quite interesting, because it means that the definition of compare directly uses the compare function exported by the traits type specified as a template parameter! Consequently, if we define a new traits class, then define compare so that it compares characters case-insensitively, we can build a string class that behaves just like std::string, but treats things case-insensitively!

这是一个例子.我们从 std::char_traits<char> 继承以获得我们没有编写的所有函数的默认行为:

Here's an example. We inherit from std::char_traits<char> to get the default behavior for all the functions we don't write:

class CaseInsensitiveTraits: public std::char_traits<char> {
public:
    static bool lt (char one, char two) {
        return std::tolower(one) < std::tolower(two);
    }

    static bool eq (char one, char two) {
        return std::tolower(one) == std::tolower(two);
    }

    static int compare (const char* one, const char* two, size_t length) {
        for (size_t i = 0; i < length; ++i) {
            if (lt(one[i], two[i])) return -1;
            if (lt(two[i], one[i])) return +1;
        }
        return 0;
    }
};

(注意我在这里也定义了 eqlt ,它们分别比较相等和小于的字符,然后定义了 compare代码>就这个功能而言).

(Notice I've also defined eq and lt here, which compare characters for equality and less-than, respectively, and then defined compare in terms of this function).

现在我们有了这个traits类,我们可以简单地将CaseInsensitiveString定义为

Now that we have this traits class, we can define CaseInsensitiveString trivially as

typedef std::basic_string<char, CaseInsensitiveTraits> CaseInsensitiveString;

瞧!我们现在有一个不区分大小写的字符串!

And voila! We now have a string that treats everything case-insensitively!

当然,使用traits除了这个还有其他的原因.例如,如果您想定义一个使用某种固定大小的底层字符类型的字符串,那么您可以在该类型上特化 char_traits ,然后从该类型生成字符串.例如,在 Windows API 中,有一个 TCHAR 类型,它是窄字符或宽字符,具体取决于您在预处理期间设置的宏.然后,您可以通过编写

Of course, there are other reasons besides this for using traits. For example, if you want to define a string that uses some underlying character type of a fixed-size, then you can specialize char_traits on that type and then make strings from that type. In the Windows API, for example, there's a type TCHAR that is either a narrow or wide character depending on what macros you set during preprocessing. You can then make strings out of TCHARs by writing

typedef basic_string<TCHAR> tstring;

现在你有一个 TCHAR 的字符串.

And now you have a string of TCHARs.

在所有这些示例中,请注意我们刚刚定义了一些特征类(或使用了一个已经存在的类)作为某个模板类型的参数,以便获取该类型的字符串.重点是 basic_string 作者只需要指定如何使用特征,我们就可以神奇地让它们使用我们的特征而不是默认值来获取具有一些细微差别或怪癖的字符串.默认字符串类型.

In all of these examples, notice that we just defined some traits class (or used one that already existed) as a parameter to some template type in order to get a string for that type. The whole point of this is that the basic_string author just needs to specify how to use the traits and we magically can make them use our traits rather than the default to get strings that have some nuance or quirk not part of the default string type.

希望这有帮助!

编辑:正如@phoji 所指出的,这种特征的概念不仅由 STL 使用,也不是特定于 C++.作为一个完全无耻的自我推销,前阵子我写了一个三元搜索树的实现(一种基数树此处描述),它使用特征来存储任何类型的字符串并使用任何比较类型客户希望他们存储.如果您想查看在实践中使用它的示例,这可能会很有趣.

EDIT: As @phooji pointed out, this notion of traits is not just used by the STL, nor is it specific to C++. As a completely shameless self-promotion, a while back I wrote an implementation of a ternary search tree (a type of radix tree described here) that uses traits to store strings of any type and using whatever comparison type the client wants them to store. It might be an interesting read if you want to see an example of where this is used in practice.

EDIT:为了回应您关于 std::string 不使用 traits::length 的声明,事实证明它在几个地方.最值得注意的是,当您从 char* C 样式字符串构造一个 std::string 时,该字符串的新长度是通过调用 traits 得出的::length 在那个字符串上.traits::length 似乎主要用于处理 C 风格的字符序列,这是 C++ 中字符串的最小公分母",而 std::string 用于处理任意内容的字符串.

EDIT: In response to your claim that std::string doesn't use traits::length, it turns out that it does in a few places. Most notably, when you construct a std::string out of a char* C-style string, the new length of the string is derived by calling traits::length on that string. It seems that traits::length is used mostly to deal with C-style sequences of characters, which are the "least common denominator" of strings in C++, while std::string is used to work with strings of arbitrary contents.

相关文章