为什么 C 和 C++ 如此讨厌带符号的字符?

2022-01-12 00:00:00 c char c++

为什么 C 允许使用字符类型"访问对象:

Why does C allow accessing object using "character type":

6.5 表达式 (C)

对象的存储值只能由具有以下类型之一的左值表达式访问:

An object shall have its stored value accessed only by an lvalue expression that has one ofthe following types:

  • 一种字符类型.

但 C++ 只允许 char 和 unsigned char?

but C++ only allows char and unsigned char?

3.10 左值和右值 (C++)

如果程序尝试通过以下类型之一以外的左值访问对象的存储值,则行为未定义:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • char 或 unsigned char 类型.

signed char hatred 的另一部分(引自 C++ 标准):

Another portion of signed char hatred (quote from C++ standard):

3.9 类型 (C++)

对于普通可复制类型 T 的任何对象(基类子对象除外),无论该对象是否拥有类型 T 的有效值,构成该对象的底层字节都可以复制到 char 或 无符号字符.如果 char 或 unsigned char 数组的内容被复制回对象,则该对象随后应保持其原始值.

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

并且来自 C 标准:

6.2.6 类型的表示 (C)

存储在任何其他对象类型的非位域对象中的值由 n × CHAR_BIT 位组成,其中 n 是该类型对象的大小,以字节为单位.该值可以被复制到 unsigned char [n] 类型的对象中(例如,通过 memcpy);生成的字节集称为值的对象表示.

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

我可以在 stackoverflow 上看到很多人说这是因为 unsigned char 是唯一保证没有填充位的字符类型,但 C99 部分 6.2.6.2 整数类型 说

I can see many people on stackoverflow saying that is because unsigned char is the only character type that guaranteed to not have padding bits, but C99 Section 6.2.6.2 Integer types says

signed char 不应有任何填充位

signed char shall not have any padding bits

那么这背后的真正原因是什么?

So what is the real reason behind this?

推荐答案

这是我对动机的看法:

在非二进制补码系统上,signed char 将不适合访问对象的表示.这是因为有两种可能的 signed char 表示具有相同的值(+0 和 -0),或者一种表示没有值(陷阱表示).在任何一种情况下,这都会阻止您做最有意义的事情,您可能会对对象的表示做一些事情.例如,如果您有一个 16 位无符号整数 0x80ff,则作为 signed char 的一个或另一个字节将陷入或比较等于 0.

On a non-twos-complement system, signed char will not be suitable for accessing the representation of an object. This is because either there are two possible signed char representations which have the same value (+0 and -0), or one representation that has no value (a trap representation). In either case, this prevents you from doing most meaningful things you might do with the representation of an object. For example, if you have a 16-bit unsigned integer 0x80ff, one or the other byte, as a signed char, is going to either trap or compare equal to 0.

请注意,在这样的实现(非二元补码)中,普通 char 需要定义为无符号类型,以便通过 char 访问对象的表示以正常工作.虽然没有明确的要求,但我认为这是源自标准中其他要求的要求.

Note that on such an implementation (non-twos-complement), plain char needs to be defined as an unsigned type for accessing the representations of objects via char to work correctly. While there's no explicit requirement, I see this as a requirement derived from other requirements in the standard.

相关文章