在 bool 中设置额外的位使其同时为真和假

2022-01-19 00:00:00 boolean c++ evaluation undefined-behavior abi

如果我得到一个 bool 变量并将其第二位设置为 1，则变量同时计算为真和假.使用带有 -g 选项的 gcc6.3 编译以下代码，(gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -omytest_d) 并运行可执行文件.你会得到以下结果.

If I get a bool variable and set its second bit to 1, then variable evaluates to true and false at the same time. Compile the following code with gcc6.3 with -g option, (gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -o mytest_d) and run the executable. You get the following.

T怎么可能同时等于真假?

How can T be equal to true and false at the same time?

value bits ----- ---- T: 1 0001 after bit change T: 3 0011 T is true T is false

当您以不同的语言(例如 fortran)调用函数时，可能会发生这种情况，其中真假定义与 C++ 不同.对于fortran，如果任何位不为0，则值为真，如果所有位均为零，则值为假.

This can happen when you call a function in a different language (say fortran) where true and false definition is different than C++. For fortran if any bits are not 0 then the value is true, if all bits are zero then the value is false.

#include <iostream> #include <bitset> using namespace std; void set_bits_to_1(void* val){ char *x = static_cast<char *>(val); for (int i = 0; i<2; i++ ){ *x |= (1UL << i); } } int main(int argc,char *argv[]) { bool T = 3; cout <<" value bits " <<endl; cout <<" ----- ---- " <<endl; cout <<" T: "<< T <<" "<< bitset<4>(T)<<endl; set_bits_to_1(&T); bitset<4> bit_T = bitset<4>(T); cout <<"after bit change"<<endl; cout <<" T: "<< T <<" "<< bit_T<<endl; if (T ){ cout <<"T is true" <<endl; } if ( T == false){ cout <<"T is false" <<endl; } }

////////////////////////////////使用 ifort 编译时与 C++ 不兼容的 Fortran 函数.

/////////////////////////////////// // Fortran function that is not compatible with C++ when compiled with ifort.

logical*1 function return_true() implicit none return_true = 1; end function return_true

推荐答案

在 C++ 中，bool 的位表示(甚至大小)是由实现定义的；通常它被实现为 char 大小的类型，取 1 或 0 作为可能的值.

In C++ the bit representation (and even the size) of a bool is implementation defined; generally it's implemented as a char-sized type taking 1 or 0 as possible values.

如果您将其值设置为与允许值不同的任何值(在这种特定情况下，通过 char 为 bool 别名并修改其位表示)，您就是打破语言规则，所以任何事情都可能发生.特别是，在标准中明确规定了损坏".bool 可能表现为 true 和 false(或既不是 true 也不是 false)同时:

If you set its value to anything different from the allowed ones (in this specific case by aliasing a bool through a char and modifying its bit representation), you are breaking the rules of the language, so anything can happen. In particular, it's explicitly specified in the standard that a "broken" bool may behave as both true and false (or neither true nor false) at the same time:

以本国际标准描述为未定义"的方式使用 bool 值，例如通过检查未初始化的自动对象的值，可能会导致其表现得好像既不是 true 或 false

Using a bool value in ways described by this International Standard as "undefined," such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false

(C++11，[basic.fundamental]，注释 47)

(C++11, [basic.fundamental], note 47)

在这种特殊情况下，你可以看到它是如何在这种奇怪的情况下结束的:第一个 if 被编译为

In this particular case, you can see how it ended up in this bizarre situation: the first if gets compiled to

movzx eax, BYTE PTR [rbp-33] test al, al je .L22

在 eax 中加载 T(扩展名为零)，如果全为零则跳过打印；下一个 if 是

which loads T in eax (with zero extension), and skips the print if it's all zero; the next if instead is

movzx eax, BYTE PTR [rbp-33] xor eax, 1 test al, al je .L23

测试 if(T == false) 被转换为 if(T^1)，它只翻转低位.这对于一个有效的 bool 来说是可以的，但对于你的损坏"来说是可以的.一个它不会削减它.

The test if(T == false) is transformed to if(T^1), which flips just the low bit. This would be ok for a valid bool, but for your "broken" one it doesn't cut it.

请注意，这个奇怪的序列只在低优化级别生成；在更高级别，这通常会归结为零/非零检查，并且像您这样的序列可能会变为单个测试/条件分支.无论如何，在其他情况下，您都会遇到奇怪的行为，例如将 bool 值与其他整数相加时:

Notice that this bizarre sequence is only generated at low optimization levels; at higher levels this is generally going to boil down to a zero/nonzero check, and a sequence like yours is likely to become a single test/conditional branch. You will get bizarre behavior anyway in other contexts, e.g. when summing bool values to other integers:

int foo(bool b, int i) { return i + b; }

成为

foo(bool, int): movzx edi, dil lea eax, [rdi+rsi] ret

其中 dil 是受信任的"；为 0/1.

where dil is "trusted" to be 0/1.

如果你的程序都是 C++，那么解决方法很简单:不要以这种方式破坏 bool 值，避免弄乱它们的位表示，一切都会顺利；特别是，即使您将整数分配给 bool，编译器也会发出必要的代码以确保结果值是有效的 bool，因此您的 bool T = 3 确实是安全的，而 T 最终会以 true 结束.

If your program is all C++, then the solution is simple: don't break bool values this way, avoid messing with their bit representation and everything will go well; in particular, even if you assign from an integer to a bool the compiler will emit the necessary code to make sure that the resulting value is a valid bool, so your bool T = 3 is indeed safe, and T will end up with a true in its guts.

如果您需要与用其他语言编写的代码进行互操作，这些代码可能与 bool 的概念不同，请避免使用 bool 来表示边界".代码，并将其编组为适当大小的整数.它将在条件和公司一样好.

If instead you need to interoperate with code written in other languages that may not share the same idea of what a bool is, just avoid bool for "boundary" code, and marshal it as an appropriately-sized integer. It will work in conditionals & co. just as fine.

免责声明我对 Fortran 的所有了解是我今天早上在标准文档上阅读的内容，并且我有一些带有 Fortran 列表的穿孔卡片用作书签，所以请放轻松.p>

Disclaimer all I know of Fortran is what I read this morning on standard documents, and that I have some punched cards with Fortran listings that I use as bookmarks, so go easy on me.

首先，这种语言互操作性不是语言标准的一部分，而是平台 ABI 的一部分.正如我们所说的 Linux x86-64，相关文档是 System V x86-64 ABI.

First of all, this kind of language interoperability stuff isn't part of the language standards, but of the platform ABI. As we are talking about Linux x86-64, the relevant document is the System V x86-64 ABI.

首先，没有指定 C _Bool 类型(在 3.1.2 note ? 中定义为与 C++ bool 相同)具有任何类型与 Fortran LOGICAL 的兼容性；特别是，在 9.2.2 表 9.2 中指定了plain"；LOGICAL 映射到 signed int.关于它所说的 TYPE*??N 类型

First of all, nowhere is specified that the C _Bool type (which is defined to be the same as C++ bool at 3.1.2 note ?) has any kind of compatibility with Fortran LOGICAL; in particular, at 9.2.2 table 9.2 specifies that "plain" LOGICAL is mapped to signed int. About TYPE*N types it says that

TYPE*??N"表示法指定TYPE类型的变量或聚合成员应占用N字节的存储空间.

The "TYPE*N" notation specifies that variables or aggregate members of type TYPE shall occupy N bytes of storage.

(同上)

LOGICAL*1 没有显式指定等效类型，这是可以理解的:它甚至不是标准的；实际上，如果您尝试在 Fortran 95 兼容模式下编译包含 LOGICAL*1 的 Fortran 程序，则会收到有关它的警告，两者均由 ifort

There's no equivalent type explicitly specified for LOGICAL*1, and it's understandable: it's not even standard; indeed if you try to compile a Fortran program containing a LOGICAL*1 in Fortran 95 compliant mode you get warnings about it, both by ifort

./example.f90(2): warning #6916: Fortran 95 does not allow this length specification. [1] logical*1, intent(in) :: x ------------^

通过 gfort

./example.f90:2:13: logical*1, intent(in) :: x 1 Error: GNU Extension: Nonstandard type declaration LOGICAL*1 at (1)

所以水已经浑浊了；所以，结合上面的两个规则，我会选择 signed char 是安全的.

so the waters are already muddled; so, combining the two rules above, I'd go for signed char to be safe.

然而:ABI 还规定:

LOGICAL 类型的值是 .TRUE. 实现为 1 和 .FALSE.实现为 0.

The values for type LOGICAL are .TRUE. implemented as 1 and .FALSE. implemented as 0.

所以，如果您有一个程序在 LOGICAL 值中存储除 1 和 0 之外的任何内容，您已经超出了 Fortran 方面的规范！你说:

So, if you have a program that stores anything besides 1 and 0 in a LOGICAL value, you are already out of spec on the Fortran side! You say:

fortran logical*1 与 bool 具有相同的表示形式，但在 fortran 中如果位为 00000011 则为 true，在 C++ 中为未定义.

A fortran logical*1 has same representation as bool, but in fortran if bits are 00000011 it is true, in C++ it is undefined.

最后这句话不正确，Fortran 标准与表示无关，而 ABI 明确表示相反.实际上，您可以通过检查 gfort 的输出中的 LOGICAL 轻松看到这一点比较:

This last statement is not true, the Fortran standard is representation-agnostic, and the ABI explicitly says the contrary. Indeed you can see this in action easily by checking the output of gfort for LOGICAL comparison:

integer function logical_compare(x, y) logical, intent(in) :: x logical, intent(in) :: y if (x .eqv. y) then logical_compare = 12 else logical_compare = 24 end if end function logical_compare

变成

logical_compare_: mov eax, DWORD PTR [rsi] mov edx, 24 cmp DWORD PTR [rdi], eax mov eax, 12 cmovne eax, edx ret

您会注意到两个值之间有一个直接的 cmp，而无需先对它们进行标准化(与 ifort 不同，在这方面更保守).

You'll notice that there's a straight cmp between the two values, without normalizing them first (unlike ifort, that is more conservative in this regard).

更有趣的是:不管 ABI 说什么，ifort 默认使用非标准的 LOGICAL 表示；这在 -fpscomp 逻辑中进行了解释 切换文档，其中还指定了一些关于 LOGICAL 和跨语言兼容性的有趣细节:

Even more interesting: regardless of what the ABI says, ifort by default uses a nonstandard representation for LOGICAL; this is explained in the -fpscomp logicals switch documentation, which also specifies some interesting details about LOGICAL and cross-language compatibility:

指定非零值的整数被视为真，零值的整数被视为假.字面常量 .TRUE.整数值为 1，文字常量为 .FALSE.整数值为 0.英特尔 Fortran 版本 8.0 之前的版本和 Fortran PowerStation 使用此表示.

Specifies that integers with a non-zero value are treated as true, integers with a zero value are treated as false. The literal constant .TRUE. has an integer value of 1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Intel Fortran releases before Version 8.0 and by Fortran PowerStation.

默认为fpscomp nologicals，指定奇整数值(低位1)被视为真，偶整数值(低位0)被视为假.

The default is fpscomp nologicals, which specifies that odd integer values (low bit one) are treated as true and even integer values (low bit zero) are treated as false.

字面常量 .TRUE.整数值为 -1，文字常量为 .FALSE.具有整数值 0.Compaq Visual Fortran 使用此表示.Fortran 标准未指定 LOGICAL 值的内部表示.在 LOGICAL 上下文中使用整数值或将 LOGICAL 值传递给用其他语言编写的程序的程序是不可移植的，并且可能无法正确执行.英特尔建议您避免使用依赖于 LOGICAL 值的内部表示的编码实践.

The literal constant .TRUE. has an integer value of -1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Compaq Visual Fortran. The internal representation of LOGICAL values is not specified by the Fortran standard. Programs which use integer values in LOGICAL contexts, or which pass LOGICAL values to procedures written in other languages, are non-portable and may not execute correctly. Intel recommends that you avoid coding practices that depend on the internal representation of LOGICAL values.

(强调)

现在，LOGICAL 的内部表示通常不成问题，因为据我所知，如果你按规则"玩游戏的话.不要跨越你不会注意到的语言界限.对于符合标准的程序，没有直接转换"的程序.INTEGER 和 LOGICAL 之间；我看到你可以将 INTEGER 推入 LOGICAL 的唯一方法似乎是 TRANSFER，它本质上是不可移植的，并且没有提供真正的保证, 或非标准的 INTEGER <->LOGICAL 赋值转换.

Now, the internal representation of a LOGICAL normally shouldn't a problem, as, from what I gather, if you play "by the rules" and don't cross language boundaries you aren't going to notice. For a standard compliant program there's no "straight conversion" between INTEGER and LOGICAL; the only way I see you can shove an INTEGER into a LOGICAL seem to be TRANSFER, which is intrinsically non-portable and give no real guarantees, or the non-standard INTEGER <-> LOGICAL conversion on assignment.

后一个已记录通过 gfort 始终导致非零 ->.TRUE.，零 ->.FALSE. 和你可以看到在所有情况下代码都是为实现这一点而生成(即使在使用旧表示的情况下它是复杂的代码)，因此您似乎无法以这种方式将任意整数推入 LOGICAL.

The latter one is documented by gfort to always result in nonzero -> .TRUE., zero -> .FALSE., and you can see that in all cases code is generated to make this happen (even though it's convoluted code in case of ifort with the legacy representation), so you cannot seem to shove an arbitrary integer into a LOGICAL in this way.

logical*1 function integer_to_logical(x) integer, intent(in) :: x integer_to_logical = x return end function integer_to_logical

integer_to_logical_: mov eax, DWORD PTR [rdi] test eax, eax setne al ret

LOGICAL*1 的反向转换是一个直整数零扩展 (gfort)，因此，为了遵守上面链接的文档中的合同，它显然期望 LOGICAL 值为 0 或 1.

The reverse conversion for a LOGICAL*1 is a straight integer zero-extension (gfort), so, to be honoring the contract in the documentation linked above, it's clearly expecting the LOGICAL value to be 0 or 1.

但总的来说，这些转换的情况有点一团糟，所以我会远离他们.

But in general, the situation for these conversions is a bit of a mess, so I'd just stay away from them.

所以，长话短说:避免将 INTEGER 数据放入 LOGICAL 值中，因为即使在 Fortran 中也很糟糕，并确保使用正确的编译器标志来获取布尔值的 ABI 兼容表示，以及与 C/C++ 的互操作性应该没问题.但为了更加安全，我只会在 C++ 端使用纯 char.

So, long story short: avoid putting INTEGER data into LOGICAL values, as it is bad even in Fortran, and make sure to use the correct compiler flag to get the ABI-compliant representation for booleans, and interoperability with C/C++ should be fine. But to be extra safe, I'd just use plain char on the C++ side.

最后，根据我的收集从文档来看，ifort 有一些内置支持与 C 的互操作性，包括布尔值；您可以尝试利用它.

Finally, from what I gather from the documentation, in ifort there is some builtin support for interoperability with C, including booleans; you may try to leverage it.

相关文章