信号 NaN 的用处?

2021-12-22 00:00:00 floating-point visual-c++ ieee-754 c++ x87

我最近阅读了很多关于 IEEE 754 和 x87 架构的书.我正在考虑在我正在处理的一些数字计算代码中使用 NaN 作为缺失值",我希望使用 signaling NaN 可以让我在这种情况下捕获浮点异常我不想继续处理缺失值".相反,我会使用 quiet NaN 来允许缺失值"通过计算传播.但是,根据现有的(非常有限的)文档,信号 NaN 并不像我认为的那样工作.

这是我所知道的总结(所有这些都使用 x87 和 VC++):

  • _EM_INVALID(IEEE无效"异常)控制 x87 在遇到 NaN 时的行为
  • 如果 _EM_INVALID 被屏蔽(异常被禁用),则不会产生异常并且操作可以返回安静的 NaN.涉及信号 NaN 的操作将不会导致抛出异常,但会转换为安静的 NaN.
  • 如果 _EM_INVALID 未屏蔽(启用异常),则无效操作(例如 sqrt(-1))会导致抛出无效异常.
  • x87 从不生成信号 NaN.
  • 如果 _EM_INVALID 未屏蔽,任何使用信号 NaN(甚至用它初始化变量)都会导致抛出无效异常.

标准库提供了一种访问 NaN 值的方法:

std::numeric_limits::signaling_NaN();

std::numeric_limits::quiet_NaN();

问题是我认为信号 NaN 没有任何用处.如果 _EM_INVALID 被屏蔽,它的行为与安静的 NaN 完全相同.由于没有任何 NaN 可与任何其他 NaN 进行比较,因此没有逻辑差异.

如果_EM_INVALID未被屏蔽(启用异常),那么甚至不能用信号NaN初始化一个变量:double dVal = std::numeric_limits<double>::signaling_NaN(); 因为这会引发异常(将信号 NaN 值加载到 x87 寄存器中以将其存储到内存地址).>

你可能和我一样认为:

  1. 掩码_EM_INVALID.
  2. 使用 NaN 信号初始化变量.
  3. 取消屏蔽_EM_INVALID.

但是,第 2 步会导致信号 NaN 转换为安静的 NaN,因此后续使用它不会导致抛出异常!所以WTF?!

信号 NaN 是否有任何用途或目的?我知道最初的意图之一是用它初始化内存,以便可以捕获对未初始化浮点值的使用.

有人能告诉我这里是否遗漏了什么吗?

<小时>

为了进一步说明我希望做的事情,这里有一个例子:

考虑对数据向量(双精度数)执行数学运算.对于某些操作,我希望允许向量包含缺失值"(假设这对应于电子表格列,例如,其中某些单元格没有值,但它们的存在很重要).对于某些操作,我不想允许向量包含缺失值".如果集合中存在缺失值",也许我想采取不同的行动方案――也许执行不同的操作(因此这不是无效状态).

这个原始代码看起来像这样:

const double MISSING_VALUE = 1.3579246e123;使用 std::vector;向量<双>缺少允许(1000000,MISSING_VALUE);向量<双>missingNotAllowed(1000000, MISSING_VALUE);//...用(用户)数据填充missingAllowed和missingNotAllowed...for (vector::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {if (*it != MISSING_VALUE) *it = sqrt(*it);//sqrt() 可以是任何操作}for (vector::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {if (*it != MISSING_VALUE) *it = sqrt(*it);否则 *它 = 0;}

请注意,每次循环迭代都必须执行缺失值"检查.虽然我理解在大多数情况下,sqrt 函数(或任何其他数学运算)可能会掩盖此检查,但在某些情况下,运算很少(可能只是添加)并且检查成本很高.更不用说缺失值"使合法输入值失去作用的事实,如果计算合法地到达该值(尽管可能是不可能的),可能会导致错误.同样在技术上是正确的,应该根据该值检查用户输入数据,并应该采取适当的措施.我发现此解决方案不优雅且性能不理想.这是对性能至关重要的代码,我们绝对没有并行数据结构或某种数据元素对象的奢侈.

NaN 版本看起来像这样:

使用 std::vector;向量<双>missingAllowed(1000000, std::numeric_limits<double>::quiet_NaN());向量<双>missingNotAllowed(1000000, std::numeric_limits::signaling_NaN());//...用(用户)数据填充missingAllowed和missingNotAllowed...for (vector::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {*it = sqrt(*it);//如果 *it == QNaN 那么 sqrt(*it) == QNaN}for (vector::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {尝试 {*it = sqrt(*it);} catch (FPInvalidException&) {//假设 _seh_translator 设置*它= 0;}}

现在消除了显式检查,应该提高性能.我认为如果我可以在不接触 FPU 寄存器的情况下初始化向量,这一切都会起作用...

此外,我想任何自尊的 sqrt 实现都会检查 NaN 并立即返回 NaN.

解决方案

据我所知,NaN 信号的目的是初始化数据结构,但是,当然 runtime C 中的初始化运行将 NaN 作为初始化的一部分加载到浮点寄存器中的风险,从而触发信号,因为编译器不知道需要使用整数寄存器复制这个浮点值.

我希望您可以使用信号 NaN 初始化 static 值,但即使这样也需要编译器进行一些特殊处理,以避免将其转换为安静的 NaN.您也许可以使用一些转换魔法来避免在初始化期间将其视为浮点值.

如果您使用 ASM 编写,这将不是问题.但是在 C 中,尤其是在 C++ 中,我认为您必须颠覆类型系统才能用 NaN 初始化变量.我建议使用 memcpy.

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating point exception in the cases where I don't want to proceed with "missing values." Conversely, I would use quiet NaN to allow the "missing value" to propagate through a computation. However, signaling NaNs don't work as I thought they would based on the (very limited) documentation that exists on them.

Here is a summary of what I know (all of this using x87 and VC++):

  • _EM_INVALID (the IEEE "invalid" exception) controls the behavior of the x87 when encountering NaNs
  • If _EM_INVALID is masked (the exception is disabled), no exception is generated and operations can return quiet NaN. An operation involving signaling NaN will not cause an exception to be thrown, but will be converted to quiet NaN.
  • If _EM_INVALID is unmasked (exception enabled), an invalid operation (e.g., sqrt(-1)) causes an invalid exception to be thrown.
  • The x87 never generates signaling NaN.
  • If _EM_INVALID is unmasked, any use of a signaling NaN (even initializing a variable with it) causes an invalid exception to be thrown.

The Standard Library provides a way to access the NaN values:

std::numeric_limits<double>::signaling_NaN();

and

std::numeric_limits<double>::quiet_NaN();

The problem is that I see no use whatsoever for the signaling NaN. If _EM_INVALID is masked it behaves exactly the same as quiet NaN. Since no NaN is comparable to any other NaN, there is no logical difference.

If _EM_INVALID is not masked (exception is enabled), then one cannot even initialize a variable with a signaling NaN: double dVal = std::numeric_limits<double>::signaling_NaN(); because this throws an exception (the signaling NaN value is loaded into an x87 register to store it to the memory address).

You may think the following as I did:

  1. Mask _EM_INVALID.
  2. Initialize the variable with signaling NaN.
  3. Unmask_EM_INVALID.

However, step 2 causes the signaling NaN to be converted to a quiet NaN, so subsequent uses of it will not cause exceptions to be thrown! So WTF?!

Is there any utility or purpose whatsoever to a signaling NaN? I understand one of the original intents was to initialize memory with it so that use of an unitialized floating point value could be caught.

Can someone tell me if I am missing something here?


EDIT:

To further illustrate what I had hoped to do, here is an example:

Consider performing mathematical operations on a vector of data (doubles). For some operations, I want to allow the vector to contain a "missing value" (pretend this corresponds to a spreadsheet column, for example, in which some of the cells do not have a value, but their existence is significant). For some operations, I do not want to allow the vector to contain a "missing value." Perhaps I want to take a different course of action if a "missing value" is present in the set -- perhaps performing a different operation (thus this is not an invalid state to be in).

This original code would look something like this:

const double MISSING_VALUE = 1.3579246e123;
using std::vector;

vector<double> missingAllowed(1000000, MISSING_VALUE);
vector<double> missingNotAllowed(1000000, MISSING_VALUE);

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it); // sqrt() could be any operation
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it);
    else *it = 0;
}

Note that the check for the "missing value" must be performed every loop iteration. While I understand in most cases, the sqrt function (or any other mathematical operation) will likely overshadow this check, there are cases where the operation is minimal (perhaps just an addition) and the check is costly. Not to mention the fact that the "missing value" takes a legal input value out of play and could cause bugs if a calculation legitimately arrives at that value (unlikely though it may be). Also to be technically correct, the user input data should be checked against that value and an appropriate course of action should be taken. I find this solution inelegant and less-than-optimal performance-wise. This is performance-critical code, and we definitely do not have the luxury of parallel data structures or data element objects of some sort.

The NaN version would look like this:

using std::vector;

vector<double> missingAllowed(1000000, std::numeric_limits<double>::quiet_NaN());
vector<double> missingNotAllowed(1000000, std::numeric_limits<double>::signaling_NaN());

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    *it = sqrt(*it); // if *it == QNaN then sqrt(*it) == QNaN
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    try {
        *it = sqrt(*it);
    } catch (FPInvalidException&) { // assuming _seh_translator set up
        *it = 0;
    }
}

Now the explicit check is eliminated and performance should be improved. I think this would all work if I could initialize the vector without touching the FPU registers...

Furthermore, I would imagine any self-respecting sqrt implementation checks for NaN and returns NaN immediately.

解决方案

As I understand it, the purpose of signaling NaN is to initialize data structures, but, of course runtime initialization in C runs the risk of having the NaN loaded into a float register as part of initialization, thereby triggering the signal because the the compiler isn't aware that this float value needs to be copied using an integer register.

I would hope that you could could initialize a static value with a signaling NaN, but even that would require some special handling by the compiler to avoid having it converted to a quiet NaN. You could perhaps use a bit of casting magic to avoid having it treated as a float value during initialization.

If you were writing in ASM, this would not be an issue. but in C and especially in C++, I think you will have to subvert the type system in order to initialize a variable with NaN. I suggest using memcpy.

相关文章