NaN 的位模式真的依赖于硬件吗?
我正在阅读 Java 语言规范中的浮点 NaN 值(我很无聊).32 位 float
具有这种位格式:
I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float
has this bit format:
seee eeee emmm mmmm mmmm mmmm mmmm mmmm
s
是符号位,e
是指数位,m
是尾数位.NaN 值被编码为全 1 的指数,并且尾数位不全为 0(这将是 +/- 无穷大).这意味着有许多不同的可能 NaN 值(具有不同的 s
和 m
位值).
s
is the sign bit, e
are the exponent bits, and m
are the mantissa bits. A NaN value is encoded as an exponent of all 1s, and the mantissa bits are not all 0 (which would be +/- infinity). This means that there are lots of different possible NaN values (having different s
and m
bit values).
对此,JLS§4.2.3 说:
IEEE 754 允许其单双浮点格式中的每一种都有多个不同的 NaN 值.虽然每个硬件架构在生成新的 NaN 时都会返回特定的 NaN 位模式,但程序员也可以创建具有不同位模式的 NaN 来编码,例如追溯诊断信息.
IEEE 754 allows multiple distinct NaN values for each of its single and double floating-point formats. While each hardware architecture returns a particular bit pattern for NaN when a new NaN is generated, a programmer can also create NaNs with different bit patterns to encode, for example, retrospective diagnostic information.
JLS 中的文本似乎暗示,例如,0.0/0.0
的结果具有与硬件相关的位模式,并且取决于该表达式是否被计算为编译时间常量,它所依赖的硬件可能是编译 Java 程序的硬件或运行程序的硬件.如果属实,这一切似乎非常不稳定.
The text in the JLS seems to imply that the result of, for example, 0.0/0.0
, has a hardware-dependent bit pattern, and depending on whether that expression was computed as a compile time constant, the hardware it is dependent on might be the hardware the Java program was compiled on or the hardware the program was run on. This all seems very flaky if true.
我进行了以下测试:
System.out.println(Integer.toHexString(Float.floatToRawIntBits(0.0f/0.0f)));
System.out.println(Integer.toHexString(Float.floatToRawIntBits(Float.NaN)));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(0.0d/0.0d)));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(Double.NaN)));
我机器上的输出是:
7fc00000
7fc00000
7ff8000000000000
7ff8000000000000
输出没有显示任何超出预期的内容.指数位都是 1.尾数的高位也是 1,这对于 NaN 显然表示安静的 NaN"而不是发信号的 NaN"(https://en.wikipedia.org/wiki/NaN#Floating_point).符号位和尾数位的其余部分为 0.输出还显示,在我的机器上生成的 NaN 与 Float 和 Double 类中的常量 NaN 没有区别.
The output shows nothing out of the expected. The exponent bits are all 1. The upper bit of the mantissa is also 1, which for NaNs apparently indicates a "quiet NaN" as opposed to a "signalling NaN" (https://en.wikipedia.org/wiki/NaN#Floating_point). The sign bit and the rest of the mantissa bits are 0. The output also shows that there was no difference between the NaNs generated on my machine and the constant NaNs from the Float and Double classes.
我的问题是,无论编译器或虚拟机的 CPU 是多少,Java 是否都能保证输出,还是真的无法预测?JLS 对此很神秘.
My question is, is that output guaranteed in Java, regardless of the CPU of the compiler or VM, or is it all genuinely unpredictable? The JLS is mysterious about this.
如果 0.0/0.0
保证该输出,是否有任何算术方法可以生成具有其他(可能与硬件相关?)位模式的 NaN?(我知道 intBitsToFloat
/longBitsToDouble
可以编码其他 NaN,但我想知道其他值是否可以从正常算术中产生.)
If that output is guaranteed for 0.0/0.0
, are there any arithmetic ways of producing NaNs that do have other (possibly hardware-dependent?) bit patterns? (I know intBitsToFloat
/longBitsToDouble
can encode other NaNs, but I'd like to know if other values can occur from normal arithmetic.)
后续要点:我注意到 Float.NaN 和 双倍.NaN 指定它们的确切位模式,但在源 (浮动, Double) 它们由 0.0/0.0
生成.如果该划分的结果确实取决于编译器的硬件,那么无论是规范还是实现似乎都存在缺陷.
A followup point: I've noticed that Float.NaN and Double.NaN specify their exact bit pattern, but in the source (Float, Double) they are generated by 0.0/0.0
. If the result of that division is really dependent on the hardware of the compiler, it seems like there is a flaw there in either the spec or the implementation.
推荐答案
这就是 §2.3.2 of the JVM 7 spec 不得不说:
双值集合的元素正是可以表示的值使用 IEEE 754 标准中定义的双浮点格式,除了只有一个 NaN 值(IEEE 754 指定 253-2 个不同的 NaN 值).
The elements of the double value set are exactly the values that can be represented using the double floating-point format defined in the IEEE 754 standard, except that there is only one NaN value (IEEE 754 specifies 253-2 distinct NaN values).
和 §2.8.1:
Java 虚拟机没有信号 NaN 值.
The Java Virtual Machine has no signaling NaN value.
所以从技术上讲,只有一个 NaN.但是 §4.2.3JLS 还说(在您的报价之后):
So technically there is only one NaN. But §4.2.3 of the JLS also says (right after your quote):
在大多数情况下,Java SE 平台将给定类型的 NaN 值视为折叠为单个规范值,因此本规范通常将任意 NaN 称为规范值.
For the most part, the Java SE platform treats NaN values of a given type as though collapsed into a single canonical value, and hence this specification normally refers to an arbitrary NaN as though to a canonical value.
但是,Java SE 平台 1.3 版引入了使程序员能够区分 NaN 值的方法:Float.floatToRawIntBits 和 Double.doubleToRawLongBits 方法.感兴趣的读者可以参考 Float 和 Double 类的规范以获取更多信息.
However, version 1.3 of the Java SE platform introduced methods enabling the programmer to distinguish between NaN values: the Float.floatToRawIntBits and Double.doubleToRawLongBits methods. The interested reader is referred to the specifications for the Float and Double classes for more information.
我认为这正是您和 CandiedOrange 建议的意思:它依赖于底层处理器,但 Java 对待它们都一样.
Which I take to mean exactly what you and CandiedOrange propose: It is dependent on the underlying processor, but Java treats them all the same.
但它变得更好:显然,您的 NaN 值完全有可能被静默转换为不同的 NaN,如 Double.longBitsToDouble()
:
But it gets better: Apparently, it is entirely possible that your NaN values are silently converted to different NaNs, as described in Double.longBitsToDouble()
:
请注意,此方法可能无法返回具有与 long 参数完全相同的位模式的双 NaN.IEEE 754 区分了两种 NaN,静默 NaN 和信令 NaN.这两种 NaN 之间的差异在 Java 中通常是不可见的.信号 NaN 的算术运算将它们变成安静的 NaN,具有不同但通常相似的位模式.但是,在某些处理器上,仅复制信号 NaN 也会执行该转换.特别是,复制一个信令 NaN 以将其返回给调用方法可以执行此转换.因此 longBitsToDouble 可能无法返回带有信号 NaN 位模式的双精度数.因此,对于某些 long 值,doubleToRawLongBits(longBitsToDouble(start)) 可能不等于 start.此外,哪些特定的位模式代表信令 NaN 取决于平台;尽管所有 NaN 位模式,安静或信令,都必须在上面确定的 NaN 范围内.
Note that this method may not be able to return a double NaN with exactly same bit pattern as the long argument. IEEE 754 distinguishes between two kinds of NaNs, quiet NaNs and signaling NaNs. The differences between the two kinds of NaN are generally not visible in Java. Arithmetic operations on signaling NaNs turn them into quiet NaNs with a different, but often similar, bit pattern. However, on some processors merely copying a signaling NaN also performs that conversion. In particular, copying a signaling NaN to return it to the calling method may perform this conversion. So longBitsToDouble may not be able to return a double with a signaling NaN bit pattern. Consequently, for some long values, doubleToRawLongBits(longBitsToDouble(start)) may not equal start. Moreover, which particular bit patterns represent signaling NaNs is platform dependent; although all NaN bit patterns, quiet or signaling, must be in the NaN range identified above.
作为参考,这里有一个与硬件相关的 NaN 表.总结:
For reference, there is a table of the hardware-dependant NaNs here. In summary:
- x86:
quiet: Sign=0 Exp=0x7ff Frac=0x80000
signalling: Sign=0 Exp=0x7ff Frac=0x40000
- PA-RISC:
quiet: Sign=0 Exp=0x7ff Frac=0x40000
signalling: Sign=0 Exp=0x7ff Frac=0x80000
- Power:
quiet: Sign=0 Exp=0x7ff Frac=0x80000
signalling: Sign=0 Exp=0x7ff Frac=0x5555555500055555
- Alpha:
quiet: Sign=0 Exp=0 Frac=0xfff8000000000000
signalling: Sign=1 Exp=0x2aa Frac=0x7ff5555500055555
因此,要验证这一点,您确实需要这些处理器之一并尝试一下.此外,欢迎任何关于如何解释 Power 和 Alpha 架构的较长值的见解.
So, to verify this you would really need one of these processors and go try it out. Also any insights on how to interpret the longer values for the Power and Alpha architectures are welcome.
相关文章