浮点转换实际上是如何在 C++ 中完成的?(双浮点或浮点双倍)
所以我搜索了这个主题,并没有发现任何与它真正相关的内容.
So I've searched about this topic and found nothing really relevant about it.
我试图查看这个简单代码背后的程序集:
I've tried to look at the assembly behind this simple code :
int main(int argc, char *argv[])
{
double d = 1.0;
float f = static_cast<float>(d);
system("PAUSE");
return 0;
}
这是(使用 Visual Studio 2012):
which is (with Visual Studio 2012) :
15: double d = 1.0;
000000013FD7C16D movsd xmm0,mmword ptr [__real@3ff0000000000000 (013FD91AB0h)]
000000013FD7C175 movsd mmword ptr [d],xmm0
16: float f = static_cast<float>(d);
000000013FD7C17B cvtsd2ss xmm0,mmword ptr [d]
000000013FD7C181 movss dword ptr [f],xmm0
我对组装不太满意,但无论如何我都试图分析它.所以前两行好像是把双精度值3ff0000000000000
移动到一个寄存器中,然后把寄存器的内容移动到d的内存地址中.
I'm not that comfortable with assembly but tried to analyze that anyway.
So the first two lines seems to move the double-precision value 3ff0000000000000
into a register, and then move the content of the register to the memory adress of d.
然后,我只是不知道下一行是什么.cvtsd2ss
操作显然是一个转换双精度浮点值的指令到单精度浮点值,但我找不到这条指令实际上做了什么.(然后将转换后的值移动到 f 的内存空间中).
Then, I just don't know exactly what does the next lines. The cvtsd2ss
operation is apparently an instruction that convert double precision floating point value to single precision floating point value but I couldn't find what this instruction actually does.
(Then the converted value is moved to the memory space of f).
所以我的问题是,这个转换实际上是如何通过这条指令完成的?我知道 C++ 强制转换会在另一种类型中产生最接近的值,但除此之外,我不知道实际执行的操作......
So my question is, how is this conversion actually done by this instruction ? I know that the C++ cast will yield the closest value in the other type but apart from that, I have no idea about the actual operations performed...
推荐答案
cvtsd2ss
指令使用FPU 的舍入模式进行转换.默认舍入模式是舍入到最近的偶数.
The cvtsd2ss
instruction uses the FPU's rounding mode to do the conversion. The default rounding mode is round-to-nearest-even.
为了遵循该算法,请牢记 IEEE 754-1985 维基百科页面上的信息
In order to follow the algorithm, it helps to keep in mind the information at the IEEE 754-1985 Wikipedia page, especially the diagrams representing the layout.
首先计算目标float
的指数:double
类型的范围比float
要大,所以结果可能是0.0f
(或非正规)表示非常小的 double
,或无限值表示非常大的 double.
First, the exponent of the target float
is computed: the double
type has a wider range than float
, so the result may be 0.0f
(or a denormal) for a very small double
, or an infinite value for a very large double.
对于普通 double
被转换为普通 float
的通常情况(粗略地说,当 double
的无偏指数可以是以单精度表示的 8 位表示),目标有效数字的前 23 位与原始数字的 52 位有效数字的最高有效位相同.
For the usual case of a normal double
being converted to a normal float
(roughly, when the unbiased exponent of the double
can be represented in the 8 bits of a single-precision representation), the first 23 bits of the destination significand start out the same as the most significant of the original number's 52-bit significand.
然后就是四舍五入的问题:
Then there is the problem of rounding:
如果剩余位低于
10..0
,则目标有效位保持原样.
if the left-over bits are below
10..0
, then the target significand is left as-is.
如果剩余位高于10..0
,则目标有效位递增.如果增加它会使其溢出(因为它已经是 1..1
),那么进位会传播到指数位.由于精心设计了 IEEE 754 布局,这会产生正确的结果.
If the left-over bits are above 10..0
, then the target significand is incremented. If incrementing it makes it overflow (because it is already 1..1
), then the carry is propagated into the exponent bits. This produces the correct result because of the careful way the IEEE 754 layout has been designed.
如果剩下的位正好是 10..0
,那么 double
正好在两个 float
之间.在这两个选项中,选择最后一位 0
(偶数")的那个.
If the bits left over are exactly 10..0
, then the double
is exactly midway between two float
s. Of these two choices, the one with the last bit 0
("even") is chosen.
经过这一步,目标有效位对应于最接近原始double
的float
.
After this step, the target significand corresponds to the float
nearest to the original double
.
定向舍入模式只是更简单.目标 float
是非正规的情况稍微复杂一些(必须小心避免双舍入").
The directed rounding modes are only simpler. The case where the target float
is a denormal is slightly more complicated (one must be careful to avoid "double-rounding").
相关文章