C 浮点精度
可能重复:
浮点比较
我对 C/C++ 中浮点数的准确性有疑问.当我执行下面的程序时:
#include <stdio.h>int main (void) {浮点数 = 101.1;双 b = 101.1;printf ("a: %f
", a);printf ("b: %lf
", b);返回0;}
结果:
a: 101.099998b: 101.100000
我相信 float 应该有 32 位所以应该足以存储 101.1 为什么?
解决方案如果它们可以通过将 2 (即,2-n
比如 1
, 1/2
, 1/4
, 1/65536
等)取决于可用于精度的位数.
在浮点数(23 位精度)或双倍数(52 位精度)提供的缩放范围内,没有任何组合可以使您精确到 101.1.p>
如果您想要快速教程了解这种倒置的二次幂的工作原理,请参阅 这个答案.
将该答案中的知识应用于您的 101.1
数字(作为单精度浮点数):
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmmm 1/n0 10000101 10010100011001100110011||||||||||+- 8388608|||||||||+-- 4194304||||||||+----- 524288|||||||+-------- 262144||||||+--------- 32768|||||+--------- 16384||||+------------- 2048|||+------------- 1024||+----------------- 64|+-------------------- 16+------------------------ 2
对于 101.1
,尾数部分实际上永远持续:
mmmmmmmmm mmmm mmmm mmmm mm100101000 1100 1100 1100 11|00 1100(以此类推).
因此,这不是精度问题,没有多少有限位可以准确地以 IEEE754 格式表示该数字.
使用位来计算实际数(最接近的近似值),符号为正.指数为 128+4+1 = 133 - 127 偏差 = 6,因此乘数为 26 或 64.
尾数由 1(隐式基数)加上(对于所有这些位,每个位值 1/(2n),因为 n 从 1 开始并向右增加),{1/2、1/16、1/64、1/1024、1/2048、1/16384、1/32768、1/262144、1/524288、1/4194304、1/8388608}
.
当你把所有这些加起来,你会得到 1.57968747615814208984375
.
当您将其乘以之前计算的乘数 64
时,您会得到 101.09999847412109375
.
所有数字都是用bc
使用100 位十进制数字的比例计算的,导致很多尾随零,因此数字应该非常准确.更重要的是,因为我检查了结果:
#include <stdio.h>int main (void) {浮动 f = 101.1f;printf ("%.50f
", f);返回0;}
也给了我101.09999847412109375000...
.
Possible Duplicate:
Floating point comparison
I have a problem about the accuracy of float in C/C++. When I execute the program below:
#include <stdio.h>
int main (void) {
float a = 101.1;
double b = 101.1;
printf ("a: %f
", a);
printf ("b: %lf
", b);
return 0;
}
Result:
a: 101.099998
b: 101.100000
I believe float should have 32-bit so should be enough to store 101.1 Why?
解决方案You can only represent numbers exactly in IEEE754 (at least for the single and double precision binary formats) if they can be constructed from adding together inverted powers of two (i.e., 2-n
like 1
, 1/2
, 1/4
, 1/65536
and so on) subject to the number of bits available for precision.
There is no combination of inverted powers of two that will get you exactly to 101.1, within the scaling provided by floats (23 bits of precision) or doubles (52 bits of precision).
If you want a quick tutorial on how this inverted-power-of-two stuff works, see this answer.
Applying the knowledge from that answer to your 101.1
number (as a single precision float):
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm 1/n
0 10000101 10010100011001100110011
| | | || || || |+- 8388608
| | | || || || +-- 4194304
| | | || || |+----- 524288
| | | || || +------ 262144
| | | || |+--------- 32768
| | | || +---------- 16384
| | | |+------------- 2048
| | | +-------------- 1024
| | +------------------ 64
| +-------------------- 16
+----------------------- 2
The mantissa part of that actually continues forever for 101.1
:
mmmmmmmmm mmmm mmmm mmmm mm
100101000 1100 1100 1100 11|00 1100 (and so on).
hence it's not a matter of precision, no amount of finite bits will represent that number exactly in IEEE754 format.
Using the bits to calculate the actual number (closest approximation), the sign is positive. The exponent is 128+4+1 = 133 - 127 bias = 6, so the multiplier is 26 or 64.
The mantissa consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2n) as n starts at 1 and increases to the right), {1/2, 1/16, 1/64, 1/1024, 1/2048, 1/16384, 1/32768, 1/262144, 1/524288, 1/4194304, 1/8388608}
.
When you add all these up, you get 1.57968747615814208984375
.
When you multiply that by the multiplier previously calculated, 64
, you get 101.09999847412109375
.
All numbers were calculated with bc
using a scale of 100 decimal digits, resulting in a lot of trailing zeros, so the numbers should be very accurate. Doubly so, since I checked the result with:
#include <stdio.h>
int main (void) {
float f = 101.1f;
printf ("%.50f
", f);
return 0;
}
which also gave me 101.09999847412109375000...
.
相关文章