使用 double 比 float 快吗?
双值存储更高的精度并且是浮点数的两倍,但英特尔 CPU 是否针对浮点数进行了优化?
Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats?
也就是说,对于 +、-、* 和/而言,双重运算是否与浮点运算一样快或更快?
That is, are double operations just as fast or faster than float operations for +, -, *, and /?
64 位架构的答案会改变吗?
Does the answer change for 64-bit architectures?
推荐答案
没有一个intel CPU",尤其是在哪些操作方面相对于其他人进行了优化!级别(特别是在 FPU 内),是您问题的答案:
There isn't a single "intel CPU", especially in terms of what operations are optimized with respect to others!, but most of them, at CPU level (specifically within the FPU), are such that the answer to your question:
是双重操作一样快或比 +、-、的浮点运算更快*, 和/?
are double operations just as fast or faster than float operations for +, -, *, and /?
是是";-- 在CPU内,除了double
比 float
慢一些.(假设您的编译器使用 SSE2 进行标量 FP 数学运算,就像所有 x86-64 编译器一样,以及一些 32 位编译器取决于选项.传统 x87 在寄存器中没有不同的宽度,仅在内存中(它在加载/存储时转换)),所以从历史上看,对于 double
来说,即使是 sqrt 和除法也同样慢.
is "yes" -- within the CPU, except for division and sqrt which are somewhat slower for double
than for float
. (Assuming your compiler uses SSE2 for scalar FP math, like all x86-64 compilers do, and some 32-bit compilers depending on options. Legacy x87 doesn't have different widths in registers, only in memory (it converts on load/store), so historically even sqrt and division were just as slow for double
).
例如,Haswell 的 divsd
吞吐量为每 8 到 14 个周期一个(取决于数据),但 divss
(单标量)吞吐量为每 7 个周期一个循环.x87 fdiv
是 8 到 18 个周期的吞吐量.(来自 https://agner.org/optimize/ 的数字.延迟与除法的吞吐量相关,但更高比吞吐量数字.)
For example, Haswell has a divsd
throughput of one per 8 to 14 cycles (data-dependent), but a divss
(scalar single) throughput of one per 7 cycles. x87 fdiv
is 8 to 18 cycle throughput. (Numbers from https://agner.org/optimize/. Latency correlates with throughput for division, but is higher than the throughput numbers.)
logf(float)
和 sinf(float)
等许多库函数的 float
版本也会更快strong> 比 log(double)
和 sin(double)
,因为它们的精度要少得多.他们可以使用具有较少项的多项式近似来获得 float
与 double
The float
versions of many library functions like logf(float)
and sinf(float)
will also be faster than log(double)
and sin(double)
, because they have many fewer bits of precision to get right. They can use polynomial approximations with fewer terms to get full precision for float
vs. double
然而,每个数字占用两倍的内存显然意味着缓存负载更重,内存带宽更多来填充和溢出这些缓存行/到内存;当您执行大量这样的操作时,您关心浮点运算的性能,因此内存和缓存注意事项至关重要.
However, taking up twice the memory for each number clearly implies heavier load on the cache(s) and more memory bandwidth to fill and spill those cache lines from/to RAM; the time you care about performance of a floating-point operation is when you're doing a lot of such operations, so the memory and cache considerations are crucial.
@Richard 的回答指出还有其他方法可以执行 FP 操作(SSE/SSE2 指令;旧的 MMX 仅是整数),特别适用于大量数据(SIMD",单指令/多数据)的简单操作,其中每个向量寄存器可以打包 4 个单精度浮点数或仅2个双精度,这样效果会更显着.
@Richard's answer points out that there are also other ways to perform FP operations (the SSE / SSE2 instructions; good old MMX was integers-only), especially suitable for simple ops on lot of data ("SIMD", single instruction / multiple data) where each vector register can pack 4 single-precision floats or only 2 double-precision ones, so this effect will be even more marked.
最后,您确实必须进行基准测试,但我的预测是,对于合理的(即大;-)基准测试,您会发现坚持使用单精度的优势(当然假设您不需要需要额外的精度!-).
In the end, you do have to benchmark, but my prediction is that for reasonable (i.e., large;-) benchmarks, you'll find advantage to sticking with single precision (assuming of course that you don't need the extra bits of precision!-).
相关文章