numpy float:比算术运算中内置的慢 10 倍?
问题描述
以下代码的时间非常奇怪:
I am getting really weird timings for the following code:
import numpy as np
s = 0
for i in range(10000000):
s += np.float64(1) # replace with np.float32 and built-in float
- 内置浮点:4.9 秒
- float64:10.5 秒
- float32:45.0 秒
- 函数调用
- numpy 与 python float 的转换
- 对象的创建
为什么 float64
比 float
慢两倍?为什么 float32
比 float64 慢 5 倍?
Why is float64
twice slower than float
? And why is float32
5 times slower than float64?
有什么办法可以避免使用 np.float64
的惩罚,并让 numpy
函数返回内置 float
而不是 <代码>float64?
Is there any way to avoid the penalty of using np.float64
, and have numpy
functions return built-in float
instead of float64
?
我发现使用 numpy.float64
比 Python 的 float 慢很多,而 numpy.float32
甚至更慢(即使我在 32 位机器上)).
I found that using numpy.float64
is much slower than Python's float, and numpy.float32
is even slower (even though I'm on a 32-bit machine).
numpy.float32
在我的 32 位机器上.因此,每次我使用各种 numpy 函数(例如 numpy.random.uniform
)时,我都会将结果转换为 float32
(以便以 32 位精度执行进一步的操作).
numpy.float32
on my 32-bit machine. Therefore, every time I use various numpy functions such as numpy.random.uniform
, I convert the result to float32
(so that further operations would be performed at 32-bit precision).
有没有办法在程序或命令行中的某处设置单个变量,并使所有 numpy 函数返回 float32
而不是 float64
?
Is there any way to set a single variable somewhere in the program or in the command line, and make all numpy functions return float32
instead of float64
?
编辑#1:
numpy.float64 在算术计算中比 float 慢 10 倍.太糟糕了,即使在计算之前转换为浮点数并返回,程序运行速度也快了 3 倍.为什么?有什么办法可以解决吗?
numpy.float64 is 10 times slower than float in arithmetic calculations. It's so bad that even converting to float and back before the calculations makes the program run 3 times faster. Why? Is there anything I can do to fix it?
我想强调,我的时间安排不是由于以下任何原因:
I want to emphasize that my timings are not due to any of the following:
我更新了我的代码,以更清楚地说明问题所在.使用新代码,我似乎看到使用 numpy 数据类型会带来十倍的性能损失:
I updated my code to make it clearer where the problem lies. With the new code, it would seem I see a ten-fold performance hit from using numpy data types:
from datetime import datetime
import numpy as np
START_TIME = datetime.now()
# one of the following lines is uncommented before execution
#s = np.float64(1)
#s = np.float32(1)
#s = 1.0
for i in range(10000000):
s = (s + 8) * s % 2399232
print(s)
print('Runtime:', datetime.now() - START_TIME)
时间是:
- float64:34.56 秒
- float32:35.11 秒
- 浮动:3.53 秒
为了它,我也试过了:
从日期时间导入日期时间将 numpy 导入为 np
from datetime import datetime import numpy as np
START_TIME = datetime.now()
s = np.float64(1)
for i in range(10000000):
s = float(s)
s = (s + 8) * s % 2399232
s = np.float64(s)
print(s)
print('Runtime:', datetime.now() - START_TIME)
执行时间为13.28 s;实际上,将 float64
转换为 float
并返回比按原样使用要快 3 倍.尽管如此,转换还是要付出代价,因此总体而言,与纯 Python float
相比,它的速度要慢 3 倍以上.
The execution time is 13.28 s; it's actually 3 times faster to convert the float64
to float
and back than to use it as is. Still, the conversion takes its toll, so overall it's more than 3 times slower compared to the pure-python float
.
我的机器是:
- 英特尔酷睿 2 双核 T9300 (2.5GHz)
- WinXP Professional(32 位)
- ActiveState Python 3.1.3.5
- Numpy 1.5.1
编辑 #2:
感谢您的回答,他们帮助我了解如何处理这个问题.
Thank you for the answers, they help me understand how to deal with this problem.
但我仍然想知道为什么下面的代码使用 float64
比使用 float
慢 10 倍的确切原因(也许基于源代码).
But I still would like to know the precise reason (based on the source code perhaps) why the code below runs 10 times slow with float64
than with float
.
编辑#3:
我在 Windows 7 x64 (Intel Core i7 930 @ 3.8GHz) 下重新运行代码.
I rerun the code under the Windows 7 x64 (Intel Core i7 930 @ 3.8GHz).
同样,代码是:
from datetime import datetime
import numpy as np
START_TIME = datetime.now()
# one of the following lines is uncommented before execution
#s = np.float64(1)
#s = np.float32(1)
#s = 1.0
for i in range(10000000):
s = (s + 8) * s % 2399232
print(s)
print('Runtime:', datetime.now() - START_TIME)
时间是:
- float64:16.1s
- float32:16.1 秒
- 浮动:3.2 秒
现在两个 np
浮点数(64 或 32)都比内置 float
慢 5 倍.尽管如此,还是有很大的不同.我想弄清楚它是从哪里来的.
Now both np
floats (either 64 or 32) are 5 times slower than the built-in float
. Still, a significant difference. I'm trying to figure out where it comes from.
编辑结束
解决方案
总结
如果算术表达式同时包含 numpy
和内置数字,则 Python 算术运行速度较慢.避免这种转换几乎可以消除我报告的所有性能下降.
If an arithmetic expression contains both numpy
and built-in numbers, Python arithmetics works slower. Avoiding this conversion removes almost all of the performance degradation I reported.
详情
请注意,在我的原始代码中:
Note that in my original code:
s = np.float64(1)
for i in range(10000000):
s = (s + 8) * s % 2399232
float
和 numpy.float64
类型混合在一个表达式中.也许 Python 必须将它们全部转换为一种类型?
the types float
and numpy.float64
are mixed up in one expression. Perhaps Python had to convert them all to one type?
s = np.float64(1)
for i in range(10000000):
s = (s + np.float64(8)) * s % np.float64(2399232)
如果运行时没有改变(而不是增加),这表明 Python 确实在幕后做了什么,从而解释了性能拖累.
If the runtime is unchanged (rather than increased), it would suggest that's what Python indeed was doing under the hood, explaining the performance drag.
实际上,运行时间下降了 1.5 倍!这怎么可能?Python 可能要做的最糟糕的事情难道不是这两次转换吗?
Actually, the runtime fell by 1.5 times! How is it possible? Isn't the worst thing that Python could possibly have to do was these two conversions?
我真的不知道.也许 Python 必须动态检查什么需要转换成什么,这需要时间,并且被告知要执行哪些精确的转换可以使其更快.也许,一些完全不同的机制用于算术(根本不涉及转换),并且它恰好在不匹配的类型上非常慢.阅读 numpy
源代码可能会有所帮助,但这超出了我的技能范围.
I don't really know. Perhaps Python had to dynamically check what needs to be converted into what, which takes time, and being told what precise conversions to perform makes it faster. Perhaps, some entirely different mechanism is used for arithmetics (which doesn't involve conversions at all), and it happens to be super-slow on mismatched types. Reading numpy
source code might help, but it's beyond my skill.
无论如何,现在我们显然可以通过将转换移出循环来加快速度:
Anyway, now we can obviously speed things up more by moving the conversions out of the loop:
q = np.float64(8)
r = np.float64(2399232)
for i in range(10000000):
s = (s + q) * s % r
正如预期的那样,运行时间大幅减少:又减少了 2.3 倍.
As expected, the runtime is reduced substantially: by another 2.3 times.
公平地说,我们现在需要稍微更改 float
版本,将文字常量移出循环.这会导致轻微的 (10%) 减速.
To be fair, we now need to change the float
version slightly, by moving the literal constants out of the loop. This results in a tiny (10%) slowdown.
考虑到所有这些变化,代码的 np.float64
版本现在只比等效的 float
版本慢 30%;可笑的 5 倍性能损失已基本消失.
Accounting for all these changes, the np.float64
version of the code is now only 30% slower than the equivalent float
version; the ridiculous 5-fold performance hit is largely gone.
为什么我们仍然看到 30% 的延迟?numpy.float64
数字占用与 float
相同的空间,所以这不是原因.对于用户定义的类型,算术运算符的解析可能需要更长的时间.当然不是主要问题.
Why do we still see the 30% delay? numpy.float64
numbers take the same amount of space as float
, so that won't be the reason. Perhaps the resolution of the arithmetic operators takes longer for user-defined types. Certainly not a major concern.
相关文章