numpy 是否自动针对树莓派进行了优化
问题描述
Raspberry Pi(armv7l 架构)支持可用于优化的 neon vfpv4.
The Raspberry Pi ( armv7l architecture ) has neon vfpv4 support which can be used for optimization.
安装命令 pip3 install numpy
或 apt-get python3-numpy
时,标准版 numpy 是否包括这些优化?
Does the standard version of numpy include these optimizations when installing the command pip3 install numpy
or apt-get python3-numpy
?
我不是在谈论 blas 和 lapack.原生 numpy.
I am not talking about blas and lapack. Native numpy.
解决方案
正如 Mark Setchell 所指出的,numpy
似乎没有针对 NEON 内在函数的特定代码.然而,这还不是全部.现代编译器经常能够采用串行编写的代码并将其转换为使用 SIMD 内在函数.例如,GCC 可以部分展开循环并使用 NEON 的 SIMD 指令同时执行循环的多次迭代.
As Mark Setchell noted, numpy
does not appear to have specific code that targets NEON intrinsics. However, that is not the full story. Modern compilers are frequently able to take serially written code and transform it to use SIMD intrinsics. For instance, GCC can partially unroll loops and use NEON's SIMD instructions to perform multiple iterations of the loop at the same time.
接下来要注意的是 pip install
和 apt-get install
会做不同的事情.apt-get
将从 Raspbian/Debian 存储库中获取预构建的二进制文件(取决于您使用的是哪个).而 pip
在 ARM 架构上只能获取 numpy
的来源.这是因为 Python 包索引 (PyPI) 不存储 ARM 架构的二进制文件.
The next thing to note is that pip install
and apt-get install
will do different things. apt-get
will fetch a prebuilt binary from the Raspbian/Debian repository (depending which you are using). Whereas pip
can only fetch the source of numpy
when on ARM architectures. This is because the Python Package Index (PyPI) does not store binaries for ARM architectures.
Debian 和 Raspbian 的存储库中似乎都有 armhf
版本的 python3-numpy
.hf
代表硬浮点"——在硬件而不是软件中完成的浮点计算.这个 debian 页面 似乎也暗示 armhf
软件包已被编译为利用 NEON 内在函数,但结果有限.也就是说,GCC 正在使用 NEON 内在函数,但还没有像使用 SSE/SSE2 内在函数时那样精细调整.
Debian and Raspbian both appear to have armhf
versions of python3-numpy
in their repositories. The hf
stands for "hard float" -- floating point computations done in hardware as opposed to software. This debian page also appears to suggest that armhf
packages have been compiled to take advantage of NEON intrinsics, but results have been limited. That is, GCC is using the NEON intrinsics, but isn't as finely tuned (yet) as it is when using SSE/SSE2 intrinsics.
pip
将是更糟糕的选择,因为 GCC 在针对 ARM 浮点指令时似乎有点谨慎.也就是说,pip 将下载 numpy 源并在您的 Raspberry Pi 上编译它,但默认情况下可能不会尽可能多地优化代码.您可能需要通过使用 --global-option
参数告诉 pip 使用一些编译器选项.例如 --global-option="-mfloat-abi=hard"
.您可以在这里找到一整套传递选项.
pip
would be the worse option in this case as it appears that GCC is a bit cautious when it comes to targeting ARM floating point instructions. That is, pip will download the numpy source and compile it on your Raspberry Pi, but might not optimise the code as much as it can by default. You will probably need to tell pip to use a few compiler options by using the --global-option
argument. eg --global-option="-mfloat-abi=hard"
. You can find a comprehensive set of options to pass here.
相关文章