为什么 numpy 计算不受全局解释器锁的影响?
问题描述
我正在尝试决定是否应该使用多处理或线程,并且我学到了一些关于 全局解释器锁.在这篇不错的博文中,似乎多线程不适合繁忙的任务.但是,我也了解到某些功能,例如 I/O 或 numpy,不受 GIL 的影响.
I'm trying to decide if I should use multiprocessing or threading, and I've learned some interesting bits about the Global Interpreter Lock. In this nice blog post, it seems multithreading isn't suitable for busy tasks. However, I also learned that some functionality, such as I/O or numpy, is unaffected by the GIL.
谁能解释一下原因,以及我如何确定我的(可能是相当 numpy-heavy)代码是否适合多线程?
Can anyone explain why, and how I can find out if my (probably quite numpy-heavy) code is going to be suitable for multithreading?
解决方案
许多 numpy 计算不受 GIL 影响,但不是全部.
Many numpy calculations are unaffected by the GIL, but not all.
在不需要 Python 解释器的代码(例如 C 库)中,可以专门释放 GIL - 允许依赖于解释器的其他代码继续运行.在 Numpy C 代码库中,宏 NPY_BEGIN_THREADS
和 NPY_END_THREADS
用于分隔允许 GIL 发布的代码块.你可以在 这个 numpy 源的搜索中看到这些.
While in code that does not require the Python interpreter (e.g. C libraries) it is possible to specifically release the GIL - allowing other code that depends on the interpreter to continue running. In the Numpy C codebase the macros NPY_BEGIN_THREADS
and NPY_END_THREADS
are used to delimit blocks of code that permit GIL release. You can see these in this search of the numpy source.
NumPy C API 文档 有更多关于线程支持的信息.注意处理条件 GIL 释放的附加宏 NPY_BEGIN_THREADS_DESCR
、NPY_END_THREADS_DESCR
和 NPY_BEGIN_THREADS_THRESHOLDED
,取决于数组 dtypes
和大小的循环.
The NumPy C API documentation has more information on threading support. Note the additional macros NPY_BEGIN_THREADS_DESCR
, NPY_END_THREADS_DESCR
and NPY_BEGIN_THREADS_THRESHOLDED
which handle conditional GIL release, dependent on array dtypes
and the size of loops.
大多数核心函数都发布了 GIL - 例如 通用函数 (ufunc) 这样做 如所述:
Most core functions release the GIL - for example Universal Functions (ufunc) do so as described:
只要不涉及对象数组,Python 全局解释器锁 (GIL) 就会在调用循环之前释放.必要时重新获取它以处理错误情况.
as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.
关于您自己的代码,NumPy 的源代码可用.检查您为上述宏使用的函数(以及它们调用的函数).另请注意,性能优势在很大程度上取决于多长时间 GIL 发布 - 如果您的代码不断地加入/退出 Python,您将不会看到太大的改进.
With regard to your own code, the source code for NumPy is available. Check the functions you use (and the functions they call) for the above macros. Note also that the performance benefit is heavily dependent on how long the GIL is released - if your code is constantly dropping in/out of Python you won't see much of an improvement.
另一种选择是测试它.但是,请记住,使用条件 GIL 宏的函数可能会针对小型和大型数组表现出不同的行为.因此,使用小数据集的测试可能无法准确表示大型任务的性能.
The other option is to just test it. However, bear in mind that functions using the conditional GIL macros may exhibit different behaviour with small and large arrays. A test with a small dataset may therefore not be an accurate representation of performance for a larger task.
官方 wiki 上提供了一些关于使用 numpy 进行并行处理的附加信息 以及一篇关于 Python GIL 的有用帖子在 Programmers.SE 上.
There is some additional information on parallel processing with numpy available on the official wiki and a useful post about the Python GIL in general over on Programmers.SE.
相关文章