segfault 使用 numpy 的 lapack_lite 在 osx 上进行多处理，而不是 linux

2022-01-12 00:00:00 python numpy multiprocessing segmentation-fault

问题描述

以下测试代码在 OSX 10.7.3 上对我来说是段错误，但在其他机器上没有:

The following test code segfaults for me on OSX 10.7.3, but not other machines:

from __future__ import print_function import numpy as np import multiprocessing as mp import scipy.linalg def f(a): print("about to call") ### these all cause crashes sign, x = np.linalg.slogdet(a) #x = np.linalg.det(a) #x = np.linalg.inv(a).sum() ### these are all fine #x = scipy.linalg.expm3(a).sum() #x = np.dot(a, a.T).sum() print("result:", x) return x def call_proc(a): print(" calling with multiprocessing") p = mp.Process(target=f, args=(a,)) p.start() p.join() if __name__ == '__main__': import sys n = int(sys.argv[1]) if len(sys.argv) > 1 else 50 a = np.random.normal(0, 2, (n, n)) f(a) call_proc(a) call_proc(a)

其中一个段错误的示例输出:

Example output for one of the segfaulty ones:

$ python2.7 test.py about to call result: -4.96797718087 calling with multiprocessing about to call calling with multiprocessing about to call

OSX问题报告"弹出，抱怨像 KERN_INVALID_ADDRESS at 0x0000000000000108 这样的段错误；这是一个完整的.

with an OSX "problem report" popping up complaining about a segfault like KERN_INVALID_ADDRESS at 0x0000000000000108; here's a full one.

如果我用 n <= 32 运行它，它运行良好；对于任何 n >= 33，它都会崩溃.

If I run it with n <= 32, it runs fine; for any n >= 33, it crashes.

如果我注释掉在原始过程中完成的 f(a) 调用，那么对 call_proc 的两个调用都可以.如果我在不同的大数组上调用 f ，它仍然会出现段错误；如果我在不同的小数组上调用它，或者如果我调用 f(large_array) 然后将 f(small_array) 传递给不同的进程，它工作正常.它们实际上不需要是相同的功能.np.inv(large_array) 然后传递给 np.linalg.slogdet(different_large_array) 也是段错误.

If I comment out the f(a) call that's done in the original process, both calls to call_proc are fine. It still segfaults if I call f on a different large array; if I call it on a different small array, or if I call f(large_array) and then pass off f(small_array) to a different process, it works fine. They don't actually need to be the same function; np.inv(large_array) followed by passing off to np.linalg.slogdet(different_large_array) also segfaults.

f 中所有被注释掉的 np.linalg 东西都会导致崩溃；np.dot(self.a, self.a.T).sum() 和 scipy.linalg.exp3m 工作正常.据我所知，区别在于前者使用 numpy 的 lapack_lite 而后者不使用.

All of the commented-out np.linalg things in f cause crashes; np.dot(self.a, self.a.T).sum() and scipy.linalg.exp3m work fine. As far as I can tell, the difference is that the former use numpy's lapack_lite and the latter don't.

这发生在我的桌面上

python 2.6.7，numpy 1.5.1
python 2.7.1、numpy 1.5.1、scipy 0.10.0
python 3.2.2、numpy 1.6.1、scipy 0.10.1

2.6和2.7我认为是系统默认安装的；我从源代码压缩包手动安装了 3.2 版本.所有这些 numpy 都链接到系统 Accelerate 框架:

The 2.6 and 2.7 are I think the default system installs; I installed the 3.2 versions manually from the source tarballs. All of those numpys are linked to the system Accelerate framework:

$ otool -L `python3.2 -c 'from numpy.core import _dotblas; print(_dotblas.__file__)'` /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/numpy/core/_dotblas.so: /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.1)

我在另一台具有类似设置的 Mac 上得到相同的行为.

I get the same behavior on another Mac with a similar setup.

但是 f 的所有选项都可以在其他运行的机器上运行

But all of the options for f work on other machines running

OSX 10.6.8 与 Python 2.6.1 和 numpy 1.2.1 链接到 Accelerate 4 和 vecLib 268(除了它没有 scipy 或 slogdet)
Debian 6 与 Python 3.2.2、numpy 1.6.1 和 scipy 0.10.1 链接到系统 ATLAS
Ubuntu 11.04 与 Python 2.7.1、numpy 1.5.1 和 scipy 0.8.0 链接到系统 ATLAS

我在这里做错了吗?这可能是什么原因造成的?我不明白如何在一个被腌制和解封的 numpy 数组上运行一个函数可能会导致它稍后在不同的进程中出现段错误.

Am I doing something wrong here? What could possibly be causing this? I don't see how running a function on a numpy array that's getting pickled and unpickled can possibly cause it to later segfault in a different process.

更新:当我进行核心转储时，回溯位于 dispatch_group_async_f 内部，即 Grand Central Dispatch 接口.大概这是 numpy/GCD 和多处理之间的交互中的一个错误.我已将此报告为一个 numpy 错误，但如果有人对解决方法有任何想法，或者就此而言，如何解决该错误，将不胜感激.:)

Update: when I do a core dump, the backtrace is inside dispatch_group_async_f, the Grand Central Dispatch interface. Presumably this is a bug in the interactions between numpy/GCD and multiprocessing. I`ve reported this as a numpy bug, but if anyone has any ideas about workarounds or, for that matter, how to solve the bug, it'd be greatly appreciated. :)

解决方案

原来OSX上默认使用的Accelerate框架只是不支持在 fork 的两侧使用 BLAS 调用.除了链接到不同的 BLAS 之外，没有真正的解决方法，而且这似乎不是他们有兴趣修复的问题.

It turns out that the Accelerate framework used by default on OSX just doesn't support using BLAS calls on both sides of a fork. No real way to deal with this other than linking to a different BLAS, and it doesn't seem like something they're interested in fixing.

相关文章