multiprocessing.Pool - PicklingError: Can't pickle <type 'thread.lock'>: 属性查找 thread.lock 失败

2022-01-12 00:00:00 python multiprocessing pickle threadpool

问题描述

multiprocessing.Pool 快把我逼疯了...
我想升级许多软件包，并且对于每个软件包，我都必须检查是否有更高版本.这是由 check_one 函数完成的.
主要代码在 Updater.update 方法中:在那里我创建了 Pool 对象并调用 map() 方法.

multiprocessing.Pool is driving me crazy...
I want to upgrade many packages, and for every one of them I have to check whether there is a greater version or not. This is done by the check_one function.
The main code is in the Updater.update method: there I create the Pool object and call the map() method.

代码如下:

def check_one(args): res, total, package, version = args i = res.qsize() logger.info('[{0:.1%} - {1}, {2} / {3}]', i / float(total), package, i, total, addn=False) try: json = PyPIJson(package).retrieve() new_version = Version(json['info']['version']) except Exception as e: logger.error('Error: Failed to fetch data for {0} ({1})', package, e) return if new_version > version: res.put_nowait((package, version, new_version, json)) class Updater(FileManager): # __init__ and other methods... def update(self): logger.info('Searching for updates') packages = Queue.Queue() data = ((packages, self.set_len, dist.project_name, Version(dist.version)) for dist in self.working_set) pool = multiprocessing.Pool() pool.map(check_one, data) pool.close() pool.join() while True: try: package, version, new_version, json = packages.get_nowait() except Queue.Empty: break txt = 'A new release is avaiable for {0}: {1!s} (old {2}), update'.format(package, new_version, version) u = logger.ask(txt, bool=('upgrade version', 'keep working version'), dont_ask=self.yes) if u: self.upgrade(package, json, new_version) else: logger.info('{0} has not been upgraded', package) self._clean() logger.success('Updating finished successfully')

当我运行它时，我得到了这个奇怪的错误:

When I run it I get this weird error:

Searching for updates Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 505, in run self.__target(*self.__args, **self.__kwargs) File "/usr/local/lib/python2.7/dist-packages/multiprocessing/pool.py", line 225, in _handle_tasks put(task) PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed

解决方案

multiprocessing 通过 将任务(包括 check_one 和 data)传递给工作进程mp.SimpleQueue.与 Queue.Queue 不同，放在 mp.SimpleQueue 中的所有内容都必须是可选择的.Queue.Queues 是不可挑选的:

multiprocessing passes tasks (which include check_one and data) to the worker processes through a mp.SimpleQueue. Unlike Queue.Queues, everything put in the mp.SimpleQueue must be pickable. Queue.Queues are not pickable:

import multiprocessing as mp import Queue def foo(queue): pass pool=mp.Pool() q=Queue.Queue() pool.map(foo,(q,))

产生此异常:

UnpickleableError: Cannot pickle <type 'thread.lock'> objects

您的 data 包括 packages，这是一个 Queue.Queue.这可能是问题的根源.

Your data includes packages, which is a Queue.Queue. That might be the source of the problem.

这是一个可能的解决方法:Queue 用于两个目的:

Here is a possible workaround: The Queue is being used for two purposes:

找出近似大小(通过调用qsize)
存储结果以供日后检索.

我们可以使用 mp.Value，而不是调用 qsize，以便在多个进程之间共享一个值.

Instead of calling qsize, to share a value between multiple processes, we could use a mp.Value.

我们可以(并且应该)只返回来自对 check_one 的调用的值，而不是将结果存储在队列中.pool.map 将结果收集到自己制作的队列中，并将结果作为 pool.map 的返回值返回.

Instead of storing results in a queue, we can (and should) just return values from calls to check_one. The pool.map collects the results in a queue of its own making, and returns the results as the return value of pool.map.

例如:

import multiprocessing as mp import Queue import random import logging # logger=mp.log_to_stderr(logging.DEBUG) logger = logging.getLogger(__name__) qsize = mp.Value('i', 1) def check_one(args): total, package, version = args i = qsize.value logger.info('[{0:.1%} - {1}, {2} / {3}]'.format( i / float(total), package, i, total)) new_version = random.randrange(0,100) qsize.value += 1 if new_version > version: return (package, version, new_version, None) else: return None def update(): logger.info('Searching for updates') set_len=10 data = ( (set_len, 'project-{0}'.format(i), random.randrange(0,100)) for i in range(set_len) ) pool = mp.Pool() results = pool.map(check_one, data) pool.close() pool.join() for result in results: if result is None: continue package, version, new_version, json = result txt = 'A new release is avaiable for {0}: {1!s} (old {2}), update'.format( package, new_version, version) logger.info(txt) logger.info('Updating finished successfully') if __name__=='__main__': logging.basicConfig(level=logging.DEBUG) update()

相关文章

multiprocessing.Pool - PicklingError: Can't pickle &lt;type 'thread.lock'&gt;: 属性查找 thread.lock 失败

问题描述

解决方案

multiprocessing.Pool - PicklingError: Can't pickle <type 'thread.lock'>: 属性查找 thread.lock 失败