使用futures.ProcessPoolExecutor而不使用futures.ThreadPoolExecutor时的递归最大错误

问题描述

我正在使用此代码来抓取API:

submissions = get_submissions(1)
with futures.ProcessPoolExecutor(max_workers=4) as executor:
#or using this: with futures.ThreadPoolExecutor(max_workers=4) as executor:
    for s in executor.map(map_func, submissions):
        collection_front.update({"time_recorded":time_recorded}, {'$push':{"thread_list":s}}, upsert=True)

它对线程非常有效/速度很快,但当我尝试使用进程时,我得到了一个满的队列,并出现以下错误:

  File "/usr/local/lib/python3.4/dist-packages/praw/objects.py", line 82, in __getattr__
    if not self.has_fetched:
RuntimeError: maximum recursion depth exceeded
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 251, in _queue_management_worker
    shutdown_worker()
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 209, in shutdown_worker
    call_queue.put_nowait(None)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 131, in put_nowait
    return self.put(obj, False)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 82, in put
    raise Full
queue.Full

Traceback (most recent call last):
  File "reddit_proceses.py", line 64, in <module>
    for s in executor.map(map_func, submissions):
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 549, in result_iterator
    yield future.result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 402, in result
    return self.__get_result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

请注意,最初,对于小的数据检索,这些过程非常有效且非常快速,但现在它们根本不起作用。这是错误还是praw对象会导致进程出现递归错误,但线程不会出现递归错误?


解决方案

我在从线程移动到进程时遇到了类似的问题,只是我使用的是Executor.Submit。我认为这可能与您遇到的问题相同,但我不能确定,因为我不知道您的代码在什么上下文中运行。

在我的例子中,发生的情况是:我将代码作为脚本运行,并且我没有使用始终推荐的if __name__ == "__main__":。看起来,当使用Executor运行新进程时,Python加载.py文件并运行Submit中指定的函数。因为它加载文件,所以主文件(而不是函数内部或上面的if语句)上存在的代码将被运行,因此每个进程将再次运行一个新进程,具有无限递归。

看起来线程不会出现这种情况。

相关文章