使用joblib库spacy生成_Pickle.PicklingError:无法选择任务以将其发送给工作进程
问题描述
我有一个很大的句子列表(大约700万个),我想从其中提取名词。
我使用joblib
库来并行化提取过程,如下所示:
import spacy
from tqdm import tqdm
from joblib import Parallel, delayed
nlp = spacy.load('en_core_web_sm')
class nouns:
def get_nouns(self, text):
doc = nlp(u"{}".format(text))
return [token.text for token in doc if token.tag_ in ['NN', 'NNP', 'NNS', 'NNPS']]
def parallelize(self, sentences):
results = Parallel(n_jobs=1)(delayed(self.get_nouns)(sent) for sent in tqdm(sentences))
return results
if __name__ == '__main__':
sentences = ['we went to the school yesterday',
'The weather is really cold',
'Can we catch the dog?',
'How old are you John?',
'I like diving and swimming',
'Can the world become united?']
obj = nouns()
print(obj.parallelize(sentences))
当PARALLEZIZE函数中的n_jobs
大于1时,我收到以下长错误:
100%|██████████| 6/6 [00:00<00:00, 200.00it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "C:Python35libsite-packagesjoblibexternalslokyackendqueues.py", line 150, in _feed
obj_ = dumps(obj, reducers=reducers)
File "C:Python35libsite-packagesjoblibexternalslokyackendeduction.py", line 243, in dumps
dump(obj, buf, reducers=reducers, protocol=protocol)
File "C:Python35libsite-packagesjoblibexternalslokyackendeduction.py", line 236, in dump
_LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 267, in dump
return Pickler.dump(self, obj)
File "C:Python35libpickle.py", line 408, in dump
self.save(obj)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 841, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 770, in save_list
self._batch_appends(obj)
File "C:Python35libpickle.py", line 797, in _batch_appends
save(tmp[0])
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 725, in save_tuple
save(element)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 718, in save_instancemethod
self.save_reduce(types.MethodType, (obj.__func__, obj.__self__), obj=obj)
File "C:Python35libpickle.py", line 599, in save_reduce
save(args)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 725, in save_tuple
save(element)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 395, in save_function
self.save_function_tuple(obj)
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 594, in save_function_tuple
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 841, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 599, in save_reduce
save(args)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 740, in save_tuple
save(element)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 740, in save_tuple
save(element)
File "C:Python35libpickle.py", line 495, in save
rv = reduce(self.proto)
File "stringsource", line 2, in preshed.maps.PreshMap.__reduce_cython__
TypeError: self.c_map cannot be converted to a Python object for pickling
"""Exception in thread QueueFeederThread:
Traceback (most recent call last):
File "C:Python35libsite-packagesjoblibexternalslokyackendqueues.py", line 150, in _feed
obj_ = dumps(obj, reducers=reducers)
File "C:Python35libsite-packagesjoblibexternalslokyackendeduction.py", line 243, in dumps
dump(obj, buf, reducers=reducers, protocol=protocol)
File "C:Python35libsite-packagesjoblibexternalslokyackendeduction.py", line 236, in dump
_LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 267, in dump
return Pickler.dump(self, obj)
File "C:Python35libpickle.py", line 408, in dump
self.save(obj)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 841, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 770, in save_list
self._batch_appends(obj)
File "C:Python35libpickle.py", line 797, in _batch_appends
save(tmp[0])
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 725, in save_tuple
save(element)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 718, in save_instancemethod
self.save_reduce(types.MethodType, (obj.__func__, obj.__self__), obj=obj)
File "C:Python35libpickle.py", line 599, in save_reduce
save(args)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 725, in save_tuple
save(element)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 395, in save_function
self.save_function_tuple(obj)
File "C:Python35libsite-packagesjoblibexternalscloudpicklecloudpickle.py", line 594, in save_function_tuple
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 841, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 810, in save_dict
self._batch_setitems(obj.items())
File "C:Python35libpickle.py", line 836, in _batch_setitems
save(v)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 599, in save_reduce
save(args)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 740, in save_tuple
save(element)
File "C:Python35libpickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "C:Python35libpickle.py", line 623, in save_reduce
save(state)
File "C:Python35libpickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "C:Python35libpickle.py", line 740, in save_tuple
save(element)
File "C:Python35libpickle.py", line 495, in save
rv = reduce(self.proto)
File "stringsource", line 2, in preshed.maps.PreshMap.__reduce_cython__
TypeError: self.c_map cannot be converted to a Python object for pickling
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:Python35lib hreading.py", line 914, in _bootstrap_inner
self.run()
File "C:Python35lib hreading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "C:Python35libsite-packagesjoblibexternalslokyackendqueues.py", line 175, in _feed
onerror(e, obj)
File "C:Python35libsite-packagesjoblibexternalslokyprocess_executor.py", line 310, in _on_queue_feeder_error
self.thread_wakeup.wakeup()
File "C:Python35libsite-packagesjoblibexternalslokyprocess_executor.py", line 155, in wakeup
self._writer.send_bytes(b"")
File "C:Python35libmultiprocessingconnection.py", line 183, in send_bytes
self._check_closed()
File "C:Python35libmultiprocessingconnection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../playground.py", line 43, in <module>
print(obj.Paralize(sentences))
File ".../playground.py", line 32, in Paralize
results = Parallel(n_jobs=2)(delayed(self.get_nouns)(sent) for sent in tqdm(sentences))
File "C:Python35libsite-packagesjoblibparallel.py", line 934, in __call__
self.retrieve()
File "C:Python35libsite-packagesjoblibparallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "C:Python35libsite-packagesjoblib\_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "C:Python35libconcurrentfutures\_base.py", line 405, in result
return self.__get_result()
File "C:Python35libconcurrentfutures\_base.py", line 357, in __get_result
raise self._exception
_pickle.PicklingError: Could not pickle the task to send it to the workers.
我的代码中有什么问题?
解决方案
问:我的代码中有什么问题?
嗯,问题很可能不是来自代码,而是来自n_jobs
指示(和joblib
内部编排)准备那么多主进程的精确副本,以便让它们彼此独立工作(从而有效地摆脱Gil锁定并将多个进程流映射到物理硬件资源)时出现的"隐藏"处理。
此步骤负责复制所有蟒蛇对象,已知使用Pickle
来执行此操作。Pickle
模块以其历史主要限制而闻名,即哪些内容可以进行浸泡,哪些内容不能进行浸泡。
错误消息确认了这一点:
TypeError: self.c_map cannot be converted to a Python object for pickling
如果您的"有问题"的Python对象将使用此模块进行筛选而不引发此错误,则可以尝试使用技巧来提供Mike McKearnsdill
模块,而不是Pickle
和测试。
dill
具有相同的API签名,因此纯import dill as pickle
可能有助于使所有其他代码保持相同。
我也遇到了同样的问题,需要在多个进程之间分发大型模型,而dill
是一种可行的方法。性能也有所提高。
这是查找奖励:
dill
允许保存/恢复完整的python解释器状态!
dill
的一个很酷的副作用,一旦import dill as pickle
完成,pickle.dump_session( <aFile> )
将保存一个完整的Python解释器会话的状态完整副本。这可以根据需要进行恢复(崩溃后恢复、培训和优化的ML模型状态-完全保存/恢复、增量学习ML模型状态-完全保存并重新分发,用于部署的用户库的远程恢复等)。
相关文章