为了并行执行任务
multiprocessing
“pickles”任务函数。在你的例子中,这个“任务函数”是
lambda file: getThing(file, 2, map)
.
不幸的是,默认情况下,lambda函数不能在python中进行pickle(另请参见
this stackoverflow post
). 让我用最少的代码来说明这个问题:
import multiprocessing
l = range(12)
def not_a_lambda(e):
print(e)
def main():
with multiprocessing.Pool() as pool:
pool.map(not_a_lambda, l) # Case (A)
pool.map(lambda e: print(e), l) # Case (B)
main()
在
案例A
我们有一个适当的,自由的功能,可以腌制
pool.map
手术会成功的。在
案例B
我们有一个lambda函数,会发生崩溃。
一种可能的解决方案是使用适当的模块作用域函数(如
not_a_lambda
). 另一种解决方案是依赖第三方模块,如
dill
,以扩展酸洗功能。在后一种情况下,您可以使用
pathos
作为常规的替代品
多处理
模块。最后,您可以创建
Worker
收集您的
共享状态
作为成员。可能是这样的:
import multiprocessing
class Worker:
def __init__(self, mutex, map):
self.mutex = mutex
self.map = map
def __call__(self, e):
print("Hello from Worker e=%r" % (e, ))
with self.mutex:
k, v = e
self.map[k] = v
print("Goodbye from Worker e=%r" % (e, ))
def main():
manager = multiprocessing.Manager()
mutex = manager.Lock()
map = manager.dict()
# there is only ONE Worker instance which is shared across all processes
# thus, you need to make sure you don't access / modify internal state of
# the worker instance without locking the mutex.
worker = Worker(mutex, map)
with multiprocessing.Pool() as pool:
pool.map(worker, l.items())
main()