如何从 Process- 或 Thread 实例返回值?

2022-01-12 00:00:00 python multithreading python-multiprocessing python-multithreading multiprocessing

问题描述

所以我想运行一个函数，它既可以在网上搜索信息，也可以直接从我自己的 mysql 数据库中搜索信息.第一个过程会很耗时，第二个相对较快.

So I want to run a function which can either search for information on the web or directly from my own mysql database. The first process will be time-consuming, the second relatively fast.

考虑到这一点，我创建了一个启动此复合搜索 (find_compound_view) 的进程.如果该过程相对较快地完成，则意味着它存在于数据库中，因此我可以立即呈现结果.否则，我将渲染drax_retrieving_data.html".

With this in mind I create a process which starts this compound search (find_compound_view). If the process finishes relatively fast it means it's present on the database so I can render the results immediately. Otherwise, I will render "drax_retrieving_data.html".

我想出的愚蠢解决方案是运行该函数两次，一次是检查该过程是否需要很长时间，另一次是实际获取函数的返回值.这很大程度上是因为我不知道如何返回我的 find_compound_view 函数的值.我试过谷歌搜索，但似乎找不到如何从 Process 类中返回值.

The stupid solution I came up with was to run the function twice, once to check if the process takes a long time, the other to actually get the return values of the function. This is pretty much because I don't know how to return the values of my find_compound_view function. I've tried googling but I can't seem to find how to return the values from the class Process specifically.

p = Process(target=find_compound_view, args=(form,)) p.start() is_running = p.is_alive() start_time=time.time() while is_running: time.sleep(0.05) is_running = p.is_alive() if time.time() - start_time > 10 : print('Timer exceeded, DRAX is retrieving info!',time.time() - start_time) return render(request,'drax_internal_dbs/drax_retrieving_data.html') compound = find_compound_view(form,use_email=False) if compound: data=***** return render(request, 'drax_internal_dbs/result.html',data)

解决方案

您将需要一个 multiprocessing.Pipe 或一个 multiprocessing.Queue 将结果发送回您的父进程.如果你只是做 I/0，你应该使用 Thread 而不是 Process，因为它更轻量级并且大部分时间都花在等待上.我正在向您展示它是如何为进程和线程完成的.

You will need a multiprocessing.Pipe or a multiprocessing.Queue to send the results back to your parent-process. If you just do I/0, you should use a Thread instead of a Process, since it's more lightweight and most time will be spend on waiting. I'm showing you how it's done for Process and Threads in general.

使用队列处理

多处理队列建立在管道之上，访问与锁/信号量同步.队列是线程和进程安全的，这意味着您可以将一个队列用于多个生产者/消费者进程，甚至这些进程中的多个线程.在队列中添加第一项也将在调用过程中启动一个馈线线程.multiprocessing.Queue 的额外开销使得在单生产者/单消费者场景中使用管道更可取且性能更高.

The multiprocessing queue is build on top of a pipe and access is synchronized with locks/semaphores. Queues are thread- and process-safe, meaning you can use one queue for multiple producer/consumer-processes and even multiple threads in these processes. Adding the first item on the queue will also start a feeder-thread in the calling process. The additional overhead of a multiprocessing.Queue makes using a pipe for single-producer/single-consumer scenarios preferable and more performant.

以下是使用 multiprocessing.Queue 发送和检索结果的方法:

Here's how to send and retrieve a result with a multiprocessing.Queue:

from multiprocessing import Process, Queue SENTINEL = 'SENTINEL' def sim_busy(out_queue, x): for _ in range(int(x)): assert 1 == 1 result = x out_queue.put(result) # If all results are enqueued, send a sentinel-value to let the parent know # no more results will come. out_queue.put(SENTINEL) if __name__ == '__main__': out_queue = Queue() p = Process(target=sim_busy, args=(out_queue, 150e6)) # 150e6 == 150000000.0 p.start() for result in iter(out_queue.get, SENTINEL): # sentinel breaks the loop print(result)

队列作为参数传递给函数，结果是队列上的 .put() 和队列中的父 get.()..get() 是一个阻塞调用，直到有要获取的东西(指定超时参数是可能的)才会恢复执行.请注意，sim_busy 在这里所做的工作是 cpu 密集型的，此时您会选择进程而不是线程.

The queue is passed as argument into the function, results are .put() on the queue and the parent get.()s from the queue. .get() is a blocking call, execution does not resume until something is to get (specifying timeout parameter is possible). Note the work sim_busy does here is cpu-intensive, that's when you would choose processes over threads.

流程与管道

对于一对一的连接，管道就足够了.设置几乎相同，只是方法名称不同，对 Pipe() 的调用返回两个连接对象.在双工模式下，两个对象都是读写端，duplex=False(单工)第一个连接对象是管道的读端，第二个是写端.在这个基本场景中，我们只需要一个单工管道:

For one-to-one connections a pipe is enough. The setup is nearly identical, just the methods are named differently and a call to Pipe() returns two connection objects. In duplex mode, both objects are read-write ends, with duplex=False (simplex) the first connection object is the read-end of the pipe, the second is the write-end. In this basic scenario we just need a simplex-pipe:

from multiprocessing import Process, Pipe SENTINEL = 'SENTINEL' def sim_busy(write_conn, x): for _ in range(int(x)): assert 1 == 1 result = x write_conn.send(result) # If all results are send, send a sentinel-value to let the parent know # no more results will come. write_conn.send(SENTINEL) if __name__ == '__main__': # duplex=False because we just need one-way communication in this case. read_conn, write_conn = Pipe(duplex=False) p = Process(target=sim_busy, args=(write_conn, 150e6)) # 150e6 == 150000000.0 p.start() for result in iter(read_conn.recv, SENTINEL): # sentinel breaks the loop print(result)

<小时>
线程&排队

要使用线程，您需要切换到 queue.Queue.queue.Queue 构建在 collections.deque 之上，添加了一些锁以使其成为线程安全的.与多处理的队列和管道不同，放在 queue.Queue 上的对象不会被腌制.由于线程共享相同的内存地址空间，内存复制的序列化是不必要的，只传输指针.

For use with threading, you want to switch to queue.Queue. queue.Queue is build on top of a collections.deque, adding some locks to make it thread-safe. Unlike with multiprocessing's queue and pipe, objects put on a queue.Queue won't get pickled. Since threads share the same memory address-space, serialization for memory-copying is unnecessary, only pointers are transmitted.

from threading import Thread from queue import Queue import time SENTINEL = 'SENTINEL' def sim_io(out_queue, query): time.sleep(1) result = query + '_result' out_queue.put(result) # If all results are enqueued, send a sentinel-value to let the parent know # no more results will come. out_queue.put(SENTINEL) if __name__ == '__main__': out_queue = Queue() p = Thread(target=sim_io, args=(out_queue, 'my_query')) p.start() for result in iter(out_queue.get, SENTINEL): # sentinel-value breaks the loop print(result)

<小时>
阅读这里为什么for result in iter(out_queue.get, SENTINEL):在可能的情况下，应该优先于 while True...break 设置.
阅读这里为什么你应该使用if __name__ == '__main__':您的脚本，尤其是在多处理中.
更多关于 get()-用法的信息这里.

Read here why for result in iter(out_queue.get, SENTINEL): should be prefered over a while True...break setup, where possible.

Read here why you should use if __name__ == '__main__': in all your scripts and especially in multiprocessing.

More about get()-usage here.

相关文章