如何在 OpenCV 中使用 gpu::Stream?

2021-12-10 00:00:00 opencv gpgpu c++

OpenCV 具有封装异步调用队列的 gpu::Stream 类.某些函数具有带有附加 gpu::Stream 参数的重载.除了 gpu-basics-similarity.cpp 示例代码,OpenCV 文档中关于如何以及何时使用 gpu::Stream 的信息很少.例如,(对我来说)不是很清楚 gpu::Stream::enqueueConvertgpu::Stream::enqueueCopy 究竟是做什么的,或者如何使用 gpu::Stream 作为额外的重载参数.我正在寻找一些类似教程的 gpu::Stream 概述.

OpenCV has gpu::Stream class that encapsulates a queue of asynchronous calls. Some functions have overloads with the additional gpu::Stream parameter. Aside from gpu-basics-similarity.cpp sample code, there is very little information in OpenCV documentation on how and when to use gpu::Stream. For example, it is not very clear (to me) what exactly gpu::Stream::enqueueConvert or gpu::Stream::enqueueCopy do, or how to use gpu::Stream as additional overload parameter. I'm looking for some tutorial-like overview of gpu::Stream.

推荐答案

默认情况下所有 gpu 模块功能都是同步的,即当前 CPU 线程被阻塞,直到操作完成.

By default all gpu module functions are synchronous, i.e. current CPU thread is blocked until operation finishes.

gpu::StreamcudaStream_t 的包装器,允许使用异步非阻塞调用.CUDA异步并发执行的详细信息也可以阅读《CUDA C编程指南》.

gpu::Stream is a wrapper for cudaStream_t and allows to use asynchronous non-blocking call. You can also read "CUDA C Programming Guide" for detailed information about CUDA asynchronous concurrent execution.

大多数 gpu 模块函数都有额外的 gpu::Stream 参数.如果传递非默认流,函数调用将是异步的,调用将被添加到流命令队列中.

Most gpu module functions have additional gpu::Stream parameter. If you pass non-default stream the function call will be asynchronous, and the call will be added to stream command queue.

还有 gpu::StreamCPU<->GPUGPU<->GPU 之间的异步内存传输提供方法.但是 CPU<->GPU 异步内存传输仅适用于页面锁定的主机内存.还有一个gpu::CudaMem类封装了这种内存.

Also gpu::Stream provides methos for asynchronous memory transfers between CPU<->GPU and GPU<->GPU. But CPU<->GPU asynchronous memory transfers works only with page-locked host memory. There is another class gpu::CudaMem that encapsulates such memory.

目前,如果相同的操作将不同的数据排入不同的流两次,您可能会遇到问题.一些函数使用常量或纹理 GPU 内存,下一次调用可能会在前一次调用完成之前更新内存.但是异步调用不同的操作是安全的,因为每个操作都有自己的常量缓冲区.对您持有的缓冲区进行内存复制/上传/下载/设置操作也是安全的.

Currently, you may face problems if same operation is enqueued twice with different data to different streams. Some functions use the constant or texture GPU memory, and next call may update the memory before the previous one has been finished. But calling different operations asynchronously is safe because each operation has its own constant buffer. Memory copy/upload/download/set operations to the buffers you hold are also safe.

这是小样本:

// allocate page-locked memory
CudaMem host_src_pl(768, 1024, CV_8UC1, CudaMem::ALLOC_PAGE_LOCKED);
CudaMem host_dst_pl;

// get Mat header for CudaMem (no data copy)
Mat host_src = host_src_pl;

// fill mat on CPU
someCPUFunc(host_src);

GpuMat gpu_src, gpu_dst;

// create Stream object
Stream stream;

// next calls are non-blocking

// first upload data from host
stream.enqueueUpload(host_src_pl, gpu_src);
// perform blur
blur(gpu_src, gpu_dst, Size(5,5), Point(-1,-1), stream);
// download result back to host
stream.enqueueDownload(gpu_dst, host_dst_pl);

// call another CPU function in parallel with GPU
anotherCPUFunc();

// wait GPU for finish
stream.waitForCompletion();

// now you can use GPU results
Mat host_dst = host_dst_pl;

相关文章