CUDA:在 C++ 中包装设备内存分配
我现在开始使用 CUDA,不得不承认我对 C API 有点失望.我理解选择 C ??的原因,但是如果该语言是基于 C++ 的,那么几个方面会简单得多,例如设备内存分配(通过 cudaMalloc
).
I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via cudaMalloc
).
我的计划是自己做这个,使用重载的 operator new
和放置 new
和 RAII(两种选择).我想知道到目前为止是否有任何我没有注意到的警告.代码似乎可以工作,但我仍然想知道潜在的内存泄漏.
My plan was to do this myself, using overloaded operator new
with placement new
and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code seems to work but I'm still wondering about potential memory leaks.
RAII代码的用法如下:
CudaArray<float> device_data(SIZE);
// Use `device_data` as if it were a raw pointer.
也许在这种情况下一个类是多余的(特别是因为你仍然必须使用 cudaMemcpy
,这个类只封装 RAII)所以另一种方法是 placement new
:
Perhaps a class is overkill in this context (especially since you'd still have to use cudaMemcpy
, the class only encapsulating RAII) so the other approach would be placement new
:
float* device_data = new (cudaDevice) float[SIZE];
// Use `device_data` …
operator delete [](device_data, cudaDevice);
这里,cudaDevice
只是作为一个标签来触发重载.然而,由于在正常放置 new
中这将指示放置,我发现语法奇怪地一致,甚至可能比使用类更可取.
Here, cudaDevice
merely acts as a tag to trigger the overload. However, since in normal placement new
this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.
我会很感激各种批评.有人可能知道下一个版本的 CUDA 是否计划在这个方向上做一些事情(据我所知,这将改进其对 C++ 的支持,不管他们的意思是什么).
I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).
所以,我的问题实际上是三方面的:
So, my question is actually threefold:
- 我的展示位置
new
重载在语义上是否正确?它会泄漏内存吗? - 有没有人知道未来 CUDA 开发朝着这个大方向发展的信息(让我们面对现实:C++ s*ck 中的 C 接口)?
- 我怎样才能以一致的方式更进一步(还有其他 API 需要考虑,例如,不仅有设备内存,还有常量内存存储和纹理内存)?
- Is my placement
new
overload semantically correct? Does it leak memory? - Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
- How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?
<小时>
// Singleton tag for CUDA device memory placement.
struct CudaDevice {
static CudaDevice const& get() { return instance; }
private:
static CudaDevice const instance;
CudaDevice() { }
CudaDevice(CudaDevice const&);
CudaDevice& operator =(CudaDevice const&);
} const& cudaDevice = CudaDevice::get();
CudaDevice const CudaDevice::instance;
inline void* operator new [](std::size_t nbytes, CudaDevice const&) {
void* ret;
cudaMalloc(&ret, nbytes);
return ret;
}
inline void operator delete [](void* p, CudaDevice const&) throw() {
cudaFree(p);
}
template <typename T>
class CudaArray {
public:
explicit
CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { }
operator T* () { return data; }
~CudaArray() {
operator delete [](data, cudaDevice);
}
private:
std::size_t const size;
T* const data;
CudaArray(CudaArray const&);
CudaArray& operator =(CudaArray const&);
};
关于这里使用的单例:是的,我知道它的缺点.但是,这些在这种情况下无关紧要.我在这里只需要一个不可复制的小型标签.其他所有内容(即多线程注意事项、初始化时间)均不适用.
About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.
推荐答案
我会采用安置新方法.然后我会定义一个符合 std::allocator<> 接口的类.理论上,您可以将此类作为模板参数传递给 std::vector<> 和 std::map<> 等等.
I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.
当心,我听说做这样的事情充满了困难,但至少你会通过这种方式学到更多关于 STL 的知识.而且您无需重新发明容器和算法.
Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.
相关文章