CUDA:在 C++ 中包装设备内存分配

2022-01-10 00:00:00 cuda c++ placement-new raii

我现在开始使用 CUDA,不得不承认我对 C API 有点失望.我理解选择 C ??的原因,但是如果该语言是基于 C++ 的,那么几个方面会简单得多,例如设备内存分配(通过 cudaMalloc).

I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via cudaMalloc).

我的计划是自己做这个,使用重载的 operator new 和放置 new 和 RAII(两种选择).我想知道到目前为止是否有任何我没有注意到的警告.代码似乎可以工作,但我仍然想知道潜在的内存泄漏.

My plan was to do this myself, using overloaded operator new with placement new and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code seems to work but I'm still wondering about potential memory leaks.

RAII代码的用法如下:

CudaArray<float> device_data(SIZE);
// Use `device_data` as if it were a raw pointer.

也许在这种情况下一个类是多余的(特别是因为你仍然必须使用 cudaMemcpy,这个类只封装 RAII)所以另一种方法是 placement new:

Perhaps a class is overkill in this context (especially since you'd still have to use cudaMemcpy, the class only encapsulating RAII) so the other approach would be placement new:

float* device_data = new (cudaDevice) float[SIZE];
// Use `device_data` …
operator delete [](device_data, cudaDevice);

这里,cudaDevice 只是作为一个标签来触发重载.然而,由于在正常放置 new 中这将指示放置,我发现语法奇怪地一致,甚至可能比使用类更可取.

Here, cudaDevice merely acts as a tag to trigger the overload. However, since in normal placement new this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.

我会很感激各种批评.有人可能知道下一个版本的 CUDA 是否计划在这个方向上做一些事情(据我所知,这将改进其对 C++ 的支持,不管他们的意思是什么).

I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).

所以,我的问题实际上是三方面的:

So, my question is actually threefold:

  1. 我的展示位置 new 重载在语义上是否正确?它会泄漏内存吗?
  2. 有没有人知道未来 CUDA 开发朝着这个大方向发展的信息(让我们面对现实:C++ s*ck 中的 C 接口)?
  3. 我怎样才能以一致的方式更进一步(还有其他 API 需要考虑,例如,不仅有设备内存,还有常量内存存储和纹理内存)?
  1. Is my placement new overload semantically correct? Does it leak memory?
  2. Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
  3. How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)?

<小时>

// Singleton tag for CUDA device memory placement.
struct CudaDevice {
    static CudaDevice const& get() { return instance; }
private:
    static CudaDevice const instance;
    CudaDevice() { }
    CudaDevice(CudaDevice const&);
    CudaDevice& operator =(CudaDevice const&);
} const& cudaDevice = CudaDevice::get();

CudaDevice const CudaDevice::instance;

inline void* operator new [](std::size_t nbytes, CudaDevice const&) {
    void* ret;
    cudaMalloc(&ret, nbytes);
    return ret;
}

inline void operator delete [](void* p, CudaDevice const&) throw() {
    cudaFree(p);
}

template <typename T>
class CudaArray {
public:
    explicit
    CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { }

    operator T* () { return data; }

    ~CudaArray() {
        operator delete [](data, cudaDevice);
    }

private:
    std::size_t const size;
    T* const data;

    CudaArray(CudaArray const&);
    CudaArray& operator =(CudaArray const&);
};

关于这里使用的单例:是的,我知道它的缺点.但是,这些在这种情况下无关紧要.我在这里只需要一个不可复制的小型标签.其他所有内容(即多线程注意事项、初始化时间)均不适用.

About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.

推荐答案

我会采用安置新方法.然后我会定义一个符合 std::allocator<> 接口的类.理论上,您可以将此类作为模板参数传递给 std::vector<> 和 std::map<> 等等.

I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.

当心,我听说做这样的事情充满了困难,但至少你会通过这种方式学到更多关于 STL 的知识.而且您无需重新发明容器和算法.

Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.

相关文章