二维字符数组到 CUDA 内核

2022-01-10 00:00:00 gpu c cuda c++

我需要帮助将 char[][] 转移到 Cuda 内核.这是我的代码:

I need help with transfer char[][] to Cuda kernel. This is my code:

__global__ 
void kernel(char** BiExponent){
  for(int i=0; i<500; i++)
     printf("%c",BiExponent[1][i]); // I want print line 1
}

int main(){
  char (*Bi2dChar)[500] = new char [5000][500];
  char **dev_Bi2dChar;

  ...//HERE I INPUT DATA TO Bi2dChar

  size_t host_orig_pitch = 500 * sizeof(char);
  size_t pitch;
  cudaMallocPitch((void**)&dev_Bi2dChar, &pitch, 500 * sizeof(char), 5000);
  cudaMemcpy2D(dev_Bi2dChar, pitch, Bi2dChar, host_orig_pitch, 500 * sizeof(char), 5000, cudaMemcpyHostToDevice);
  kernel <<< 1, 512 >>> (dev_Bi2dChar);
  free(Bi2dChar); cudaFree(dev_Bi2dChar);
}

我使用:nvcc.exe" -gencode=arch=compute_20,code="sm_20,compute_20" --use-local-env --cl-version 2012 -ccbin

I use: nvcc.exe" -gencode=arch=compute_20,code="sm_20,compute_20" --use-local-env --cl-version 2012 -ccbin

感谢您的帮助.

推荐答案

cudaMemcpy2D 实际上并不处理二维(即双指针,**)数组C.请注意,文档 表明它需要单个指针,不是双指针.

cudaMemcpy2D doesn't actually handle 2-dimensional (i.e. double pointer, **) arrays in C. Note that the documentation indicates it expects single pointers, not double pointers.

一般来说,在主机和设备之间移动任意双指针 C 数组比单指针数组更复杂.

Generally speaking, moving arbitrary double pointer C arrays between the host and the device is more complicated than a single pointer array.

如果你真的想处理双指针数组,那么在这个页面的右上角搜索CUDA 2D Array",你会发现如何做的各种例子.(例如,@talonmies 给出的答案这里)

If you really want to handle the double-pointer array, then search on "CUDA 2D Array" in the upper right hand corner of this page, and you'll find various examples of how to do it. (For example, the answer given by @talonmies here)

通常,更简单的方法是简单地展平"数组,以便它可以被单个指针引用,即 char[] 而不是 char[][],然后使用索引算法来模拟二维访问.

Often, an easier approach is simply to "flatten" the array so it can be referenced by a single pointer, i.e. char[] instead of char[][], and then use index arithmetic to simulate 2-dimensional access.

您的扁平化代码如下所示:(您提供的代码是不可编译的、不完整的代码段,我的也是)

Your flattened code would look something like this: (the code you provided is an uncompilable, incomplete snippet, so mine is also)

#define XDIM 5000
#define YDIM 500

__global__ 
void kernel(char* BiExponent){
  for(int i=0; i<500; i++)
     printf("%c",BiExponent[(1*XDIM)+i]); // I want print line 1
}

int main(){
  char (*Bi2dChar)[YDIM] = new char [XDIM][YDIM];
  char *dev_Bi2dChar;

  ...//HERE I INPUT DATA TO Bi2dChar

  cudaMalloc((void**)&dev_Bi2dChar,XDIM*YDIM * sizeof(char));
  cudaMemcpy(dev_Bi2dChar, &(Bi2dChar[0][0]), host_orig_pitch, XDIM*YDIM * sizeof(char), cudaMemcpyHostToDevice);
  kernel <<< 1, 512 >>> (dev_Bi2dChar);
  free(Bi2dChar); cudaFree(dev_Bi2dChar);
}

如果你想要一个有间距的数组,你可以类似地创建它,但你仍然会这样做作为单指针数组,而不是双指针数组.

If you want a pitched array, you can create it similarly, but you will still do so as single pointer arrays, not double pointer arrays.

相关文章