创建要与 C++ 程序链接的静态 CUDA 库
我正在尝试将 CUDA 内核与 C++ 自动工具项目链接,但似乎无法通过链接阶段.
I am attempting to link a CUDA kernel with a C++ autotools project however cannot seem to pass the linking stage.
我有一个文件 GPUFloydWarshall.cu,其中包含我想放入库 libgpu.a 中的内核和包装 C 函数.这将与项目的其余部分保持一致.这有可能吗?
I have a file GPUFloydWarshall.cu that contains the kernel and a wrapper C function that I would like place into a library libgpu.a. This will be consistent with the remainder of the project. Is this at all possible?
其次,该库需要链接到大约十个其他库,用于目前使用 mpicxx 的主要可执行文件.
Secondly, the library would then need to be linked to around ten other libraries for the main executable which at the moment using mpicxx.
目前我正在使用/生成以下命令来编译和创建 libgpu.a 库
Currently I am using/generating the below commands to compile and create the libgpu.a library
nvcc -rdc=true -c -o temp.o GPUFloydWarshall.cu
nvcc -dlink -o GPUFloydWarshall.o temp.o -L/usr/local/cuda/lib64 -lcuda -lcudart
rm -f libgpu.a
ar cru libgpu.a GPUFloydWarshall.o
ranlib libgpu.a
当这全部链接到主可执行文件时,我收到以下错误
When this is all linked into the main executable I get the following error
problem/libproblem.a(libproblem_a-UTRP.o): In function `UTRP::evaluate(Solution&)':
UTRP.cpp:(.text+0x1220): undefined reference to `gpu_fw(double*, int)'
gpu_fw 函数是我的包装函数.
Th gpu_fw function is my wrapper function.
推荐答案
这可能吗?
是的,这是可能的.并且围绕它创建一个(非CUDA)包装函数使它变得更加容易.如果您始终依赖 C++ 链接(您提到了包装 C 函数),您可以让您的生活更轻松.mpicxx 是 C++ 编译器/链接器别名,cuda 文件 (.cu) 默认遵循 C++ 编译器/链接器行为.这里很简单讨论将 cuda 代码(封装在包装函数中)构建到静态库中的问题.
Yes, it's possible. And creating a (non-CUDA) wrapper function around it makes it even easier. You can make your life easier still if you rely on C++ linking throughout (you mention a wrapper C function). mpicxx is a C++ compiler/linker alias, and cuda files (.cu) follow C++ compiler/linker behavior by default. Here's a very simple question that discusses building cuda code (encapsulated in a wrapper function) into a static library.
其次,该库需要链接到大约十个其他库,用于目前使用 mpicxx 的主要可执行文件.
Secondly, the library would then need to be linked to around ten other libraries for the main executable which at the moment using mpicxx.
一旦您的库中公开了 C/C++(非 CUDA)包装器,链接应该与普通库的普通链接没有什么不同.您可能仍需要传递 cuda 运行时库和您可能在链接步骤中使用的任何其他 cuda 库,但这在概念上与您的项目可能依赖的任何其他库相同.
Once you have a C/C++ (non-CUDA) wrapper exposed in your library, linking should be no different than ordinary linking of ordinary libraries. You may still need to pass the cuda runtime libraries and any other cuda libraries you may be using in the link step, but this is the same conceptually as any other libraries your project may depend on.
尚不清楚您是否需要使用设备链接来完成您想做的事情.(但这是可以接受的,它只是使事情复杂了一点.)不管怎样,你的库的构建并不完全正确,现在你已经展示了命令序列.设备链接命令生成一个设备可链接对象,它不包括所有必要的主机部分.要将所有内容集中到一个位置,我们希望将 GPUFloydWarshall.o(具有设备链接片段)和 temp.o(具有主机代码片段)添加到库中.
It's not clear you need to use device linking for what you want to do. (But it's acceptable, it just complicates things a bit.) Anyway, your construction of the library is not quite correct, now that you have shown the command sequence. The device link command produces a device-linkable object, that does not include all necessary host pieces. To get everything in one place, we want to add both GPUFloydWarshall.o (which has the device-linked pieces) AND temp.o (which has the host code pieces) to the library.
这是一个完整的例子:
$ cat GPUFloydWarshall.cu
#include <stdio.h>
__global__ void mykernel(){
printf("hello
");
}
void gpu_fw(){
mykernel<<<1,1>>>();
cudaDeviceSynchronize();
}
$ cat main.cpp
#include <stdio.h>
void gpu_fw();
int main(){
gpu_fw();
}
$ nvcc -rdc=true -c -o temp.o GPUFloydWarshall.cu
$ nvcc -dlink -o GPUFloydWarshall.o temp.o -lcudart
$ rm -f libgpu.a
$ ar cru libgpu.a GPUFloydWarshall.o temp.o
$ ranlib libgpu.a
$ g++ main.cpp -L. -lgpu -o main -L/usr/local/cuda/lib64 -lcudart
$ ./main
hello
$
相关文章