CUDA device 未解析的外部函数

2022-01-10 00:00:00 c linker-errors cuda c++

我正在尝试了解如何在单独的头文件中解耦 CUDA __device__ 代码.

I am trying to understand how to decouple CUDA __device__ codes in separate header files.

我有三个文件.

文件:1:int2.cuh

#ifndef INT2_H_ #define INT2_H_ #include "cuda.h" #include "cuda_runtime.h" #include "device_launch_parameters.h" __global__ void kernel(); __device__ int k2(int k); int launchKernel(int dim); #endif /* INT2_H_ */

文件 2:int2.cu

#include "int2.cuh" #include "cstdio" __global__ void kernel() { int tid = threadIdx.x; printf("%d ", k2(tid)); } __device__ int k2(int i) { return i * i; } int launchKernel(int dim) { kernel<<<1, dim>>>(); cudaDeviceReset(); return 0; }

文件 3:CUDASample.cu

include <stdio.h> #include <stdlib.h> #include "int2.cuh" #include "iostream" using namespace std; static const int WORK_SIZE = 256; __global__ void sampleCuda() { int tid = threadIdx.x; // printf("%d ", k2(tid)); //Can not call k2 printf("%d ", tid * tid); } int main(void) { int var; var = launchKernel(16); kernel<<<1, 16>>>(); cudaDeviceReset(); sampleCuda<<<1, 16>>>(); cudaDeviceReset(); return 0; }

代码工作文件.我可以调用 sampleCuda() 内核(在同一个文件中)，调用 C 函数 launchKernel() (在其他文件中)，然后调用 kernel() 直接(在其他文件中).

The code works file. I can call the sampleCuda() kernel (in same file), call the C function launchKernel() (in other file), and call kernel() directly (in other file).

我面临的问题是从 sampleCuda() 内核调用 __device__ 函数.然后它显示以下错误.但是，在 kernel() 中可以调用相同的函数.

The problem I am facing is calling the __device__ function from sampleCuda() kernel. then it shows the following error. However, the same function is callable in kernel().

10:58:11 **** Incremental Build of configuration Debug for project CUDASample **** make all Building file: ../src/CUDASample.cu Invoking: NVCC Compiler /Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 -gencode arch=compute_20,code=sm_20 -odir "src" -M -o "src/CUDASample.d" "../src/CUDASample.cu" /Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 -x cu -o "src/CUDASample.o" "../src/CUDASample.cu" ../src/CUDASample.cu(18): warning: variable "var" was set but never used ../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced ../src/CUDASample.cu(18): warning: variable "var" was set but never used ../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced ptxas fatal : Unresolved extern function '_Z2k2i' make: *** [src/CUDASample.o] Error 255 10:58:14 Build Finished (took 2s.388ms)

推荐答案

问题是您在调用它的 __global__ 的单独编译单元中定义了一个 __device__ 函数.您需要通过添加 -dc 标志显式启用可重定位设备代码模式，或者将您的定义移动到同一单元.

The issue is that you defined a __device__ function in separate compilation unit from __global__ that calls it. You need to either explicitely enable relocatable device code mode by adding -dc flag or move your definition to the same unit.

来自 nvcc 文档:

From nvcc documentation:

--device-c|-dc 将每个 .c/.cc/.cpp/.cxx/.cu 输入文件编译成一个包含可重定位设备代码的目标文件.它相当于--relocatable-device-code=true --compile.

--device-c|-dc Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains relocatable device code. It is equivalent to --relocatable-device-code=true --compile.

参见CUDA C++ 设备代码的单独编译和链接了解更多信息.

相关文章

CUDA __device__ 未解析的外部函数

推荐答案

CUDA device 未解析的外部函数