CUDA __device__ 未解析的外部函数
我正在尝试了解如何在单独的头文件中解耦 CUDA __device__
代码.
I am trying to understand how to decouple CUDA __device__
codes in separate header files.
我有三个文件.
文件:1:int2.cuh
#ifndef INT2_H_
#define INT2_H_
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
__global__ void kernel();
__device__ int k2(int k);
int launchKernel(int dim);
#endif /* INT2_H_ */
文件 2:int2.cu
#include "int2.cuh"
#include "cstdio"
__global__ void kernel() {
int tid = threadIdx.x;
printf("%d
", k2(tid));
}
__device__ int k2(int i) {
return i * i;
}
int launchKernel(int dim) {
kernel<<<1, dim>>>();
cudaDeviceReset();
return 0;
}
文件 3:CUDASample.cu
include <stdio.h>
#include <stdlib.h>
#include "int2.cuh"
#include "iostream"
using namespace std;
static const int WORK_SIZE = 256;
__global__ void sampleCuda() {
int tid = threadIdx.x;
// printf("%d
", k2(tid)); //Can not call k2
printf("%d
", tid * tid);
}
int main(void) {
int var;
var = launchKernel(16);
kernel<<<1, 16>>>();
cudaDeviceReset();
sampleCuda<<<1, 16>>>();
cudaDeviceReset();
return 0;
}
代码工作文件.我可以调用 sampleCuda()
内核(在同一个文件中),调用 C 函数 launchKernel()
(在其他文件中),然后调用 kernel()
直接(在其他文件中).
The code works file. I can call the sampleCuda()
kernel (in same file), call the C function launchKernel()
(in other file), and call kernel()
directly (in other file).
我面临的问题是从 sampleCuda()
内核调用 __device__
函数.然后它显示以下错误.但是,在 kernel()
中可以调用相同的函数.
The problem I am facing is calling the __device__
function from sampleCuda()
kernel. then it shows the following error. However, the same function is callable in kernel()
.
10:58:11 **** Incremental Build of configuration Debug for project CUDASample ****
make all
Building file: ../src/CUDASample.cu
Invoking: NVCC Compiler
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 -gencode arch=compute_20,code=sm_20 -odir "src" -M -o "src/CUDASample.d" "../src/CUDASample.cu"
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 -x cu -o "src/CUDASample.o" "../src/CUDASample.cu"
../src/CUDASample.cu(18): warning: variable "var" was set but never used
../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced
../src/CUDASample.cu(18): warning: variable "var" was set but never used
../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced
ptxas fatal : Unresolved extern function '_Z2k2i'
make: *** [src/CUDASample.o] Error 255
10:58:14 Build Finished (took 2s.388ms)
推荐答案
问题是您在调用它的 __global__
的单独编译单元中定义了一个 __device__
函数.您需要通过添加 -dc
标志显式启用 可重定位设备代码 模式,或者将您的定义移动到同一单元.
The issue is that you defined a __device__
function in separate compilation unit from __global__
that calls it. You need to either explicitely enable relocatable device code mode by adding -dc
flag or move your definition to the same unit.
来自 nvcc
文档:
From nvcc
documentation:
--device-c|-dc
将每个 .c/.cc/.cpp/.cxx/.cu 输入文件编译成一个包含可重定位设备代码的目标文件.它相当于--relocatable-device-code
=true --compile
.
--device-c|-dc
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains relocatable device code. It is equivalent to--relocatable-device-code
=true--compile
.
参见CUDA C++ 设备代码的单独编译和链接 了解更多信息.
相关文章