不支持外部调用 - CUDA
目标是调用另一个文件中可用的设备函数,当我编译 global 内核时,它显示以下错误 *External calls are not supported (found non-inlined call to _Z6GoldenSectionCUDA)*.
Objective is to call a device function available in another file, when i compile the global kernel it shows the following error *External calls are not supported (found non-inlined call to _Z6GoldenSectionCUDA)*.
有问题的代码(不是完整的代码,而是出现问题的地方),猫范数.h
Problematic Code (not the full code but where the problem arises), cat norm.h
# ifndef NORM_H_
# define NORM_H_
# include<stdio.h>
__device__ double invcdf(double prob, double mean, double stddev);
#endif
cat norm.cu
cat norm.cu
# include <norm.h>
__device__ double invcdf(double prob, double mean, double stddev) {
return (mean + stddev*normcdfinv(prob));
}
猫测试.cu
# include <norm.h>
# include <curand.h>
# include <curand_kernel.h>
__global__ void phase2Kernel(double* out_profit, struct strategyHolder* strategy) {
curandState seedValue;
curand_init(threadIdx.x, 0, 0, &seedValue);
double randomD = invcdf(curand_uniform_double( &seedValue ), 300, 80);
}
nvcc -c norm.cu -o norm.o -I"."
nvcc -c test.cu -o test.o -I"."
nvcc -c norm.cu -o norm.o -I"."
nvcc -c test.cu -o test.o -I"."
推荐答案
您正在尝试进行单独编译,这需要一些特殊的命令行选项.请参阅 NVCC 手册 有关详细信息,但这里是如何让您的示例编译.我已针对 sm_20,但您可以针对 sm_20 或更高版本,具体取决于您拥有的 GPU.旧设备 (sm_1x) 上无法单独编译.
You're trying to do separate compilation, which needs some special command line options. See the NVCC manual for details, but here's how to get your example to compile. I've targeted sm_20, but you can target sm_20 or later depending on what GPU you have. Separate compilation is not possible on older devices (sm_1x).
- 您无需在头文件中将
__device__
函数声明为extern
,但如果您有任何静态设备变量,则需要将它们声明为 <代码>外部 如下图所示编译为设备生成可重定位代码(
-dc
是-c
的设备等价物,见手册了解更多信息)
- You don't need to declare the
__device__
function asextern
in your header file, but if you have any static device variables they will need to be declared asextern
Generate relocatable code for the device by compiling as shown below (
-dc
is the device equivalent of-c
, see the manual for more information)
nvcc -arch=sm_20 -dc norm.cu -o norm.o -I.
nvcc -arch=sm_20 -dc test.cu -o test.o -I.
通过在最终主机链接之前调用 nvlink 来链接代码的设备部分
Link the device parts of the code by calling nvlink before the final host link
nvlink -arch=sm_20 norm.o test.o -o final.o
相关文章