即使使用 -arch=sm_20,Cuda Hello World printf 也无法正常工作

2022-01-10 00:00:00 cuda c++

我不认为我是 Cuda 的新手,但显然我是.

I didn't think I was a complete newbie with Cuda, but apparently I am.

我最近将我的 cuda 设备升级到了功能 1.3 到 2.1 (Geforce GT 630).我还想对 Cuda 工具包 5.0 进行全面升级.

I recently upgraded my cuda device to one capable capability 1.3 to 2.1 (Geforce GT 630). I thought to do a full upgrade to Cuda toolkit 5.0 as well.

我可以编译通用 cuda 内核,但 printf 即使设置了 -arch=sm_20 也无法正常工作.

I can compile general cuda kernels, but printf is not working even with -arch=sm_20 set.

代码:

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){

    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{

    test<<<1,1>>>();
        return 0;
}

编译器:

Error   2   error MSB3721: The command ""C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0in
vcc.exe" -gencode=arch=compute_10,code="sm_20,compute_10" --use-local-env --cl-version 2010 -ccbin "C:Program Files (x86)Microsoft Visual Studio 10.0VCin"  -I"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0include" -I"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0include"  -G   --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile -arch=sm_20  -g   -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o "Debugmain.cu.obj" "d:userstoredocumentsvisual studio 2010Projects	estCuda	estCudamain.cu"" exited with code 2.  C:Program Files (x86)MSBuildMicrosoft.Cppv4.0BuildCustomizationsCUDA 5.0.targets  592 10  testCuda
Error   1   error : calling a __host__ function("printf") from a __global__ function("test") is not allowed d:userstoredocumentsvisual studio 2010Projects	estCuda	estCudamain.cu    9   1   testCuda

由于这个问题,我的生活即将结束……完成了.请在屋顶上告诉我答案.

I'm about done with life because of this problem...done done done. Please talk me down from the rooftops with an answer.

推荐答案

内核中 printf 仅在计算能力 2 或更高的硬件中支持.因为您的项目设置为为 both 计算能力 1.0 和计算 2.1 构建,所以 nvcc 会多次编译代码并构建多架构 fatbinary 对象.错误是在计算能力 1.0 编译周期生成的,因为 该架构不支持 printf 调用.

In kernel printf is only supported in compute capability 2 or higher hardware. Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object. It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf call is unsupported for that architecture.

如果您从项目中删除计算能力 1.0 构建目标,错误将消失.

If you remove the compute capability 1.0 build target from your project, the error will disappear.

你也可以这样写内核:

__global__ void test()
{
#if __CUDA_ARCH__ >= 200
    printf("Hi Cuda World");
#endif
}

__CUDA_ARCH__ 符号只会在为计算能力 2.0 或高目标构建时为 >= 200,这将允许您为计算能力 1.x 设备编译此代码而不会遇到语法错误.

The __CUDA_ARCH__ symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error.

当为正确的架构编译并且没有输出时,您还需要确保内核完成并且驱动程序刷新输出缓冲区.为此,在主机代码中内核启动后添加一个同步调用

When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer. To do this add a synchronizing call after the kernel launch in the host code

例如:

int main( int argc, char** argv )
{

    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

[免责声明:所有代码在浏览器中编写,从未编译,使用风险自负]

[disclaimer: all code written in browser, never compiled, use at own risk]

如果你同时做这两件事,你应该能够编译、运行并查看输出.

If you do both things, you should be able to compile, run and see output.

相关文章