在设备上调用 printf() 的不完整输出
为了测试设备上的 printf() 调用,我编写了一个简单的程序,它将一个中等大小的数组复制到设备并将设备数组的值打印到屏幕上.尽管数组已正确复制到设备,但 printf() 函数无法正常工作,从而丢失了前几百个数字.代码中的数组大小为 4096.这是一个错误还是我没有正确使用这个函数?非常感谢.
For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.
我的 gpu 是 GeForce GTX 550i,计算能力为 2.1
My gpu is GeForce GTX 550i, with compute capability 2.1
我的代码:
#include<stdio.h>
#include<stdlib.h>
#define N 4096
__global__ void Printcell(float *d_Array , int n){
int k = 0;
printf("
=========== data of d_Array on device==============
");
for( k = 0; k < n; k++ ){
printf("%f ", d_Array[k]);
if((k+1)%6 == 0) printf("
");
}
printf("
Totally %d elements has been printed", k);
}
int main(){
int i =0;
float Array[N] = {0}, rArray[N] = {0};
float *d_Array;
for(i=0;i<N;i++)
Array[i] = i;
cudaMalloc((void**)&d_Array, N*sizeof(float));
cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
Printcell<<<1,1>>>(d_Array, N); //Print the device array by a kernel
cudaDeviceSynchronize();
/* Copy the device array back to host to see if it was correctly copied */
cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);
printf("
");
for(i=0;i<N;i++){
printf("%f ", rArray[i]);
if((i+1)%6 == 0) printf("
");
}
}
推荐答案
来自设备的 printf 队列有限.它适用于小规模调试式输出,而不是大规模输出.
printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.
参考程序员指南一个>:
printf() 的输出缓冲区在内核启动之前设置为固定大小(请参阅关联的主机端 API).它是循环的,如果在内核执行期间产生的输出超出缓冲区的容量,则会覆盖较旧的输出.
The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.
您的内核中 printf 输出超出了缓冲区,因此在缓冲区转储到标准 I/O 队列之前,第一个打印的元素丢失(覆盖).
Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.
链接的文档表明缓冲区大小也可以增加.
The linked documentation indicates that the buffer size can be increased, also.
相关文章