printing from cuda kernels - c

I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error
error : calling a host function("printf") from a __device__/__global__ function("agent_movement_top") is not allowed
error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -Xcompiler "/EHsc /nologo /Od /Zi /MDd " -o "Debug\" "C:\Users\umdutta\Desktop\SANKHA_ALL_MATERIALS\PROGRAMMING_FOLDER\ABM_MODELLING_2D_3D\TRY_NUM_2\test_proj_test\test_proj\test_proj\"" exited with code 2.
I am using the card GTX 560 ti having a compute capability greater than 2.0 and when I have searched a bit about the printing from cuda kernels I also saw that I need to change the compiler from sm_10 to sm_2.0 to take the full advantage of the card. Also some suggested for cuPrintf to serve the purpose. I am bit confused what should I do and what should be the simplest and quickest way to get the printouts on my console screen. If I need to change the nvcc compiler from 1.0 to 2.0 then what should I do? One more thing I would like to mention that I am using windows 7.0 and programming in visual studio 2010. Thanks for all your help.

To enable use of plain printf() on devices of Compute Capability >= 2.0, it's important to compile for CC of at least CC 2.0 and disable the default, which includes a build for CC 1.0.
Right-click the .cu file in your project, select Properties, select Configuration Properties | CUDA C/C++ | Device. Click on the Code Generation line, click the triangle, select Edit. In the Code Generation dialog box, uncheck Inherit from parent or project defaults, type compute_20,sm_20 in the top window, click OK.

you can write this code to print whatever you want from inside the CUDA Kernel:
# if __CUDA_ARCH__>=200
printf("%d \n", tid);
and include < stdio.h >

One way of solving this problem is by using cuPrintf function which is capable of printing from the kernels. Copy the files and cuPrintf.cuh from the folder
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\src\simplePrintf
to the project folder. Then add the header file cuPrintf.cuh to your project and add
#include ""
to your code. Then your code should be written in a format mentioned below :
#include ""
__global__ void testKernel(int val)
cuPrintf("Value is: %d\n", val);
int main()
testKernel<<< 2, 3 >>>(10);
cudaPrintfDisplay(stdout, true);
return 0;
By following the above procedure one can get a print on the console window from the device function.
Though I solved my issues in the above mentioned way I still don't have the solution of using printf from the device function. If it is true and absolutely necessary to upgrade my nvcc compiler from sm_10 to sm_21 to enable the printf feature then it would be very much helpful if someone could show me the light. Thanks for all your cooperation

I am using GTX 1650 also GTX1050, and c++11. For recent users, this is my suggestion:
In host function:
using namespace std;
cout<< .....(anything you want) << endl;
In kernel:
printf("ss=%4.2f \n", ss);
Note that this "if" is quite important and I notice nobody mentioned this. Because you might use a lot of threads and you definitely do not want to print too much from every threads. Also 4.2f means 4 points and 2 for decimal. This can prevent print too much 00000. Also do not forget \n to jump line.
Also you can consider this to print shared memory value:
for(int i=0;i<64;i++){
for(int j=0;j<8; j++){
printf("%4.2f ", ashare[i*8+j]);
This can print shared memory beautifully. Notice also need to restrict only in threadIdx.x==0


Debugging C code that uses libtiff

I've written a lot of code over the years, but I haven't done much with C in the context of linux. Nor am I as familiar as I feel I should be with someone of the tools and utilities. Thanks in advance for your indulgence.
I'm trying to write some C code that uses libtiff. I need to be able to debug it line by line, including stepping through the libtiff source as appropriate. I'm using the Code::Blocks IDE and have that configured and working for basic "hello world" code, as well as a rudimentary calling of libtiff for proof-of-concept purposes. This is all working.
Here's my code:
#include "tiffio.h"
TIFF* tif = TIFFOpen("test0.tiff", "r");
if (tif) {
uint32 imagelength;
tdata_t buf;
uint32 row;
TIFFGetField(tif, TIFFTAG_IMAGELENGTH, &imagelength);
buf = _TIFFmalloc(TIFFScanlineSize(tif));
for (row = 0; row < imagelength; row++)
TIFFReadScanline(tif, buf, row, 0);
Stepping through my code above works fine. However, I can't step into any
of the libtiff function calls. I'm currently on ubuntu, using the default libtiff installed via apt-get. I'm assuming based on some reading I've done that the library isn't built with debugging symbols, which may be the source of my problem.
I'm assuming if that's indeed the problem, that I can compile a custom version of libtiff with the options I need, and have Code::Blocks compile/link against it instead of against the system default libraries. I've downloaded a fresh copy of libtiff, and am familiar with the make/make install process, but I'm not sure about the specifics of getting the compile set up properly for what I need. Some direction would be much appreciated.
Problem solved by uninstalling the system libtiff (not strictly necessary but was easiest for me to avoid any ambiguity on what version of libtiff I was using). Then configured Code::Blocks as follows (Project->Build Options):
Produce debubging symbols (-g) is checked
Enable common compiler warnings is checked
Other Compiler Options set to -fPIC
Linker Settings -> Other Linker Options set to -ltiff -L
Search Directories -> Compiler set to
Search Directories -> Linker set to
$LD_LIBRARY_PATH set to /home/depaan/amcdev/libtiff0/lib in Settings -> Environment -> Environment Variables (menu)
I'd previously complied libtiff locally as per the usual configure, make, make install... with
./configure --prefix=<desired_libtiff_location>
And CFLAGS set to "-g"
export CFLAGS="-g"

system command not executing with mpiicc -O

I have intel Parallel studio XE cluster edition 2015 on my 10 Node server connected with infiniband band. I wrote my code in C. My code consists of system commands with sprintf command like below:
printf("started \n");
system("cp metis_input.txt $HOME/metis-4.0/.");
sprintf(filename,"$HOME/metis-4.0/./partdmesh metis_input.txt %d",size-1);
sprintf(filename,"mv metis_input.txt.npart.%d nodes_proc.txt",size-1);
printf("completed \n");
When I compile my code and run it without any opmization flags it runs smoothly but when I compile my code with "mpiicc -O" the above lines dont even seem to be executed. I think that the above lines are being skipped. Only the printf's are executed. Do I need to add anything extra in my code (like including any headers) to get these system commands runnning for INTEL mpi compiler with -O ?

YouCompleteMe suggests only "local" used code

I'm trying to use YCM for the first time so in order to make it work I decided to give a chance for the YCM-Generator, which generates the file automatically based on the makefile.
So far my program is just a simple hello world.
#include <stdio.h>
int main()
printf("Hello World!");
return 0;
I'm using the CMakeLists.txt trick to generate the makefile.
file(GLOB sources *.h *.c)
add_executable(Foo ${sources})
then after executing the YCM-Generator script, I get this output
Running cmake in '/tmp/tmp_YknVy'... $ cmake
Running make... $ make -i -j4
Cleaning up...
Build completed in 1.5 sec
Collected 2 relevant entries for C compilation (0 discarded).
Collected 0 relevant entries for C++ compilation (0 discarded).
Created YCM config file with 0 C flags
YCM plugin does find the file, but the auto-completion doesn't work right, for example, if I type "floa", it doesn't suggests "float", but It only suggests things that I used before like "int" or "printf".
Am I missing something or this is working as intended?
So I fixed it.
For c it does require a , while a friend of mine could make it work without one in c++.
The auto complete only suggest automatically functions that were previously used, if you don't remember a function name you have to press <Ctrl-Space>
YCM-Generator didn't do the job, so I modified the example file myself following the comments.
If you are used to Visual Assist, the auto complete works but it's really weak if compared to VA, which is a shame... I really hope someone port that plugin to Linux.

Getting OpenMP running in Code::Blocks

I am trying to teach myself OpenMP using Windows 7, but I am having a hard time getting Code::Blocks to compile a basic hello world program:
#include <omp.h>
#include <stdio.h>
int main()
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
I have made some progress, but there is one remaining persistent error that I can't get rid of.
I have -fopenmp in my compiler "Compiler->Compiler Settings->Other Options"
I have -gomp and -pthreads in "Compiler->Linker Settings->Other linker options"
I have C:\Program File (x86)\Codeblocks\MinGW\gcc\mingw32\bin in "Compiler->Toolchain exectuable->Additional Paths"
When I compile, I get the error: "ld.exe: cannot find -lpthread"
Can someone suggest what I might have set up wrong?
The linker complains about a missing library. pthreads is the library that implements the threading interface that your OpenMP implementation uses to do all the threading stuff.
The library is called "libpthread.a" (static version) and "" (dynamic version) on the disk. Try to find these two on the file system under your MinGW directory. They likely reside in a directory called "lib" or "lib64". If either one is missing, then you might need to install an additional package.

Linux C: Shell-like environment - for individual execution - of C commands? (C interpreter)

Sorry if the question is worded wrong - I don't know the right word for what I'm asking for! :)
Say, you have some simple C program like:
#include <stdio.h>
int main()
int a=2;
printf("Hello World %d\n", a);
return 0;
Typically, this would have to be saved in a file (say, hello.c); then we run gcc on the source file and obtain executable file - and if we compiled in debug information, then we can use gdb on the executable, to step through lines of code, and inspect variables.
What I would like to have, is basically some sort of a "C" shell - similar to the Python shell; in the sense that I can have a sequence of Python commands in a file (a script) - or I can just paste the same commands in the shell, and they will execute the same. In respect to the simple program above, this is what I'd like to be able to do (where C> represents the imagined prompt):
C> #include <stdio.h>
(stdio.h included)
C> int a=2;
C> printf("Hello World %d\n", a);
Hello World 2
In other words, I'd like to be able to execute individual C commands interactively (I'm guessing this would represent on-the-fly compilation of sorts?). Initially I was misled by the name of the C shell (csh) - but I don't think it will be able to execute C commands on the fly.
So, first and foremost, I'd like to know if it is possible somehow to persuade, say, gdb to perform in this manner? If not, is there anything else that would allow me to do something similar (some special shell, maybe)?
As for the context - I have some code where I have problems troubleshooting pointers between structs and such; here the way gdb can printout structs works very well - however, to isolate the problem, I have to make new source files, paste in data, compile and debug all over again. In this case, I'd much rather have the possibility to paste several structs (and their initialization commands) in some sort of a shell - and then, inspect using printf (or even better, something akin to gdb's print) typed directly on the shell.
Just for the record - I'm not really persuaded something like this really exists; but I thought I'd ask anyways :)
Thanks in advance for any answers,
EDIT: I was a bit busy, so haven't had time to review all answers yet for accept (sorry :) ); just wanted to add a little comment re:"interpreted vs. machine code"; or as mentioned by #doron:
The problem with running C /C++ source interactively is that
the compiler is not able to perform line by line interpretation of the code.
I am fully aware of this - but let's imagine a command line application (could even be an interpreted one), that gives you a prompt with a command line interface. At start, let's assume this application generates this simple "text file" in memory:
int main()
return 0;
Then, the application simply waits for a text to be entered at the prompt, and ENTER to be pressed; and upon a new line:
The application checks:
if the line starts with #define or #include, then it is added below the ##HEADER## - but above the int main() line - in the temp file
anything else, goes below ##MAIN## line - but above return 0; line - in the temp file
the temp file is stripped of ##HEADER## and ##MAIN## lines, and saved to disk as temp.c
gcc is called to compile temp.c and generate temp.out executable
if fail, notify user, exit
gdb is called to run the temp.out executable, with a breakpoint set at the return 0; line
if fail, notify user, exit
execution is returned to the prompt; the next commands the user enters, are in fact passed to gdb (so the user can use commands like p variable to inspect) - until the user presses, say, Ctrl+1 to exit gdb
Ctrl+1 - gdb exits, control is returned to our application - which waits for the next code line all over again.. etc
(subsequent code line entries are kept in the temp file - placed below the last entry from the same category)
Obviously, I wouldn't expect to be able to paste the entire linux kernel code into an application like this, and expect it to work :) However, I would expect to be able to paste in a couple of structs, and to inspect the results of statements like, say:
char dat = (char) (*(int16_t*)(my->structure->pdata) >> 32 & 0xFF) ^ 0x88;
... so I'm sure in what is the proper syntax to use (which is usually what I mess up with) - without the overhead of rebuilding and debugging the entire software, just to figure out whether I should have moved a right parenthesis before or after the asterisk sign (in the cases when such an action doesn't raise a compilation error, of course).
Now, I'm not sure of the entire scope of problems that can arise from a simplistic application architecture as above. But, it's an example, that simply points that something like a "C shell" (for relatively simple sessions/programs) would be conceptually doable, by also using gcc and gdb - without any serious clashes with the, otherwise, strict distinction between 'machine code' and 'interpreted' languages.
There are C interpreters.
Look for Ch or CINT.
Edit: found a new (untested) thing that appears to be what the OP wants
Or just use it [...] like driving a Ferarri on city streets.
Tiny C Compiler
[... many features, including]
C script supported : just add '#!/usr/local/bin/tcc -run' at the first line of your C source, and execute it directly from the command line.
When your CPU runs a computer program, it runs something called machine code. This is a series of binary instructions that are specific to the CPU that you are using. Since machine code is quite hard to hand code, people invented higher level languages like C and C++. Unfortunately the CPU only understands machine code. So what happens is that we run a compiler that converts the high-level source language into machine code. Computer languages in this class are compiled language like C and C++. These languages are said to run natively since the generated machine code is run by the CPU without any further interpretation.
Now certain languages like Python, Bash and Perl do not need to be compiled beforehand and are rather interpreted. This means that the source file is read line by line by the interpreter and the correct task for the line is performed. This gives you the ability run stuff in an interactive shell as we see in Python.
The problem with running C /C++ source interactively is that the compiler is not able to perform line by line interpretation of the code. It is designed solely to generate corresponding machine code and therefore cannot run your C / C++ source interactively.
#buddhabrot and #pmg - thank you for your answers!
For the benefit of n00bery, here is a summary of the answers (as I couldn't immediately grasp what is going on): what I needed (in OP) is handled by what is called a "C Interpreter" (not a 'C shell'), of which the following were suggested:
CINT | ROOT - Ubuntu: install as sudo apt-get install root-system-bin (5.18.00-2.3ubuntu4 + 115MB of additional disk space)
c-repl (c-repl README)- Ubuntu: install as sudo apt-get install c-repl (c-repl_0.0.20071223-1_i386.deb + 106kB of additional disk space)
Ch standard edition - standard edition is freeware for windows/Unix
For c-repl - there is a quick tutorial on c-repl homepage as an example session; but here is how the same commands behave on my Ubuntu Lucid system, with the repository version (edit: see Where can I find c-repl documentation? for a better example):
$ c-repl
> int x = 3
> ++x
> .p x
unknown command: p
> printf("%d %p\n", x, &x)
4 0xbbd014
> .t fprintf
repl is ok
> #include <unistd.h>
<stdin>:1:22: warning: extra tokens at end of #include directive
> getp
p getp
No symbol "getp" in current context.
> printf("%d\n", getpid())
> [Ctrl+C]
/usr/bin/c-repl:185:in `readline': Interrupt
from /usr/bin/c-repl:185:in `input_loop'
from /usr/bin/c-repl:184:in `loop'
from /usr/bin/c-repl:184:in `input_loop'
from /usr/bin/c-repl:203
Apparently, it would be best to build c-repl from latest source.
For cint it was a bit difficult to find something relateed to it directly (the webpage refers to ROOT Tutorials instead), but then I found "Le Huy: Using CINT - C/C++ Interpreter - Basic Commands"; and here is an example session from my system:
(Note: if cint is not available on your distribution's package root-system-bin, try root instead.)
$ cint
cint : C/C++ interpreter (mailing list '')
Copyright(c) : 1995~2005 Masaharu Goto (
revision : 5.16.29, Jan 08, 2008 by M.Goto
No main() function found in given source file. Interactive interface started.
'?':help, '.q':quit, 'statement','{statements;}' or '.p [expr]' to evaluate
cint> L iostream
Error: Symbol Liostream is not defined in current scope (tmpfile):1:
*** Interpreter error recovered ***
cint> {#include <iostream>}
cint> files
Error: Symbol files is not defined in current scope (tmpfile):1:
*** Interpreter error recovered ***
cint> {int x=3;}
cint> {++x}
Syntax Error: ++x Maybe missing ';' (tmpfile):2:
*** Interpreter error recovered ***
cint> {++x;}
cint> .p x
cint> printf("%d %p\n", x, &x)
4 0x8d57720
(const int)12
cint> printf("%d\n", getpid())
Error: Function getpid() is not defined in current scope (tmpfile):1:
*** Interpreter error recovered ***
cint> {#include <unistd.h>}
cint> printf("%d\n", getpid())
(const int)6
cint> .q
Bye... (try 'qqq' if still running)
In any case, that is exactly what I needed: ability to load headers, add variables, and inspect the memory they will take! Thanks again, everyone - Cheers!
Python and c belongs to different kinds of language. Python is interpreted line by line when running, but c should compile, link and generate code to run.
