PyCuda - use *.cubin - named symbol not found - nvcc

I try to use a compiled *.cubin file with PyCuda but I get this error:
func = mod.get_function("doublify")
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found
Content of doublify.cu:
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y * 4;
a[idx] *= 2;
}
I compiled it with the following command:
nvcc --cubin -arch sm_75 doublify.cu
This is my python script:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4, 4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
mod = pycuda.driver.module_from_file("doublify.cubin")
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a)
Do I need to pass in additional flags to the nvcc compiler? If I use it with the SourceModule from Pycuda everything is working as expected. It's also not working with compiling a *.fatbin

Figure it out myself after debugging PyCuda itself. If anyone else stumbles upon the same problem, this is the solution:
I was missing the extern "C" statement at the beginning of the *.cu file.
extern "C"
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y * 4;
a[idx] *= 2;
}

Related

On Mac OS X 10.11 nvcc doesn't like complex if used with .cu extension

I have a simple code which compiles differently depending whether it's saved with .c or .cu extension:
#include <stdio.h>
#include <complex.h>
int main()
{
float num;
float eps_i, eps_s, tau_d, sigma;
float pi, wave_freq, eps_0;
eps_i = 43.0;
eps_s = 2.4;
tau_d = 0.3;
sigma = 4.75;
pi = 3.14;
wave_freq = 0.015;
eps_0 = 40.234;
float complex c1 = 0.0 + 2.0 * pi * wave_freq * tau_d * I;
float complex c2 = 0.0 + sigma / (2.0 * pi * wave_freq * eps_0) * I;
num = creal(eps_i + (eps_s - eps_i) / (1.0 + (0.0 + 2.0 * pi * wave_freq * tau_d * I)) -
(0.0 + sigma / (2.0 * pi * wave_freq * eps_0) * I));
printf("%g\n", num);
}
If I compile it using nvcc test.c it works exactly how I expected it. However if I run nvcc test.cu I get:
test.cu(18): error: expected a ";"
test.cu(19): error: "complex" has already been declared in the current scope
test.cu(19): error: expected a ";"
test.cu(21): error: identifier "I" is undefined
test.cu(21): error: identifier "creal" is undefined
test.cu(18): warning: variable "complex" was declared but never referenced
test.cu(19): warning: variable "complex" was declared but never referenced
5 errors detected in the compilation of "/var/folders/3z/8bl4b3yx0c3_5tgf35dr_z180000gp/T//tmpxft_00015a75_00000000-9_test.cpp1.ii".
I understand that .cu is treated as code containing CUDA code and .c is just host code, but I would expect them to behave the same in this instance. Notice that it doesn't complain about #include <complex.h> at all. What am I missing?
From comments, this problem would appear to be specific to CUDA 7.5 on OS X El Capitan with the latest XCode. The provided repro case seems to work on every other platform it was tested on. It has been recommended that a bug report be raised with NVIDIA.
[This answer has been added as a community wiki entry to get this question off the unanswered list]

How to compile and link C and ASM together on Windows for my OS

I have a problem with my 32-bit protected mode OS project Sinatra. I can compile sources to object files, but I don't know how to link these together. I use NASM and TDM-GCC on Windows. I have fixed problems with my code so it compiles. I have removed the comments for brevity.
My file boot.asm:
[BITS 32]
[global start]
[extern _JlMain]
start:
cli
call _JlMain
hlt
My file JSinatra.h:
#ifndef __SINATRA_H__
#define __SINATRA_H__
#define JWhiteText 0x07
void JlMain();
void JlClearScreen();
unsigned int JlPrintF(char * message, unsigned int line);
#endif
My file JSinatra.c:
#include "JSinatra.h"
void JlClearScreen() // clear entire screen
{
char * vidmem = (char * ) 0xb8000;
unsigned int i = 0;
while (i < (80 * 25 * 2)) {
vidmem[i] = ' ';
i += 1;
vidmem[i] = JWhiteText;
i += 1;
}
}
unsigned int JlPrintF(char * message, unsigned int line) {
char * vidmem = (char * ) 0xb8000;
unsigned int i = 0;
i = line * 80 * 2;
while ( * message != 0) {
if ( * message == '\n') {
line += 1;
i = (line * 80 * 2); * message += 1;
} else {
vidmem[i] = * message; * message += 1;
i += 1;
vidmem[i] = JWhiteText;
i += 1;
}
}
return (1);
}
void JlMain() {
JlClearScreen();
JlPrintF("Sinatra v0 Virgin/Kernel Mode\n", 0);
}
I need to load my OS starting at absolute address 0x100000. How can I properly compile and link my code to create a binary image?
First of all, if you're compiling to ELF, then you mustn't add an initial underscore before functions in assembly.
Now, in order to link different source files together, you obviously have to get them to common ground, which is in this case, object code.
So, what you'll do is:
Assemble the assembly source files to object code.
Compile but not link C source files to object code. In gcc: gcc -c file.c -o file.o
Link those together. In gcc: gcc cfile.o asfile.o -o app
Using GCC-TDM and NASM on Windows
Because you are targeting an OS being loaded at an absolute address without C-runtimes you'll need to make sure you compile as freestanding code; that your asm and C files target the same type of object (win32/PECOFF); and the last step will be converting the PECOFF file to a binary image.
To compile C files you would use something like:
gcc -m32 -ffreestanding -c JSinatra.c -o JSinatra.o
To assemble the asm files you would use something like:
nasm -f win32 boot.asm -o boot.o
To link them together you have to do it in two steps:
ld -m i386pe -T NUL -o sinatra.tmp -Ttext 0x100000 boot.o JSinatra.o
The ld command above will create a temporary file sinatra.tmp that is a 32-bit PECOFF executable. You then need to convert sinatra.tmp to a binary image with a command like:
objcopy -O binary sinatra.tmp sinatra.img
You should then have a binary image in the file sinatra.img

How to get a pointer to a binary section in Mac OS X?

I'm writing some code which stores some data structures in a special named binary section. These are all instances of the same struct which are scattered across many C files and are not within scope of each other. By placing them all in the named section I can iterate over all of them.
This works perfectly with GCC and GNU ld. Fails on Mac OS X due to missing __start___mysection and __stop___mysection symbols. I guess llvm ld is not smart enough to provide them automatically.
In GCC and GNU ld, I use __attribute__((section(...)) plus some specially named extern pointers which are magically filled in by the linker. Here's a trivial example:
#include <stdio.h>
extern int __start___mysection[];
extern int __stop___mysection[];
static int x __attribute__((section("__mysection"))) = 4;
static int y __attribute__((section("__mysection"))) = 10;
static int z __attribute__((section("__mysection"))) = 22;
#define SECTION_SIZE(sect) \
((size_t)((__stop_##sect - __start_##sect)))
int main(void)
{
size_t sz = SECTION_SIZE(__mysection);
int i;
printf("Section size is %u\n", sz);
for (i=0; i < sz; i++) {
printf("%d\n", __start___mysection[i]);
}
return 0;
}
What is the general way to get a pointer to the beginning/end of a section with FreeBSD linker. Anyone have any ideas?
For reference linker is:
#(#)PROGRAM:ld PROJECT:ld64-127.2
llvm version 3.0svn, from Apple Clang 3.0 (build 211.12)
Similar question was asked about MSVC here: How to get a pointer to a binary section in MSVC?
You can get the Darwin linker to do this for you.
#include <stdio.h>
extern int start_mysection __asm("section$start$__DATA$__mysection");
extern int stop_mysection __asm("section$end$__DATA$__mysection");
// If you don't reference x, y and z explicitly, they'll be dead-stripped.
// Prevent that with the "used" attribute.
static int x __attribute__((used,section("__DATA,__mysection"))) = 4;
static int y __attribute__((used,section("__DATA,__mysection"))) = 10;
static int z __attribute__((used,section("__DATA,__mysection"))) = 22;
int main(void)
{
long sz = &stop_mysection - &start_mysection;
long i;
printf("Section size is %ld\n", sz);
for (i=0; i < sz; ++i) {
printf("%d\n", (&start_mysection)[i]);
}
return 0;
}
Using Mach-O information:
#include <mach-o/getsect.h>
char *secstart;
unsigned long secsize;
secstart = getsectdata("__SEGMENT", "__section", &secsize);
The above gives information about section declared as:
int x __attribute__((section("__SEGMENT,__section"))) = 123;
More information: https://developer.apple.com/library/mac/documentation/developertools/conceptual/machoruntime/Reference/reference.html

C: Dereferencing pointer to incomplete type error

The project I am trying to compile on OS X is: https://github.com/Ramblurr/PietCreator
Am unfortunately unable to fix the problems with the following lines:
width = info_ptr->width;
height = info_ptr->height;
ncol = 2 << (info_ptr->bit_depth - 1);
Which produce the errors:
file.c: In function ‘read_png’:
file.c:1117: error: dereferencing pointer to incomplete type
file.c:1118: error: dereferencing pointer to incomplete type
file.c:1119: error: dereferencing pointer to incomplete type
Full code of the read_png function below:
#include <png.h>
#include <math.h>
png_byte bit_depth;
png_structp png_ptr;
png_infop info_ptr;
int number_of_passes;
png_bytep * row_pointers;
int
read_png (char *fname)
{
char header [8];
FILE *in;
int i, j, ncol, rc;
if (! strcmp (fname, "-")) {
/* read from stdin: */
vprintf ("info: not trying to read png from stdin\n");
return -1;
}
if (! (in = fopen (fname, "rb"))) {
fprintf (stderr, "cannot open `%s'; reason: %s\n", fname,
strerror (errno));
return -1;
}
if (! in || (rc = fread (header, 1, 8, in)) != 8
|| png_sig_cmp ((unsigned char *) header, 0, 8) != 0) {
return -1;
}
if (! (png_ptr = png_create_read_struct (PNG_LIBPNG_VER_STRING, 0, 0, 0))
|| ! (info_ptr = png_create_info_struct (png_ptr))) {
return -1;
}
png_init_io (png_ptr, in);
png_set_sig_bytes (png_ptr, 8);
png_read_png (png_ptr, info_ptr,
PNG_TRANSFORM_STRIP_16 | PNG_TRANSFORM_STRIP_ALPHA
| PNG_TRANSFORM_EXPAND, NULL);
/** | PNG_TRANSFORM_PACKING | PNG_TRANSFORM_SHIFT **/
row_pointers = png_get_rows (png_ptr, info_ptr);
width = info_ptr->width;
height = info_ptr->height;
ncol = 2 << (info_ptr->bit_depth - 1);
vprintf ("info: got %d x %d pixel with %d cols\n", width, height, ncol);
alloc_cells (width, height);
for (j = 0; j < height; j++) {
png_byte *row = row_pointers [j];
for (i = 0; i < width; i++) {
png_byte *ptr = & row [i * 3];
/* ncol always 256 ? */
int r = (ptr [0] * 256) / ncol;
int g = (ptr [1] * 256) / ncol;
int b = (ptr [2] * 256) / ncol;
int col = ((r * 256 + g) * 256) + b;
int col_idx = get_color_idx (col);
if (col_idx < 0) {
if (unknown_color == -1) {
fprintf (stderr, "cannot read from `%s'; reason: invalid color found\n",
fname);
return -1;
} else {
/* set to black or white: */
col_idx = (unknown_color == 0 ? c_black : c_white);
}
}
set_cell (i, j, col_idx);
}
}
return 0;
}
I think it is by design of the creator of png.h module.
It should be that png_infop is declared as a pointer to a struct in "png.h". The actual struct declaration and definition should be in "png.c".
The author does not want to expose the internals of the struct so the struct is defined in the "png.c".
This means you cannot access any member of the struct (i.e: info_ptr->width, info_ptr->height, info_ptr->bit_depth.
The struct members are not meant to be accessed by user.
I bet there are functions to access those members if the author thinks that you will need the width, height, or bit_depth information (i.e: getWidth(info_ptr), getHeight(info_ptr), ...).
You need to look in png.h (or its documentation), find out what the type png_infop is a pointer to, and then find out how you're supposed to access its fields. Assuming that this pointer is really the right thing to get that data from, then either you need to include the definition of that type (so that the compiler knows about its data members width etc) from some other header, or else there are getter functions you're supposed to call that take a png_infop parameter and return the info you're after.
[Edit: looks as if you're supposed to use png_get_IHDR, or png_get_image_width etc.]
I compiled the project on Mac OS X 10.6.8 successfully.
git clone https://github.com/Ramblurr/PietCreator.git
cd PietCreator
mkdir build
cd build
cmake ../
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Looking for Q_WS_X11
-- Looking for Q_WS_X11 - not found.
-- Looking for Q_WS_WIN
-- Looking for Q_WS_WIN - not found.
-- Looking for Q_WS_QWS
-- Looking for Q_WS_QWS - not found.
-- Looking for Q_WS_MAC
-- Looking for Q_WS_MAC - found
-- Looking for QT_MAC_USE_COCOA
-- Looking for QT_MAC_USE_COCOA - found
-- Found Qt-Version 4.7.4 (using /usr/local/bin/qmake)
-- Looking for gdImagePng in /usr/local/lib/libgd.dylib
-- Looking for gdImagePng in /usr/local/lib/libgd.dylib - found
-- Found ZLIB: /usr/include (found version "1.2.3")
-- Found PNG: /usr/X11R6/lib/libpng.dylib
-- Looking for gdImageJpeg in /usr/local/lib/libgd.dylib
-- Looking for gdImageJpeg in /usr/local/lib/libgd.dylib - found
-- Found JPEG: /usr/local/lib/libjpeg.dylib
-- Looking for gdImageGif in /usr/local/lib/libgd.dylib
-- Looking for gdImageGif in /usr/local/lib/libgd.dylib - found
-- Found GD: /usr/local/lib/libgd.dylib
-- Found GIF: /usr/local/lib/libgif.dylib
-- Looking for include files HAVE_GD_H
-- Looking for include files HAVE_GD_H - found
-- Looking for include files HAVE_PNG_H
-- Looking for include files HAVE_PNG_H - not found.
-- Looking for include files HAVE_GIF_LIB_H
-- Looking for include files HAVE_GIF_LIB_H - found
-- Configuring done
-- Generating done
-- Build files have been written to: /Developer/workspace/png/PietCreator/build
After running make the application was compiled successfully:
Linking CXX executable pietcreator
[ 95%] Built target pietcreator
[ 97%] Generating NPietTest.moc
Scanning dependencies of target npiettest
[100%] Building CXX object npiet/CMakeFiles/npiettest.dir/test/NPietTest.cpp.o
Linking CXX executable npiettest
[100%] Built target npiettest
The only problems I ran into were all related to missing dependencies when executing cmake ../. I had to download/compile/install Qt 4.7.4 and Qt-mobility 1.2.0. After that, I also needed libgd and giflib but then I used brew for the job.
I suggest you try another git clone and try to compile it again from scratch.
If you would like to know, brew installed gd 2.0.36RC1 and giflib 4.1.6.
You usually get this error when you have a forward declaration in the header and you don't include the file for the corresponding class in the source file:
//header.h
class B;
class A{
A();
B* b;
}
//source.cpp
#include "header.h"
//include "B.h" //include header where B is defined to prevent error
A::A()
{
b->foo(); //error, B is only a forward declaration
}
Therefore, you need to include the appropriate header, a forward declaration is not enough.
incomplete reference errors typically occur when you have a construct as under:
struct foo;
int bar(void) {
struct foo *p;
p->a = 0;
}
here The foo struct is declared however its actual content is not known, And will yield a dereferencing to an incomplete type error.
The philosophy behind this is to force a formal API usage to manipulate data structures. this ensures that future API can change the structure much more easily without affecting legacy programs.
So typically an API header will do something like this:
/*
* foo.h part of foo API
*/
struct foo;
extern void foo_set_a(struct foo *p, int value);
extern int foo_get_a(struct foo *p);
It will internally implement the foo API functions ... for instance:
/*
* foo.c ... implements foo API
*/
struct foo {
int a;
};
void foo_set_a(struct foo *p, int value) {
p->a = value;
}
int foo_get_a(struct foo *p) {
return p->a;
}
and then the user of foo API can:
use_foo() {
struct foo *my_foo;
foo_set_a(my_foo, 1);
}

Some issue with Atomic add in CUDA kernel operation

I'm having a issue with my kernel.cu class
Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:
kernel.cu(17): error: identifier "atomicAdd" is undefined
My code:
#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd
__global__ void dot (int *a, int *b, int *c){
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
int sum = 0;
for( int i = 0; i<THREADS_PER_BLOCK; i++)
sum += temp[i];
atomicAdd(c, sum);
}
}
Some suggest?
You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:
nvcc -arch=sm_11 -v kernel.cu -o kernel.o
and see what happens.
EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.
Today with the latest cuda SDK and toolkit this solution will not work.
People also say that adding:
compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;
to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.
You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:
compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

Resources