cudaMalloc gives a NULL pointer [duplicate] - c

I'm new to CUDA/C and new to stack overflow. This is my first question.
I'm trying to allocate memory dynamically in a kernel function, but the results are unexpected.
I read using malloc() in a kernel can lower performance a lot, but I need it anyway so I first tried with a simple int ** array just to test the possibility, then I'll actually need to allocate more complex structs.
In my main I used cudaMalloc() to allocate the space for the array of int *, and then I used malloc() for every thread in the kernel function to allocate the array for every index of the outer array. I then used another thread to check the result, but it doesn't always work.
Here's main code:
#define N_CELLE 1024*2
#define L_CELLE 512
extern "C" {
int main(int argc, char **argv) {
int *result = (int *)malloc(sizeof(int));
int *d_result;
int size_numbers = N_CELLE * sizeof(int *);
int **d_numbers;
cudaMalloc((void **)&d_numbers, size_numbers);
cudaMalloc((void **)&d_result, sizeof(int *));
kernel_one<<<2, 1024>>>(d_numbers);
cudaDeviceSynchronize();
kernel_two<<<1, 1>>>(d_numbers, d_result);
cudaMemcpy(result, d_result, sizeof(int), cudaMemcpyDeviceToHost);
printf("%d\n", *result);
cudaFree(d_numbers);
cudaFree(d_result);
free(result);
}
}
I used extern "C"because I could't compile while importing my header, which is not used in this example code. I pasted it since I don't know if this may be relevant or not.
This is kernel_one code:
__global__ void kernel_one(int **d_numbers) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
d_numbers[i] = (int *)malloc(L_CELLE*sizeof(int));
for(int j=0; j<L_CELLE;j++)
d_numbers[i][j] = 1;
}
And this is kernel_two code:
__global__ void kernel_two(int **d_numbers, int *d_result) {
int temp = 0;
for(int i=0; i<N_CELLE; i++) {
for(int j=0; j<L_CELLE;j++)
temp += d_numbers[i][j];
}
*d_result = temp;
}
Everything works fine (aka the count is correct) until I use less than 1024*2*512 total blocks in device memory. For example, if I #define N_CELLE 1024*4 the program starts giving "random" results, such as negative numbers.
Any idea of what the problem could be?
Thanks anyone!

In-kernel memory allocation draws memory from a statically allocated runtime heap. At larger sizes, you are exceeding the size of that heap and then your two kernels are attempting to read and write from uninitialised memory. This produces a runtime error on the device and renders the results invalid. You would already know this if you either added correct API error checking on the host side, or ran your code with the cuda-memcheck utility.
The solution is to ensure that the heap size is set to something appropriate before trying to run a kernel. Adding something like this:
size_t heapsize = sizeof(int) * size_t(N_CELLE) * size_t(2*L_CELLE);
cudaDeviceSetLimit(cudaLimitMallocHeapSize, heapsize);
to your host code before any other API calls, should solve the problem.

I don't know anything about CUDA but these are severe bugs:
You cannot convert from int** to void**. They are not compatible types. Casting doesn't solve the problem, but hides it.
&d_numbers gives the address of a pointer to pointer which is wrong. It is of type int***.
Both of the above bugs result in undefined behavior. If your program somehow seems to works in some condition, that's just by pure (bad) luck only.

Related

How to initialize array size in a library in C?

I'm creating a C-library with .h and .c files for a ring buffer. Ideally, you would initialize this ring buffer library in the main project with something like ringbuff_init(int buff_size); and the size that is sent, will be the size of the buffer. How can I do this when arrays in C needs to be initialized statically?
I have tried some dynamically allocating of arrays already, I did not get it to work. Surely this task is possible somehow?
What I would like to do is something like this:
int buffSize[];
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(int buff_size)
{
buffSize[buff_size];
}
This obviously doesn't compile because the array should have been initialized at the declaration. So my question is really, when you make a library for something like a buffer, how can you initialize it in the main program (so that in the .h/.c files of the buffer library) the buffer size is set to the wanted size?
You want to use dynamic memory allocation. A direct translation of your initial attempt would look like this:
size_t buffSize;
int * buffer;
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(size_t buff_size)
{
buffSize = buff_size;
buffer = malloc(buff_size * sizeof(int));
}
This solution here is however extremely bad. Let me list the problems here:
There is no check of the result of malloc. It could return NULL if the allocation fails.
Buffer size needs to be stored along with the buffer, otherwise there's no way to know its size from your library code. It isn't exactly clean to keep these global variables around.
Speaking of which, these global variables are absolutely not thread-safe. If several threads call functions of your library, results are inpredictible. You might want to store your buffer and its size in a struct that would be returned from your init function.
Nothing keeps you from calling the init function several times in a row, meaning that the buffer pointer will be overwritten each time, causing memory leaks.
Allocated memory must be eventually freed using the free function.
In conclusion, you need to think very carefully about the API you expose in your library, and the implementation while not extremely complicated, will not be trivial.
Something more correct would look like:
typedef struct {
size_t buffSize;
int * buffer;
} RingBuffer;
int ringbuffer_init(size_t buff_size, RingBuffer * buf)
{
if (buf == NULL)
return 0;
buf.buffSize = buff_size;
buf.buffer = malloc(buff_size * sizeof(int));
return buf.buffer != NULL;
}
void ringbuffer_free(RingBuffer * buf)
{
free(buf.buffer);
}
int main(void)
{
RingBuffer buf;
int ok = ringbuffer_init(100, &buf); // initialize buffer size to 100
// ...
ringbuffer_free(&buf);
}
Even this is not without problems, as there is still a potential memory leak if the init function is called several times for the same buffer, and the client of your library must not forget to call the free function.
Static/global arrays can't have dynamic sizes.
If you must have a global dynamic array, declare a global pointer instead and initialize it with a malloc/calloc/realloc call.
You might want to also store its size in an accompanying integer variable as sizeof applied to a pointer won't give you the size of the block the pointer might be pointing to.
int *buffer;
int buffer_nelems;
char *ringbuffer_init(int buff_size)
{
assert(buff_size > 0);
if ( (buffer = malloc(buff_size*sizeof(*buffer)) ) )
buffer_nelems = buff_size;
return buffer;
}
You should use malloc function for a dynamic memory allocation.
It is used to dynamically allocate a single large block of memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any form.
Example:
// Dynamically allocate memory using malloc()
buffSize= (int*)malloc(n * sizeof(int));
// Initialize the elements of the array
for (i = 0; i < n; ++i) {
buffSize[i] = i + 1;
}
// Print the elements of the array
for (i = 0; i < n; ++i) {
printf("%d, ", buffSize[i]);
}
I know I'm three years late to the party, but I feel I have an acceptable solution without using dynamic allocation.
If you need to do this without dynamic allocation for whatever reason (I have a similar issue in an embedded environment, and would like to avoid it).
You can do the following:
Library:
int * buffSize;
int buffSizeLength;
void ringbuffer_init(int buff_size, int * bufferAddress)
{
buffSize = bufferAddress;
buffSizeLength = buff_size;
}
Main :
#define BUFFER_SIZE 100
int LibraryBuffer[BUFFER_SIZE];
int main(void)
{
ringbuffer_init(BUFFER_SIZE, LibraryBuffer ) // initialize buffer size to 100
}
I have been using this trick for a while now, and it's greatly simplified some parts of working with a library.
One drawback: you can technically mess with the variable in your own code, breaking the library. I don't have a solution to that yet. If anyone has a solution to that I would love to here it. Basically good discipline is required for now.
You can also combine this with #SirDarius 's typedef for ring buffer above. I would in fact recommend it.

What is the correct way to temporarily cast void* for arithmetic?

I am C novice but been a programmer for some years, so I am trying to learn C by following along Stanford's course from 2008 and doing Assignment 3 on Vectors in C.
It's just a generic array basically, so the data is held inside a struct as a void *. The compiler flag -Wpointer-arith is turned on so I can't do arithmetic (and I understand the reasons why).
The struct around the data must not know what type the data is, so that it is generic for the caller.
To simplify things I am trying out the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
void *data;
int aindex;
int elemSize;
} trial;
void init(trial *vector, int elemSize)
{
vector->aindex = 0;
vector->elemSize = elemSize;
vector->data = malloc(10 * elemSize);
}
void add(trial *vector, const void *elemAddr)
{
if (vector->aindex != 0)
vector->data = (char *)vector->data + vector->elemSize;
vector->aindex++;
memcpy(vector->data, elemAddr, sizeof(int));
}
int main()
{
trial vector;
init(&vector, sizeof(int));
for (int i = 0; i < 8; i++)
{add(&vector, &i);}
vector.data = (char *)vector.data - ( 5 * vector.elemSize);
printf("%d\n", *(int *)vector.data);
printf("%s\n", "done..");
free(vector.data);
return 0;
}
However I get an error at free with free(): invalid pointer. So I ran valgrind on it and received the following:
==21006== Address 0x51f0048 is 8 bytes inside a block of size 40 alloc'd
==21006== at 0x4C2CEDF: malloc (vg_replace_malloc.c:299)
==21006== by 0x1087AA: init (pointer_arithm.c:13)
==21006== by 0x108826: main (pointer_arithm.c:29)
At this point my guess is I am either not doing the char* correctly, or maybe using memcpy incorrectly
This happens because you add eight elements to the vector, and then "roll back" the pointer by only five steps before attempting a free. You can easily fix that by using vector->aindex to decide by how much the index is to be unrolled.
The root cause of the problem, however, is that you modify vector->data. You should avoid modifying it in the first place, relying on a temporary pointer inside of your add function instead:
void add(trial *vector, const void *elemAddr, size_t sz) {
char *base = vector->data;
memcpy(base + vector->aindex*sz, elemAddr, sz);
vector->aindex++;
}
Note the use of sz, you need to pass sizeof(int) to it.
Another problem in your code is when you print by casting vector.data to int*. This would probably work, but a better approach would be to write a similar read function to extract the data.
If you don't know the array's data type beforehand, you must assume a certain amount of memory when you first initialize it, for example, 32 bytes or 100 bytes. Then if you run out of memory, you can expand using realloc and copying over your previous data to the new slot. The C++ vector IIRC follows either a x2 or x2.2 ratio to reallocate, not sure.
Next up is your free. There's a big thing you must know here. What if the user were to send you a memory allocated object of their own? For example a char* that they allocated previously? If you simply delete the data member of your vector, that won't be enough. You need to ask for a function pointer in case the data type is something that requires special attention as your input to add.
Lastly you are doing a big mistake at this line here:
if (vector->aindex != 0)
vector->data = (char *)vector->data + vector->elemSize;
You are modifiyng your pointer address!!! Your initial address is lost here! You must never do this. Use a temporary char* to hold your initial data address and manipulate it instead.
Your code is somewhat confusing, there's probably a mis-understanding or two hiding in there.
A few observations:
You can't change a pointer returned by malloc() and then pass the new value to free(). Every value passed to free() must be the exact same value returned by one of the allocation functions.
As you've guessed, the copying is best done by memcpy() and you have to cast to char * for the arithmetic.
The function to append a value could be:
void add(trial *vector, const void *element)
{
memcpy((char *) vector->data + vector->aindex * vector->elemSize, element);
++vector->aindex;
}
Of course this doesn't handle overflowing the vector, since the length is not stored (I didn't want to assume it was hard-coded at 10).
Changing the data value in vector for each object is very odd, and makes things more confusing. Just add the required offset when you need to access the element, that's super-cheap and very straight forward.

How to assign some part of dynamic array to whole static array

The size of dynamic array is the twice the size of static array. I want to assign the values which starts from (N/2)-1 to N-1 of dynamic array to whole static array.
The only way is copying the values with a loop?
My code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char *argv[])
{
int N=100, pSize=4, lSize, i;
double *A;
lSize=N/sqrt(pSize);
/* memory allocation */
A=(double*)malloc(sizeof(double)*N);
double B[lSize];
/* memory allocation has been done */
/* initilize arrays */
for(i=0; i<lSize; i++){
B[i]=rand()% 10;
}
A=B;
for (i=0; i<lSize; i++){
fprintf(stdout,"%f\n", A[i]);
}
return 0;
}
You can use the memcpy function to copy the data. For your example you want to copy the last half of A to B so could do something like:
memcpy(&B[0], &A[lSize-1], lSize * sizeof(double));
Note: On the MinGW compiler I was using, it was requiring that I declare the destination as &B[0], I thought I could get away with just B. It may be due to configuration I have (I don't use the C compiler all that much, normally just use g++ for quick C++ test cases).
You can use memcpy to copy contiguous chunks of memory around.
Your program leaks your allocation, which is probably bad - is that A=B intended to be where you would put the code that copies the array?
It may be possible, depending on your architecture, to do a copy without a CPU loop (via a call to a DMA engine or something). In standard C, you have no choice but to loop. You can either do it yourself or you can call memcpy(3), memmove(3), or bcopy(3) if you prefer to use the library's implementations.
As said, you need to use memcpy:
#define N 100
int staticarray[N];
int *pointer = (int*) malloc( sizeof(int)*N*2 );
memcpy( staticarray, (pointer + ((N/2) - 1)), sizeof(int)*N );

Exception on malloc for a structure in C

I have a structure defined like so:
typedef struct {
int n;
int *n_p;
void **list_pp;
size_t rec_size;
int n_buffs;
size_t buff_size
} fl_hdr_type;
and in my code I Have a function for initlialization that has the following
fl_hdr_type *fl_hdr;
fl_hdr = malloc(sizeof(fl_hdr_type) + (buff_size_n * rec_size_n));
where those buffer size are passed in to the function to allow space for the buffers as well.
The size is pretty small typically..100*50 or something like that..plenty of memory on this system to allocate it.
I can't actually post the stack trace because this code is on another network, but some information pulled from dbx on the core file:
buff_size_n = 32, rec_size_n = 186
and the stack..line numbers from malloc.c
t_splay:861
t_delete:796
realfree: 531
cleanfree:945
_malloc:230
_malloc:186
Any ideas why this fails?
Try running your program through valgrind, see what it reports. It's possible in some other part of the program you have corrupted free lists or something else malloc looks at.
What you need to do is simply do this.
fl_hdr = malloc(sizeof(fl_hdr_type));
The list_pp is a dynamic array of void* and you need to allocate that to the size you need with another malloc.
list_pp is simply a pointer to something else that is allocated on then heap.
If you want to allocate in place with one malloc, then you will need to define it as an array of the actual types you want. The compiler needs to know the types to be able to perform the allocation.
If what you are looking for is dynamic arrays in C, then look at this.
You need to explicitly assign n_p and list_pp to the appropriate offsets.
fl_hdr_type *fl_hdr;
fl_hdr = malloc(sizeof(fl_hdr_type) + (buff_size_n * rec_size_n));
fl_hdr->n_p = fl_hdr+sizeof(fl_hdr_type);
fl_hdr->list_pp = fl_hdr->n_p + (num_n * sizeof(int));
If you're going to do this, I'd recommend putting the pointers at the end of the struct, instead of the middle. I'm with Romain, though, and recommend you use separate calls to malloc() instead of grabbing everything with one call.
I made your example into a program, and have absolutely no issues running it. If you can compile and run this simple code (and it works), you have corrupted the heap somewhere else in your program. Please run it through Valgrind (edit as User275455 suggested, I did not notice the reply) and update your question with the output that it gives you.
Edit
Additionally, please update your question to indicate exactly what you are doing with **list_pp and *n_p after allocating the structure. If you don't have access to valgrind, at least paste the entire trace that glibc printed when the program crashed.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int n;
int *n_p;
void **list_pp;
size_t rec_size;
int n_buffs;
size_t buff_size;
} fl_hdr_type;
static size_t buff_size_n = 50;
static size_t rec_size_n = 100;
static fl_hdr_type *my_init(void)
{
fl_hdr_type *fl_hdr = NULL;
fl_hdr = malloc(sizeof(fl_hdr_type) + (buff_size_n * rec_size_n));
return fl_hdr;
}
int main(void)
{
fl_hdr_type *t = NULL;
t = my_init();
printf("Malloc %s\n", t == NULL ? "Failed" : "Worked");
if (t != NULL)
free(t);
return 0;
}

Can you define the size of an array at runtime in C

New to C, thanks a lot for help.
Is it possible to define an array in C without either specifying its size or initializing it.
For example, can I prompt a user to enter numbers and store them in an int array ? I won't know how many numbers they will enter beforehand.
The only way I can think of now is to define a max size, which is not an ideal solution...
Well, you can dynamically allocate the size:
#include <stdio.h>
int main(int argc, char *argv[])
{
int *array;
int cnt;
int i;
/* In the real world, you should do a lot more error checking than this */
printf("enter the amount\n");
scanf("%d", &cnt);
array = malloc(cnt * sizeof(int));
/* do stuff with it */
for(i=0; i < cnt; i++)
array[i] = 10*i;
for(i=0; i < cnt; i++)
printf("array[%d] = %d\n", i, array[i]);
free(array);
return 0;
}
Perhaps something like this:
#include <stdio.h>
#include <stdlib.h>
/* An arbitrary starting size.
Should be close to what you expect to use, but not really that important */
#define INIT_ARRAY_SIZE 8
int array_size = INIT_ARRAY_SIZE;
int array_index = 0;
array = malloc(array_size * sizeof(int));
void array_push(int value) {
array[array_index] = value;
array_index++;
if(array_index >= array_size) {
array_size *= 2;
array = realloc(array, array_size * sizeof(int));
}
}
int main(int argc, char *argv[]) {
int shouldBreak = 0;
int val;
while (!shouldBreak) {
scanf("%d", &val);
shouldBreak = (val == 0);
array_push(val);
}
}
This will prompt for numbers and store them in a array, as you asked. It will terminated when passed given a 0.
You create an accessor function array_push for adding to your array, you call realloc from with this function when you run out space. You double the amount of allocated space each time. At most you'll allocate double the memory you need, at worst you will call realloc log n times, where is n is final intended array size.
You may also want to check for failure after calling malloc and realloc. I have not done this above.
Yes, absolutely. C99 introduced the VLA or Variable Length Array.
Some simple code would be like such:
#include <stdio.h>
int main (void) {
int arraysize;
printf("How bid do you want your array to be?\n");
scanf("%d",&arraysize);
int ar[arraysize];
return 0;
}
Arrays, by definition, are fixed-size memory structures. You want a vector. Since Standard C doesn't define vectors, you could try looking for a library, or hand-rolling your own.
You need to do dynamic allocation: You want a pointer to a memory address of yet-unkown size. Read up on malloc and realloc.
If all you need is a data structure where in you can change its size dynamically then the best option you can go for is a linked list. You can add data to the list dynamically allocating memory for it and this would be much easier!!
If you're a beginner, maybe you don't want to deal with malloc and free yet. So if you're using GCC, you can allocate variable size arrays on the stack, just specifying the size as an expression.
For example:
#include <stdio.h>
void dyn_array(const unsigned int n) {
int array[n];
int i;
for(i=0; i<n;i++) {
array[i]=i*i;
}
for(i=0; i<n;i++) {
printf("%d\n",array[i]);
}
}
int main(int argc, char **argv) {
dyn_array(argc);
return 0;
}
But keep in mind that this is a non standard extension, so you shouldn't count on it if portability matters.
You can use malloc to allocate memory dynamically (i.e. the size is not known until runtime).
C is a low level language: you have to manually free up the memory after it's used; if you don't, your program will suffer from memory leaks.
UPDATE
Just read your comment on another answer.
You're asking for an array with a dynamically-changing-size.
Well, C has no language/syntactic facilities to do that; you either have to implement this yourself or use a library that has already implemented it.
See this question: Is there an auto-resizing array/dynamic array implementation for C that comes with glibc?
For something like this, you might want to look into data structures such as:
Linked Lists (Ideal for this situation)
Various Trees (Binary Trees, Heaps, etc)
Stacks & Queues
But as for instantiating a variable sized array, this isn't really possible.
The closest to a dynamic array is by using malloc and it's associated commands (delete, realloc, etc).
But in this situation, using commands like malloc may result in the need to expand the array, an expensive operation where you initialize another array and then copy the old array into that. Lists, and other datatypes, are generally much better at resizing.
If you're looking for array facilities and don't want to roll your own, try the following:
Glib
Apache APR
NSPR
Above given answers are correct but there is one correction, the function malloc() reserve a block of memory of specified size and return a pointer of type void* which can be casted into pointer of any form.
Syntax: ptr = (cast-type*) malloc(byte-size)
#include<stdio.h>
#include<cstdlib>
int main(int argc,char* argv[]){
int *arraySize,length;
scanf("%d",&length);
arraySize = (int*)malloc(length*sizeof(int));
for(int i=0;i<length;i++)
arraySize[i] = i*2;
for(int i=0;i<length;i++)
printf("arrayAt[%d]=%d\n",i,arraySize[i]);
free(arraySize);
}

Resources