I'm new to CUDA/C and new to stack overflow. This is my first question.
I'm trying to allocate memory dynamically in a kernel function, but the results are unexpected.
I read using malloc() in a kernel can lower performance a lot, but I need it anyway so I first tried with a simple int ** array just to test the possibility, then I'll actually need to allocate more complex structs.
In my main I used cudaMalloc() to allocate the space for the array of int *, and then I used malloc() for every thread in the kernel function to allocate the array for every index of the outer array. I then used another thread to check the result, but it doesn't always work.
Here's main code:
#define N_CELLE 1024*2
#define L_CELLE 512
extern "C" {
int main(int argc, char **argv) {
int *result = (int *)malloc(sizeof(int));
int *d_result;
int size_numbers = N_CELLE * sizeof(int *);
int **d_numbers;
cudaMalloc((void **)&d_numbers, size_numbers);
cudaMalloc((void **)&d_result, sizeof(int *));
kernel_one<<<2, 1024>>>(d_numbers);
cudaDeviceSynchronize();
kernel_two<<<1, 1>>>(d_numbers, d_result);
cudaMemcpy(result, d_result, sizeof(int), cudaMemcpyDeviceToHost);
printf("%d\n", *result);
cudaFree(d_numbers);
cudaFree(d_result);
free(result);
}
}
I used extern "C"because I could't compile while importing my header, which is not used in this example code. I pasted it since I don't know if this may be relevant or not.
This is kernel_one code:
__global__ void kernel_one(int **d_numbers) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
d_numbers[i] = (int *)malloc(L_CELLE*sizeof(int));
for(int j=0; j<L_CELLE;j++)
d_numbers[i][j] = 1;
}
And this is kernel_two code:
__global__ void kernel_two(int **d_numbers, int *d_result) {
int temp = 0;
for(int i=0; i<N_CELLE; i++) {
for(int j=0; j<L_CELLE;j++)
temp += d_numbers[i][j];
}
*d_result = temp;
}
Everything works fine (aka the count is correct) until I use less than 1024*2*512 total blocks in device memory. For example, if I #define N_CELLE 1024*4 the program starts giving "random" results, such as negative numbers.
Any idea of what the problem could be?
Thanks anyone!
In-kernel memory allocation draws memory from a statically allocated runtime heap. At larger sizes, you are exceeding the size of that heap and then your two kernels are attempting to read and write from uninitialised memory. This produces a runtime error on the device and renders the results invalid. You would already know this if you either added correct API error checking on the host side, or ran your code with the cuda-memcheck utility.
The solution is to ensure that the heap size is set to something appropriate before trying to run a kernel. Adding something like this:
size_t heapsize = sizeof(int) * size_t(N_CELLE) * size_t(2*L_CELLE);
cudaDeviceSetLimit(cudaLimitMallocHeapSize, heapsize);
to your host code before any other API calls, should solve the problem.
I don't know anything about CUDA but these are severe bugs:
You cannot convert from int** to void**. They are not compatible types. Casting doesn't solve the problem, but hides it.
&d_numbers gives the address of a pointer to pointer which is wrong. It is of type int***.
Both of the above bugs result in undefined behavior. If your program somehow seems to works in some condition, that's just by pure (bad) luck only.
I have to make a program that would allocate an array of int with mmap() and then count how much memory was actually allocated to it.
So far I have the code that allocates the array, but I don't know how to count the allocated memory. This is what I tried so far:
the handler:
int i;
void segfault_sigaction(int signal, siginfo_t *si, void *arg)
{
printf("ujel segafult, velikost pomnilnika je: %d bajtov\n", sizeof(int) * i);
exit(0);
}
main:
int main(int argc, char* argv[])
{
int *addr;
int n = atoi(argv[1]);
struct sigaction sa;
memset(&sa,0,sizeof(sigaction));
sa.sa_handler = segfault_sigaction;
//sigemptyset(&sa.sa_mask);
//sa.sa_sigaction = segfault_sigaction;
//sa.sa_flags = 0;
sigaction(SIGSEGV,&sa, NULL);
addr = mmap(NULL, n * sizeof(int), PROT_WRITE | PROT_READ ,MAP_ANONYMOUS | MAP_PRIVATE,-1,0);
i = n;
while(1){
addr[i+1] = 5;
i++;
}
return 0;
}
EDIT: after reading your answers and the task description again I changed the code, I put in a handler that catches SEGFAULT, but it is not really reliable(sometimes it works, sometimes the default segmentation fault error occurs).
The amount of memory mapped by mmap() is just what you requested, perhaps rounded up a bit. If you're asking, "How can I see how much it rounded up," the answer is you can't do so in a general way, but some specific platforms may have APIs that let you figure it out. But there's no reason why you'd need to.
The code you've written seems to be "searching" for the end of the allocation by looking for a nonzero part. This results in undefined behavior because you will always run off the end of the allocation. You absolutely must not do this. Just use the size that you passed to mmap() as the size. That's the only size that matters.
I've allocated an "array" of mystruct of size n like this:
if (NULL == (p = calloc(sizeof(struct mystruct) * n,1))) {
/* handle error */
}
Later on, I only have access to p, and no longer have n. Is there a way to determine the length of the array given just the pointer p?
I figure it must be possible, since free(p) does just that. I know malloc() keeps track of how much memory it has allocated, and that's why it knows the length; perhaps there is a way to query for this information? Something like...
int length = askMallocLibraryHowMuchMemoryWasAlloced(p) / sizeof(mystruct)
I know I should just rework the code so that I know n, but I'd rather not if possible. Any ideas?
No, there is no way to get this information without depending strongly on the implementation details of malloc. In particular, malloc may allocate more bytes than you request (e.g. for efficiency in a particular memory architecture). It would be much better to redesign your code so that you keep track of n explicitly. The alternative is at least as much redesign and a much more dangerous approach (given that it's non-standard, abuses the semantics of pointers, and will be a maintenance nightmare for those that come after you): store the lengthn at the malloc'd address, followed by the array. Allocation would then be:
void *p = calloc(sizeof(struct mystruct) * n + sizeof(unsigned long int),1));
*((unsigned long int*)p) = n;
n is now stored at *((unsigned long int*)p) and the start of your array is now
void *arr = p+sizeof(unsigned long int);
Edit: Just to play devil's advocate... I know that these "solutions" all require redesigns, but let's play it out.
Of course, the solution presented above is just a hacky implementation of a (well-packed) struct. You might as well define:
typedef struct {
unsigned int n;
void *arr;
} arrInfo;
and pass around arrInfos rather than raw pointers.
Now we're cooking. But as long as you're redesigning, why stop here? What you really want is an abstract data type (ADT). Any introductory text for an algorithms and data structures class would do it. An ADT defines the public interface of a data type but hides the implementation of that data type. Thus, publicly an ADT for an array might look like
typedef void* arrayInfo;
(arrayInfo)newArrayInfo(unsignd int n, unsigned int itemSize);
(void)deleteArrayInfo(arrayInfo);
(unsigned int)arrayLength(arrayInfo);
(void*)arrayPtr(arrayInfo);
...
In other words, an ADT is a form of data and behavior encapsulation... in other words, it's about as close as you can get to Object-Oriented Programming using straight C. Unless you're stuck on a platform that doesn't have a C++ compiler, you might as well go whole hog and just use an STL std::vector.
There, we've taken a simple question about C and ended up at C++. God help us all.
keep track of the array size yourself; free uses the malloc chain to free the block that was allocated, which does not necessarily have the same size as the array you requested
Just to confirm the previous answers: There is no way to know, just by studying a pointer, how much memory was allocated by a malloc which returned this pointer.
What if it worked?
One example of why this is not possible. Let's imagine the code with an hypothetic function called get_size(void *) which returns the memory allocated for a pointer:
typedef struct MyStructTag
{ /* etc. */ } MyStruct ;
void doSomething(MyStruct * p)
{
/* well... extract the memory allocated? */
size_t i = get_size(p) ;
initializeMyStructArray(p, i) ;
}
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
doSomething(s) ;
}
Why even if it worked, it would not work anyway?
But the problem of this approach is that, in C, you can play with pointer arithmetics. Let's rewrite doSomethingElse():
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
MyStruct * s2 = s + 5 ; /* s2 points to the 5th item */
doSomething(s2) ; /* Oops */
}
How get_size is supposed to work, as you sent the function a valid pointer, but not the one returned by malloc. And even if get_size went through all the trouble to find the size (i.e. in an inefficient way), it would return, in this case, a value that would be wrong in your context.
Conclusion
There are always ways to avoid this problem, and in C, you can always write your own allocator, but again, it is perhaps too much trouble when all you need is to remember how much memory was allocated.
Some compilers provide msize() or similar functions (_msize() etc), that let you do exactly that
May I recommend a terrible way to do it?
Allocate all your arrays as follows:
void *blockOfMem = malloc(sizeof(mystruct)*n + sizeof(int));
((int *)blockofMem)[0] = n;
mystruct *structs = (mystruct *)(((int *)blockOfMem) + 1);
Then you can always cast your arrays to int * and access the -1st element.
Be sure to free that pointer, and not the array pointer itself!
Also, this will likely cause terrible bugs that will leave you tearing your hair out. Maybe you can wrap the alloc funcs in API calls or something.
malloc will return a block of memory at least as big as you requested, but possibly bigger. So even if you could query the block size, this would not reliably give you your array size. So you'll just have to modify your code to keep track of it yourself.
For an array of pointers you can use a NULL-terminated array. The length can then determinate like it is done with strings. In your example you can maybe use an structure attribute to mark then end. Of course that depends if there is a member that cannot be NULL. So lets say you have an attribute name, that needs to be set for every struct in your array you can then query the size by:
int size;
struct mystruct *cur;
for (cur = myarray; cur->name != NULL; cur++)
;
size = cur - myarray;
Btw it should be calloc(n, sizeof(struct mystruct)) in your example.
Other have discussed the limits of plain c pointers and the stdlib.h implementations of malloc(). Some implementations provide extensions which return the allocated block size which may be larger than the requested size.
If you must have this behavior you can use or write a specialized memory allocator. This simplest thing to do would be implementing a wrapper around the stdlib.h functions. Some thing like:
void* my_malloc(size_t s); /* Calls malloc(s), and if successful stores
(p,s) in a list of handled blocks */
void my_free(void* p); /* Removes list entry and calls free(p) */
size_t my_block_size(void* p); /* Looks up p, and returns the stored size */
...
really your question is - "can I find out the size of a malloc'd (or calloc'd) data block". And as others have said: no, not in a standard way.
However there are custom malloc implementations that do it - for example http://dmalloc.com/
I'm not aware of a way, but I would imagine it would deal with mucking around in malloc's internals which is generally a very, very bad idea.
Why is it that you can't store the size of memory you allocated?
EDIT: If you know that you should rework the code so you know n, well, do it. Yes it might be quick and easy to try to poll malloc but knowing n for sure would minimize confusion and strengthen the design.
One of the reasons that you can't ask the malloc library how big a block is, is that the allocator will usually round up the size of your request to meet some minimum granularity requirement (for example, 16 bytes). So if you ask for 5 bytes, you'll get a block of size 16 back. If you were to take 16 and divide by 5, you would get three elements when you really only allocated one. It would take extra space for the malloc library to keep track of how many bytes you asked for in the first place, so it's best for you to keep track of that yourself.
This is a test of my sort routine. It sets up 7 variables to hold float values, then assigns them to an array, which is used to find the max value.
The magic is in the call to myMax:
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
And that was magical, wasn't it?
myMax expects a float array pointer (float *) so I use &arr to get the address of the array, and cast it as a float pointer.
myMax also expects the number of elements in the array as an int. I get that value by using sizeof() to give me byte sizes of the array and the first element of the array, then divide the total bytes by the number of bytes in each element. (we should not guess or hard code the size of an int because it's 2 bytes on some system and 4 on some like my OS X Mac, and could be something else on others).
NOTE:All this is important when your data may have a varying number of samples.
Here's the test code:
#include <stdio.h>
float a, b, c, d, e, f, g;
float myMax(float *apa,int soa){
int i;
float max = apa[0];
for(i=0; i< soa; i++){
if (apa[i]>max){max=apa[i];}
printf("on i=%d val is %0.2f max is %0.2f, soa=%d\n",i,apa[i],max,soa);
}
return max;
}
int main(void)
{
a = 2.0;
b = 1.0;
c = 4.0;
d = 3.0;
e = 7.0;
f = 9.0;
g = 5.0;
float arr[] = {a,b,c,d,e,f,g};
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
printf("mmax = %0.2f\n",mmax);
return 0;
}
In uClibc, there is a MALLOC_SIZE macro in malloc.h:
/* The size of a malloc allocation is stored in a size_t word
MALLOC_HEADER_SIZE bytes prior to the start address of the allocation:
+--------+---------+-------------------+
| SIZE |(unused) | allocation ... |
+--------+---------+-------------------+
^ BASE ^ ADDR
^ ADDR - MALLOC_HEADER_SIZE
*/
/* The amount of extra space used by the malloc header. */
#define MALLOC_HEADER_SIZE \
(MALLOC_ALIGNMENT < sizeof (size_t) \
? sizeof (size_t) \
: MALLOC_ALIGNMENT)
/* Set up the malloc header, and return the user address of a malloc block. */
#define MALLOC_SETUP(base, size) \
(MALLOC_SET_SIZE (base, size), (void *)((char *)base + MALLOC_HEADER_SIZE))
/* Set the size of a malloc allocation, given the base address. */
#define MALLOC_SET_SIZE(base, size) (*(size_t *)(base) = (size))
/* Return base-address of a malloc allocation, given the user address. */
#define MALLOC_BASE(addr) ((void *)((char *)addr - MALLOC_HEADER_SIZE))
/* Return the size of a malloc allocation, given the user address. */
#define MALLOC_SIZE(addr) (*(size_t *)MALLOC_BASE(addr))
malloc() stores metadata regarding space allocation before 8 bytes from space actually allocated. This could be used to determine space of buffer. And on my x86-64 this always return multiple of 16. So if allocated space is multiple of 16 (which is in most cases) then this could be used:
Code
#include <stdio.h>
#include <malloc.h>
int size_of_buff(void *buff) {
return ( *( ( int * ) buff - 2 ) - 17 ); // 32 bit system: ( *( ( int * ) buff - 1 ) - 17 )
}
void main() {
char *buff = malloc(1024);
printf("Size of Buffer: %d\n", size_of_buff(buff));
}
Output
Size of Buffer: 1024
This is my approach:
#include <stdio.h>
#include <stdlib.h>
typedef struct _int_array
{
int *number;
int size;
} int_array;
int int_array_append(int_array *a, int n)
{
static char c = 0;
if(!c)
{
a->number = NULL;
a->size = 0;
c++;
}
int *more_numbers = NULL;
a->size++;
more_numbers = (int *)realloc(a->number, a->size * sizeof(int));
if(more_numbers != NULL)
{
a->number = more_numbers;
a->number[a->size - 1] = n;
}
else
{
free(a->number);
printf("Error (re)allocating memory.\n");
return 1;
}
return 0;
}
int main()
{
int_array a;
int_array_append(&a, 10);
int_array_append(&a, 20);
int_array_append(&a, 30);
int_array_append(&a, 40);
int i;
for(i = 0; i < a.size; i++)
printf("%d\n", a.number[i]);
printf("\nLen: %d\nSize: %d\n", a.size, a.size * sizeof(int));
free(a.number);
return 0;
}
Output:
10
20
30
40
Len: 4
Size: 16
If your compiler supports VLA (variable length array), you can embed the array length into the pointer type.
int n = 10;
int (*p)[n] = malloc(n * sizeof(int));
n = 3;
printf("%d\n", sizeof(*p)/sizeof(**p));
The output is 10.
You could also choose to embed the information into the allocated memory yourself with a structure including a flexible array member.
struct myarray {
int n;
struct mystruct a[];
};
struct myarray *ma =
malloc(sizeof(*ma) + n * sizeof(struct mystruct));
ma->n = n;
struct mystruct *p = ma->a;
Then to recover the size, you would subtract the offset of the flexible member.
int get_size (struct mystruct *p) {
struct myarray *ma;
char *x = (char *)p;
ma = (void *)(x - offsetof(struct myarray, a));
return ma->n;
}
The problem with trying to peek into heap structures is that the layout might change from platform to platform or from release to release, and so the information may not be reliably obtainable.
Even if you knew exactly how to peek into the meta information maintained by your allocator, the information stored there may have nothing to do with the size of the array. The allocator simply returned memory that could be used to fit the requested size, but the actual size of the memory may be larger (perhaps even much larger) than the requested amount.
The only reliable way to know the information is to find a way to track it yourself.
It is more than a funny question. :-)
I wish to initialize an array in C, but instead of zeroing out the array with calloc. I want to set all element to one. Is there a single function that does just that?
I have used my question above to search in google, no answer. Hope you can help me out! FYI, I am first year CS student just starting to program in C.
There isn't a standard C memory allocation function that allows you to specify a value other than 0 that the allocated memory is initialized to.
You could easily enough write a cover function to do the job:
void *set_alloc(size_t nbytes, char value)
{
void *space = malloc(nbytes);
if (space != 0)
memset(space, value, nbytes);
return space;
}
Note that this assumes you want to set each byte to the same value. If you have a more complex initialization requirement, you'll need a more complex function. For example:
void *set_alloc2(size_t nelems, size_t elemsize, void *initializer)
{
void *space = malloc(nelems * elemsize);
if (space != 0)
{
for (size_t i = 0; i < nelems; i++)
memmove((char *)space + i * elemsize, initializer, elemsize);
}
return space;
}
Example usage:
struct Anonymous
{
double d;
int i;
short s;
char t[2];
};
struct Anonymous a = { 3.14159, 23, -19, "A" };
struct Anonymous *b = set_alloc2(20, sizeof(struct Anonymous), &a);
memset is there for you:
memset(array, value, length);
There is no such function. You can implement it yourself with a combination of malloc() and either memset() (for character data) or a for loop (for other integer data).
The impetus for the calloc() function's existence (vs. malloc() + memset()) is that it can be a nice performance optimization in some cases. If you're allocating a lot of data, the OS might be able to give you a range of virtual addresses that are already initialized to zero, which saves you the extra cost of manually writing out 0's into that memory range. This can be a large performance gain because you don't need to page all of those pages in until you actually use them.
Under the hood, calloc() might look something like this:
void *calloc(size_t count, size_t size)
{
// Error checking omitted for expository purposes
size_t total_size = count * size;
if (total_size < SOME_THRESHOLD) // e.g. the OS's page size (typically 4 KB)
{
// For small allocations, allocate from normal malloc pool
void *mem = malloc(total_size);
memset(mem, 0, total_size);
return mem;
}
else
{
// For large allocations, allocate directory from the OS, already zeroed (!)
return mmap(NULL, total_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
// Or on Windows, use VirtualAlloc()
}
}
Sorry if the title isn't as descriptive as it should be, the problem is hard to put in a few words. I am trying to find out how much mem i have available by malloc'ing and if that worked, writing to that segment. On certain systems (all linux on x86_64) i see segfaults when writing to the 2049th mib. The code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
int main (int argc, char **argv) {
void *addr;
int megasize = 2100
/// allocate the memory via mmap same result
//addr = mmap ((void *) 0, (size_t) megasize << 20, PROT_READ | PROT_WRITE,
// MAP_PRIVATE | MAP_ANONYMOUS, (int) -1, (off_t) 0);
addr = malloc(megasize << 20);
if (addr == MAP_FAILED) {
fprintf (stderr, "alloc of %d megabytes failed %s\n", megasize,
strerror (errno));
exit (1);
};
printf ("got %d megabytes at %p\n", megasize, addr);
{
int i;
char *p = addr;
printf("touching the %d Mb memory:\n", megasize);
for (i = 0; i < megasize; i++) {
p[i << 20] = 0;
putchar('.');
if (i%64==63) // commenting out this line i see that it really is the 2049th mb
printf(" #%d\n", i);
fflush(stdout);
};
putchar('\n');
};
/// free the memory
munmap (addr, (size_t) megasize << 20);
return 0;
}
It segfaults reliably on some systems, whereas on others it works fine. Reading the logs for the systems where it fails tells me it's not the oom killer. There are values for megasize that i can choose which will cause malloc to fail but those are larger.
The segfault occurs reliably for any size bigger than 2gib and smaller than the limit where malloc returns -1 for those systems.
I believe there is a limit i am hitting that isn't observed by malloc and i can't figure out what it is. I tried reading out a few of the limits via getrlimit that seemed relevant like RLIMIT_AS and RLIMIT_DATA but those were way bigger.
This is the relevant part of my valgrindlog
==29126== Warning: set address range perms: large range [0x39436000, 0xbc836000) (defined)
==29126== Invalid write of size 1
==29126== at 0x400AAD: main (in /home/max/source/scratch/memorytest)
==29126== Address 0xffffffffb9436000 is not stack'd, malloc'd or (recently) free'd
Can anybody please give me a clue as to what the problem is?
You'll be getting an overflow when counting via int i, as int is 4 bytes wide here:
p[i << 20] = ...
Change
int i;
to be
size_t i;
size_t is the preferred type when addressing memory.
An 32-bit int cannot store the value 2049 mb. You're invoking undefined behavior via signed integer overflow, and happen to be getting a negative number. On most 32-bit machines, when added to a pointer that wraps back around and ends up giving you the address you wanted, by accident. On 64-bit machines, that gives you an address roughly 2047 mb below the start of your block of memory (or wrapped around to the top of the 64-bit memory space).
Use the proper types. Here, i should have type size_t.