I have to make a program that would allocate an array of int with mmap() and then count how much memory was actually allocated to it.
So far I have the code that allocates the array, but I don't know how to count the allocated memory. This is what I tried so far:
the handler:
int i;
void segfault_sigaction(int signal, siginfo_t *si, void *arg)
{
printf("ujel segafult, velikost pomnilnika je: %d bajtov\n", sizeof(int) * i);
exit(0);
}
main:
int main(int argc, char* argv[])
{
int *addr;
int n = atoi(argv[1]);
struct sigaction sa;
memset(&sa,0,sizeof(sigaction));
sa.sa_handler = segfault_sigaction;
//sigemptyset(&sa.sa_mask);
//sa.sa_sigaction = segfault_sigaction;
//sa.sa_flags = 0;
sigaction(SIGSEGV,&sa, NULL);
addr = mmap(NULL, n * sizeof(int), PROT_WRITE | PROT_READ ,MAP_ANONYMOUS | MAP_PRIVATE,-1,0);
i = n;
while(1){
addr[i+1] = 5;
i++;
}
return 0;
}
EDIT: after reading your answers and the task description again I changed the code, I put in a handler that catches SEGFAULT, but it is not really reliable(sometimes it works, sometimes the default segmentation fault error occurs).
The amount of memory mapped by mmap() is just what you requested, perhaps rounded up a bit. If you're asking, "How can I see how much it rounded up," the answer is you can't do so in a general way, but some specific platforms may have APIs that let you figure it out. But there's no reason why you'd need to.
The code you've written seems to be "searching" for the end of the allocation by looking for a nonzero part. This results in undefined behavior because you will always run off the end of the allocation. You absolutely must not do this. Just use the size that you passed to mmap() as the size. That's the only size that matters.
Related
I'm new to CUDA/C and new to stack overflow. This is my first question.
I'm trying to allocate memory dynamically in a kernel function, but the results are unexpected.
I read using malloc() in a kernel can lower performance a lot, but I need it anyway so I first tried with a simple int ** array just to test the possibility, then I'll actually need to allocate more complex structs.
In my main I used cudaMalloc() to allocate the space for the array of int *, and then I used malloc() for every thread in the kernel function to allocate the array for every index of the outer array. I then used another thread to check the result, but it doesn't always work.
Here's main code:
#define N_CELLE 1024*2
#define L_CELLE 512
extern "C" {
int main(int argc, char **argv) {
int *result = (int *)malloc(sizeof(int));
int *d_result;
int size_numbers = N_CELLE * sizeof(int *);
int **d_numbers;
cudaMalloc((void **)&d_numbers, size_numbers);
cudaMalloc((void **)&d_result, sizeof(int *));
kernel_one<<<2, 1024>>>(d_numbers);
cudaDeviceSynchronize();
kernel_two<<<1, 1>>>(d_numbers, d_result);
cudaMemcpy(result, d_result, sizeof(int), cudaMemcpyDeviceToHost);
printf("%d\n", *result);
cudaFree(d_numbers);
cudaFree(d_result);
free(result);
}
}
I used extern "C"because I could't compile while importing my header, which is not used in this example code. I pasted it since I don't know if this may be relevant or not.
This is kernel_one code:
__global__ void kernel_one(int **d_numbers) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
d_numbers[i] = (int *)malloc(L_CELLE*sizeof(int));
for(int j=0; j<L_CELLE;j++)
d_numbers[i][j] = 1;
}
And this is kernel_two code:
__global__ void kernel_two(int **d_numbers, int *d_result) {
int temp = 0;
for(int i=0; i<N_CELLE; i++) {
for(int j=0; j<L_CELLE;j++)
temp += d_numbers[i][j];
}
*d_result = temp;
}
Everything works fine (aka the count is correct) until I use less than 1024*2*512 total blocks in device memory. For example, if I #define N_CELLE 1024*4 the program starts giving "random" results, such as negative numbers.
Any idea of what the problem could be?
Thanks anyone!
In-kernel memory allocation draws memory from a statically allocated runtime heap. At larger sizes, you are exceeding the size of that heap and then your two kernels are attempting to read and write from uninitialised memory. This produces a runtime error on the device and renders the results invalid. You would already know this if you either added correct API error checking on the host side, or ran your code with the cuda-memcheck utility.
The solution is to ensure that the heap size is set to something appropriate before trying to run a kernel. Adding something like this:
size_t heapsize = sizeof(int) * size_t(N_CELLE) * size_t(2*L_CELLE);
cudaDeviceSetLimit(cudaLimitMallocHeapSize, heapsize);
to your host code before any other API calls, should solve the problem.
I don't know anything about CUDA but these are severe bugs:
You cannot convert from int** to void**. They are not compatible types. Casting doesn't solve the problem, but hides it.
&d_numbers gives the address of a pointer to pointer which is wrong. It is of type int***.
Both of the above bugs result in undefined behavior. If your program somehow seems to works in some condition, that's just by pure (bad) luck only.
I know that on your hard drive, if you delete a file, the data is not (instantly) gone. The data is still there until it is overwritten. I was wondering if a similar concept existed in memory. Say I allocate 256 bytes for a string, is that string still floating in memory somewhere after I free() it until it is overwritten?
Your analogy is correct. The data in memory doesn't disappear or anything like that; the values may indeed still be there after a free(), though attempting to read from freed memory is undefined behaviour.
Generally, it does stay around, unless you explicitly overwrite the string before freeing it (like people sometimes do with passwords). Some library implementations automatically overwrite deallocated memory to catch accesses to it, but that is not done in release mode.
The answer depends highly on the implementation. On a good implementation, it's likely that at least the beginning (or the end?) of the memory will be overwritten with bookkeeping information for tracking free chunks of memory that could later be reused. However the details will vary. If your program has any level of concurrency/threads (even in the library implementation you might not see), then such memory could be clobbered asynchronously, perhaps even in such a way that even reading it is dangerous. And of course the implementation of free might completely unmap the address range from the program's virtual address space, in which case attempting to do anything with it will crash your program.
From a standpoint of an application author, you should simply treat free according to the specification and never access freed memory. But from the standpoint of a systems implementor or integrator, it might be useful to know (or design) the implementation, in which case your question is then interesting.
If you want to verify the behaviour for your implementation, the simple program below will do that for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* The number of memory bytes to test */
#define MEM_TEST_SIZE 256
void outputMem(unsigned char *mem, int length)
{
int i;
for (i = 0; i < length; i++) {
printf("[%02d]", mem[i] );
}
}
int bytesChanged(unsigned char *mem, int length)
{
int i;
int count = 0;
for (i = 0; i < MEM_TEST_SIZE; i++) {
if (mem[i] != i % 256)
count++;
}
return count;
}
main(void)
{
int i;
unsigned char *mem = (unsigned char *)malloc(MEM_TEST_SIZE);
/* Fill memory with bytes */
for (i = 0; i < MEM_TEST_SIZE; i++) {
mem[i] = i % 256;
}
printf("After malloc and copy to new mem location\n");
printf("mem = %ld\n", mem );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
free(mem);
printf("\n\nAfter free()\n");
printf("mem = %ld\n", mem );
printf("Bytes changed in memory = %d\n", bytesChanged(mem, MEM_TEST_SIZE) );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
}
It is more than a funny question. :-)
I wish to initialize an array in C, but instead of zeroing out the array with calloc. I want to set all element to one. Is there a single function that does just that?
I have used my question above to search in google, no answer. Hope you can help me out! FYI, I am first year CS student just starting to program in C.
There isn't a standard C memory allocation function that allows you to specify a value other than 0 that the allocated memory is initialized to.
You could easily enough write a cover function to do the job:
void *set_alloc(size_t nbytes, char value)
{
void *space = malloc(nbytes);
if (space != 0)
memset(space, value, nbytes);
return space;
}
Note that this assumes you want to set each byte to the same value. If you have a more complex initialization requirement, you'll need a more complex function. For example:
void *set_alloc2(size_t nelems, size_t elemsize, void *initializer)
{
void *space = malloc(nelems * elemsize);
if (space != 0)
{
for (size_t i = 0; i < nelems; i++)
memmove((char *)space + i * elemsize, initializer, elemsize);
}
return space;
}
Example usage:
struct Anonymous
{
double d;
int i;
short s;
char t[2];
};
struct Anonymous a = { 3.14159, 23, -19, "A" };
struct Anonymous *b = set_alloc2(20, sizeof(struct Anonymous), &a);
memset is there for you:
memset(array, value, length);
There is no such function. You can implement it yourself with a combination of malloc() and either memset() (for character data) or a for loop (for other integer data).
The impetus for the calloc() function's existence (vs. malloc() + memset()) is that it can be a nice performance optimization in some cases. If you're allocating a lot of data, the OS might be able to give you a range of virtual addresses that are already initialized to zero, which saves you the extra cost of manually writing out 0's into that memory range. This can be a large performance gain because you don't need to page all of those pages in until you actually use them.
Under the hood, calloc() might look something like this:
void *calloc(size_t count, size_t size)
{
// Error checking omitted for expository purposes
size_t total_size = count * size;
if (total_size < SOME_THRESHOLD) // e.g. the OS's page size (typically 4 KB)
{
// For small allocations, allocate from normal malloc pool
void *mem = malloc(total_size);
memset(mem, 0, total_size);
return mem;
}
else
{
// For large allocations, allocate directory from the OS, already zeroed (!)
return mmap(NULL, total_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
// Or on Windows, use VirtualAlloc()
}
}
[I've solved this problem--please see my last comment below.]
In my application, I need to use my own special malloc, based on Doug Lea's dlmalloc: I map an anonymous file (using mmap), create an mspace out of part of the mapped file, and pass the mspace to mspace_malloc. I am finding that some of the addresses that mspace_malloc returns are not within the bounds of the mapped file--even though, as far as I can tell, the process can write to and read from the malloc'ed memory just fine. Why am I encountering this behavior, and what can I do to force mspace_malloc to return an address within the range of the mspace?
/* Inside dl_malloc.c */
#define ONLY_MSPACES 1
#define MSPACES 1
void * heap;
off_t heap_length;
mspace ms;
void init(size_t size) {
heap = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (heap == MAP_FAILED) {
perror("init: mmap");
exit(EXIT_FAILURE);
}
heap_length = size;
ms = create_mspace(heap, size, 0);
mspace_track_large_chunks(ms, 1);
}
void * my_malloc(size_t bytes) {
return mspace_malloc(heap, bytes);
}
/************************/
/* In application */
#include <stdio.h>
#include <stdlib.h>
#define HEAP_SIZE 0x10000 // 32 pages
#define ROWS 2
#define COLS 4096
extern void init(void);
extern void * my_malloc(size_t bytes);
extern void * heap;
extern off_t heap_length;
int main(void) {
init(HEAP_SIZE);
int ** matrix = (int **)my_malloc(sizeof(int *) * ROWS);
int i;
for (i = 0; i < ROWS; ++i)
matrix[i] = (int *)my_malloc(sizeof(int) * COLS);
printf("Heap bounds: %lx to %lx\n",
(off_t)heap, (off_t)heap + heap_length);
printf("Matrix: %p ", matrix;
for (i = 0; i < ROWS; ++i)
printf("Matrix[%d]: %p ", i, matrix[i]");
printf("\n");
return EXIT_SUCCESS;
}
When I run this program (well, the above is a simplification, but not by much), I see that the address assigned to matrix is within the bounds printed for the heap, but that the two addresses for the two rows are very far below the lower bound--more than 0x100000000 below it! And yet I seem to be able to read and write to the matrix. This last point I find puzzling and would like to understand, but the more urgent issue is that I need to do something to make sure that all the addresses that my_malloc returns are within the heap bounds, because other parts of my application need this.
BTW, note that I do not need to lock my call to create_mspace, since I'm only using one thread in this program. Anyway, I tried setting this argument to 1, but I saw no difference in the results.
Thanks!
Eureka (hahaha)! The above, simplified example, with the given constants, will work correctly, and the returned addresses will be within range. However, if you call my_malloc on sizes that are too large relative to the original mspace, malloc will call mmap (unless you explicitly disable this). The addresses I was seeing were merely those returned as a result of those calls to mmap. So the solution to this mystery turned out to be rather simple. I'm leaving this posted in case others happen to run into this issue and forget about malloc's calls to mmap.
I know that on your hard drive, if you delete a file, the data is not (instantly) gone. The data is still there until it is overwritten. I was wondering if a similar concept existed in memory. Say I allocate 256 bytes for a string, is that string still floating in memory somewhere after I free() it until it is overwritten?
Your analogy is correct. The data in memory doesn't disappear or anything like that; the values may indeed still be there after a free(), though attempting to read from freed memory is undefined behaviour.
Generally, it does stay around, unless you explicitly overwrite the string before freeing it (like people sometimes do with passwords). Some library implementations automatically overwrite deallocated memory to catch accesses to it, but that is not done in release mode.
The answer depends highly on the implementation. On a good implementation, it's likely that at least the beginning (or the end?) of the memory will be overwritten with bookkeeping information for tracking free chunks of memory that could later be reused. However the details will vary. If your program has any level of concurrency/threads (even in the library implementation you might not see), then such memory could be clobbered asynchronously, perhaps even in such a way that even reading it is dangerous. And of course the implementation of free might completely unmap the address range from the program's virtual address space, in which case attempting to do anything with it will crash your program.
From a standpoint of an application author, you should simply treat free according to the specification and never access freed memory. But from the standpoint of a systems implementor or integrator, it might be useful to know (or design) the implementation, in which case your question is then interesting.
If you want to verify the behaviour for your implementation, the simple program below will do that for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* The number of memory bytes to test */
#define MEM_TEST_SIZE 256
void outputMem(unsigned char *mem, int length)
{
int i;
for (i = 0; i < length; i++) {
printf("[%02d]", mem[i] );
}
}
int bytesChanged(unsigned char *mem, int length)
{
int i;
int count = 0;
for (i = 0; i < MEM_TEST_SIZE; i++) {
if (mem[i] != i % 256)
count++;
}
return count;
}
main(void)
{
int i;
unsigned char *mem = (unsigned char *)malloc(MEM_TEST_SIZE);
/* Fill memory with bytes */
for (i = 0; i < MEM_TEST_SIZE; i++) {
mem[i] = i % 256;
}
printf("After malloc and copy to new mem location\n");
printf("mem = %ld\n", mem );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
free(mem);
printf("\n\nAfter free()\n");
printf("mem = %ld\n", mem );
printf("Bytes changed in memory = %d\n", bytesChanged(mem, MEM_TEST_SIZE) );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
}