Memory allocation threshold (mmap vs malloc) - c

Id like to point out I'm new to this so I'm trying to understand / explain it best i can.
I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
Here is how memory is allocated currently using third party libsodium:
alloc_region(escrypt_region_t *region, size_t size)
{
uint8_t *base, *aligned;
#if defined(MAP_ANON) && defined(HAVE_MMAP)
if ((base = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE,
#ifdef MAP_NOCORE
MAP_ANON | MAP_PRIVATE | MAP_NOCORE,
#else
MAP_ANON | MAP_PRIVATE,
#endif
-1, 0)) == MAP_FAILED)
base = NULL; /* LCOV_EXCL_LINE */
aligned = base;
#elif defined(HAVE_POSIX_MEMALIGN)
if ((errno = posix_memalign((void **) &base, 64, size)) != 0) {
base = NULL;
}
aligned = base;
#else
base = aligned = NULL;
if (size + 63 < size)
errno = ENOMEM;
else if ((base = (uint8_t *) malloc(size + 63)) != NULL) {
aligned = base + 63;
aligned -= (uintptr_t) aligned & 63;
}
#endif
region->base = base;
region->aligned = aligned;
region->size = base ? size : 0;
return aligned;
}
So for example, this currently calls posix_memalign to allocate (e.g.) 32mb of memory.
32mb exceeds my 'memory cap' given to me (but does not throw memory warnings as the memory capacity is far greater, its just what I'm 'allowed' to use)
From some googling, I'm under the impression i can either use mmap and virtual memory.
I can see that the function above already has some mmap implemented but is never called.
Is it possible to convert the above code so that i never exceed my 30mb memory limit?
From my understanding, if this allocation would exceed my free memory, it would automatically allocate in virtual memory? So can i force this to happen and pretend that my free space is lower than available?
Any help is appreciated
UPDATE
/* Allocate memory. */
B_size = (size_t) 128 * r * p;
V_size = (size_t) 128 * r * N;
need = B_size + V_size;
if (need < V_size) {
errno = ENOMEM;
return -1;
}
XY_size = (size_t) 256 * r + 64;
need += XY_size;
if (need < XY_size) {
errno = ENOMEM;
return -1;
}
if (local->size < need) {
if (free_region(local)) {
return -1;
}
if (!alloc_region(local, need)) {
return -1;
}
}
B = (uint8_t *) local->aligned;
V = (uint32_t *) ((uint8_t *) B + B_size);
XY = (uint32_t *) ((uint8_t *) V + V_size);

I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
On Linux or POSIX systems, you might consider using setrlimit(2) with RLIMIT_AS:
This is the maximum size of the process's virtual memory
(address space) in bytes. This limit affects calls to brk(2),
mmap(2), and mremap(2), which fail with the error ENOMEM upon
exceeding this limit.
Above this limit, the mmap would fail, and so would fail for instance the call to malloc(3) that triggered that particular use of mmap.
I'm under the impression i can either use mmap
Notice that malloc(3) will call mmap(2) (or sometimes sbrk(2)...) to retrieve (virtual) memory from the kernel, thus growing your virtual address space. However, malloc often prefer to reused previously free-d memory (when available). And free usually won't call munmap(2) to release memory chunks but prefer to keep it for future malloc-s. Actually most C standard libraries segregate between "small" and "large" allocation (in practice a malloc for a gigabyte will use mmap, and the corresponding free would mmap immediately).
See also mallopt(3) and madvise(2). In case you need to lock some pages (obtained by mmap) into physical RAM, consider mlock(2).
Look also into this answer (explaining that the notion of RAM used by a particular process is not that easy).
For malloc related bugs (including memory leaks) use valgrind.

Related

Get maximum available heap memory

I'm currently trying to figure out the maximum memory that is able to be allocated through the malloc() command in C.
Until now I´ve tried a simple algorithm that increments a counter that will subsequently be allocated. If the malloc command returns "NULL" I know, that there is not enough memory available.
ULONG ulMaxSize = 0;
for (ULONG ulSize = /*0x40036FF0*/ 0x40A00000; ulSize <= 0xffffffff; ulSize++)
{
void* pBuffer = malloc(ulSize);
if (pBuffer == NULL)
{
ulMaxSize = ulSize - 1;
break;
}
free(pBuffer);
}
void* pMaxBuffer = malloc(ulMaxSize);
However, this algorithm gets executed very long since the malloc() command has turned out to be a time consuming task.
My question is now, if there is a more efficient algorithm to find the maximum memory able to be allocated?
The maximum memory that can be allocated depends mostly on few factors:
Address space limits on the process (max memory, virtual memory and friends).
Virtual space available
Physical space available
Fragmentation, which will limit the size of continuous memory blocks.
... Other limits ...
From your description (extreme slowness) looks like the process start using swap, which is VERY slow vs. real memory.
Consider the following alternative
For address space limit, look at ulimit -a (or use getrlimit to access the same data from C program) - look for 'max memory size', and 'virtual memory'
For swap space, physical memory - top
ulimit -a (filtered)
data seg size (kbytes, -d) unlimited
max memory size (kbytes, -m) 2048
stack size (kbytes, -s) 8192
virtual memory (kbytes, -v) unlimited
From a practical point, given that a program does not have control over system resources, you should be focused on 'max memory size'.
Other than using OS specific API to get such number :
sysinfo on linux or reading it from /proc/meminfo )
GlobalMemoryStatusEx for win32
You can also do a binary search, not recommended, as the state of the system might be in flux and the result could vary over time:
ULONG getMax() {
ULONG min = 0x0;
ULONG max = 0xffffffff;
void* t = malloc(max);
if(t!=NULL) {
free(t);
return max;
}
while(max-min > 1) {
ULONG mid = min + (max - min) / 2;
t = malloc(mid);
if(t == NULL) {
max = mid;
continue;
}
free(t);
min = mid;
}
return min;
}

Why do very large stack allocations fail despite unlimited ulimit?

The following static allocation gives segmentation fault
double U[100][2048][2048];
But the following dynamic allocation goes fine
double ***U = (double ***)malloc(100 * sizeof(double **));
for(i=0;i<100;i++)
{
U[i] = (double **)malloc(2048 * sizeof(double *));
for(j=0;j<2048;j++)
{
U[i][j] = (double *)malloc(2048*sizeof(double));
}
}
The ulimit is set to unlimited in linux.
Can anyone give me some hint on whats happening?
When you say the ulimit is set to unlimited, are you using the -s option? As otherwise this doesn't change the stack limit, only the file size limit.
There appear to be stack limits regardless, though. I can allocate:
double *u = malloc(200*2048*2048*(sizeof(double))); // 6gb contiguous memory
And running the binary I get:
VmData: 6553660 kB
However, if I allocate on the stack, it's:
double u[200][2048][2048];
VmStk: 2359308 kB
Which is clearly not correct (suggesting overflow). With the original allocations, the two give the same results:
Array: VmStk: 3276820 kB
malloc: VmData: 3276860 kB
However, running the stack version, I cannot generate a segfault no matter what the size of the array -- even if it's more than the total memory actually on the system, if -s unlimited is set.
EDIT:
I did a test with malloc in a loop until it failed:
VmData: 137435723384 kB // my system doesn't quite have 131068gb RAM
Stack usage never gets above 4gb, however.
Assuming your machine actually has enough free memory to allocate 3.125 GiB of data, the difference most likely lies in the fact that the static allocation needs all of this memory to be contiguous (it's actually a 3-dimensional array), while the dynamic allocation only needs contiguous blocks of about 2048*8 = 16 KiB (it's an array of pointers to arrays of pointers to quite small actual arrays).
It is also possible that your operating system uses swap files for heap memory when it runs out, but not for stack memory.
There is a very good discussion of Linux memory management - and specifically the stack - here: 9.7 Stack overflow, it is worth the read.
You can use this command to find out what your current stack soft limit is
ulimit -s
On Mac OS X the hard limit is 64MB, see How to change the stack size using ulimit or per process on Mac OS X for a C or Ruby program?
You can modify the stack limit at run-time from your program, see Change stack size for a C++ application in Linux during compilation with GNU compiler
I combined your code with the sample there, here's a working program
#include <stdio.h>
#include <sys/resource.h>
unsigned myrand() {
static unsigned x = 1;
return (x = x * 1664525 + 1013904223);
}
void increase_stack( rlim_t stack_size )
{
rlim_t MIN_STACK = 1024 * 1024;
stack_size += MIN_STACK;
struct rlimit rl;
int result;
result = getrlimit(RLIMIT_STACK, &rl);
if (result == 0)
{
if (rl.rlim_cur < stack_size)
{
rl.rlim_cur = stack_size;
result = setrlimit(RLIMIT_STACK, &rl);
if (result != 0)
{
fprintf(stderr, "setrlimit returned result = %d\n", result);
}
}
}
}
void my_func() {
double U[100][2048][2048];
int i,j,k;
for(i=0;i<100;++i)
for(j=0;j<2048;++j)
for(k=0;k<2048;++k)
U[i][j][k] = myrand();
double sum = 0;
int n;
for(n=0;n<1000;++n)
sum += U[myrand()%100][myrand()%2048][myrand()%2048];
printf("sum=%g\n",sum);
}
int main() {
increase_stack( sizeof(double) * 100 * 2048 * 2048 );
my_func();
return 0;
}
You are hitting a limit of the stack. By default on Windows, the stack is 1M but can grow more if there is enough memory.
On many *nix systems default stack size is 512K.
You are trying to allocate 2048 * 2048 * 100 * 8 bytes, which is over 2^25 (over 2G for stack). If you have a lot of virtual memory available and still want to allocate this on stack, use a different stack limit while linking the application.
Linux:
How to increase the gcc executable stack size?
Change stack size for a C++ application in Linux during compilation with GNU compiler
Windows:
http://msdn.microsoft.com/en-us/library/tdkhxaks%28v=vs.110%29.aspx

Optimal Memory Utilization in realloc (splitting?)

I'm having difficulty with coding my realloc function.
I have it working through standard memcpy procedure, but I can't get it optimized. I know there are two other cases I need to accommodate for: expanding the current block forward, and checking if the current sized block is large enough (and if too large, split it to free memory).
However, I can't seem to get it right. I always get errors. To clarify, these are not compile errors... these are heap integrity checks that fail through a trace driver. If I do it without splitting, I I run out of memory, and if I try to split, it says it "failed to preserve the original block/data."
Below is my normal memcpy code. The commented section in the middle is my attempt to expand, but I think I need to split because it's causing a ton of fragmentation. This is leading to me running out of memory and erroring out during (one) of the realloc tests. If I do it without the comment block, it works fine, but there is zero optimization.
My attempts to split always fail; commented code at the bottom is my attempt. What am I doing wrong here?
I would very much appreciate any assistance, thank you. :)
#define PACK(size, alloc) ((size) | (alloc))
#define GET_SIZE(p) (GET(p) & ~0x7)
#define GET_ALLOC(p) (GET(p) & 0x1)
#define HDRP(bp) ((char *)(bp) - WSIZE)
#define FTRP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
#define NEXT_BLKP(bp) ((char *)(bp) + GET_SIZE(((char *)(bp) - WSIZE)))
void *mm_realloc(void *oldptr, size_t size)
{
void *newptr;
size_t copySize;
copySize = GET_SIZE(HDRP(oldptr));
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(oldptr)));
// if (copySize > size) return oldptr;
/*if (!next_alloc) {
if ((GET_SIZE(HDRP(oldptr)) + GET_SIZE(HDRP(NEXT_BLKP(oldptr))))>size) {
copySize += GET_SIZE(HDRP(NEXT_BLKP(oldptr)));
PUT(HDRP(oldptr), PACK(copySize,1));
PUT(FTRP(oldptr), PACK(copySize,1));
return oldptr;
}
}*/
newptr = mm_malloc(size);
if (newptr == NULL)
return NULL;
if (size < copySize)
copySize = size;
memcpy(newptr, oldptr, copySize);
PUT(newptr,GET(oldptr));
mm_free(oldptr);
return newptr;
}
// int total_avail = (GET_SIZE(HDRP(oldptr)) + GET_SIZE(HDRP(NEXT_BLKP(oldptr))));
// copySize -= (total_avail - size);

C - shared memory - dynamic array inside shared struct

i'm trying to share a struct like this
example:
typedef struct {
int* a;
int b;
int c;
} ex;
between processes, the problem is that when I initialize 'a' with a malloc, it becomes private to the heap of the process that do this(or at least i think this is what happens). Is there any way to create a shared memory (with shmget, shmat) with this struct that works?
EDIT: I'm working on Linux.
EDIT: I have a process that initialize the buffer like this:
key_t key = ftok("gr", 'p');
int mid = shmget(key, sizeof(ex), IPC_CREAT | 0666);
ex* e = NULL;
status b_status = init(&e, 8); //init gives initial values to b c and allocate space for 'a' with a malloc
e = (ex*)shmat(mid, NULL, 0);
the other process attaches himself to the shared memory like this:
key_t key = ftok("gr", 'p');
int shmid = shmget(key, sizeof(ex), 0);
ex* e;
e = (ex*)shmat(shmid, NULL, 0);
and later get an element from a, in this case that in position 1
int i = get_el(e, 1);
First of all, to share the content pointed by your int *a field, you will need to copy the whole memory related to it. Thus, you will need a shared memory that can hold at least size_t shm_size = sizeof(struct ex) + get_the_length_of_your_ex();.
From now on, since you mentioned shmget and shmat, I will assume you run a Linux system.
The first step is the shared memory segment creation. It would be a good thing if you can determine an upper bound to the size of the int *a content. This way you would not have to create/delete the shared memory segment over and over again. But if you do so, an extra overhead to state how long is the actual data will be needed. I will assume that a simple size_t will do the trick for this purpose.
Then, after you created your segment, you must set the data correctly to make it hold what you want. Notice that while the physical address of the memory segment is always the same, when calling shmat you will get virtual pointers, which are only usable in the process that called shmat. The example code below should give you some tricks to do so.
#include <sys/types.h>
#include <sys/ipc.h>
/* Assume a cannot point towards an area larger than 4096 bytes. */
#define A_MAX_SIZE (size_t)4096
struct ex {
int *a;
int b;
int c;
}
int shm_create(void)
{
/*
* If you need to share other structures,
* You'll need to pass the key_t as an argument
*/
key_t k = ftok("/a/path/of/yours");
int shm_id = 0;
if (0 > (shm_id = shmget(
k, sizeof(struct ex) + A_MAX_SIZE + sizeof(size_t), IPC_CREAT|IPC_EXCL|0666))) {
/* An error occurred, add desired error handling. */
}
return shm_id;
}
/*
* Fill the desired shared memory segment with the structure
*/
int shm_fill(int shmid, struct ex *p_ex)
{
void *p = shmat(shmid, NULL, 0);
void *tmp = p;
size_t data_len = get_my_ex_struct_data_len(p_ex);
if ((void*)(-1) == p) {
/* Add desired error handling */
return -1;
}
memcpy(tmp, p_ex, sizeof(struct ex));
tmp += sizeof(struct ex);
memcpy(tmp, &data_len, sizeof(size_t);
tmp += 4;
memcpy(tmp, p_ex->a, data_len);
shmdt(p);
/*
* If you want to keep the reference so that
* When modifying p_ex anywhere, you update the shm content at the same time :
* - Don't call shmdt()
* - Make p_ex->a point towards the good area :
* p_ex->a = p + sizeof(struct ex) + sizeof(size_t);
* Never ever modify a without detaching the shm ...
*/
return 0;
}
/* Get the ex structure from a shm segment */
int shm_get_ex(int shmid, struct ex *p_dst)
{
void *p = shmat(shmid, NULL, SHM_RDONLY);
void *tmp;
size_t data_len = 0;
if ((void*)(-1) == p) {
/* Error ... */
return -1;
}
data_len = *(size_t*)(p + sizeof(struct ex))
if (NULL == (tmp = malloc(data_len))) {
/* No memory ... */
shmdt(p);
return -1;
}
memcpy(p_dst, p, sizeof(struct ex));
memcpy(tmp, (p + sizeof(struct ex) + sizeof(size_t)), data_len);
p_dst->a = tmp;
/*
* If you want to modify "globally" the structure,
* - Change SHM_RDONLY to 0 in the shmat() call
* - Make p_dst->a point to the good offset :
* p_dst->a = p + sizeof(struct ex) + sizeof(size_t);
* - Remove from the code above all the things made with tmp (malloc ...)
*/
return 0;
}
/*
* Detach the given p_ex structure from a shm segment.
* This function is useful only if you use the shm segment
* in the way I described in comment in the other functions.
*/
void shm_detach_struct(struct ex *p_ex)
{
/*
* Here you could :
* - alloc a local pointer
* - copy the shm data into it
* - detach the segment using the current p_ex->a pointer
* - assign your local pointer to p_ex->a
* This would save locally the data stored in the shm at the call
* Or if you're lazy (like me), just detach the pointer and make p_ex->a = NULL;
*/
shmdt(p_ex->a - sizeof(struct ex) - sizeof(size_t));
p_ex->a = NULL;
}
Excuse my laziness, it would be space-optimized to not copy at all the value of the int *a pointer of the struct ex since it is completely unused in the shared memory, but I spared myself extra-code to handle this (and some pointer checkings like the p_ex arguments integrity).
But when you are done, you must find a way to share the shm ID between your processes. This could be done using sockets, pipes ... Or using ftok with the same input.
The memory you allocate to a pointer using malloc() is private to that process. So, when you try to access the pointer in another process(other than the process which malloced it) you are likely going to access an invalid memory page or a memory page mapped in another process address space. So, you are likely to get a segfault.
If you are using the shared memory, you must make sure all the data you want to expose to other processes is "in" the shared memory segment and not private memory segments of the process.
You could try, leaving the data at a specified offset in the memory segment, which can be concretely defined at compile time or placed in a field at some known location in the shared memory segment.
Eg:
If you are doing this
char *mem = shmat(shmid2, (void*)0, 0);
// So, the mystruct type is at offset 0.
mystruct *structptr = (mystruct*)mem;
// Now we have a structptr, use an offset to get some other_type.
other_type *other = (other_type*)(mem + structptr->offset_of_other_type);
Other way would be to have a fixed size buffer to pass the information using the shared memory approach, instead of using the dynamically allocated pointer.
Hope this helps.
Are you working in Windows or Linux?
In any case what you need is a memory mapped file. Documentation with code examples here,
http://msdn.microsoft.com/en-us/library/aa366551%28VS.85%29.aspx
http://menehune.opt.wfu.edu/Kokua/More_SGI/007-2478-008/sgi_html/ch03.html
You need to use shared memory/memory mapped files/whatever your OS gives you.
In general, IPC and sharing memory between processes is quite OS dependent, especially in low-level languages like C (higher-level languages usually have libraries for that - for example, even C++ has support for it using boost).
If you are on Linux, I usually use shmat for small amount, and mmap (http://en.wikipedia.org/wiki/Mmap) for larger amounts.
On Win32, there are many approaches; the one I prefer is usually using page-file backed memory mapped files (http://msdn.microsoft.com/en-us/library/ms810613.aspx)
Also, you need to pay attention to where you are using these mechanism inside your data structures: as mentioned in the comments, without using precautions the pointer you have in your "source" process is invalid in the "target" process, and needs to be replaced/adjusted (IIRC, pointers coming from mmap are already OK(mapped); at least, under windows pointers you get out of MapViewOfFile are OK).
EDIT: from your edited example:
What you do here:
e = (ex*)shmat(mid, NULL, 0);
(other process)
int shmid = shmget(key, sizeof(ex), 0);
ex* e = (ex*)shmat(shmid, NULL, 0);
is correcty, but you need to do it for each pointer you have, not only for the "main" pointer to the struct. E.g. you need to do:
e->a = (int*)shmat(shmget(another_key, dim_of_a, IPC_CREAT | 0666), NULL, 0);
instead of creating the array with malloc.
Then, on the other process, you also need to do shmget/shmat for the pointer.
This is why, in the comments, I said that I usually prefer to pack the structs: so I do not need to go through the hassle to to these operations for every pointer.
Convert the struct:
typedef struct {
int b;
int c;
int a[];
} ex;
and then on parent process:
int mid = shmget(key, sizeof(ex) + arraysize*sizeof(int), 0666);
it should work.
In general, it is difficult to work with dynamic arrays inside structs in c, but in this way you are able to allocate the proper memory (this will also work in malloc: How to include a dynamic array INSIDE a struct in C?)

Simple C implementation to track memory malloc/free?

programming language: C
platform: ARM
Compiler: ADS 1.2
I need to keep track of simple melloc/free calls in my project. I just need to get very basic idea of how much heap memory is required when the program has allocated all its resources. Therefore, I have provided a wrapper for the malloc/free calls. In these wrappers I need to increment a current memory count when malloc is called and decrement it when free is called. The malloc case is straight forward as I have the size to allocate from the caller. I am wondering how to deal with the free case as I need to store the pointer/size mapping somewhere. This being C, I do not have a standard map to implement this easily.
I am trying to avoid linking in any libraries so would prefer *.c/h implementation.
So I am wondering if there already is a simple implementation one may lead me to. If not, this is motivation to go ahead and implement one.
EDIT: Purely for debugging and this code is not shipped with the product.
EDIT: Initial implementation based on answer from Makis. I would appreciate feedback on this.
EDIT: Reworked implementation
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <limits.h>
static size_t gnCurrentMemory = 0;
static size_t gnPeakMemory = 0;
void *MemAlloc (size_t nSize)
{
void *pMem = malloc(sizeof(size_t) + nSize);
if (pMem)
{
size_t *pSize = (size_t *)pMem;
memcpy(pSize, &nSize, sizeof(nSize));
gnCurrentMemory += nSize;
if (gnCurrentMemory > gnPeakMemory)
{
gnPeakMemory = gnCurrentMemory;
}
printf("PMemAlloc (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pSize + 1, nSize, gnCurrentMemory, gnPeakMemory);
return(pSize + 1);
}
return NULL;
}
void MemFree (void *pMem)
{
if(pMem)
{
size_t *pSize = (size_t *)pMem;
// Get the size
--pSize;
assert(gnCurrentMemory >= *pSize);
printf("PMemFree (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pMem, *pSize, gnCurrentMemory, gnPeakMemory);
gnCurrentMemory -= *pSize;
free(pSize);
}
}
#define BUFFERSIZE (1024*1024)
typedef struct
{
bool flag;
int buffer[BUFFERSIZE];
bool bools[BUFFERSIZE];
} sample_buffer;
typedef struct
{
unsigned int whichbuffer;
char ch;
} buffer_info;
int main(void)
{
unsigned int i;
buffer_info *bufferinfo;
sample_buffer *mybuffer;
char *pCh;
printf("Tesint MemAlloc - MemFree\n");
mybuffer = (sample_buffer *) MemAlloc(sizeof(sample_buffer));
if (mybuffer == NULL)
{
printf("ERROR ALLOCATING mybuffer\n");
return EXIT_FAILURE;
}
bufferinfo = (buffer_info *) MemAlloc(sizeof(buffer_info));
if (bufferinfo == NULL)
{
printf("ERROR ALLOCATING bufferinfo\n");
MemFree(mybuffer);
return EXIT_FAILURE;
}
pCh = (char *)MemAlloc(sizeof(char));
printf("finished malloc\n");
// fill allocated memory with integers and read back some values
for(i = 0; i < BUFFERSIZE; ++i)
{
mybuffer->buffer[i] = i;
mybuffer->bools[i] = true;
bufferinfo->whichbuffer = (unsigned int)(i/100);
}
MemFree(bufferinfo);
MemFree(mybuffer);
if(pCh)
{
MemFree(pCh);
}
return EXIT_SUCCESS;
}
You could allocate a few extra bytes in your wrapper and put either an id (if you want to be able to couple malloc() and free()) or just the size there. Just malloc() that much more memory, store the information at the beginning of your memory block and and move the pointer you return that many bytes forward.
This can, btw, also easily be used for fence pointers/finger-prints and such.
Either you can have access to internal tables used by malloc/free (see this question: Where Do malloc() / free() Store Allocated Sizes and Addresses? for some hints), or you have to manage your own tables in your wrappers.
You could always use valgrind instead of rolling your own implementation. If you don't care about the amount of memory you allocate you could use an even simpler implementation: (I did this really quickly so there could be errors and I realize that it is not the most efficient implementation. The pAllocedStorage should be given an initial size and increase by some factor for a resize etc. but you get the idea.)
EDIT: I missed that this was for ARM, to my knowledge valgrind is not available on ARM so that might not be an option.
static size_t indexAllocedStorage = 0;
static size_t *pAllocedStorage = NULL;
static unsigned int free_calls = 0;
static unsigned long long int total_mem_alloced = 0;
void *
my_malloc(size_t size){
size_t *temp;
void *p = malloc(size);
if(p == NULL){
fprintf(stderr,"my_malloc malloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
total_mem_alloced += size;
temp = (size_t *)realloc(pAllocedStorage, (indexAllocedStorage+1) * sizeof(size_t));
if(temp == NULL){
fprintf(stderr,"my_malloc realloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
pAllocedStorage = temp;
pAllocedStorage[indexAllocedStorage++] = (size_t)p;
return p;
}
void
my_free(void *p){
size_t i;
int found = 0;
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] == (size_t)p){
pAllocedStorage[i] = (size_t)NULL;
found = 1;
break;
}
}
if(!found){
printf("Free Called on unknown\n");
}
free_calls++;
free(p);
}
void
free_check(void) {
size_t i;
printf("checking freed memeory\n");
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] != (size_t)NULL){
printf( "Memory leak %X\n", (unsigned int)pAllocedStorage[i]);
free((void *)pAllocedStorage[i]);
}
}
free(pAllocedStorage);
pAllocedStorage = NULL;
}
I would use rmalloc. It is a simple library (actually it is only two files) to debug memory usage, but it also has support for statistics. Since you already wrapper functions it should be very easy to use rmalloc for it. Keep in mind that you also need to replace strdup, etc.
Your program may also need to intercept realloc(), calloc(), getcwd() (as it may allocate memory when buffer is NULL in some implementations) and maybe strdup() or a similar function, if it is supported by your compiler
If you are running on x86 you could just run your binary under valgrind and it would gather all this information for you, using the standard implementation of malloc and free. Simple.
I've been trying out some of the same techniques mentioned on this page and wound up here from a google search. I know this question is old, but wanted to add for the record...
1) Does your operating system not provide any tools to see how much heap memory is in use in a running process? I see you're talking about ARM, so this may well be the case. In most full-featured OSes, this is just a matter of using a cmd-line tool to see the heap size.
2) If available in your libc, sbrk(0) on most platforms will tell you the end address of your data segment. If you have it, all you need to do is store that address at the start of your program (say, startBrk=sbrk(0)), then at any time your allocated size is sbrk(0) - startBrk.
3) If shared objects can be used, you're dynamically linking to your libc, and your OS's runtime loader has something like an LD_PRELOAD environment variable, you might find it more useful to build your own shared object that defines the actual libc functions with the same symbols (malloc(), not MemAlloc()), then have the loader load your lib first and "interpose" the libc functions. You can further obtain the addresses of the actual libc functions with dlsym() and the RTLD_NEXT flag so you can do what you are doing above without having to recompile all your code to use your malloc/free wrappers. It is then just a runtime decision when you start your program (or any program that fits the description in the first sentence) where you set an environment variable like LD_PRELOAD=mymemdebug.so and then run it. (google for shared object interposition.. it's a great technique and one used by many debuggers/profilers)

Resources