C - shared memory - dynamic array inside shared struct - c

i'm trying to share a struct like this
example:
typedef struct {
int* a;
int b;
int c;
} ex;
between processes, the problem is that when I initialize 'a' with a malloc, it becomes private to the heap of the process that do this(or at least i think this is what happens). Is there any way to create a shared memory (with shmget, shmat) with this struct that works?
EDIT: I'm working on Linux.
EDIT: I have a process that initialize the buffer like this:
key_t key = ftok("gr", 'p');
int mid = shmget(key, sizeof(ex), IPC_CREAT | 0666);
ex* e = NULL;
status b_status = init(&e, 8); //init gives initial values to b c and allocate space for 'a' with a malloc
e = (ex*)shmat(mid, NULL, 0);
the other process attaches himself to the shared memory like this:
key_t key = ftok("gr", 'p');
int shmid = shmget(key, sizeof(ex), 0);
ex* e;
e = (ex*)shmat(shmid, NULL, 0);
and later get an element from a, in this case that in position 1
int i = get_el(e, 1);

First of all, to share the content pointed by your int *a field, you will need to copy the whole memory related to it. Thus, you will need a shared memory that can hold at least size_t shm_size = sizeof(struct ex) + get_the_length_of_your_ex();.
From now on, since you mentioned shmget and shmat, I will assume you run a Linux system.
The first step is the shared memory segment creation. It would be a good thing if you can determine an upper bound to the size of the int *a content. This way you would not have to create/delete the shared memory segment over and over again. But if you do so, an extra overhead to state how long is the actual data will be needed. I will assume that a simple size_t will do the trick for this purpose.
Then, after you created your segment, you must set the data correctly to make it hold what you want. Notice that while the physical address of the memory segment is always the same, when calling shmat you will get virtual pointers, which are only usable in the process that called shmat. The example code below should give you some tricks to do so.
#include <sys/types.h>
#include <sys/ipc.h>
/* Assume a cannot point towards an area larger than 4096 bytes. */
#define A_MAX_SIZE (size_t)4096
struct ex {
int *a;
int b;
int c;
}
int shm_create(void)
{
/*
* If you need to share other structures,
* You'll need to pass the key_t as an argument
*/
key_t k = ftok("/a/path/of/yours");
int shm_id = 0;
if (0 > (shm_id = shmget(
k, sizeof(struct ex) + A_MAX_SIZE + sizeof(size_t), IPC_CREAT|IPC_EXCL|0666))) {
/* An error occurred, add desired error handling. */
}
return shm_id;
}
/*
* Fill the desired shared memory segment with the structure
*/
int shm_fill(int shmid, struct ex *p_ex)
{
void *p = shmat(shmid, NULL, 0);
void *tmp = p;
size_t data_len = get_my_ex_struct_data_len(p_ex);
if ((void*)(-1) == p) {
/* Add desired error handling */
return -1;
}
memcpy(tmp, p_ex, sizeof(struct ex));
tmp += sizeof(struct ex);
memcpy(tmp, &data_len, sizeof(size_t);
tmp += 4;
memcpy(tmp, p_ex->a, data_len);
shmdt(p);
/*
* If you want to keep the reference so that
* When modifying p_ex anywhere, you update the shm content at the same time :
* - Don't call shmdt()
* - Make p_ex->a point towards the good area :
* p_ex->a = p + sizeof(struct ex) + sizeof(size_t);
* Never ever modify a without detaching the shm ...
*/
return 0;
}
/* Get the ex structure from a shm segment */
int shm_get_ex(int shmid, struct ex *p_dst)
{
void *p = shmat(shmid, NULL, SHM_RDONLY);
void *tmp;
size_t data_len = 0;
if ((void*)(-1) == p) {
/* Error ... */
return -1;
}
data_len = *(size_t*)(p + sizeof(struct ex))
if (NULL == (tmp = malloc(data_len))) {
/* No memory ... */
shmdt(p);
return -1;
}
memcpy(p_dst, p, sizeof(struct ex));
memcpy(tmp, (p + sizeof(struct ex) + sizeof(size_t)), data_len);
p_dst->a = tmp;
/*
* If you want to modify "globally" the structure,
* - Change SHM_RDONLY to 0 in the shmat() call
* - Make p_dst->a point to the good offset :
* p_dst->a = p + sizeof(struct ex) + sizeof(size_t);
* - Remove from the code above all the things made with tmp (malloc ...)
*/
return 0;
}
/*
* Detach the given p_ex structure from a shm segment.
* This function is useful only if you use the shm segment
* in the way I described in comment in the other functions.
*/
void shm_detach_struct(struct ex *p_ex)
{
/*
* Here you could :
* - alloc a local pointer
* - copy the shm data into it
* - detach the segment using the current p_ex->a pointer
* - assign your local pointer to p_ex->a
* This would save locally the data stored in the shm at the call
* Or if you're lazy (like me), just detach the pointer and make p_ex->a = NULL;
*/
shmdt(p_ex->a - sizeof(struct ex) - sizeof(size_t));
p_ex->a = NULL;
}
Excuse my laziness, it would be space-optimized to not copy at all the value of the int *a pointer of the struct ex since it is completely unused in the shared memory, but I spared myself extra-code to handle this (and some pointer checkings like the p_ex arguments integrity).
But when you are done, you must find a way to share the shm ID between your processes. This could be done using sockets, pipes ... Or using ftok with the same input.

The memory you allocate to a pointer using malloc() is private to that process. So, when you try to access the pointer in another process(other than the process which malloced it) you are likely going to access an invalid memory page or a memory page mapped in another process address space. So, you are likely to get a segfault.
If you are using the shared memory, you must make sure all the data you want to expose to other processes is "in" the shared memory segment and not private memory segments of the process.
You could try, leaving the data at a specified offset in the memory segment, which can be concretely defined at compile time or placed in a field at some known location in the shared memory segment.
Eg:
If you are doing this
char *mem = shmat(shmid2, (void*)0, 0);
// So, the mystruct type is at offset 0.
mystruct *structptr = (mystruct*)mem;
// Now we have a structptr, use an offset to get some other_type.
other_type *other = (other_type*)(mem + structptr->offset_of_other_type);
Other way would be to have a fixed size buffer to pass the information using the shared memory approach, instead of using the dynamically allocated pointer.
Hope this helps.

Are you working in Windows or Linux?
In any case what you need is a memory mapped file. Documentation with code examples here,
http://msdn.microsoft.com/en-us/library/aa366551%28VS.85%29.aspx
http://menehune.opt.wfu.edu/Kokua/More_SGI/007-2478-008/sgi_html/ch03.html

You need to use shared memory/memory mapped files/whatever your OS gives you.
In general, IPC and sharing memory between processes is quite OS dependent, especially in low-level languages like C (higher-level languages usually have libraries for that - for example, even C++ has support for it using boost).
If you are on Linux, I usually use shmat for small amount, and mmap (http://en.wikipedia.org/wiki/Mmap) for larger amounts.
On Win32, there are many approaches; the one I prefer is usually using page-file backed memory mapped files (http://msdn.microsoft.com/en-us/library/ms810613.aspx)
Also, you need to pay attention to where you are using these mechanism inside your data structures: as mentioned in the comments, without using precautions the pointer you have in your "source" process is invalid in the "target" process, and needs to be replaced/adjusted (IIRC, pointers coming from mmap are already OK(mapped); at least, under windows pointers you get out of MapViewOfFile are OK).
EDIT: from your edited example:
What you do here:
e = (ex*)shmat(mid, NULL, 0);
(other process)
int shmid = shmget(key, sizeof(ex), 0);
ex* e = (ex*)shmat(shmid, NULL, 0);
is correcty, but you need to do it for each pointer you have, not only for the "main" pointer to the struct. E.g. you need to do:
e->a = (int*)shmat(shmget(another_key, dim_of_a, IPC_CREAT | 0666), NULL, 0);
instead of creating the array with malloc.
Then, on the other process, you also need to do shmget/shmat for the pointer.
This is why, in the comments, I said that I usually prefer to pack the structs: so I do not need to go through the hassle to to these operations for every pointer.

Convert the struct:
typedef struct {
int b;
int c;
int a[];
} ex;
and then on parent process:
int mid = shmget(key, sizeof(ex) + arraysize*sizeof(int), 0666);
it should work.
In general, it is difficult to work with dynamic arrays inside structs in c, but in this way you are able to allocate the proper memory (this will also work in malloc: How to include a dynamic array INSIDE a struct in C?)

Related

What kinds of data in a device driver can be shared among processes?

In device drivers, how can we tell what data is shared among processes and what is local to a process? The Linux Device Drivers book mentions
Any time that a hardware or software resource is shared beyond a single thread of execution, and the possibility exists that one thread could encounter an inconsistent view of that resource, you must explicitly manage access to that resource.
But what kinds of software resources can be shared among threads and what kinds of data cannot be shared? I know that global variables are generally considered as shared memory but what other kinds of things need to be protected?
For example, is the struct inode and struct file types passed in file operations like open, release, read, write, etc. considered to be shared?
In the open call inside main.c , why is dev (in the line dev = container_of(inode->i_cdev, struct scull_dev, cdev);) not protected with a lock if it points to a struct scull_dev entry in the global array scull_devices?
In scull_write, why isn't the line int quantum = dev->quantum, qset = dev->qset; locked with a semaphore since it's accessing a global variable?
/* In scull.h */
struct scull_qset {
void **data; /* pointer to an array of pointers which each point to a quantum buffer */
struct scull_qset *next;
};
struct scull_dev {
struct scull_qset *data; /* Pointer to first quantum set */
int quantum; /* the current quantum size */
int qset; /* the current array size */
unsigned long size; /* amount of data stored here */
unsigned int access_key; /* used by sculluid and scullpriv */
struct semaphore sem; /* mutual exclusion semaphore */
struct cdev cdev; /* Char device structure */
};
/* In main.c */
struct scull_dev *scull_devices; /* allocated in scull_init_module */
int scull_major = SCULL_MAJOR;
int scull_minor = 0;
int scull_nr_devs = SCULL_NR_DEVS;
int scull_quantum = SCULL_QUANTUM;
int scull_qset = SCULL_QSET;
ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,
loff_t *f_pos)
{
struct scull_dev *dev = filp->private_data; /* flip->private_data assigned in scull_open */
struct scull_qset *dptr;
int quantum = dev->quantum, qset = dev->qset;
int itemsize = quantum * qset;
int item; /* item in linked list */
int s_pos; /* position in qset data array */
int q_pos; /* position in quantum */
int rest;
ssize_t retval = -ENOMEM; /* value used in "goto out" statements */
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
/* find listitem, qset index and offset in the quantum */
item = (long)*f_pos / itemsize;
rest = (long)*f_pos % itemsize;
s_pos = rest / quantum;
q_pos = rest % quantum;
/* follow the list up to the right position */
dptr = scull_follow(dev, item);
if (dptr == NULL)
goto out;
if (!dptr->data) {
dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
if (!dptr->data)
goto out;
memset(dptr->data, 0, qset * sizeof(char *));
}
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos])
goto out;
}
/* write only up to the end of this quantum */
if (count > quantum - q_pos)
count = quantum - q_pos;
if (copy_from_user(dptr->data[s_pos]+q_pos, buf, count)) {
retval = -EFAULT;
goto out;
}
*f_pos += count;
retval = count;
/* update the size */
if (dev->size < *f_pos)
dev->size = *f_pos;
out:
up(&dev->sem);
return retval;
}
int scull_open(struct inode *inode, struct file *filp)
{
struct scull_dev *dev; /* device information */
/* Question: Why was the lock not placed here? */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
scull_trim(dev); /* ignore errors */
up(&dev->sem);
}
return 0; /* success */
}
int scull_init_module(void)
{
int result, i;
dev_t dev = 0;
/* assigns major and minor numbers (left out for brevity) */
/*
* allocate the devices -- we can't have them static, as the number
* can be specified at load time
*/
scull_devices = kmalloc(scull_nr_devs * sizeof(struct scull_dev), GFP_KERNEL);
if (!scull_devices) {
result = -ENOMEM;
goto fail; /* isn't this redundant? */
}
memset(scull_devices, 0, scull_nr_devs * sizeof(struct scull_dev));
/* Initialize each device. */
for (i = 0; i < scull_nr_devs; i++) {
scull_devices[i].quantum = scull_quantum;
scull_devices[i].qset = scull_qset;
init_MUTEX(&scull_devices[i].sem);
scull_setup_cdev(&scull_devices[i], i);
}
/* some other stuff (left out for brevity) */
return 0; /* succeed */
fail:
scull_cleanup_module(); /* left out for brevity */
return result;
}
/*
* Set up the char_dev structure for this device.
*/
static void scull_setup_cdev(struct scull_dev *dev, int index)
{
int err, devno = MKDEV(scull_major, scull_minor + index);
cdev_init(&dev->cdev, &scull_fops);
dev->cdev.owner = THIS_MODULE;
dev->cdev.ops = &scull_fops; /* isn't this redundant? */
err = cdev_add (&dev->cdev, devno, 1);
/* Fail gracefully if need be */
if (err)
printk(KERN_NOTICE "Error %d adding scull%d", err, index);
}
All data in memory can be considered a "shared resource" if both threads are able to access it*. The only resource they wouldn't be shared between processors is the data in the registers, which is abstracted away in C.
There are two reasons that you would not practically consider two resources to be shared (even though they do not actually mean that two threads could not theoretically access them, some nightmarish code could sometimes bypass these).
Only one thread can/does access it. Clearly if only one thread accesses a variable then there can be no race conditions. This is the reason local variables and single threaded programs do not need locking mechanisms.
The value is constant. You can't get different results based on order of access if the value can never change.
The program you have shown here is incomplete, so it is hard to say, but each of the variables accessed without locking must meet one of the criteria for this program to be thread safe.
There are some non-obvious ways to meet the criteria, such as if a variable is constant or limited to one thread only in a specific context.
You gave two examples of lines that were not locked. For the first line.
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
This line does not actually access any variables, it just computes where the struct containing cdev would be. There can be no race conditions because nobody else has access to your pointers (though they have access to what they point to), they are only accessible within the function (this is not true of what they point to). This meets criteria (1).
The other example is
int quantum = dev->quantum, qset = dev->qset;
This one is a bit harder to say without context, but my best guess is that it is assumed that dev->quantum and dev->qset will never change during the function call. This seems supported by the fact that they are only called in scull_init_module which should only be called once at the very beginning. I believe this fits criteria (2).
Which brings up another way that you might change a shared variable without locking, if you know that other threads will not try to access it until you are done for some other reason (eg they are not extant yet)
In short, all memory is shared, but sometimes you can get away with acting like its not.
*There are embedded systems where each processor has some amount of RAM that only it could use, but this is not the typical case.

Memory allocation threshold (mmap vs malloc)

Id like to point out I'm new to this so I'm trying to understand / explain it best i can.
I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
Here is how memory is allocated currently using third party libsodium:
alloc_region(escrypt_region_t *region, size_t size)
{
uint8_t *base, *aligned;
#if defined(MAP_ANON) && defined(HAVE_MMAP)
if ((base = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE,
#ifdef MAP_NOCORE
MAP_ANON | MAP_PRIVATE | MAP_NOCORE,
#else
MAP_ANON | MAP_PRIVATE,
#endif
-1, 0)) == MAP_FAILED)
base = NULL; /* LCOV_EXCL_LINE */
aligned = base;
#elif defined(HAVE_POSIX_MEMALIGN)
if ((errno = posix_memalign((void **) &base, 64, size)) != 0) {
base = NULL;
}
aligned = base;
#else
base = aligned = NULL;
if (size + 63 < size)
errno = ENOMEM;
else if ((base = (uint8_t *) malloc(size + 63)) != NULL) {
aligned = base + 63;
aligned -= (uintptr_t) aligned & 63;
}
#endif
region->base = base;
region->aligned = aligned;
region->size = base ? size : 0;
return aligned;
}
So for example, this currently calls posix_memalign to allocate (e.g.) 32mb of memory.
32mb exceeds my 'memory cap' given to me (but does not throw memory warnings as the memory capacity is far greater, its just what I'm 'allowed' to use)
From some googling, I'm under the impression i can either use mmap and virtual memory.
I can see that the function above already has some mmap implemented but is never called.
Is it possible to convert the above code so that i never exceed my 30mb memory limit?
From my understanding, if this allocation would exceed my free memory, it would automatically allocate in virtual memory? So can i force this to happen and pretend that my free space is lower than available?
Any help is appreciated
UPDATE
/* Allocate memory. */
B_size = (size_t) 128 * r * p;
V_size = (size_t) 128 * r * N;
need = B_size + V_size;
if (need < V_size) {
errno = ENOMEM;
return -1;
}
XY_size = (size_t) 256 * r + 64;
need += XY_size;
if (need < XY_size) {
errno = ENOMEM;
return -1;
}
if (local->size < need) {
if (free_region(local)) {
return -1;
}
if (!alloc_region(local, need)) {
return -1;
}
}
B = (uint8_t *) local->aligned;
V = (uint32_t *) ((uint8_t *) B + B_size);
XY = (uint32_t *) ((uint8_t *) V + V_size);
I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
On Linux or POSIX systems, you might consider using setrlimit(2) with RLIMIT_AS:
This is the maximum size of the process's virtual memory
(address space) in bytes. This limit affects calls to brk(2),
mmap(2), and mremap(2), which fail with the error ENOMEM upon
exceeding this limit.
Above this limit, the mmap would fail, and so would fail for instance the call to malloc(3) that triggered that particular use of mmap.
I'm under the impression i can either use mmap
Notice that malloc(3) will call mmap(2) (or sometimes sbrk(2)...) to retrieve (virtual) memory from the kernel, thus growing your virtual address space. However, malloc often prefer to reused previously free-d memory (when available). And free usually won't call munmap(2) to release memory chunks but prefer to keep it for future malloc-s. Actually most C standard libraries segregate between "small" and "large" allocation (in practice a malloc for a gigabyte will use mmap, and the corresponding free would mmap immediately).
See also mallopt(3) and madvise(2). In case you need to lock some pages (obtained by mmap) into physical RAM, consider mlock(2).
Look also into this answer (explaining that the notion of RAM used by a particular process is not that easy).
For malloc related bugs (including memory leaks) use valgrind.

two function addresses subtraction

I'm reading a piece of code about exploit in here. There is a statement going like this:
/*
FreeBSD <= 6.1 suffers from classical check/use race condition on SMP
systems in kevent() syscall, leading to kernel mode NULL pointer
dereference. It can be triggered by spawning two threads:
1st thread looping on open() and close() syscalls, and the 2nd thread
looping on kevent(), trying to add possibly invalid filedescriptor.
*/
static void kernel_code(void) {
struct thread *thread;
gotroot = 1;
asm(
"movl %%fs:0, %0"
: "=r"(thread)
);
thread->td_proc->p_ucred->cr_uid = 0;
#ifdef PRISON_BREAK
thread->td_proc->p_ucred->cr_prison = NULL;
#endif
return;
}
static void code_end(void) {
return;
}
int main() {
....
memcpy(0, &kernel_code, &code_end - &kernel_code);
....
}
I'm curious what's the meaning of this memcpy? What is the result of &code_end - &kernel_code?
This assumes that the function kernel_code() will end where somewhere before function code_end() starts. The memcpy() therefore copies kernel_code() to address 0. One assumes that some other aspect of the exploit results in a return or jump to address 0, thereby running kernel_code().
void * memcpy ( void * destination, const void * source, size_t num );
That memcpy will copy the function kernel_code to address 0 (NULL).
What the code is trying to exploit, is gaining root privilege, UID of 0, by two threads competing for the queue, do_thread/do_thread2.
By mmaping the contents of the address of the code_end function, with the address of kernel_code, copy the result into the buffer, to the address 0, on the condition that the code are adjacent to each other, thereby, as in effective user id of 0 aka root.
This C++ Ref page summarizes what memcpy is about.
void * memcpy ( void * destination, const void * source, size_t num );
Copies the values of num bytes from the location pointed to by source
directly to the memory block pointed to by destination.

Use mbind to move pages in NUMA system

I am trying to get my head around the numa(3) library on Linux.
I have a large array allocated (many GB of memory). Threads running on random NUMA nodes write to it, as such pages are faulted in on random NUMA memory nodes (default NUMA policy).
At the end of threaded calculation I have a single-threaded job to sum up the results. For that I first compress the array removing a lot of elements and then want to move the remainder to the NUMA node of the master thread.
The move_pages syscall is not the right one for the job because it requires an array entry for each page - too much overhead.
The documentation is unclear wether it is possible to force numa_tonode_memory to move faulted memory.
So the only way that I see is to use mbind with MPOL_MF_MOVE, but I can't get my head around creating a proper nodemask argument (or something else is failing). Here is as far as I got:
#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <numa.h>
#include <numaif.h>
int master_node;
nodemask_t master_nodemask;
// initializer
// has to be called from master/main thread
void numa_lock_master_thread() {
int curcpu = sched_getcpu();
if (curcpu >= 0) {
// master_node = numa_node_of_cpu(curcpu);
numa_run_on_node(master_node = numa_node_of_cpu(curcpu));
if (master_node >= 0) {
struct bitmask * bindmask = numa_bitmask_alloc(numa_num_possible_nodes());
numa_bitmask_clearall(bindmask);
numa_bitmask_setbit(bindmask, master_node);
copy_bitmask_to_nodemask(bindmask, &master_nodemask);
numa_bitmask_free(bindmask);
}
} else { // has never failed before
perror("sched_getcpu");
}
}
static inline void numa_migrate_pages_to_master_node(void * addr, unsigned long len) {
if (master_node < 0)
return;
if ( mbind( addr
, len
, MPOL_BIND
, master_nodemask.n
, numa_max_node()
, MPOL_MF_MOVE)) {
perror("mbind");
}
}
from /usr/include/numa.h:
typedef struct {
unsigned long n[NUMA_NUM_NODES/(sizeof(unsigned long)*8)];
} nodemask_t;
Sometimes I get mbind: Bad address, sometimes the call succeeds, but subsequent memory accesses give a SIGSEGV.
addr is always a valid pointer returned by
mmap(NULL, (num_pages) * sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | (flags), -1, 0);
and len is page-aligned.
How can I make this work with as few syscalls as possible and without the large overhead that comes with move_pages?
And what is the proper way to set up the nodemask argument for mbind?

Simple C implementation to track memory malloc/free?

programming language: C
platform: ARM
Compiler: ADS 1.2
I need to keep track of simple melloc/free calls in my project. I just need to get very basic idea of how much heap memory is required when the program has allocated all its resources. Therefore, I have provided a wrapper for the malloc/free calls. In these wrappers I need to increment a current memory count when malloc is called and decrement it when free is called. The malloc case is straight forward as I have the size to allocate from the caller. I am wondering how to deal with the free case as I need to store the pointer/size mapping somewhere. This being C, I do not have a standard map to implement this easily.
I am trying to avoid linking in any libraries so would prefer *.c/h implementation.
So I am wondering if there already is a simple implementation one may lead me to. If not, this is motivation to go ahead and implement one.
EDIT: Purely for debugging and this code is not shipped with the product.
EDIT: Initial implementation based on answer from Makis. I would appreciate feedback on this.
EDIT: Reworked implementation
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <limits.h>
static size_t gnCurrentMemory = 0;
static size_t gnPeakMemory = 0;
void *MemAlloc (size_t nSize)
{
void *pMem = malloc(sizeof(size_t) + nSize);
if (pMem)
{
size_t *pSize = (size_t *)pMem;
memcpy(pSize, &nSize, sizeof(nSize));
gnCurrentMemory += nSize;
if (gnCurrentMemory > gnPeakMemory)
{
gnPeakMemory = gnCurrentMemory;
}
printf("PMemAlloc (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pSize + 1, nSize, gnCurrentMemory, gnPeakMemory);
return(pSize + 1);
}
return NULL;
}
void MemFree (void *pMem)
{
if(pMem)
{
size_t *pSize = (size_t *)pMem;
// Get the size
--pSize;
assert(gnCurrentMemory >= *pSize);
printf("PMemFree (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pMem, *pSize, gnCurrentMemory, gnPeakMemory);
gnCurrentMemory -= *pSize;
free(pSize);
}
}
#define BUFFERSIZE (1024*1024)
typedef struct
{
bool flag;
int buffer[BUFFERSIZE];
bool bools[BUFFERSIZE];
} sample_buffer;
typedef struct
{
unsigned int whichbuffer;
char ch;
} buffer_info;
int main(void)
{
unsigned int i;
buffer_info *bufferinfo;
sample_buffer *mybuffer;
char *pCh;
printf("Tesint MemAlloc - MemFree\n");
mybuffer = (sample_buffer *) MemAlloc(sizeof(sample_buffer));
if (mybuffer == NULL)
{
printf("ERROR ALLOCATING mybuffer\n");
return EXIT_FAILURE;
}
bufferinfo = (buffer_info *) MemAlloc(sizeof(buffer_info));
if (bufferinfo == NULL)
{
printf("ERROR ALLOCATING bufferinfo\n");
MemFree(mybuffer);
return EXIT_FAILURE;
}
pCh = (char *)MemAlloc(sizeof(char));
printf("finished malloc\n");
// fill allocated memory with integers and read back some values
for(i = 0; i < BUFFERSIZE; ++i)
{
mybuffer->buffer[i] = i;
mybuffer->bools[i] = true;
bufferinfo->whichbuffer = (unsigned int)(i/100);
}
MemFree(bufferinfo);
MemFree(mybuffer);
if(pCh)
{
MemFree(pCh);
}
return EXIT_SUCCESS;
}
You could allocate a few extra bytes in your wrapper and put either an id (if you want to be able to couple malloc() and free()) or just the size there. Just malloc() that much more memory, store the information at the beginning of your memory block and and move the pointer you return that many bytes forward.
This can, btw, also easily be used for fence pointers/finger-prints and such.
Either you can have access to internal tables used by malloc/free (see this question: Where Do malloc() / free() Store Allocated Sizes and Addresses? for some hints), or you have to manage your own tables in your wrappers.
You could always use valgrind instead of rolling your own implementation. If you don't care about the amount of memory you allocate you could use an even simpler implementation: (I did this really quickly so there could be errors and I realize that it is not the most efficient implementation. The pAllocedStorage should be given an initial size and increase by some factor for a resize etc. but you get the idea.)
EDIT: I missed that this was for ARM, to my knowledge valgrind is not available on ARM so that might not be an option.
static size_t indexAllocedStorage = 0;
static size_t *pAllocedStorage = NULL;
static unsigned int free_calls = 0;
static unsigned long long int total_mem_alloced = 0;
void *
my_malloc(size_t size){
size_t *temp;
void *p = malloc(size);
if(p == NULL){
fprintf(stderr,"my_malloc malloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
total_mem_alloced += size;
temp = (size_t *)realloc(pAllocedStorage, (indexAllocedStorage+1) * sizeof(size_t));
if(temp == NULL){
fprintf(stderr,"my_malloc realloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
pAllocedStorage = temp;
pAllocedStorage[indexAllocedStorage++] = (size_t)p;
return p;
}
void
my_free(void *p){
size_t i;
int found = 0;
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] == (size_t)p){
pAllocedStorage[i] = (size_t)NULL;
found = 1;
break;
}
}
if(!found){
printf("Free Called on unknown\n");
}
free_calls++;
free(p);
}
void
free_check(void) {
size_t i;
printf("checking freed memeory\n");
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] != (size_t)NULL){
printf( "Memory leak %X\n", (unsigned int)pAllocedStorage[i]);
free((void *)pAllocedStorage[i]);
}
}
free(pAllocedStorage);
pAllocedStorage = NULL;
}
I would use rmalloc. It is a simple library (actually it is only two files) to debug memory usage, but it also has support for statistics. Since you already wrapper functions it should be very easy to use rmalloc for it. Keep in mind that you also need to replace strdup, etc.
Your program may also need to intercept realloc(), calloc(), getcwd() (as it may allocate memory when buffer is NULL in some implementations) and maybe strdup() or a similar function, if it is supported by your compiler
If you are running on x86 you could just run your binary under valgrind and it would gather all this information for you, using the standard implementation of malloc and free. Simple.
I've been trying out some of the same techniques mentioned on this page and wound up here from a google search. I know this question is old, but wanted to add for the record...
1) Does your operating system not provide any tools to see how much heap memory is in use in a running process? I see you're talking about ARM, so this may well be the case. In most full-featured OSes, this is just a matter of using a cmd-line tool to see the heap size.
2) If available in your libc, sbrk(0) on most platforms will tell you the end address of your data segment. If you have it, all you need to do is store that address at the start of your program (say, startBrk=sbrk(0)), then at any time your allocated size is sbrk(0) - startBrk.
3) If shared objects can be used, you're dynamically linking to your libc, and your OS's runtime loader has something like an LD_PRELOAD environment variable, you might find it more useful to build your own shared object that defines the actual libc functions with the same symbols (malloc(), not MemAlloc()), then have the loader load your lib first and "interpose" the libc functions. You can further obtain the addresses of the actual libc functions with dlsym() and the RTLD_NEXT flag so you can do what you are doing above without having to recompile all your code to use your malloc/free wrappers. It is then just a runtime decision when you start your program (or any program that fits the description in the first sentence) where you set an environment variable like LD_PRELOAD=mymemdebug.so and then run it. (google for shared object interposition.. it's a great technique and one used by many debuggers/profilers)

Resources