Reduce malloc calls by slicing one big malloc'd memory - c

First, here is where I got the idea from:
There was once an app I wrote that used lots of little blobs of
memory, each allocated with malloc(). It worked correctly but was
slow. I replaced the many calls to malloc with just one, and then
sliced up that large block within my app. It was much much faster.
I was profiling my application, and I got a unexpectedly nice performance boost when I reduced the number of malloc calls. I am still allocating the same amount of memory, though.
So, I would like to do what this guy did, but I am unsure what's the best way to do it.
My Idea:
// static global variables
static void * memoryForStruct1 = malloc(sizeof(Struct1) * 10000);
int struct1Index = 0;
...
// somewhere, I need memory, fast:
Struct1* data = memoryForStruct1[struct1Index++];
...
// done with data:
--struct1Index;
Gotchas:
I have to make sure I don't exceed 10000
I have to release the memory in the same order I occupied. (Not a major issue in my case, since I am using recursion, but I would like to avoid it if possible).
Inspired from Mihai Maruseac:
First, I create a linked list of int that basically tells me which memory indexes are free. I then added a property to my struct called int memoryIndex which helps me return the memory occupied in any order. And Luckily, I am sure my memory needs would never exceed 5 MB at any given time, so I can safely allocate that much memory. Solved.

The system call which gives you memory is brk. The usual malloc and calloc, realloc functions simply use the space given by brk. When that space is not enough, another brk is made to create new space. Usually, the space is increased in sizes of a virtual memory page.
Thus, if you really want to have a premade pool of objects, then make sure to allocate memory in multiples of pagesize. For example, you can create one pool of 4KB. 8KB, ... space.
Next idea, look at your objects. Some of them have one size, some have other size. It will be a big pain to handle allocations for all of them from the same pool. Create pools for objects of various sizes (powers of 2 is best) and allocate from them. For example, if you'll have an object of size 34B you'd allocate space for it from the 64B pool.
Lastly, the remaining space can be either left unused or it can be moved down to the other pools. In the above example, you have 30B left. You'd split it in 16B, 8B, 4B and 2B chunks and add each chunk to their respective pool.
Thus, you'd use linked lists to manage the preallocated space. Which means that your application will use more memory than it actually needs but if this really helps you, why not?
Basically, what I've described is a mix between buddy allocator and slab allocator from the Linux kernel.
Edit: After reading your comments, it will be pretty easy to allocate a big area with malloc(BIG_SPACE) and use this as a pool for your memory.

If you can, look at using glib which has memory slicing API that supports this. It's very easy to use, and saves you from having to re-implement it.

Related

How to get heap memory usage, FreeRTOS [duplicate]

I'm creating a list of elements inside a task in the following way:
l = (dllist*)pvPortMalloc(sizeof(dllist));
dllist is 32 byte big.
My embedded system has 60kB SRAM so I expected my 200 element list can be handled easily by the system. I found out that after allocating space for 8 elements the system is crashing on the 9th malloc function call (256byte+).
If possible, where can I change the heap size inside freeRTOS?
Can I somehow request the current status of heap size?
I couldn't find this information in the documentation so I hope somebody can provide some insight in this matter.
Thanks in advance!
(Yes - FreeRTOS pvPortMalloc() returns void*.)
If you have 60K of SRAM, and configTOTAL_HEAP_SIZE is large, then it is unlikely you are going to run out of heap after allocating 256 bytes unless you had hardly any heap remaining before hand. Many FreeRTOS demos will just keep creating objects until all the heap is used, so if your application is based on one of those, then you would be low on heap before your code executed. You may have also done something like use up loads of heap space by creating tasks with huge stacks.
heap_4 and heap_5 will combine adjacent blocks, which will minimise fragmentation as far as practical, but I don't think that will be your problem - especially as you don't mention freeing anything anywhere.
Unless you are using heap_3.c (which just makes the standard C library malloc and free thread safe) you can call xPortGetFreeHeapSize() to see how much free heap you have. You may also have xPortGetMinimumEverFreeHeapSize() available to query how close you have ever come to running out of heap. More information: http://www.freertos.org/a00111.html
You could also define a malloc() failed hook (http://www.freertos.org/a00016.html) to get instant notification of pvPortMalloc() returning NULL.
For the standard allocators you will find a config option in FreeRTOSConfig.h .
However:
It is very well possible you run out of memory already, depending on the allocator used. IIRC there is one that does not free() any blocks (free() is just a dummy). So any block returned will be lost. This is still useful if you only allocate memory e.g. at startup, but then work with what you've got.
Other allocators might just not merge adjacent blocks once returned, increasing fragmentation much faster than a full-grown allocator.
Also, you might loose memory to fragmentation. Depending on your alloc/free pattern, you quickly might end up with a heap looking like swiss cheese: Many holes between allocated blocks. So while there is still enough free memory, no single block is big enough for the size required.
If you only allocate blocks that size there, you might be better of using your own allocator or a pool (blocks of fixed size). Thaqt would be statically allocated (e.g. array) and chained as a linked list during startup. Alloc/free would then just be push/pop on a stack (or put/get on a queue). That would also be very fast and have complexity O(1) (interrupt-safe if properly written).
Note that normal malloc()/free() are not interrupt-safe.
Finally: Do not cast void *. (Well, that's actually what standard malloc() returns and I expect that FreeRTOS-variant does the same).

malloc and other associated functions

I have an array named 'ArrayA' and it is full of ints but I want to add another 5 cell to the end of the array every time a condition is met. How would I do this? ( The internet is not being very helpful )
If this is a static array, you will have to create a new one with more space and copy the data yourself. If it was allocated with malloc(), as the title to your question suggests, then you can use realloc() to do this more-or-less automatically. Note that the address of your array will, in general, have changed.
It is precisely because of the need for "dynamic" arrays that grow (and shrink) as needed, that languages like C++ introduced vectors. They do the management under the covers.
You need the realloc function.
Also note that adding 5 cells is not the best performance solution.
It is best to double the size of your arrays every time an array increase is needed.
Use two variables, one for the size (the number of integers used) and one for capacity (the actual memory size of arrays)
In a modern OS it is generally safe to assume that if you allocate a lot of memory that you don't use then it will not actually consume physical RAM, but only exist as virtual mappings. The OS will provide physical RAM as soon as a page (today generally in chunks of 4Kb) is used for the first time.
You can specifically enforce this behavior by using mmap to create a large anonymous mapping (MAP_PRIVATE | MAP_ANONYMOUS) e.g. as much as you intend to hold at maximum. On modern x64 systems virtual mappings can be up to 64Tb large. It is logically memory available to your program, but in practice pages will be added to it as you start using them.
realloc as described by the other posters is the naiive way to resize a malloc mapping, but make sure that realloc was successful. It can fail!
Problems with memory arise when you use memory once, don't deallocate it and stop using it. In contrast allocated, but untouched memory generally does not actually use resources other then VM table entries.

How to find how much memory is actually used up by a malloc call?

If I call:
char *myChar = (char *)malloc(sizeof(char));
I am likely to be using more than 1 byte of memory, because malloc is likely to be using some memory on its own to keep track of free blocks in the heap, and it may effectively cost me some memory by always aligning allocations along certain boundaries.
My question is: Is there a way to find out how much memory is really used up by a particular malloc call, including the effective cost of alignment, and the overhead used by malloc/free?
Just to be clear, I am not asking to find out how much memory a pointer points to after a call to malloc. Rather, I am debugging a program that uses a great deal of memory, and I want to be aware of which parts of the code are allocating how much memory. I'd like to be able to have internal memory accounting that very closely matches the numbers reported by top. Ideally, I'd like to be able to do this programmatically on a per-malloc-call basis, as opposed to getting a summary at a checkpoint.
There isn't a portable solution to this, however there may be operating-system specific solutions for the environments you're interested in.
For example, with glibc on Linux, you can use the mallinfo() function from <malloc.h> which returns a struct mallinfo. The uordblks and hblkhd members of this structure contains the dynamically allocated address space used by the program including book-keeping overhead - if you take the difference of this before and after each malloc() call, you will know the amount of space used by that call. (The overhead is not necessarily constant for every call to malloc()).
Using your example:
char *myChar;
size_t s = sizeof(char);
struct mallinfo before, after;
int mused;
before = mallinfo();
myChar = malloc(s);
after = mallinfo();
mused = (after.uordblks - before.uordblks) + (after.hblkhd - before.hblkhd);
printf("Requested size %zu, used space %d, overhead %zu\n", s, mused, mused - s);
Really though, the overhead is likely to be pretty minor unless you are making a very very high number of very small allocations, which is a bad idea anyway.
It really depends on the implementation. You should really use some memory debugger. On Linux Valgrind's Massif tool can be useful. There are memory debugging libraries like dmalloc, ...
That said, typical overhead:
1 int for storing size + flags of this block.
possibly 1 int for storing size of previous/next block, to assist in coallescing blocks.
2 pointers, but these may only be used in free()'d blocks, being reused for application storage in allocated blocks.
Alignment to an approppriate type, e.g: double.
-1 int (yes, that's a minus) of the next/previous chunk's field containing our size if we are an allocated block, since we cannot be coallesced until we're freed.
So, a minimum size can be 16 to 24 bytes. and minimum overhead can be 4 bytes.
But you could also satisfy every allocation via mapping memory pages (typically 4Kb), which would mean overhead for smaller allocations would be huge. I think OpenBSD does this.
There is nothing defined in the C library to query the total amount of physical memory used by a malloc() call. The amount of memory allocated is controlled by whatever memory manager is hooked up behind the scenes that malloc() calls into. That memory manager can allocate as much extra memory as it deemes necessary for its internal tracking purposes, on top of whatever extra memory the OS itself requires. When you call free(), it accesses the memory manager, which knows how to access that extra memory so it all gets released properly, but there is no way for you to know how much memory that involves. If you need that much fine detail, then you need to write your own memory manager.
If you do use valgrind/Massif, there's an option to show either the malloc value or the top value, which differ a LOT in my experience. Here's an excerpt from the Valgrind manual http://valgrind.org/docs/manual/ms-manual.html :
...However, if you wish to measure all the memory used by your program,
you can use the --pages-as-heap=yes. When this option is enabled,
Massif's normal heap block profiling is replaced by lower-level page
profiling. Every page allocated via mmap and similar system calls is
treated as a distinct block. This means that code, data and BSS
segments are all measured, as they are just memory pages. Even the
stack is measured...

C Memory Management in Embedded Systems

I have to use c/asm to create a memory management system since malloc/free don't yet exist. I need to have malloc/free!
I was thinking of using the memory stack as the space for the memory, but this would fail because when the stack pointer shrinks, ugly things happen with the allocated space.
1) Where would memory be allocated? If I place it randomly in the middle of the Heap/Stack and the Heap/Stack expands, there will be conflicts with allocated space!
12 What Is the simplest/cleanest solution for memory management? These are the only options I've researched:
A memory stack where malloc grows the stack and free(p) shrinks the stack by shifting [p..stack_pointer] (this would invalidate the shifted memory addresses though...).
A linked list (Memory Pool) with a variable-size chunk of memory. However I don't know where to place this in memory... should the linked list be a "global" variable, or "static"?
Thanks!
This article provides a good review of memory management techniques. The resources section at the bottom has links to several open source malloc implementations.
For embedded systems the memory is partitioned at link time into several sections or pools, i.e.:
ro (code + constants)
rw (heap)
zi (zero initialised memory for static variables)
You could add a 4th section in the linker configuration files that would effectively allocate a space in the memory map for dynamic allocations.
However once you have created the raw storage for dynamic memory then you need to understand how many, how large and how frequent the dynamic allocations will occur. From this you can build a picture of how the memory will fragment over time.
Typically an application that is running OS free will not use dynamic memory as you don't want to have to deal with the consequences of malloc failing. If at all possible the better solution is design to avoid it. If this is not at all possible try and simplify the dynamic behaviour using a few large structures that have the data pre-allocated before anything needs to use it.
For example say that you have an application that processes 10bytes of data whilst receiving the next 10 bytes of data to process, you could implement a simple buffering solution. The driver will always be requesting buffers of the same size and there would be a need for 3 buffers. Adding a little meta data to a structure:
{
int inUse;
char data[10];
}
You could take an array of three of theses structures (remembering to initialise inUse to 0 and flick between [0] and [1], with [2] reserved for the situations when a few too many interrupts occur and the next buffer is required buffer one is freed (the need for the 3rd buffer). The alloc algorithm would on need to check for the first buffer !inUse and return a pointer to data. The free would merely need to change inUse back to 0.
Depending on the amount of available RAM and machine (physical / virtual addressing) that you're using there are lots of possible algorithms, but the more complex the algorithm the longer the allocations could take.
Declare a huge static char buffer and use this memory to write your own malloc & free functions.
Algorithms for writing malloc and free could be as complex (and optimized) or as simple as you want.
One simple way could be following...
based on the type of memory allocation needs in your application try to find the most common buffer sizes
declare structures for each size with a char buffer of that length
and a boolean to represent whether buffer is occupied or not.
Then declare static arrays of above structures( decide array sizes
based on the total memory available in the system)
now malloc would simply go the most suitable array based on the
required size and search for a free buffer (use some search algo here
or simply use linear search) and return. Also mark the boolean in the
associated structure to TRUE.
free would simply search for buffer and mark the boolean to FALSE.
hope this helps.
Use the GNU C library. You can use just malloc() and free(), or any other subset of the library. Borrowing the design and/or implementation and not reinventing the wheel is a good way to be productive.
Unless, of course, this is homework where the point of the exercise is to implement malloc and free....

Efficient memory reallocation question

Let's say I have a program(C++, for example) that allocates multiple objects, never bigger than a given size(let's call it MAX_OBJECT_SIZE).
I also have a region(I'll call it a "page") on the heap(allocated with, say, malloc(REGION_SIZE), where REGION_SIZE >= MAX_OBJECT_SIZE).
I keep reserving space in that page until the filled space equals PAGE_SIZE(or at least gets > PAGE_SIZE - MAX_OBJECT_SIZE).
Now, I want to allocate more memory. Obviously my previous "page" won't be enough. So I have at least two options:
Use realloc(page, NEW_SIZE), where NEW_SIZE > PAGE_SIZE;
Allocate a new "page"(page2) and put the new object there.
If I wanted to have a custom allocate function, then:
Using the first method, I'd see how much I had filled, and then put my new object there(and add the size of the object to my filled memory variable).
Using the second method, I'd have a list(vector? array?) of pages, then look for the current page, and then use a method similar to 1 on the selected page.
Eventually, I'd need a method to free memory too, but I can figure out that part.
So my question is: What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
I understand that different operations may perform differently, but I'm looking for an overall metric.
In my experience option 2 is much easier to work with has minimal overhead. Realloc does not guarantee it will increase the size of existing memory. And in practice it almost never does. If you use it you will need to go back and remap all of the old objects. That would require that you remember where every object allocated was... That can be a ton over overhead.
But it's hard to qualify "most efficient" without knowing exactly what metrics you use.
This is the memory manager I always use. It works for the entire application not just one object.
allocs:
for every allocation determine the size of the object allocated.
1 look at a link list of frees for objects of that size to see if anything has been freed if so take the first free
2 look for in a look up table and if not found
2.1 allocate an array of N objects of the size being allocated.
3 return the next free object of the desired size.
3.1 if the array is full add a new page.
N objects can be programmer tunned. If you know you have a million 16 byte objects you might want that N to be slightly higher.
for objects over some size X, do not keep an array simply allocate a new object.
frees:
determine the size of the object, add it to the link list of frees.
if the size of the object allocated is less than the size of a pointer the link list does not need to incur any memory overhead. simply use the already allocated memory to store the nodes.
The problem with this method is memory is never returned to the operating system until the application has exited or the programmer decides to defragment the memory. defragmenting is another post. it can be done.
It is not clear from your question why you need to allocate a big block of memory in advance rather than allocating memory for each object as needed. I'm assuming you are using it as a contiguous array. Otherwise, it would make more sense to malloc the memory of each object as it is needed.
If it is indeed acting as an array,malloc-ing another block gives you another chunk of memory that you have to access via another pointer (in your case page2). Thus it is no longer on contiguous block and you cannot use the two blocks as part of one array.
realloc, on the other hand, allocates one contiguous block of memory. You can use it as a single array and do all sorts of pointer arithmetic not possible if there are separate blocks. realloc is also useful when you actually want to shrink the block you are working with, but that is probably not what you are seeking to do here.
So, if you are using this as an array, realloc is basically the better option. Otherwise, there is nothing wrong with malloc. Actually, you might want to use malloc for each object you create rather than having to keep track of and micro-manage blocks of memory.
You have not given any details on what platform you are experimenting. There are some performance differences for realloc between Linux and Windows, for example.
Depending on the situation, realloc might have to allocate a new memory block if it can't grow the current one and copy the old memory to the new one, which is expensive.
If you don't really need a contiguous block of memory you should avoid using realloc.
My sugestion would be to use the second approach, or use a custom allocator (you could implement a simple buddy allocator [2]).
You could also use more advanced memory allocators, like
APR memory pools
Google's TCMalloc
In the worst case, option 1 could cause a "move" of the original memory, that is an extrawork to be done. If the memory is not moved, anyway the "extra" size is initialized, which is other work too. So realloc would be "defeated" by the malloc method, but to say how much, you should do tests (and I think there's a bias on how the system is when the memory requests are done).
Depending on how many times you expect the realloc/malloc have to be performed, it could be an useful idea or an unuseful one. I would use malloc anyway.
The free strategy depends on the implementation. To free all the pages as whole, it is enough to "traverse" them; instead of an array, I would use linked "pages": add sizeof(void *) to the "page" size, and you can use the extra bytes to store the pointer to the next page.
If you have to free a single object, located anywhere in one of the pages, it becomes a little bit more complex. My idea is to keep a list of non-sequential free "block"/"slot" (suitable to hold any object). When a new "block" is requested, first you pop a value from this list; if it is empty, then you get the next "slot" in the last in use page, and eventually a new page is triggered. Freeing an object, means just to put the empty slot address in a stack/list (whatever you prefer to use).
In linux (and probably other POSIX systems) there is a third possibility, that is to use a memory mapped region with shm_open. Such a region is initialized by zeroes once you access it, but AFAIK pages that you never access come with no cost, if it isn't just the address-range in virtual memory that you reserve. So you could just reserve a large chunk of memory at the beginning (more than you ever would need) of your execution and then fill it incrementally from the start.
What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
Option 1. For it to be efficient, NEW_SIZE has to depend on old size non-linearly. Otherwise you risk running into O(n^2) performance of realloc() due to the redundant copying. I generally do new_size = old_size + old_size/4 (increase by 25% percent) as theoretically best new_size = old_size*2 might in worst case reserve too much unused memory.
Option 2. It should be more optimal as most modern OSs (thanks to C++'s STL) are already well optimized for flood of small memory allocations. And small allocations have lesser chance to cause memory fragmentation.
In the end it all depends how often you allocate the new objects and how do you handle freeing. If you allocate a lot with #1 you would have some redundant copying when expanding but freeing is dead simple since all objects are in the same page. If you would need to free/reuse the objects, with #2 you would be spending some time walking through the list of pages.
From my experience #2 is better, as moving around large memory blocks might increase rate of heap fragmentation. The #2 is also allows to use pointers as objects do not change their location in memory (though for some applications I prefer to use pool_id/index pairs instead of raw pointers). If walking through pages becomes a problem later, it can be too optimized.
In the end you should also consider option #3: libc. I think that libc's malloc() is efficient enough for many many tasks. Please test it before investing more of your time. Unless you are stuck on some backward *NIX, there should be no problem using malloc() for every smallish object. I used custom memory management only when I needed to put objects in exotic places (e.g. shm or mmap). Keep in mind the multi-threading too: malloc()/realloc()/free() generally are already optimized and MT-ready; you would have to reimplement the optimizations anew to avoid threads being constantly colliding on memory management. And if you want to have memory pools or zones, there are already bunch of libraries for that too.

Resources