(C) which heap policies are most often used? - c

I have heard that 'better-fit' is pretty commonly used, but I don't seem to read much about it online. What are the most commonly used / thought to be the most efficient policies used by heap allocators.
(I admit my vocabulary may be flawed; when I say 'policy' i mean things such as 'best fit,' 'first fit,' 'next fit,' etc)
Edit: I am also particularly interested in how the heap policies of 'better fit' and doug lea's strategy (http://gee.cs.oswego.edu/dl/html/malloc.html) compare. Doug uses a type of best fit, but his approach uses index bins, whereas better fit uses a Cartesian tree.

C programming environments use the malloc implementation provided by the standard C library that comes with the operating system The concepts in Doug Lea's memory allocator (called dlmalloc) are most widely used in most of the memory allocators in one form or another on UNIX systems. dlmalloc uses bins of different sizes to accommodate objects - the bin closest to the object size is used to allocate the object.
FreeBSD uses a new multi-threaded memory-allocator called jemalloc designed to be concurrent and thread-safe, which provides good performance characteristics when used in multi-core systems of today. A comparison of the old malloc and the new multithreaded one can be found here. Even though it's multi-threaded it still uses the concepts of chunks of different sizes to accommodate objects according to their size (chunk(s) closest to the size of the object are used to allocate the object).
Inside the UNIX kernels the most popular memory allocator is the slab allocator, which was introduced by Sun Microsystems. The slab allocator uses large chunks of memory called slabs. These slabs are divided among caches of objects (or pools) of different sizes. Each object is allocated from the cache which contains objects closest to its size.
As you'd notice the above bin/chunks/slab caches are just forms of the best fit algorithm. So, you can easily assume that the "best fit" algorithm is one of the most widely used malloc algorithm (although memory allocators differ in other important ways).

Related

memory allocation/deallocation for embedded devices

Currently we use malloc/free Linux commands for memory allocation/de-allocation in our C based embedded application. I heard that this would cause memory fragmentation as the heap size increases/decreases because of memory allocation/de-allocation which would result in performance degradation. Other programming languages with efficient Garbage Collection solves this issue by freeing the memory when not in use.
Are there any alternate approaches which would solve this issue in C based embedded programs ?
You may take a look at a solution called memory pool allocation.
See: Memory pools implementation in C
Yes, there's an easy solution: don't use dynamic memory allocation outside of initialization.
It is common (in my experience) in embedded systems to only allow calls to malloc when a program starts (this is usually done by convention, there's nothing in C to enforce this. Although you can create your own wrapper for malloc to do this). This requires more work to analyze what memory your program could possibly use since you have to allocate it all at once. The benefit you get, however, is a complete understanding of what memory your program uses.
In some cases this is fairly straightforward, in particular if your system has enough memory to allocate everything it could possibly need all at once. In severely memory-limited systems, however, you're left with the managing the memory yourself. I've seen this done by writing "custom allocators" which you allocate and free memory from. I'll provide an example.
Let's say you're implementing some mathematical program that needs lots of big matrices (not terribly huge, but for example 1000x1000 floats). Your system may not have the memory to allocate many of these matrices, but if you can allocate at least one of them, you could create a pool of memory used for matrix objects, and every time you need a matrix you grab memory from that pool, and when you're done with it you return it to the pool. This is easy if you can return them in the same order you got them in, meaning the memory pool works just like a stack. If this isn't the case, perhaps you could just clear the entire pool at the end of each "iteration" (assuming this math system is periodic).
With more detail about what exactly you're trying to implement I could provide more relevant/specific examples.
Edit: See sg7's answer as well: that user provides a link to well-established frameworks which implement what I describe here.

Popular use of Dynamic memory allocation

I have been reading coding standards in C and most of them discourages use of dynamic memory allocation.But In popular use Dynamic memory allocation leads .Any solid reason for this.I am asking the reasons for its use despite the Demerits it posses ?
These are my references
JPL Standards :http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf
Power of 10 :http://spinroot.com/gerard/pdf/P10.pdf
Dynamic memory allocation is generally banned in embedded systems programming, particularly in safety-critical embedded software. All industry standards for safety-critical software bans it: MISRA-C, DO178B, IEC 61508, ISO 26262 and so on.
There are many well-known issues with dynamic memory allocation: slow and possibly indeterministic access time, memory leaks and heap fragmentation.
None of these issues are desired in any kind of program. But in PC/desktop etc programming, they are regarded as a necessary evil, mainly because the mainstream operative systems restrict the amount of static process memory given to each process and if you want to store data beyond that, you have to store it on the heap.
It is also convenient to use dynamic memory when the amount of data isn't known until runtime. However, there exist no computer in the known world with unlimited memory, so "I want to use a completely variable amount of data, I don't know how much" is kind of a nonsense argument. A proper software engineer always designs for the worst case scenario.
Particularly in embedded systems, where the amount of RAM is limited and the consequences of bugs are far more dire than an out-of-memory message box popping up,
your program must have 100% deterministic behavior. You can't design in things like "this program will work until it runs out of RAM, then it will crash and burn". You can't allow a variable "x" number of trains to exist in your railway supervisory system, you must specify the upper limit and design the system after that.
So no matter all the issues with dynamic memory mentioned above, you don't want to use dynamic memory in these kind of systems, simply because it doesn't make any sense.
Recursion is also banned from these systems, for pretty much the same reasons.
Dynamic memory allocation in C sits on the blurry line between abstract mathematics and real-world engineering. Mathematically you say, "put this data in some memory", and indeed malloc() just gives you "some memory", basically pretending that there is an unbounded amount of memory. (And on many real-world systems malloc() does in fact never fail, due to over-comitting.)
Real engineering has to face the boundedness of all resources, and if you approach a problem knowing full well that you have X amound of memory available, then you have to plan where the memory goes. This is more cumbersome and challenging, but it can also lead to better code and better performance, if for no other reason than that it forces you to think carefully about the data flow of your program.
By contrast to common desktop machines on which malloc() never fails, there are also, at the opposite end of the spectrum, embedded machines which don't have sophisticated memory managers and on which malloc() essentially "always fails". If you are able to program without it, then you are able to program for such platforms. On the other hand, if your programming style always assumes the unlimited availability of magic memory, then you will find programming on such platforms very difficult.

Implement a user-defined malloc() function?

How do you create a new malloc() function defined in C language ?
I don't even have an elflike hint as to how to do this , how to map virtual space address with physical space , what algorithm to follow ? I do know that malloc() is not so efficient as it can cause fragmentation. So something efficient can be created. Even if not efficient , how to implement a naive C malloc function?
It was asked recently in an interview.
Wikipedia actually provides a good summary of various malloc implementations, including ones optimized for specific conditions, along with links to learn more about the implementation.
http://en.wikipedia.org/wiki/C_dynamic_memory_allocation
dlmalloc
A general-purpose allocator. The GNU C library (glibc) uses an allocator based on dlmalloc.
jemalloc
In order to avoid lock contention, jemalloc uses separate "arenas" for each CPU. Experiments measuring number of allocations per second in multithreading application have shown that this makes it scale linearly with the number of threads, while for both phkmalloc and dlmalloc performance was inversely proportional to the number of threads.
Hoard memory allocator
Hoard uses mmap exclusively, but manages memory in chunks of 64 kilobytes called superblocks. Hoard's heap is logically divided into a single global heap and a number of per-processor heaps. In addition, there is a thread-local cache that can hold a limited number of superblocks. By allocating only from superblocks on the local per-thread or per-processor heap, and moving mostly-empty superblocks to the global heap so they can be reused by other processors, Hoard keeps fragmentation low while achieving near linear scalability with the number of threads.
tcmalloc
Every thread has local storage for small allocations. For large allocations mmap or sbrk can be used. TCMalloc, a malloc developed by Google, has garbage-collection for local storage of dead threads. The TCMalloc is considered to be more than twice as fast as glibc's ptmalloc for multithreaded programs.
dmalloc (not covered by Wikipedia)
The debug memory allocation or dmalloc library has been designed as a drop in replacement for the system's malloc, realloc, calloc, free and other memory management routines while providing powerful debugging facilities configurable at runtime. These facilities include such things as memory-leak tracking, fence-post write detection, file/line number reporting, and general logging of statistics.
I think it's a nice example of malloc in An example in Chapter 8.7 of C Programming Language book:

Optimization of C program with SLAB-like technologies

I have a programming project with highly intensive use of malloc/free functions.
It has three types of structures with very high dynamics and big numbers. By this way, malloc and free are heavily used, called thousands of times per second. Can replacement of standard memory allocation by user-space version of SLAB solve this problem? Is there any implementation of such algorithms?
P.S.
System is Linux-oriented.
Sizes of structures is less than 100 bytes.
Finally, I'll prefer to use ready implementation because memory management is really hard topic.
If you only have three different then you would greatly gain by using a pool allocator (either custom made or something like boost::pool but for C). Doug Lea's binning based malloc would serve as a very good base for a pool allocator (its used in glibc).
However, you also need to take into account other factors, such as multi-threading and memory reusage (will objects be allocated, freed then realloced or just alloced then freed?). from this angle you can check into tcmalloc (which is designed for extreme allocations, both quantity and memory usage), nedmalloc or hoard. all of these allocators are open source and thus can be easily altered to suite the sizes of the objects you allocate.
Without knowing more it's impossible to give you a good answer, but yes, managing your own memory (often by allocating a large block and then doing your own allocations with in that large block) can avoid the high cost associated with general purpose memory managers. For example, in Windows many small allocations will bring performance to its knees. Existing implementations exist for almost every type of memory manager, but I'm not sure what kind you're asking for exactly...
When programming in Windows I find calling malloc/free is like death for performance -- almost any in-app memory allocation that amortizes memory allocations by batching will save you gobs of processor time when allocating/freeing, so it may not be so important which approach you use, as long as you're not calling the default allocator.
That being said, here's some simplistic multithreading-naive ideas:
This isn't strictly a slab manager, but it seems to achieve a good balance and is commonly used.
I personally find I often end up using a fairly simple-to-implement memory-reusing manager for memory blocks of the same sizes -- it maintains a linked list of unused memory of a fixed size and allocates a new block of memory when it needs to. The trick here is to store the pointers for the linked list in the unused memory blocks -- that way there's a very tiny overhead of four bytes. The entire process is O(1) whenever it's reusing memory. When it has to allocate memory it calls a slab allocator (which itself is trivial.)
For a pure allocate-only slab allocator you just ask the system (nicely) to give you a large chunk of memory and keep track of what space you haven't used yet (just maintain a pointer to the start of the unused area and a pointer to the end). When you don't have enough space to allocate the requested size, allocate a new slab. (For large chunks, just pass through to the system allocator.)
The problem with chaining these approaches? Your application will never free any memory, but performance-critical applications often are either one-shot processing applications or create many objects of the same sizes and then stop using them.
If you're careful, the above approach isn't too hard to make multithread friendly, even with just atomic operations.
I recently implemented my own userspace slab allocator, and it proved to be much more efficient (speedwise and memory-wise) than malloc/free for a large amount of fixed-size allocations. You can find it here.
Allocations and freeing work in O(1) time, and are speeded up because of bitvectors being used to represent empty/full slots. When allocating, the __builtin_ctzll GCC intrinsic is used to locate the first set bit in the bitvector (representing an empty slot), which should translate to a single instruction on modern hardware. When freeing, some clever bitwise arithmetic is performed with the pointer itself, in order to locate the header of the corresponding slab and to mark the corresponding slot as free.

What are alternatives to malloc() in C?

I am writing C for an MPC 555 board and need to figure out how to allocate dynamic memory without using malloc.
Typically malloc() is implemented on Unix using sbrk() or mmap(). (If you use the latter, you want to use the MAP_ANON flag.)
If you're targetting Windows, VirtualAlloc may help. (More or less functionally equivalent to anonymous mmap().)
Update: Didn't realize you weren't running under a full OS, I somehow got the impression instead that this might be a homework assignment running on top of a Unix system or something...
If you are doing embedded work and you don't have a malloc(), I think you should find some memory range that it's OK for you to write on, and write your own malloc(). Or take someone else's.
Pretty much the standard one that everybody borrows from was written by Doug Lea at SUNY Oswego. For example glibc's malloc is based on this. See: malloc.c, malloc.h.
You might want to check out Ralph Hempel's Embedded Memory Manager.
If your runtime doesn't support malloc, you can find an open source malloc and tweak it to manage a chunk of memory yourself.
malloc() is an abstraction that is use to allow C programs to allocate memory without having to understand details about how memory is actually allocated from the operating system. If you can't use malloc, then you have no choice other than to use whatever facilities for memory allocation that are provided by your operating system.
If you have no operating system, then you must have full control over the layout of memory. At that point for simple systems the easiest solution is to just make everything static and/or global, for more complex systems, you will want to reserve some portion of memory for a heap allocator and then write (or borrow) some code that use that memory to implement malloc.
An answer really depends on why you might need to dynamically allocate memory. What is the system doing that it needs to allocate memory yet cannot use a static buffer? The answer to that question will guide your requirements in managing memory. From there, you can determine which data structure you want to use to manage your memory.
For example, a friend of mine wrote a thing like a video game, which rendered video in scan-lines to the screen. That team determined that memory would be allocated for each scan-line, but there was a specific limit to how many bytes that could be for any given scene. After rendering each scan-line, all the temporary objects allocated during that rendering were freed.
To avoid the possibility of memory leaks and for performance reasons (this was in the 90's and computers were slower then), they took the following approach: They pre-allocated a buffer which was large enough to satisfy all the allocations for a scan-line, according to the scene parameters which determined the maximum size needed. At the beginning of each scan-line, a global pointer was set to the beginning of the scan line. As each object was allocated from this buffer, the global pointer value was returned, and the pointer was advanced to the next machine-word-aligned position following the allocated amount of bytes. (This alignment padding was including in the original calculation of buffer size, and in the 90's was four bytes but should now be 16 bytes on some machinery.) At the end of each scan-line, the global pointer was reset to the beginning of the buffer.
In "debug" builds, there were two scan buffers, which were protected using virtual memory protection during alternating scan lines. This method detects stale pointers being used from one scan-line to the next.
The buffer of scan-line memory may be called a "pool" or "arena" depending on whome you ask. The relevant detail is that this is a very simple data structure which manages memory for a certain task. It is not a general memory manager (or, properly, "free store implementation") such as malloc, which might be what you are asking for.
Your application may require a different data structure to keep track of your free storage. What is your application?
You should explain why you can't use malloc(), as there might be different solutions for different reasons, and there are several reasons why it might be forbidden or unavailable on small/embedded systems:
concern over memory fragmentation. In this case a set of routines that allocate fixed size memory blocks for one or more pools of memory might be the solution.
the runtime doesn't provide a malloc() - I think most modern toolsets for embedded systems do provide some way to link in a malloc() implementation, but maybe you're using one that doesn't for whatever reason. In that case, using Doug Lea's public domain malloc might be a good choice, but it might be too large for your system (I'm not familiar with the MPC 555 off the top of my head). If that's the case, a very simple, custom malloc() facility might be in order. It's not too hard to write, but make sure you unit test the hell out of uit because it's also easy to get details wrong. For example, I have a set of very small routines that use a brain dead memory allocation strategy using blocks on a free list (the allocator can be compile-time configured for first, best or last fit). I give it an array of char at initialization, and subsequent allocation calls will split free blocks as necessary. It's nowhere near as sophisticated as Lea's malloc(), but it's pretty dang small so for simple uses it'll do the trick.
many embedded projects forbid the use of dynamic memory allocation - in this case, you have to live with statically allocated structures
Write your own. Since your allocator will probably be specialized to a few types of objects, I recommend the Quick Fit scheme developed by Bill Wulf and Charles Weinstock. (I have not been able to find a free copy of this paper, but many people have access to the ACM digital library.) The paper is short, easy to read, and well suited to your problem.
If you turn out to need a more general allocator, the best guide I have found on the topic of programming on machines with fixed memory is Donald Knuth's book The Art of Computer Programming, Volume 1. If you want examples, you can find good ones in Don's epic book-length treatment of the source code of TeX, TeX: The Program.
Finally, the undergraduate textbook by Bryant and O'Hallaron is rather expensive, but it goes through the implementation of malloc in excruciating detail.
Write your own. Preallocate a big chunk of static RAM, then write some functions to grab and release chunks of it. That's the spirit of what malloc() does, except that it asks the OS to allocate and deallocate memory pages dynamically.
There are a multitude of ways of keeping track of what is allocated and what is not (bitmaps, used/free linked lists, binary trees, etc.). You should be able to find many references with a few choice Google searches.
malloc() and its related functions are the only game in town. You can, of course, roll your own memory management system in whatever way you choose.
If there are issues allocating dynamic memory from the heap, you can try allocating memory from the stack using alloca(). The usual caveats apply:
The memory is gone when you return.
The amount of memory you can allocate is dependent on the maximum size of your stack.
You might be interested in: liballoc
It's a simple, easy-to-implement malloc/free/calloc/realloc replacement which works.
If you know beforehand or can figure out the available memory regions on your device, you can also use their libbmmm to manage these large memory blocks and provide a backing-store for liballoc. They are BSD licensed and free.
FreeRTOS contains 3 examples implementations of memory allocation (including malloc()) to achieve different optimizations and use cases appropriate for small embedded systems (AVR, ARM, etc). See the FreeRTOS manual for more information.
I don't see a port for the MPC555, but it shouldn't be difficult to adapt the code to your needs.
If the library supplied with your compiler does not provide malloc, then it probably has no concept of a heap.
A heap (at least in an OS-less system) is simply an area of memory reserved for dynamic memory allocation. You can reserve such an area simply by creating a suitably sized statically allocated array and then providing an interface to provide contiguous chunks of this array on demand and to manage chunks in use and returned to the heap.
A somewhat neater method is to have the linker allocate the heap from whatever memory remains after stack and static memory allocation. That way the heap is always automatically as large as it possibly can be, allowing you to use all available memory simply. This will require modification of the application's linker script. Linker scripts are specific to the particular toolchain, and invariable somewhat arcane.
K&R included a simple implementation of malloc for example.

Resources