Malloc Allocation Schemes - c

Yes, I am taking a Computer systems course.
I had a few questions about the various allocation schemes to implement malloc.
For explicit lists, if I implement malloc using a LIFO-like stack, what exactly is the purpose of having pointers to previous freed memory? Like why do you need doubly-linked lists? Wouldn't singly linked lists work just as well?
Malloc lecture.
I found this link online, you can look at slide 7 to see what I'm talking about.
When looking at a segregated list allocation scheme, these lists are uni-directional right? And also, what exactly is the coalescing mechanism? Like for example, if 4 words are freed, would you first try and join it when the free space around you before inserting it back into the respective segregated linked list? Or would you simply insert the 4 word block in the '4 word' section of the respective segregated linked list?
Thank you.

Since a freed block always has room for two pointers, why not doubly-link the list? It simplifies the coalescing code so it doesn't have to maintain a trailing pointer while traversing the list. It also allows traversing the list in either direction in case there is a hint for which end of the list might be closer to begin the search. One obscure system I once looked at kept a pointer in the "middle", where the last activity occurred.
When freeing a block. There are only four possible cases:
The free block is adjacent after a free block.
The free block is adjacent before a free block.
The free block is between and adjacent to both free blocks before and after it.
The free block is not adjacent to any free block.
The purposes of coalescing adjacent free blocks are:
to reduce the length of the linked list
to accurately reflect the size of a free block without burdening the allocator to look ahead to see if two blocks are adjacent
Sorting a free block into a specific-length freelist often has benefits, but in most practical implementations, coalescing is a priority so that an alloc() request for a different size block isn't inappropriately denied when there are many differently-sized free blocks.

Related

Is memory fragmentation possible when malloc and free functions are called in order?

I am doing an embedded system app in C and I was wondering of the following.
If one uses malloc and free functions in order, do you still have to worry about memory fragmentation?
Example:
malloc(a)
malloc(b)
malloc(c)
free(c)
free(b)
free(a)
Thanks for the help.
It is almost certain that the memory allocator will perform free block aggregation, so that adjacent free blocks will be aggregated into a single block. There is no guarantee of that, but that would be conventional if not required behaviour.
There is of course no guarantee that the blocks a, b and c in your example are adjacent, but either way, on deallocation of all three, the heap will be in the same state as it was before allocation, in that sense it actually makes no difference what the order of deallocation is.
It is only the intermediate state that will be fragmented. For example if the blocks are adjacent as follows:
aaaaaaaaaaabbbbbbbbbbbbbbcccccccccccccc
and you deallocate b before a or c, you will have two non-adjacent free blocks until either a or c are deallocated.
aaaaaaaaaaa--------------cccccccccccccc
However it is academic since as I said there is no guarantee of allocation of adjacent blocks in the first instance. If you want to be sure of the behaviour, you would allocate one large block (statically or from the heap using malloc) and then allocate from that using your own allocator with an implementation that meets your application requirements.
In your case you should not see memory fragmentation occurring. You can verify that it is not occurring by doing your malloc/free sequence several times, say 1000. If malloc returns the same values for a, b, and c, then I think you can conclude that no fragmentation is occurring.

Opinions and suggestions regarding my approach to first fit malloc function

I'm writing a malloc function for a college assignment. Here's a basic layout of my idea:
1)Define a node struct with pointers to previous node, next node, as well as a char for size and vacancy. Each region in the heap will contain a hidden node with this information.
2)Malloc function. Starting with first node loop through each node checking for vacancy. If a node is vacant and is large enough return a ptr to the beginning of the region not including the node. If no space is available use sbrk to allocate requested space PLUS space for a node.
3)Free function. Go to pointer passed as parameter-sizeof(struct node) and set the vacancy to vacant. Then starting with the beginning of the list, traverse the list merging adjacent free spaces.
How does this approach sound? My main concern is with actually starting the linked list. For instance, should I create a node with sbrk before I start to do any allocations and store a ptr to it as a global variable? If so how do I initialize a first node before I allow the malloc function to be called by a driver program?
Thanks in advance. I'm not asking for someone to write my code, only to provide some insight and suggestions regarding my ideas.
I would avoid keeping all the bookkeeping information on nodes while they're allocated. I'd have the bare minimum of information (usually just the block size) at the beginning of the block, but nothing more.
I'd track free blocks and allocated blocks separately, so when you're searching for a free block, you don't waste time on blocks that are already in use.
I'd separate the free list into two pieces, and coalesce blocks lazily. In other words, have one free list you're allocating from, and a second that's just a holding area. When the user calls free, just link the block into the holding area, nothing more. When the list you're using for allocations starts to run low, sort the blocks in the holding area by address, then merge with the allocation free list. Then walk the list and merge adjacent blocks.
When you do need to call sbrk (or whatever) to allocate more space from the system, do not just allocate enough space to satisfy the current allocation request. Instead, allocate a fairly large block (e.g., a megabyte) and then split that to get satisfy the request, and add the rest as block to the free list. If you're running low enough on memory that you have to go to sbrk once, chances are the next few calls will do the same, so you might as well be greedy, and grab enough memory immediately to stand a decent chance of satisfying more requests as well.
The basic idea of the third is to avoid doing coalescing as long as possible to increase the chances of finding adjacent blocks, so when you do coalesce you'll probably do some real good, and avoid wasting time trying to coalesce when there are only a few adjacent blocks free.

Is it better to use LIFO order or FIFO order when keeping a list of freed blocks in a dynamic memory allocator?

I'm trying to implement malloc() in C for class, and I can't decide whether a block should be added to the end of the free list or the head of the free list. Which would be better, and why? The list I'm using is a doubly linked list and (for now) is unordered.
Without running a benchmark, the most likely choice to give best performance is FIFO, i.e. put freed blocks at the head of the free list.
This is because FIFO is most likely to provide temporal locality of reference, because a just-freed block is more likely to reside in a CPU cache than a block freed earlier and not used for a longer period of time.
The difference between the two shouldn't be obvious (if there is one): the order in which blocks are allocated and freed depends of a user (the programmer who's using your malloc), thus you can consider it as random.
Make at least an ordered list by sizes.
Take a look at some other techniques if you really want something fast, For instance, implement a buddy system.

What is the advantage of organizing a free list in address order as opposed to LIFO order?

I'm implementing malloc() in C, and right now my free list is maintained in LIFO order (i.e. newly freed blocks are added to the beginning of the list) and my allocator uses a first fit algorithm to search for free blocks of memory. My textbook mentions that maintaining the list in address order enjoys better memory utilization than a list in LIFO order in this situation, but I don't understand why and it doesn't explain. Eventually I'll implement a buddy system or something similar, but for now I just want to understand this.
Coalescing free blocks ("defragging") is easier if you have your free list sorted by address - joining two chunks into a larger chunk is essentially trivial then.

What happens when there is a request for memory block which is not a power of 2?

Suppose we do a malloc request for memory block of size n where 2 ^k !=n for k>0.
Malloc returns us space for that requestted memory block but how is the remainig buffer handled from the page. I read Pages are generally blocks of memory which are powers of two.
Wiki states the following:
Like any method of memory allocation, the heap will become fragmented; that is,
there will be sections of used and unused memory in the allocated
space on the heap. A good allocator will attempt to find an unused area
of already allocated memory to use before resorting to expanding the heap.
So my question is how is this tracked?
EDIT: How is the unused memory tracked when using malloc ?
This really depends on the specific implementation, as Morten Siebuhr pointed out already. In very simple cases, there might be a list of free, fixed-size blocks of memory (possibly all having the same size), so the unused memory is simply wasted. Note that real implementations will never use such simplistic algorithms.
This is an overview over some simple possibilities: http://www.osdcom.info/content/view/31/39/
This Wikipedia entry has several interesting links, including the one above: http://en.wikipedia.org/wiki/Dynamic_memory_allocation#Implementations
As a final remark, googling "malloc implementation" turns up a heap (pun intended) of valuable links.
A standard BSD-style memory allocator basically works like this:
It keeps a linked list of pre-allocated memory blocks for sizes 2^k for k<=12 (for example).
In reality, each list for a given k is composed of memory-blocks from different areas, see below.
A malloc request for n bytes is serviced by calculating n', the closest 2^k >= n, then looking up the first area in the list for k, and then returning the first free block in the free-list for the given area.
When there is no pre-allocated memory block for size 2^k, an area is allocated, an area being some larger piece of continuous memory, say a 4kB piece of memory. This piece of memory is then chopped up into pieces that are 2^k bytes. At the beginning of the continuous memory area there is book-keeping information such as where to find the linked list of free blocks within the area. A bitmap can also be used, but a linked list typically has better cache behavior (you want the next allocated block to return memory that is already in the cache).
The reason for using areas is that free(ptr) can be implemented efficiently. ptr & 0xfffff000 in this example points to the beginning of the area which contains the book-keeping structures and makes it possible to link the memory block back into the area.
The BSD allocator will waste space by always returning a memory block 2^k in size, but it can reuse the memory of the block to keep the free-list, which is a nice property. Also allocation is blazingly fast.
Modifications to the above general idea include:
Using anonymous mmap for large allocations. This shifts the work over to the kernel for handling large mallocs and avoids wasting a lot of memory in these cases.
The GNU version of malloc have special cases for non-power-of-two buckets. There is nothing inherent in the BSD allocator that requires returning 2^k memory blocks, only that there are pre-defined bucket sizes. The GNU allocator has more buckets and thus waste less space.
Sharing memory between threads is a tricky subject. Lock-contention during allocation is an important consideration, so in the GNU allocator for example will eagerly create extra areas for different threads for a given bucket size if it ever encounters lock-contention during allocation.
This varies a lot from implementation to implementation. Some waste the space, some sub-divide pages until they get the requested size (or close to it) &c.
If you are asking out of curiosity, I suggest you read the source code for the implementation in question,
If it's because of performance worries, try to benchmark it and see what happens.

Resources