Malloc/free own implementation - c

I'm currently trying to write my own implementation of malloc, and free.
During my research I've found some implementation which request free memoryspace with:
block = sbrk(totalSize);
then there is some other code
and finally they return:
return (block + 1);
But i don't understand why the + 1 is necessary.
Another thing i don't understand why some implementation have a magic number in their struct.
I already searched the web and stackoverflow but didn't find any answers to my question.

So you returned memory from your allocator. All's fine, the user does something with it, and gives your free a pointer. That's it, all you get is an address.
How are you supposed to know from an address alone:
That it was allocated by you to begin with?
That you haven't freed it already?
How big is the memory block it points at?
You must store some meta-data somewhere. The approach illustrated by the examples you described, is to store the meta-data right before the raw memory you give to the caller of malloc. That way, to retrieve it all you have to do is a simple bit of pointer arithmetic with the address you were handed in free.
After that, what meta-data to store is up to you. A magic number is one way to document that the following block was allocated by you. If its bit pattern is sufficiently "distinct" then you'll rarely try to free a block you haven't allocated yourself.

Story Teller has most of the .... story! Pun intended.
Two other points. First an often missed requirement of malloc() is to return aligned memory. malloc() isn't told what it's allocating and so is required to return a block with "the maximum alignment".
If your platform has even alignment (2 byte alignment) where things like int can't start on odd addresses (or just aren't efficient) then +1 might be rounding up. Though it doesn't make much sense to ever return an odd-length block in those circumstances.
Secondly, another smart debug feature is to put some familiar pattern at the end of blocks to check for buffer end overwrites (e.g. out by 1 errors).
I personally think 0xDEADC0D3 is a good block of 4 bytes but that's my sense of humour.

Related

How does free() function know how much bytes to deallocate and how to access that information with in our program? [duplicate]

In C programming, you can pass any kind of pointer you like as an argument to free, how does it know the size of the allocated memory to free? Whenever I pass a pointer to some function, I have to also pass the size (ie an array of 10 elements needs to receive 10 as a parameter to know the size of the array), but I do not have to pass the size to the free function. Why not, and can I use this same technique in my own functions to save me from needing to cart around the extra variable of the array's length?
When you call malloc(), you specify the amount of memory to allocate. The amount of memory actually used is slightly more than this, and includes extra information that records (at least) how big the block is. You can't (reliably) access that other information - and nor should you :-).
When you call free(), it simply looks at the extra information to find out how big the block is.
Most implementations of C memory allocation functions will store accounting information for each block, either in-line or separately.
One typical way (in-line) is to actually allocate both a header and the memory you asked for, padded out to some minimum size. So for example, if you asked for 20 bytes, the system may allocate a 48-byte block:
16-byte header containing size, special marker, checksum, pointers to next/previous block and so on.
32 bytes data area (your 20 bytes padded out to a multiple of 16).
The address then given to you is the address of the data area. Then, when you free the block, free will simply take the address you give it and, assuming you haven't stuffed up that address or the memory around it, check the accounting information immediately before it. Graphically, that would be along the lines of:
____ The allocated block ____
/ \
+--------+--------------------+
| Header | Your data area ... |
+--------+--------------------+
^
|
+-- The address you are given
Keep in mind the size of the header and the padding are totally implementation defined (actually, the entire thing is implementation-defined (a) but the in-line accounting option is a common one).
The checksums and special markers that exist in the accounting information are often the cause of errors like "Memory arena corrupted" or "Double free" if you overwrite them or free them twice.
The padding (to make allocation more efficient) is why you can sometimes write a little bit beyond the end of your requested space without causing problems (still, don't do that, it's undefined behaviour and, just because it works sometimes, doesn't mean it's okay to do it).
(a) I've written implementations of malloc in embedded systems where you got 128 bytes no matter what you asked for (that was the size of the largest structure in the system), assuming you asked for 128 bytes or less (requests for more would be met with a NULL return value). A very simple bit-mask (i.e., not in-line) was used to decide whether a 128-byte chunk was allocated or not.
Others I've developed had different pools for 16-byte chunks, 64-bytes chunks, 256-byte chunks and 1K chunks, again using a bit-mask to decide what blocks were used or available.
Both these options managed to reduce the overhead of the accounting information and to increase the speed of malloc and free (no need to coalesce adjacent blocks when freeing), particularly important in the environment we were working in.
From the comp.lang.c FAQ list: How does free know how many bytes to free?
The malloc/free implementation remembers the size of each block as it is allocated, so it is not necessary to remind it of the size when freeing. (Typically, the size is stored adjacent to the allocated block, which is why things usually break badly if the bounds of the allocated block are even slightly overstepped)
This answer is relocated from How does free() know how much memory to deallocate? where I was abrubtly prevented from answering by an apparent duplicate question. This answer then should be relevant to this duplicate:
For the case of malloc, the heap allocator stores a mapping of the original returned pointer, to relevant details needed for freeing the memory later. This typically involves storing the size of the memory region in whatever form relevant to the allocator in use, for example raw size, or a node in a binary tree used to track allocations, or a count of memory "units" in use.
free will not fail if you "rename" the pointer, or duplicate it in any way. It is not however reference counted, and only the first free will be correct. Additional frees are "double free" errors.
Attempting to free any pointer with a value different to those returned by previous mallocs, and as yet unfreed is an error. It is not possible to partially free memory regions returned from malloc.
On a related note GLib library has memory allocation functions which do not save implicit size - and then you just pass the size parameter to free. This can eliminate part of the overhead.
The heap manager stored the amount of memory belonging to the allocated block somewhere when you called malloc.
I never implemented one myself, but I guess the memory right in front of the allocated block might contain the meta information.
The original technique was to allocate a slightly larger block and store the size at the beginning, then give the application the rest of the blog. The extra space holds a size and possibly links to thread the free blocks together for reuse.
There are certain issues with those tricks, however, such as poor cache and memory management behavior. Using memory right in the block tends to page things in unnecessarily and it also creates dirty pages which complicate sharing and copy-on-write.
So a more advanced technique is to keep a separate directory. Exotic approaches have also been developed where areas of memory use the same power-of-two sizes.
In general, the answer is: a separate data structure is allocated to keep state.
malloc() and free() are system/compiler dependent so it's hard to give a specific answer.
More information on this other question.
To answer the second half of your question: yes, you can, and a fairly common pattern in C is the following:
typedef struct {
size_t numElements
int elements[1]; /* but enough space malloced for numElements at runtime */
} IntArray_t;
#define SIZE 10
IntArray_t* myArray = malloc(sizeof(intArray_t) + SIZE * sizeof(int));
myArray->numElements = SIZE;
to answer the second question, yes you could (kind of) use the same technique as malloc()
by simply assigning the first cell inside every array to the size of the array.
that lets you send the array without sending an additional size argument.
When we call malloc it's simply consume more byte from it's requirement. This more byte consumption contain information like check sum,size and other additional information.
When we call free at that time it directly go to that additional information where it's find the address and also find how much block will be free.

c malloc functionality for custom memory region

Is there any malloc/realloc/free like implementation where i can specify a memory region where to manage the memory allocation?
I mean regular malloc (etc.) functions manages only the heap memory region.
What if I need to allocate some space in a shared memory segment or in a memory mapped file?
Not 100 %, As per your question you want to maintain your own memory region. so you need to go for your own my_malloc, my_realloc and my_free
Implementing your own my_malloc may help you
void* my_malloc(int size)
{
char* ptr = malloc(size+sizeof(int));
memcpy(ptr, &size, sizeof(int));
return ptr+sizeof(int);
}
This is just a small idea, full implementation will take you to the
answer.
Refer this question
use the same method to achieve my_realloc and my_free
I asked myself this question recently too, because I wanted a malloc implementation for my security programs which could safely wipe out a static memory region just before exit (which contains sensitive data like encryption keys, passwords and other such data).
First, I found this. I thought it could be very good for my purpose, but I really could not understand it's code completely. The license status was also unclear, as it is very important for one of my projects too.
I ended up writing my own.
My own implementation supports multiple heaps at same time, operating over them with pool descriptor structure, automatic memory zeroing of freed blocks, undefined behavior and OOM handlers, getting exact usable size of allocated objects and testing that pointer is still allocated, which is very sufficient for me. It's not very fast and it is more educational grade rather than professional one, but I wanted one in a hurry.
Note that it does not (yet) knows about alignment requirements, but at least it returns an address suitable for storing an 32 bit integer.
Iam using Tasking and I can store data in a specific space of memory. For example I can use:
testVar _at(0x200000);
I'm not sure if this is what you are looking for, but for example I'am using it to store data to external RAM. But as far as I know, it's only workin for global variables.
It is not very hard to implement your own my_alloc and my_free and use preferred memory range. It is simple chain of: block size, flag free/in use, and block data plus final-block marker (e.g. block size = 0). In the beginning you have one large free block and know its address. Note that my_alloc returns the address of block data and block size/flag are few bytes before.

Dynamic Memory Allocation

How malloc() stores metadata?
void* p;
void* q;
p = malloc(sizeof(char));
q = malloc(sizeof(int));
I know that the return value p[0] points to the start of allocated block of memory,
than if I iterate and print
p[-1], p[-2].... q[-1], q[-2]....
or p[1], p[2], p[3], p[4]....
or q[1], q[2], q[3], q[4]....
I find some value that help malloc() to store data howether I can t understand
precisely what that metadata means..I only know that some of them are for block size, for the adress of the next free block but i can t find on the web nothing more
Please, Can you give me some detailed explanation of those value?
How this metadata works and is used depends entirely on the memory management in your libc. Here are some useful writeups to get you started:
Malloc - Typically, classic malloc.
DLMalloc - Doug Lee's Malloc.
GC Malloc - Malloc with garbage collection.
TC Malloc - Thread caching malloc.
Each of these has different aims, benefits and possible deficiencies. For example, perhaps you are concerned about possible heap overflow issues and protections. This may lead to one choice. Perhaps you are looking for better fragment management. This might lead to the selection of Doug Lee's malloc. You really need to specify which library you are using or research them all to understand how the metadata is used to maintain bins, coalesce adjustment free regions, etc.
David Hoelzer has an excellent answer, but I wanted to follow it up a touch more:
"Meta Data" for malloc is 100% implementation driven. Here is a quick overview of some implementations I have written or used:
The meta data stores "next" block pointers.
It stores canary values to make sure you haven't written passed the end of a block
There is no meta data at all because all the blocks are the same size and they are placed in a stack to allow O(1) access - A Unit Allocator. In this case, you'd hit memory from an adjacent block.
The data stores next block ptr, prev block ptr, alignment id, block size, and an identifier that tells a memory debugger exactly where the allocation occured. Very useful for leak tracking.
It hits the previous block's info because the data is stored at the tail of the block instead of the head...
Or mix and match all the above. In any case, reading mem[-1] is a bad idea, and in some cases (thinking embedded systems), could indeed cause a segment fault on only a read if said read happen to address beyond the current memory page and into a forbidden area of memory.
Update as per OP
The 4th scheme I described is one that has quite a bit of information per block - 16-bytes of information not being uncommon as that size won't throw off common alignment schemes. It would contain a 4-byte pointer to the next allocated block, a 4-byte pointer to the previous, an identifier for alignment - a single byte can handle this directly for common sizes of 0-256 byte alignement, but that byte could also represent 256 possible alignment enum values instead, 3 bytes of pad or canary values, and a 4-byte unique identifier to identify where in the code the call was made, though it could be made smaller. This can be a lookup value to a debug table that contains __file__, __line__, and whatever other info you wish to save with the alloction.
This would be one of the heavier varieties as it has a lot of information that needs to be updated when values are allocated and freed.
It is illegal to access p[-1], etc in your example. The results will be undefined and maybe cause memory corruption, or segmentation fault. In general you don't get any control over malloc() at all or information about what it is doing.
That said, some malloc() "replacement" libraries will give you finer grained control or information - you link these into your binary "on top" of the system malloc().
The C standard does not specify how malloc stores its metadata (or even if it will have accessible metadata); therefore, the metadata format and location are implementation-dependent. Portable code must therefore never attempt to access or parse the metadata.
Some malloc implementations provide, as an extension, functions like malloc_usable_size or malloc_size which can tell you the size of allocated blocks. While the presence of these functions is also implementation dependent, they are at least reliable and correct ways to get the information you need if they are present.

Is the bookkeeping of allocated memory blocks redundant?

When we use malloc() we provide a size in byte.
When we use free() we provide nothing.
This is because the OS of course knows about it already, it must have stored the information somewhere.
By the way, also our software must remember how many memory blocks it has requested, so that we can (for instance) safely iterates starting from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it? And if not, why not?
When we use malloc() we provide a size in byte. When we use free() we
provide nothing. This is because the OS of course knows about it
already, it must have stored the information somewhere.
Even though it gives you memory and it keeps track of what memory range belongs to your process, the OS doesn't concern itself with the internal details of your memory. malloc stores the size of the allocated chunk in its own place, also reserved inside your process (usually, it's a few bytes before the logical address returned by malloc). free simply reads that reserved information and deallocates automatically.
By the way, also our software must remember how many memory blocks it
has requested, so that we can (for instance) safely iterates starting
from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS
the size of the memory pointed by a given pointer since it knows it?
And if not, why not?
Given the above, it is redundant to store that information, yes. But you pretty much have to store it, because the way malloc does its book-keeping is an implementation detail.
If you know how your particular implementation works and you want to take that risk for your software, you are free (no pun intended) to do it. If you don't want to base your logic on an implementation detail (and you'd be right not to want to), you'll have to do this redundant book-keeping side-by-side with malloc's own book-keeping.
No, it's not redundant. malloc() manages, in cooperation with free() and a few other functions, a zillion tiny, individually addressed blocks within relatively large blocks which are generally obtained with sbrk(). The OS only knows about the large range(s), and has no clue which tiny block within it are in use or not. To add to the differences, sbrk() only lets you move the end of your data segment, not split it into parts to free independently. Though one could allocated memory using sbrk exclusively, you would be unable to free arbitrary chunks for reuse, or coalesce smaller chunks into larger ones, or split chunks without writing a bunch of bookkeeping code for this purpose - which ends up essentially being the same as writing malloc. Additionally, using malloc/free/... allows you to call sbrk only rarely, which is a performance bonus since sbrk is a system call with special overhead.
When we use free() we provide nothing.
Not quite true; we provide the pointer that was returned by malloc.
Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it?
Nope. Pointers are simply addresses; apart from their type, they carry no information about the size of the object they point to. How malloc/calloc/realloc and free keep track of object sizes and allocated vs. free blocks is up to the individual implementation; they may reserve some space immediately before the allocated memory to store the size, they may build an internal map of addresses and sizes, or they may do something else completely.
It would be nice if you could query a pointer for the size of the object it points to; unfortunately, that's simply not a feature of the language.

How does free know how much to free?

In C programming, you can pass any kind of pointer you like as an argument to free, how does it know the size of the allocated memory to free? Whenever I pass a pointer to some function, I have to also pass the size (ie an array of 10 elements needs to receive 10 as a parameter to know the size of the array), but I do not have to pass the size to the free function. Why not, and can I use this same technique in my own functions to save me from needing to cart around the extra variable of the array's length?
When you call malloc(), you specify the amount of memory to allocate. The amount of memory actually used is slightly more than this, and includes extra information that records (at least) how big the block is. You can't (reliably) access that other information - and nor should you :-).
When you call free(), it simply looks at the extra information to find out how big the block is.
Most implementations of C memory allocation functions will store accounting information for each block, either in-line or separately.
One typical way (in-line) is to actually allocate both a header and the memory you asked for, padded out to some minimum size. So for example, if you asked for 20 bytes, the system may allocate a 48-byte block:
16-byte header containing size, special marker, checksum, pointers to next/previous block and so on.
32 bytes data area (your 20 bytes padded out to a multiple of 16).
The address then given to you is the address of the data area. Then, when you free the block, free will simply take the address you give it and, assuming you haven't stuffed up that address or the memory around it, check the accounting information immediately before it. Graphically, that would be along the lines of:
____ The allocated block ____
/ \
+--------+--------------------+
| Header | Your data area ... |
+--------+--------------------+
^
|
+-- The address you are given
Keep in mind the size of the header and the padding are totally implementation defined (actually, the entire thing is implementation-defined (a) but the in-line accounting option is a common one).
The checksums and special markers that exist in the accounting information are often the cause of errors like "Memory arena corrupted" or "Double free" if you overwrite them or free them twice.
The padding (to make allocation more efficient) is why you can sometimes write a little bit beyond the end of your requested space without causing problems (still, don't do that, it's undefined behaviour and, just because it works sometimes, doesn't mean it's okay to do it).
(a) I've written implementations of malloc in embedded systems where you got 128 bytes no matter what you asked for (that was the size of the largest structure in the system), assuming you asked for 128 bytes or less (requests for more would be met with a NULL return value). A very simple bit-mask (i.e., not in-line) was used to decide whether a 128-byte chunk was allocated or not.
Others I've developed had different pools for 16-byte chunks, 64-bytes chunks, 256-byte chunks and 1K chunks, again using a bit-mask to decide what blocks were used or available.
Both these options managed to reduce the overhead of the accounting information and to increase the speed of malloc and free (no need to coalesce adjacent blocks when freeing), particularly important in the environment we were working in.
From the comp.lang.c FAQ list: How does free know how many bytes to free?
The malloc/free implementation remembers the size of each block as it is allocated, so it is not necessary to remind it of the size when freeing. (Typically, the size is stored adjacent to the allocated block, which is why things usually break badly if the bounds of the allocated block are even slightly overstepped)
This answer is relocated from How does free() know how much memory to deallocate? where I was abrubtly prevented from answering by an apparent duplicate question. This answer then should be relevant to this duplicate:
For the case of malloc, the heap allocator stores a mapping of the original returned pointer, to relevant details needed for freeing the memory later. This typically involves storing the size of the memory region in whatever form relevant to the allocator in use, for example raw size, or a node in a binary tree used to track allocations, or a count of memory "units" in use.
free will not fail if you "rename" the pointer, or duplicate it in any way. It is not however reference counted, and only the first free will be correct. Additional frees are "double free" errors.
Attempting to free any pointer with a value different to those returned by previous mallocs, and as yet unfreed is an error. It is not possible to partially free memory regions returned from malloc.
On a related note GLib library has memory allocation functions which do not save implicit size - and then you just pass the size parameter to free. This can eliminate part of the overhead.
The heap manager stored the amount of memory belonging to the allocated block somewhere when you called malloc.
I never implemented one myself, but I guess the memory right in front of the allocated block might contain the meta information.
The original technique was to allocate a slightly larger block and store the size at the beginning, then give the application the rest of the blog. The extra space holds a size and possibly links to thread the free blocks together for reuse.
There are certain issues with those tricks, however, such as poor cache and memory management behavior. Using memory right in the block tends to page things in unnecessarily and it also creates dirty pages which complicate sharing and copy-on-write.
So a more advanced technique is to keep a separate directory. Exotic approaches have also been developed where areas of memory use the same power-of-two sizes.
In general, the answer is: a separate data structure is allocated to keep state.
malloc() and free() are system/compiler dependent so it's hard to give a specific answer.
More information on this other question.
To answer the second half of your question: yes, you can, and a fairly common pattern in C is the following:
typedef struct {
size_t numElements
int elements[1]; /* but enough space malloced for numElements at runtime */
} IntArray_t;
#define SIZE 10
IntArray_t* myArray = malloc(sizeof(intArray_t) + SIZE * sizeof(int));
myArray->numElements = SIZE;
to answer the second question, yes you could (kind of) use the same technique as malloc()
by simply assigning the first cell inside every array to the size of the array.
that lets you send the array without sending an additional size argument.
When we call malloc it's simply consume more byte from it's requirement. This more byte consumption contain information like check sum,size and other additional information.
When we call free at that time it directly go to that additional information where it's find the address and also find how much block will be free.

Resources