malloc memory allocation scheme in C - c

I was experimenting with malloc in C and I have observed that malloc is wasting some space after some memory has been allocated. Below is the piece of code I used to test malloc
#include <stdlib.h>
#include <string.h>
int main(){
char* a;
char* b;
a=malloc(2*sizeof(char));
b=malloc(2*sizeof(char));
memset(a,9,2);
memset(b,9,2);
return 0;
}
In the right-middle of the following picture(open the image in a new tab for clarity) you can see the memory contents;0x804b008 is the address pointed by variable 'a' and 0x804b018 is the memory pointed by variable 'b'. what is happening to memory between from 0x804b00a 0x804b017? The thing is even if I try to allocate 3*sizeof(char) instead of 2*sizeof(char) bytes of memory the memory layout is the same! So, is there something I am missing?

malloc() is allowed to waste as much space as it wants to - the standard doesn't specify anything about the implementation. The only guarantee you have is about alignment (ยง7.20.3 Memory management functions):
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
Your implementation appears to return you minimum-8-byte-aligned pointers.

Memory Alignment! It's good for perfomance in x86 and mandatory in some architectures like ARM.
Most CPUs require that objects and variables reside at particular offsets in the system's memory. For example, 32-bit processors require a 4-byte integer to reside at a memory address that is evenly divisible by 4. This requirement is called "memory alignment". Thus, a 4-byte int can be located at memory address 0x2000 or 0x2004, but not at 0x2001. On most Unix systems, an attempt to use misaligned data results in a bus error, which terminates the program altogether. On Intel processors, the use of misaligned data is supported but at a substantial performance penalty. Therefore, most compilers automatically align data variables according to their type and the particular processor being used. This is why the size that structs and classes occupy is often larger than the sum of their members'
http://www.devx.com/tips/Tip/13265

The heap is handled by the implementation, not necessarily as you expect. The Standard explicitly doesn't guarantee anything about order or contiguity. There are two main things that cause more heap space to be used than you asked for.
First, allocated memory has to be aligned so that it's suitable for use by any sort of object. Typically, computers expect primitive data objects of N bytes to be allocated at a multiple of N, so the odds are you can't get malloc() to return a value that isn't a multiple of 8.
Second, the heap needs to be managed, so that free() allows reuse of the memory. This means that the heap manager needs to keep track of allocated and unallocated blocks, and their sizes. One practice is to stick some information in memory just before each block, so the manager can know what size block to free and where blocks are that might be reused. If that's what your system does, there will be more memory used between allocated blocks, and given alignment restrictions at 8 bytes it's likely you can't get allocations of less than 16 bytes.

Most modern malloc() implementations allocate in powers of two and have a minimum allocation size, to reduce fragmentation since oddball sizes could generally only be reused when enough contiguous allocations are free()d to make larger blocks. (It also speeds up coalescing contiguous allocations in general, IIRC.) Also keep in mind the block overhead; to get the block size you need to add some amount (8 in GNU malloc(), IIRC) for internal management uses.

malloc is only guaranteed to return you a block of memory that's at least as big as the size you give it. However, processors are generally more efficient when they're operating on blocks of memory that start at multiples of, say, 8 bytes in memory. Look up word size for more information on this.

Related

Why does malloc(1) work for storing 4 byte integers?

From what I understand, malloc(x) returns a block of memory x bytes long.
So to store a 4 byte integer, I would do:
int *p = (int *)malloc(4);
*p = 100;
Because sizeof(int) returns 4 for me.
However, if I do:
int *p = (int *)malloc(1);
*p = 100;
It seems to work exactly the same, with no issues storing the value.
Why does the amount of memory requested with malloc() not seem to matter? Shouldn't a 4 byte integer require malloc(4)?
If this works in your case it just works by chance and is not guaranteed to work. It is undefined behavior (compare this SO question) and everything can happen.
What did you expect to happen? Your program crash?
That might still happen if you call malloc and free a bit more often. malloc often takes some bytes more than requested and uses the extra space for Managing (linked list of all memory blocks, sizes of memory blocks). If you write some bytes before or after your allocated block then chances are high that you mess with the internal management structures and that subsequent malloc of free will crash.
If malloc internally always allocates a minimum of n bytes then your program might only crash if you access byte n+1. Also the operating system normally only be protects memory based on pages. If a page has a size of 512 bytes and your malloc-ed byte is in the middle of a page then your process might be able to read-write the rest of the page and will only crash when accessing the next memory page. But remember: even if this works it is undefined behavior.
malloc as all memory block allocation functions from C runtime or OS Kernel are optimized for memory access and object alignment.
Moreover, malloc specifically, allocate an hidden control block in front of the allocated space to keep track of the allocation (space required, space allocated, etc).
malloc must also to guarantee that the allocated memory address is suitably aligned for any storage object, this means that the block will start on an 8, 16, 32 or even 64 or 128 bytes boundary depending on the processor and generically from hardware (i.e. some special MMU). The boundary is also dependent on access speed, some processor have different behavior with different memory accesses (1, 2, 4, 8, ... bytes) and address boundaries. This constraints drive malloc code spec and allocator logical memory blocks partitions.
On practical side lets consider an allocator for X86 processor, it generally give back a block aligned on an 8 bytes boundary (32 bits code), that is useful for int, float and even doubles. To do this malloc divides the available memory arena in 'blocks' that are the minimal allocation space. When you allocate even 1 byte the function allocates at least one block. Eventually this block can host an integer, or even a double, but it is implementation dependent, and you can't consider it deterministic, because in future versions of the same function the behavior can change.
Now that, I hope, is clear because your code seems to work, keep well in mind that this is Undefined-Behavior and you must keep it for that. IT can work now, not with the next revision, it can crash on some hardware and not on another processor or machine.
For this we should know how malloc function works internally. To allocate memory dynamically, each operating system make use of system calls. We can dynamically allocate memory using these system calls. These system calls are different from one OS to the other.
So the system calls of one OS might not work for the other OS. And moreover if we are using system calls to allocate memory dynamically then our program will be platform dependent. So to avoid this dependency we use malloc function. Now it is the responsibility of malloc function to make the appropriate system calls based on the OS to allocate memory dynamically.
So malloc itself invokes system calls and it will be a very slow process because each time we are asking for dynamic memory it has to make use of system calls. To avoid this whenever we request dynamic memory it usually allocate extra memory so that next time the system call can be avoided and the remaining chunk of previously allocated memory can be used. And that's why your program is working as malloc is allocating extra memory.
The C programming language gives you the ability to shoot yourself in the foot.
It intentionally burdens the programmer with the charge that they ought to know what they are doing. Broadly speaking the reasons are to achieve performance, readability, and portability.
The behaviour of your code is undefined. If you ask for 1 byte then expect to get only one usable byte back. The fact that the operating system and C runtime library seems to be giving you back a little more than that is no more than a curious peculiarity.
On other occasions the compiler might simply eat your cat.
Finally, use sizeof in the call to malloc rather than hardcoding the size of an int type: on many systems sizeof(int) is 2, 4 is commonplace, and all values greater than 1 are allowed by the standard. In your case, using either sizeof(int) or sizeof(*p) is possible. Some folk prefer the latter since then you're not hardcoding the type of the variable in the sizeof call, so guarding you against possible variable type changes. (Note that sizeof(*p) is compile-time evaluable and uses static type information; hence it can be used before p itself "exists", if you get my meaning.)
It seems to work exactly the same, with no issues storing the value.
You invoke undefined behavior with your code, so you cannot tell that it works. In order to allocate memory for an integer, you should do:
int *p;
p = malloc(sizeof (*p) ); //you can use sizeof(*p) as p is already declared and here you use the size of its content, which is actually the size of an int
if (p != NULL)
*p = 100;
Simply, when you are allocating 1 byte for the int, there are 3 bytes following it that are not actually allocated for it, but you can still use them. You are getting lucky that these aren't being changed by something else during your tests, and that it isn't overwriting anything important (or maybe it is). So essentially this will cause an error once those 3 bytes are needed by something else -- always malloc the right size.
Usually malloc is implemented such a way that it allocates memory with the size not less than the size of the paragraph that is equal to 16 bytes.
So when you require for example 4 bytes of memory malloc actually allocates 16 bytes. However this behavior does not described in the C Standard and you may not rely on it. As result it means that the program you showed has undefined behavior.
I think this is because of padding, and even though you are calling malloc(1) padding bytes are coming with the memory.
please check this link http://www.delorie.com/gnu/docs/glibc/libc_31.html

Determining size of array allocated via HeapAlloc()

I'm using WinAPI's HeapAlloc() function for allocating memory, and I want to find out the size of it somewhere else in my code. Do I have to keep track of the sizes myself or is there another way?
HeapAlloc rounds up allocations to the nearest alignment. If you ask for 2 bytes, it will give you at least two bytes, but may give you more. As the documentation says:
If the HeapAlloc function succeeds, it allocates at least the amount of memory requested.
The specific alignment that HeapAlloc uses is not documented, but if I remember correctly, all of the Heap Manager APIs use an 8-byte alignment on 32-bit x86 and a 16-byte alignment on 64-bit x86. This old knowledge base article jives with my recollection. Of course, because it is not explicitly documented and subject to change in future versions of Windows and/or on different architectures, you should not rely on hard-coded alignment values.
However, if these functions do allocate more memory than request, the caller is free to use all of that memory. To determine the actual size of the allocation, you call the HeapSize function. Raymond Chen blogged about this some time ago.
So the behavior you're seeing actually makes sense, even though you are going about making the determination in entirely the wrong way. As has been pointed out in the comments already, sizeof doesn't tell you the size of the allocation. You need HeapSize to do that. All sizeof tells you is the size of the element matches[lastMeal] at compile time, which is also 8 bytes instead of 2 for alignment reasons.
As for your edit: best practice is to track this information yourself. Whenever you pass a pointer, pass the size of the allocation along with it. Note that this should be the expected size of the allocation (the 2 bytes that you asked for), not the actual size of the allocation (the 8 bytes that the Heap Manager returned for internal alignment considerations). When you free the memory by calling HeapFree, it knows how big the actual allocation was and will free it as necessary. There is, however, no way for your client code to determine the size of the allocation that you initially requested, which is why you need to track it yourself.

What happens when blocks are freed from heap by free()?

So I have allocated 256 blocks in heap:
char* ptr1 = malloc(128);
char* ptr2 = malloc(128);
Now after I free ptr2 which I assume currently lies on top of the heap, the program break(the current location of the heap) does not decrease. However if I do another malloc the address returned by malloc is the same as the one that is freed.
So I have the following questions:
When I free a block why does not the program break decrease?
When I call free what exactly happens?How does it keep track of the freed memory so that next time I declare malloc the address is the same?
It's unspecified behavior. You can not rely on any single answer, unless you only care about one particular platform/os/compiler/libc combination. You did not specify an OS, and the C standard does not describe, or require any particular implementation. From C99 (I don't have the final published version of C11 yet):
7.20.3
The order and contiguity of storage allocated by successive calls to
the calloc, malloc, and realloc functions is unspecified. The pointer
returned if the allocation succeeds is suitably aligned so that it may
be assigned to a pointer to any type of object and then used to access
such an object or an array of such objects in the space allocated
(until the space is explicitly deallocated). The lifetime of an
allocated object extends from the allocation until the deallocation.
Each such allocation shall yield a pointer to an object disjoint from
any other object. The pointer returned points to the start (lowest
byte address) of the allocated space. If the space cannot be
allocated, a null pointer is returned. If the size of the space
requested is zero, the behavior is implementation- defined: either a
null pointer is returned, or the behavior is as if the size were some
nonzero value, except that the returned pointer shall not be used to
access an object.
This manual of GNU libc , might be of help.
Here's the gist
Occasionally, free can actually return memory to the operating system
and make the process smaller. Usually, all it can do is allow a later
call to malloc to reuse the space. In the meantime, the space remains
in your program as part of a free-list used internally by malloc.
When I free a block why does not the program break decrease?
I believe it doesn't decrease because that memory has already been given to the program.
When I call free() what exactly happens?
That section of memory is marked as allocatable, and its previous contents can be overwritten.
Consider this example...
[allocatedStatus][sideOfAllocation][allocatedMemory]
^-- Returned pointer
Considering this, the free() can then mark the [allocatedStatus] to false, so future allocations on the heap can use that memory.
How does it keep track of the free()d memory so that next time I declare
malloc() the address is the same?
I don't think it does. It just scanned for some free memory and found that previous block that had been marked as free.
Here is a rough idea how memory allocators work:
You have an allocator that has a bunch of "bins" ("free lists") which are just linked lists of free memory blocks. Each bin has a different block size associated with it (I.e.: you can have a list for 8 byte blocks, 16 byte blocks, 32 byte blocks, etc... Even arbitrary sizes like 7 or 10 byte blocks). When your program requests memory (usually through malloc()) the allocator goes to the smallest bin that would fit your data and checks to see if there are any free memory blocks in it. If not then it will request some memory from the OS (usually called a page) and cuts the block it gets back into a bunch of smaller blocks to fill the bin with. Then it returns one of these free blocks to your program.
When you call free, the allocator takes that memory address and puts it back into the bin (aka free list) it came from and everybody is happy. :)
The memory is still there to use so you don't have to keep paging memory, but with respect to your program it is free.
I believe it's entirely up to the operating system once you call free(), it may choose to immediately reclaim that memory or not care and just mark that memory segment as a possible acquisition for a later time (likely the same thing). To my knowledge that memory (if significant) shows up as available in the task manager right after free() on windows.
Keep in mind that the memory we are talking about here is virtual. So that means the operating system can tell you anything it wants and is likely not an accurate representation of the physical state of the machine.
Think about how you would manage memory allocation if you were writing an OS, you likely wouldn't want to do anything hasty that may waste resources. We are talking about 128 bytes here, would you want to waste valuable processing time handling it alone? It may be the reason for that behavior or not, at least plausible.
Do it in a loop and then free() in another loop or just allocate big chunks of memory, see what happens, experiment.

How to find how much memory is actually used up by a malloc call?

If I call:
char *myChar = (char *)malloc(sizeof(char));
I am likely to be using more than 1 byte of memory, because malloc is likely to be using some memory on its own to keep track of free blocks in the heap, and it may effectively cost me some memory by always aligning allocations along certain boundaries.
My question is: Is there a way to find out how much memory is really used up by a particular malloc call, including the effective cost of alignment, and the overhead used by malloc/free?
Just to be clear, I am not asking to find out how much memory a pointer points to after a call to malloc. Rather, I am debugging a program that uses a great deal of memory, and I want to be aware of which parts of the code are allocating how much memory. I'd like to be able to have internal memory accounting that very closely matches the numbers reported by top. Ideally, I'd like to be able to do this programmatically on a per-malloc-call basis, as opposed to getting a summary at a checkpoint.
There isn't a portable solution to this, however there may be operating-system specific solutions for the environments you're interested in.
For example, with glibc on Linux, you can use the mallinfo() function from <malloc.h> which returns a struct mallinfo. The uordblks and hblkhd members of this structure contains the dynamically allocated address space used by the program including book-keeping overhead - if you take the difference of this before and after each malloc() call, you will know the amount of space used by that call. (The overhead is not necessarily constant for every call to malloc()).
Using your example:
char *myChar;
size_t s = sizeof(char);
struct mallinfo before, after;
int mused;
before = mallinfo();
myChar = malloc(s);
after = mallinfo();
mused = (after.uordblks - before.uordblks) + (after.hblkhd - before.hblkhd);
printf("Requested size %zu, used space %d, overhead %zu\n", s, mused, mused - s);
Really though, the overhead is likely to be pretty minor unless you are making a very very high number of very small allocations, which is a bad idea anyway.
It really depends on the implementation. You should really use some memory debugger. On Linux Valgrind's Massif tool can be useful. There are memory debugging libraries like dmalloc, ...
That said, typical overhead:
1 int for storing size + flags of this block.
possibly 1 int for storing size of previous/next block, to assist in coallescing blocks.
2 pointers, but these may only be used in free()'d blocks, being reused for application storage in allocated blocks.
Alignment to an approppriate type, e.g: double.
-1 int (yes, that's a minus) of the next/previous chunk's field containing our size if we are an allocated block, since we cannot be coallesced until we're freed.
So, a minimum size can be 16 to 24 bytes. and minimum overhead can be 4 bytes.
But you could also satisfy every allocation via mapping memory pages (typically 4Kb), which would mean overhead for smaller allocations would be huge. I think OpenBSD does this.
There is nothing defined in the C library to query the total amount of physical memory used by a malloc() call. The amount of memory allocated is controlled by whatever memory manager is hooked up behind the scenes that malloc() calls into. That memory manager can allocate as much extra memory as it deemes necessary for its internal tracking purposes, on top of whatever extra memory the OS itself requires. When you call free(), it accesses the memory manager, which knows how to access that extra memory so it all gets released properly, but there is no way for you to know how much memory that involves. If you need that much fine detail, then you need to write your own memory manager.
If you do use valgrind/Massif, there's an option to show either the malloc value or the top value, which differ a LOT in my experience. Here's an excerpt from the Valgrind manual http://valgrind.org/docs/manual/ms-manual.html :
...However, if you wish to measure all the memory used by your program,
you can use the --pages-as-heap=yes. When this option is enabled,
Massif's normal heap block profiling is replaced by lower-level page
profiling. Every page allocated via mmap and similar system calls is
treated as a distinct block. This means that code, data and BSS
segments are all measured, as they are just memory pages. Even the
stack is measured...

How does free() function collect the info about the no. of bytes to be freed [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C programming : How does free know how much to free?
free() is called to deallocate the memory allocated by malloc() function call. From where does the free() find the information about the no. of bytes allocated by the malloc() function. I.e., how do you conform the no. of bytes allocated by the malloc() and where is this information stored.
-Surya
Most implementations of C memory allocation functions will store accounting information for each block, either inline or separately.
One typical way (inline) is to actually allocate both a header and the memory you asked for, padded out to some minimum size. So for example, if you asked for 20 bytes, the system may allocate a 48-byte block:
16-byte header containing size, special marker, checksum, pointers to next/previous block and so on.
32 bytes data area (your 20 bytes padded out to a multiple of 16).
The address then given to you is the address of the data area. Then, when you free the block, free will simply take the address you give it and, assuming you haven't stuffed up that address or the memory around it, check the accounting information immediately before it.
Keep in mind the size of the header and the padding are totally implementation defined (actually, the entire thing is implementation-defineda but the inline-accounting-info option is a common one).
The checksums and special markers that exist in the accounting information are often the cause of errors like "Memory arena corrupted" if you overwrite them. The padding (to make allocation more efficient) is why you can sometimes write a little bit beyond the end of your requested space without causing problems (still, don't do that, it's undefined behaviour and, just because it works sometimes, doesn't mean it's okay to do it).
a I've written implementations of malloc in embedded systems where you got 128 bytes no matter what you asked for (that was the size of the largest structure in the system) and a simple non-inline bit-mask was used to decide whether a 128-byte chunk was allocated or not.
Others I've developed had different pools for 16-byte chunks, 64-bytes chunks, 256-byte chunks and 1K chunks, again using a bitmask to reduce the overhead of the accounting information and to increase the speed of malloc and free (no need to coalesce adjacent free blocks), particularly important in the environment we were working in.
This is implementation dependent. The heap stores that data in some manner that facilitates accessing it having a pointer returned by malloc() - for example, the block could store the number of bytes at the beginning and malloc() would return an offsetted pointer.
When you allocate a block of memory, more bytes than you requested are allocated. How many depends on the implementation but here is an example:
struct MallocHeader {
struct MallocHeader * prev, * next;
size_t length;
... more data, padding, etc ...
char data[0];
}
When malloc() allocates the memory from the free list, it will allocate size + sizeof(struct MallocHeader) and return the address of data. In free(), the offset of data in the struct MallocHeader is subtracted from the pointer you pass in and then it knows the size.
This is implementation dependent - it depends on the libc implementation and also on the operating system implementation (more on the operating system implementation).
I haven't got a need to know such things but if you really want to you can create your own memory allocator.
By mistake I found out that in C++ when allocating with new[] operator stores the number of elements at the beginning of the allocated zone returning to the user the zone after the number of elements (On Visual Studio).
new[NUMBER] ---> [NUMBER (4bytes)]+[allocated area]
it returns the pointer to the allocated area
and probably when the delete[] operator is called
it looks 4 bytes before the [allocated area] to see
how much elements will be deleted

Resources