Does malloc allocated fragmented chunks? - heap-memory

I found out there are kernel drivers for Contiguous memory allocations. I though malloc coalesced memory and return best fit and if memory wasn't available it would return 0. If malloc only allocated continuous memory what is the need for a contiguous memory allocator like PMEM.
My questions are as follows
Is it because the virtual memory isn't fragmented but the physical pages are fragmented?
// Assuming we have 20bytes of heap(excluding malloc header information)
p1 = malloc(4)
p2 = malloc(3)
p3 = malloc(3)
p4 = malloc(10) // total 20bytes allocd
free(p2)
free(p4) // free 10 + 3 = 13.
malloc(13)????? // Would this fail because of no large enough chunk or does it fragment?
// If it allocates where in the malloc header or payload does it store the next chunk information.
Thank you.

glibc's malloc rounds up allocation sizes due to ABI and implementation constraints. On 64-bit architectures, all allocations would end up using the same internal size.
To your question about fragmentation: The original dlmalloc code (on which glibc malloc is ultimately based) did in fact coalesce free blocks in the majority of cases, except in some corner cases involving larger applications. And that was not an algorithmic limitation as such, rather the result of not wanting to implement some form of balanced tree data structure. (Current dlmalloc does not have this particular limitation.)
However, over the years, glibc layered various allocators on top of this low-level coalescing allocator. Nowadays, there are atomic fastbin and tcache allocations. From the perspective of the lower-level allocator, these are still in use, so they cannot be coalesced with neighboring allocations.

Related

Advantages of mmap() over sbrk()?

From my book:
Recall from our first discussion that modern dynamic memory managers
not only use sbrk() but also mmap(). This process helps reduce the
negative effects of memory fragmentation when large blocks of memory
are freed but locked by smaller, more recently allocated blocks lying
between them and the end of the allocated space. In this case, had the
block been allocated with sbrk(), it would have probably remained
unused by the system for some time (or at least most of it).
Can someone kindly explain how using mmap reduces the negative effects of memory fragmentation? The given example didn't make any sense to me and wasn't clear at all.
it would have probably remained unused by the system for some time
Why this claim was made, when we free it the system can use it later. Maybe the OS keeps list of freed blocks in heap to use them when possible instead of using more space in heap.
Please Relate to both questions.
Advantages of mmap() over sbrk()?
brk/sbrk is LIFO. Let's say you increase the segment size by X number of bytes to make room for allocation A and X number of bytes to make allocation B, and then free A. You cannot reduce the allocated memory because B is still allocated. And since the segment is shared across the entire program, if multiple parts of the program use it directly, you will have no way of knowing whether particular part is still in use or not. And if one part of the program (let's say malloc) assumes entire control over the use of brk/sbrk, then calling them elsewhere will break the program.
By contrast, mmap can be unmapped in any order and allocation by one part of the program doesn't conflict with other parts of the program.
brk/sbrk are not part of the POSIX standard and thus not portable.
By contrast, mmap is standard and portable.
mmap can also do things like map files into memory which is not possible using brk/sbrk.
it would have probably remained unused by the system for some time
Why this claim was made
See 1.
Maybe the OS keeps list of freed block
There are no "blocks". There is one (virtual) block called the data segment. brk/sbrk sets the size of that block.
But doesn't mmap allocate on heap
No. "Heap" is at the end of the data segment and heap is what grows using brk/sbrk. mmap does not allocate in the area of memory that has been allocated using brk/sbrk.
mmap creates a new segment elsewhere in the address space.
does malloc actually save the free blocks that were allocated with sbrk for later usage?
If it is allocated using brk/sbrk in the first place, and if malloc hasn't reduced the size of the "heap" (in case that was possible), then malloc may reuse a free "slot" that has been previously freed. It would be a useful thing to do.
"then calling them elsewhere will break the program." can you give an example
malloc(42);
sbrk(42);
malloc(42); // maybe kaboom, who knows?
In conclusion: Just don't use brk/sbrk to set the segment size. Perhaps there's little reason to use (anonymous) mmap either. Use malloc in C.
When sbrk() is used, the heap is just one, large block of memory. If your pattern of allocating and freeing doesn't leave large, contiguous blocks of memory, every large allocation will need to grow the heap. This can result in inefficient memory use, because of all the unused gaps that are left in the heap.
With mmap(), you can have a bunch of independent blocks of mapped memory. So you could use the sbrk() heap for your small allocations, which can be packed neatly, and use mmap() for large allocations. When you're done with one of these large blocks, you can just remove the entire mapping.

How does malloc obtain memory from the heap?

We know that malloc calls mmap internally. But mmap doesn't necessarily map to the heap as mmap can map objects to any area in virtual memory, then how does malloc do internally to make sure that the requested size of memory is from the heap?
When malloc uses mmap to allocate memory, it doesn‘t care where the memory comes from — it delegates the allocation to mmap, and relies on that to provide a usable block of memory.
In the GNU C library (and probably in other implementations too), such allocations are tracked separately from the allocations managed using sbrk. All operations involving mmaped allocations are also delegated (reallocation and freeing).
From the kernel’s perspective, such allocations are off-heap, i.e. after the program break. From the programmer’s perspective, they’re all the same; the main practical consequences compared to sbrk-only allocations is that you can’t assume that allocated blocks are within the program break, or that the address space between two allocated blocks is accessible, but you shouldn‘t do that anyway.
See also the POSIX specification for malloc — it doesn’t say anything about the heap.

Why malloc(1) gives more than one page size?

I have tried in my machine using sbrk(1) and then deliberately write out of bound to test page size, which is 4096 bytes. But when I call malloc(1), I get SEGV after accessing 135152 bytes, which is way more than one page size. I know that malloc is library function and it is implementation dependent, but considering that it calls sbrk eventually, why will it give more than one page size. Can anyone tell me about its internal working?
My operating system is ubuntu 14.04 and my architecture is x86
Update: Now I am wondering if it's because malloc returns the address to a free list block that is large enough to hold my data. But that address may be in the middle of the heap so that I can keep writing until the upper limit of the heap is reached.
Older malloc() implementations of UNIX used sbrk()/brk() system calls. But these days, implementations use mmap() and sbrk(). The malloc() implementation of glibc (that's probably the one you use on your Ubuntu 14.04) uses both sbrk() and mmap() and the choice to use which one to allocate when you request the typically depends on the size of the allocation request, which glibc does dynamically.
For small allocations, glibc uses sbrk() and for larger allocations it uses mmap(). The macro M_MMAP_THRESHOLD is used to decide this. Currently, it's default value is set to 128K. This explains why your code managed to allocate 135152 bytes as it is roughly ~128K. Even though, you requested only 1 byte, your implementation allocates 128K for efficient memory allocation. So segfault didn't occur until you cross this limit.
You can play with M_MAP_THRESHOLD by using mallopt() by changing the default parameters.
M_MMAP_THRESHOLD
For allocations greater than or equal to the limit
specified (in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
the free list, the memory-allocation functions employ mmap(2) instead
of increasing the program break using sbrk(2).
Allocating memory using mmap(2) has the significant advantage that
the allocated memory blocks can always be independently released back
to the system. (By contrast, the heap can be trimmed only if memory
is freed at the top end.) On the other hand, there are some
disadvantages to the use of mmap(2): deallocated space is not placed
on the free list for reuse by later allocations; memory may be wasted
because mmap(2) allocations must be page-aligned; and the kernel must
perform the expensive task of zeroing out memory allocated via
mmap(2). Balancing these factors leads to a default setting of
128*1024 for the M_MMAP_THRESHOLD parameter.
The lower limit for this parameter is 0. The upper limit is
DEFAULT_MMAP_THRESHOLD_MAX: 512*1024 on 32-bit systems or
4*1024*1024*sizeof(long) on 64-bit systems.
Note: Nowadays, glibc uses a dynamic mmap threshold by default.
The initial value of the threshold is 128*1024, but when blocks
larger than the current threshold and less than or equal to
DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted
upward to the size of the freed block. When dynamic mmap
thresholding is in effect, the threshold for trimming the heap is also
dynamically adjusted to be twice the dynamic mmap threshold. Dynamic
adjustment of the mmap threshold is disabled if any of the
M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX
parameters is set.
For example, if you do:
#include<malloc.h>
mallopt(M_MMAP_THRESHOLD, 0);
before calling malloc(), you'll likely see a different limit. Most of these are implementation details and C standard says it's undefined behaviour to write into memory that your process doesn't own. So do it at your risk -- otherwise, demons may fly out of your nose ;-)
malloc allocates memory in large blocks for performance reasons. Subsequent calls to malloc can give you memory from the large block instead of having to ask the operating system for a lot of small blocks. This cuts down on the number of system calls needed.
From this article:
When a process needs memory, some room is created by moving the upper bound of the heap forward, using the brk() or sbrk() system calls. Because a system call is expensive in terms of CPU usage, a better strategy is to call brk() to grab a large chunk of memory and then split it as needed to get smaller chunks. This is exactly what malloc() does. It aggregates a lot of smaller malloc() requests into fewer large brk() calls. Doing so yields a significant performance improvement.
Note that some modern implementations of malloc use mmap instead of brk/sbrk to allocate memory, but otherwise the above is still true.

malloc memory allocation scheme in C

I was experimenting with malloc in C and I have observed that malloc is wasting some space after some memory has been allocated. Below is the piece of code I used to test malloc
#include <stdlib.h>
#include <string.h>
int main(){
char* a;
char* b;
a=malloc(2*sizeof(char));
b=malloc(2*sizeof(char));
memset(a,9,2);
memset(b,9,2);
return 0;
}
In the right-middle of the following picture(open the image in a new tab for clarity) you can see the memory contents;0x804b008 is the address pointed by variable 'a' and 0x804b018 is the memory pointed by variable 'b'. what is happening to memory between from 0x804b00a 0x804b017? The thing is even if I try to allocate 3*sizeof(char) instead of 2*sizeof(char) bytes of memory the memory layout is the same! So, is there something I am missing?
malloc() is allowed to waste as much space as it wants to - the standard doesn't specify anything about the implementation. The only guarantee you have is about alignment (§7.20.3 Memory management functions):
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
Your implementation appears to return you minimum-8-byte-aligned pointers.
Memory Alignment! It's good for perfomance in x86 and mandatory in some architectures like ARM.
Most CPUs require that objects and variables reside at particular offsets in the system's memory. For example, 32-bit processors require a 4-byte integer to reside at a memory address that is evenly divisible by 4. This requirement is called "memory alignment". Thus, a 4-byte int can be located at memory address 0x2000 or 0x2004, but not at 0x2001. On most Unix systems, an attempt to use misaligned data results in a bus error, which terminates the program altogether. On Intel processors, the use of misaligned data is supported but at a substantial performance penalty. Therefore, most compilers automatically align data variables according to their type and the particular processor being used. This is why the size that structs and classes occupy is often larger than the sum of their members'
http://www.devx.com/tips/Tip/13265
The heap is handled by the implementation, not necessarily as you expect. The Standard explicitly doesn't guarantee anything about order or contiguity. There are two main things that cause more heap space to be used than you asked for.
First, allocated memory has to be aligned so that it's suitable for use by any sort of object. Typically, computers expect primitive data objects of N bytes to be allocated at a multiple of N, so the odds are you can't get malloc() to return a value that isn't a multiple of 8.
Second, the heap needs to be managed, so that free() allows reuse of the memory. This means that the heap manager needs to keep track of allocated and unallocated blocks, and their sizes. One practice is to stick some information in memory just before each block, so the manager can know what size block to free and where blocks are that might be reused. If that's what your system does, there will be more memory used between allocated blocks, and given alignment restrictions at 8 bytes it's likely you can't get allocations of less than 16 bytes.
Most modern malloc() implementations allocate in powers of two and have a minimum allocation size, to reduce fragmentation since oddball sizes could generally only be reused when enough contiguous allocations are free()d to make larger blocks. (It also speeds up coalescing contiguous allocations in general, IIRC.) Also keep in mind the block overhead; to get the block size you need to add some amount (8 in GNU malloc(), IIRC) for internal management uses.
malloc is only guaranteed to return you a block of memory that's at least as big as the size you give it. However, processors are generally more efficient when they're operating on blocks of memory that start at multiples of, say, 8 bytes in memory. Look up word size for more information on this.

What happens when there is a request for memory block which is not a power of 2?

Suppose we do a malloc request for memory block of size n where 2 ^k !=n for k>0.
Malloc returns us space for that requestted memory block but how is the remainig buffer handled from the page. I read Pages are generally blocks of memory which are powers of two.
Wiki states the following:
Like any method of memory allocation, the heap will become fragmented; that is,
there will be sections of used and unused memory in the allocated
space on the heap. A good allocator will attempt to find an unused area
of already allocated memory to use before resorting to expanding the heap.
So my question is how is this tracked?
EDIT: How is the unused memory tracked when using malloc ?
This really depends on the specific implementation, as Morten Siebuhr pointed out already. In very simple cases, there might be a list of free, fixed-size blocks of memory (possibly all having the same size), so the unused memory is simply wasted. Note that real implementations will never use such simplistic algorithms.
This is an overview over some simple possibilities: http://www.osdcom.info/content/view/31/39/
This Wikipedia entry has several interesting links, including the one above: http://en.wikipedia.org/wiki/Dynamic_memory_allocation#Implementations
As a final remark, googling "malloc implementation" turns up a heap (pun intended) of valuable links.
A standard BSD-style memory allocator basically works like this:
It keeps a linked list of pre-allocated memory blocks for sizes 2^k for k<=12 (for example).
In reality, each list for a given k is composed of memory-blocks from different areas, see below.
A malloc request for n bytes is serviced by calculating n', the closest 2^k >= n, then looking up the first area in the list for k, and then returning the first free block in the free-list for the given area.
When there is no pre-allocated memory block for size 2^k, an area is allocated, an area being some larger piece of continuous memory, say a 4kB piece of memory. This piece of memory is then chopped up into pieces that are 2^k bytes. At the beginning of the continuous memory area there is book-keeping information such as where to find the linked list of free blocks within the area. A bitmap can also be used, but a linked list typically has better cache behavior (you want the next allocated block to return memory that is already in the cache).
The reason for using areas is that free(ptr) can be implemented efficiently. ptr & 0xfffff000 in this example points to the beginning of the area which contains the book-keeping structures and makes it possible to link the memory block back into the area.
The BSD allocator will waste space by always returning a memory block 2^k in size, but it can reuse the memory of the block to keep the free-list, which is a nice property. Also allocation is blazingly fast.
Modifications to the above general idea include:
Using anonymous mmap for large allocations. This shifts the work over to the kernel for handling large mallocs and avoids wasting a lot of memory in these cases.
The GNU version of malloc have special cases for non-power-of-two buckets. There is nothing inherent in the BSD allocator that requires returning 2^k memory blocks, only that there are pre-defined bucket sizes. The GNU allocator has more buckets and thus waste less space.
Sharing memory between threads is a tricky subject. Lock-contention during allocation is an important consideration, so in the GNU allocator for example will eagerly create extra areas for different threads for a given bucket size if it ever encounters lock-contention during allocation.
This varies a lot from implementation to implementation. Some waste the space, some sub-divide pages until they get the requested size (or close to it) &c.
If you are asking out of curiosity, I suggest you read the source code for the implementation in question,
If it's because of performance worries, try to benchmark it and see what happens.

Resources