Growing (and shrinking) memory pool - c

Let's say, for the purpose of the question, we have a memory pool that has n blocks allocated initially. However, when the capacity is reached, the pool wants to grow and become twice the size it was (2n).
Now this resize operation can be done with realloc in C, however the function itself may return a pointer to a different memory (with the old data copied in).
This means the pointers returned by the memory pool allocator may no longer be valid (since the memory may have been moved).
What would be a nice way to overcome this problem? Or is it even possible at all?

Allocate out of multiple non-contiguous pools of memory. When one pool is full, allocate a second pool, allowing it to be someplace else in your virtual address space.
Then the problem is one of keeping track of where your pools are. Typically you'd use some of the space in each pool for bookkeeping. For example, you might reserve one pointer's worth of space to keep a simple linear linked list of all the pools. More sophisticated allocators tend to require more bookkeeping overhead.

Instead of using realloc, malloc a new/additional chunk of blocks (assuming there's no reason why the blocks, which are managed and returned and returned by your pool allocator, need to be in a single contiguous chunk of memory).

Related

Why does malloc() call mmap() and brk() interchangeably?

I'm new to C and heap memory, still struggling to understand dynamic memory allocation.
I traced Linux system calls and found that if I use malloc to request a small amount of heap memory, then malloc calls brk internally.
But if I use malloc to request a very large amount of heap memory, then malloc calls mmap internally.
So there must be a big difference between brk and mmap, but theoretically we should be able to use brk to allocate heap memory regardless of the requested size. So why does malloc call mmap when allocating a large amount of memory?
so why malloc calls mmap when it comes to allocate a large size of memory?
The short answer is for improved efficiency on newer implementations of Linux, and the updated memory allocation algorithms that come with them. But keep in mind that this is a very implementation dependent topic, and the whys and wherefores would vary greatly for differing vintages and flavors of the specific Linux OS being discussed.
Here is fairly recent write-up regarding the low-level parts mmap() and brk() play in Linux memory allocation. And, a not so recent, but still relevant Linux Journal article that includes some content that is very on-point for the topic here, including this:
For very large requests, malloc() uses the mmap() system call to find
addressable memory space. This process helps reduce the negative
effects of memory fragmentation when large blocks of memory are freed
but locked by smaller, more recently allocated blocks lying between
them and the end of the allocated space. In this case, in fact, had
the block been allocated with brk(), it would have remained unusable
by the system even if the process freed it.
(emphasis mine)
Regarding brk():
incidentally, "...mmap() didn't exist in the early versions of Unix. brk() was the only way to increase the size of the data segment of the process at that time. The first version of Unix with mmap() was SunOS in the mid 80's, the first open-source version was BSD-Reno in 1990.". Since that time, modern implementation of memory allocation algorithms have been refactored with many improvements, greatly reducing the need for them to include using brk().
mmap (when used with MAP_ANONYMOUS) allocates a chunk of RAM that can be placed anywhere within the process's virtual address space, and that can be deallocated later (with munmap) independently of all other allocations.
brk changes the ending address of a single, contiguous "arena" of virtual address space: if this address is increased it allocates more memory to the arena, and if it is decreased, it deallocates the memory at the end of the arena. Therefore, memory allocated with brk can only be released back to the operating system when a continuous range of addresses at the end of the arena is no longer needed by the process.
Using brk for small allocations, and mmap for big allocations, is a heuristic based on the assumption that small allocations are more likely to all have the same lifespan, whereas big allocations are more likely to have a lifespan that isn't correlated with any other allocations' lifespan. So, big allocations use the system primitive that lets them be deallocated independently from anything else, and small allocations use the primitive that doesn't.
This heuristic is not very reliable. The current generation of malloc implementations, if I remember correctly, has given up altogether on brk and uses mmap for everything. The malloc implementation I suspect you are looking at (the one in the GNU C Library, based on your tags) is very old and mainly continues to be used because nobody is brave enough to take the risk of swapping it out for something newer that will probably but not certainly be better.
brk() is a traditional way of allocating memory in UNIX -- it just expands the data area by a given amount. mmap() allows you to allocate independent regions of memory without being restricted to a single contiguous chunk of virtual address space.
malloc() uses the data space for "small" allocations and mmap() for "big" ones, for a number of reasons, including reducing memory fragmentation. It's just an implementation detail you shouldn't have to worry about.
Please check this question also.
Reducing fragmentation is commonly given as the reason why mmap is used for large allocations; see ryyker’s answer for details. But I think that’s not the real benefit nowadays; in practice there’s still fragmentation even with mmap, just in a larger pool (the virtual address space, rather than the heap).
The big advantage of mmap is discardability.
When allocating memory with sbrk, if the memory is actually used (so that the kernel maps physical memory at some point), and then freed, the kernel itself can’t know about that, unless the allocator also reduces the program break (which it can’t if the freed block isn’t the topmost previously-used block under the program break). The result is that the contents of that physical memory become “precious” as far as the kernel is concerned; if it ever needs to re-purpose that physical memory, it then has to ensure that it doesn’t lose its contents. So it might end up swapping pages out (which is expensive) even though the owning process no longer cares about them.
When allocating memory with mmap, freeing the memory doesn’t just return the block to a pool somewhere; the corresponding virtual memory allocation is returned to the kernel, and that tells the kernel that any corresponding physical memory, dirty or otherwise, is no longer needed. The kernel can then re-purpose that physical memory without worrying about its contents.
the key part of the reason I think, which I copied from the chat said by Peter
free() is a user-space function, not a system call. It either hands them back to the OS with munmap or brk, or keeps them dirty in user-space. If it doesn't make a system call, the OS must preserve the contents of those pages as part of the process state.
So when you use brk to increase your memory adress, when return back, you have to use the brk a negtive value, so brk only can return the most recently memory block you allocated, when you call malloc(huge), malloc(small), free(huge). the huge cannot be returned back to system, you can only maintain a list of fragmentation for this process, so the huge is actually hold by this process. this is the drawback of brk.
but the mmap and munmap can avoid this.
I want to emphasize another view point.
malloc is system function that allocate memory.
You do not really need to debug it, because in some implementations, it might give you memory from static "arena" (e.g. static char array).
In some other implementations it may just return null pointer.
If you want to see what mallow really do, I suggest you look at
http://gee.cs.oswego.edu/dl/html/malloc.html
Linux gcc malloc is based on this.
You can take a look at jemalloc too. It basically uses same brk and mmap, but organizes the data differently and usually is "better".
Happy researching.

Large memory usage of aio [duplicate]

Here's my question: Does calling free or delete ever release memory back to the "system". By system I mean, does it ever reduce the data segment of the process?
Let's consider the memory allocator on Linux, i.e ptmalloc.
From what I know (please correct me if I am wrong), ptmalloc maintains a free list of memory blocks and when a request for memory allocation comes, it tries to allocate a memory block from this free list (I know, the allocator is much more complex than that but I am just putting it in simple words). If, however, it fails, it gets the memory from the system using say sbrk or brk system calls. When a memory is free'd, that block is placed in the free list.
Now consider this scenario, on peak load, a lot of objects have been allocated on heap. Now when the load decreases, the objects are free'd. So my question is: Once the object is free'd will the allocator do some calculations to find whether it should just keep this object in the free list or depending upon the current size of the free list it may decide to give that memory back to the system i.e decrease the data segment of the process using sbrk or brk?
Documentation of glibc tells me that if the allocation request is much larger than page size, it will be allocated using mmap and will be directly released back to the system once free'd. Cool. But let's say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what?
From what I know (correct me please), a memory allocated with malloc will never be released back to the system ever until the process ends i.e. the allocator will simply keep it in the free list if I free it. But the question that is troubling me is then, if I use a tool to see the memory usage of my process (I am using pmap on Linux, what do you guys use?), it should always show the memory used at peak load (as the memory is never given back to the system, except when allocated using mmap)? That is memory used by the process should never ever decrease(except the stack memory)? Is it?
I know I am missing something, so please shed some light on all this.
Experts, please clear my concepts regarding this. I will be grateful. I hope I was able to explain my question.
There isn't much overhead for malloc, so you are unlikely to achieve any run-time savings. There is, however, a good reason to implement an allocator on top of malloc, and that is to be able to trace memory leaks. For example, you can free all memory allocated by the program when it exits, and then check to see if your memory allocator calls balance (i.e. same number of calls to allocate/deallocate).
For your specific implementation, there is no reason to free() since the malloc won't release to system memory and so it will only release memory back to your own allocator.
Another reason for using a custom allocator is that you may be allocating many objects of the same size (i.e you have some data structure that you are allocating a lot). You may want to maintain a separate free list for this type of object, and free/allocate only from this special list. The advantage of this is that it will avoid memory fragmentation.
No.
It's actually a bad strategy for a number of reasons, so it doesn't happen --except-- as you note, there can be an exception for large allocations that can be directly made in pages.
It increases internal fragmentation and therefore can actually waste memory. (You can only return aligned pages to the OS, so pulling aligned pages out of a block will usually create two guaranteed-to-be-small blocks --smaller than a page, anyway-- to either side of the block. If this happens a lot you end up with the same total amount of usefully-allocated memory plus lots of useless small blocks.)
A kernel call is required, and kernel calls are slow, so it would slow down the program. It's much faster to just throw the block back into the heap.
Almost every program will either converge on a steady-state memory footprint or it will have an increasing footprint until exit. (Or, until near-exit.) Therefore, all the extra processing needed by a page-return mechanism would be completely wasted.
It is entirely implementation dependent. On Windows VC++ programs can return memory back to the system if the corresponding memory pages contain only free'd blocks.
I think that you have all the information you need to answer your own question. pmap shows the memory that is currenly being used by the process. So, if you call pmap before the process achieves peak memory, then no it will not show peak memory. if you call pmap just before the process exits, then it will show peak memory for a process that does not use mmap. If the process uses mmap, then if you call pmap at the point where maximum memory is being used, it will show peak memory usage, but this point may not be at the end of the process (it could occur anywhere).
This applies only to your current system (i.e. based on the documentation you have provided for free and mmap and malloc) but as the previous poster has stated, behavior of these is implmentation dependent.
This varies a bit from implementation to implementation.
Think of your memory as a massive long block, when you allocate to it you take a bit out of your memory (labeled '1' below):
111
If I allocate more more memory with malloc it gets some from the system:
1112222
If I now free '1':
___2222
It won't be returned to the system, because two is in front of it (and memory is given as a continous block). However if the end of the memory is freed, then that memory is returned to the system. If I freed '2' instead of '1'. I would get:
111
the bit where '2' was would be returned to the system.
The main benefit of freeing memory is that that bit can then be reallocated, as opposed to getting more memory from the system. e.g:
33_2222
I believe that the memory allocator in glibc can return memory back to the system, but whether it will or not depends on your memory allocation patterns.
Let's say you do something like this:
void *pointers[10000];
for(i = 0; i < 10000; i++)
pointers[i] = malloc(1024);
for(i = 0; i < 9999; i++)
free(pointers[i]);
The only part of the heap that can be safely returned to the system is the "wilderness chunk", which is at the end of the heap. This can be returned to the system using another sbrk system call, and the glibc memory allocator will do that when the size of this last chunk exceeds some threshold.
The above program would make 10000 small allocations, but only free the first 9999 of them. The last one should (assuming nothing else has called malloc, which is unlikely) be sitting right at the end of the heap. This would prevent the allocator from returning any memory to the system at all.
If you were to free the remaining allocation, glibc's malloc implementation should be able to return most of the pages allocated back to the system.
If you're allocating and freeing small chunks of memory, a few of which are long-lived, you could end up in a situation where you have a large chunk of memory allocated from the system, but you're only using a tiny fraction of it.
Here are some "advantages" to never releasing memory back to the system:
Having already used a lot of memory makes it very likely you will do so again, and
when you release memory the OS has to do quite a bit of paperwork
when you need it again, your memory allocator has to re-initialise all its data structures in the region it just received
Freed memory that isn't needed gets paged out to disk where it doesn't actually make that much difference
Often, even if you free 90% of your memory, fragmentation means that very few pages can actually be released, so the effort required to look for empty pages isn't terribly well spent
Many memory managers can perform TRIM operations where they return entirely unused blocks of memory to the OS. However, as several posts here have mentioned, it's entirely implementation dependent.
But lets say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what ?
This depends on your allocation pattern. Do you free ALL of the small allocations? If so and if the memory manager has handling for a small block allocations, then this may be possible. However, if you allocate many small items and then only free all but a few scattered items, you may fragment memory and make it impossible to TRIM blocks since each block will have only a few straggling allocations. In this case, you may want to use a different allocation scheme for the temporary allocations and the persistant ones so you can return the temporary allocations back to the OS.

How does memory allocation take place on using malloc in a loop >

How does memory allocation take place in this case?
I observed that this is not the same as using a malloc on 1000000*10000 directly, which should have lead to 4*10GB (4 bytes per int) being allocated. However, this piece of code uses only 200MB on executing.
for(int i=0;i<1000000;i++)
{
int *A = (int*)malloc(sizeof(int)*10000);
}
As mentioned, there are differences in whether the memory is allocated in chunks or as one. The main reason why you're not seeing the memory being allocated is due to the operating systems lying about memory.
If you allocate a block of memory the allocation is virtual. Since all processes have lots of virtual memory available, it will usually succeed (unless you ask for insane amounts of memory, or the OS otherwise determines it's not going to work). The actual reservation of physical memory may occur after you actually use the memory.
So when you look at memory usage, there is not only one number but several. There is shared memory, there is memory that can't be paged out, there's the virtual allocated memory and then there's the actual memory in use.
If you change the code to actually use the memory, for example just write one byte to the allocated section, you will see completely different result. The OS has to handle the memory allocation and get the memory blocks in physical memory.
Also as mentioned you don't check that malloc succeeds. Maybe it succeeds for a few times and then doesn't allocate anything more.
This system also explains why sometimes a process might get killed due to low memory even though all allocations succeeded in all processes. The OS was just being too optimistic and thought it could give out more memory than actually was possible.
The difference is how the memory is allocated.
When you call 10K times malloc to allocate 10k of memory, 10G of virtual memory is allocated to your process. The resulting 10G of memory is not continuous. That is, you get 10k scattered blocks of memory whose size is 10K. Whereas, when you call malloc requesting 10G, malloc will try to allocate a continuos block of memory whose size is 10G.
According to the malloc manual page, malloc fails when it can't allocate the requested memory. You should check if the malloc is successful in your application, in order to understand if the memory has been correctly allocated.
for(int i=0;i<1000000;i++)
{
int *A = (int*)malloc(sizeof(int)*10000);
}
This is a perfect way of memory leak. You are allocating 1000000 times, each time sizeof(int)*10000 bytes. You're guaranteed to leak all of these allocated memory. As you declared A within the loop, so after the loop you do not have handle to that variable any more, and there's no way to free even the last chunk of memory you allocated.
And of course this is different from allocating 1000000*10000*sizeof(int) in one go. The former allocates 1000000 smaller chunks which are mostly to be scattered in many memory location. The latter tries to allocate one gigantic chunk, which likely to fails.

free() not freeing memory in embedded linux.

I have allocated memory using malloc() in embedded Linux (around 10 MB). And checked the free memory it was 67080 kB but even after freeing it using free() it remains the same. It is only after the application is terminated the memory is available again. Does free() not make the freed memory available to the system, if so how to make it available.
free is a libc library call. it marks heap space as available for reuse. It does not guarantee that the associated virtual mapping will be released. Only after a dirty virtual mapping is released by your OS, then that memory will be system wide free again. This can only happen in chunks of pages.
Also if you allocated memory using malloc and family and didn't use it then it didn't actually consume physical memory until then - so freeing it will do nothing.
Does free() not make the freed memory available to the system.
No, usually not.
malloc() normally requests memory from the OS by the low level sbrk() or mmap() call. Once assigned to the application, free() just returns the memory to a memory pool that belongs to the application. That is, it's not returned back to the OS for use in another process. (Though some heuristics are in-place to do so in certain circumstances).
If swap space is in place, this becomes less of a problem, the OS will swap out the unused memory of applications to make room for additional physical memory that's required.
if so how to make it available.
Exit the application.
Or you would need to write your own memory allocator that could do this.(which in the general case is not an easy task especially if you don't want to sacrifice overhead and speed).
For a relatively big single piece of 10MB, you could simply request anonymous memory with mmap() and the memory will be released back to the OS when you munmap() that piece of memory.
Taken from the malloc 3 man page:
Normally, malloc() allocates memory from the heap, and adjusts the
size of the heap as required, using sbrk(2). When allocating blocks
of memory larger than MMAP_THRESHOLD bytes, the glibc malloc()
implementation allocates the memory as a private anonymous mapping
using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is
adjustable using mallopt(3)
You can try to modify the MMAP_THRESHOLD so that by using malloc you are invoking mmap. If you do so, free guarantees that the memory allocated through mmap will return back to the system as soon as you free it.
Your malloc() calls obtain memory from the system, and maintain a heap data structure for keeping track of used and free memory within the process. Your free() calls return memory to the heap, where they are marked free, but they're still part of the process's memory.
If you want memory deallocation to return pages to the system, you'll have to write your own memory manager, but keep in mind that it'll only be able to completely free memory under the right conditions: It depends on the behavior of your application, whether your allocations and deallocations span page boundaries and cleanly de-fragment, etc. You need to understand the memory allocation behavior of your application to know whether this will be any benefit.

Limit Virtual Memory space for malloc()

I have written my own my_malloc() function that manages its own physical memory. In my application I want to be able use both the libc malloc() as well as my own my_malloc() function. So I somehow need to partition the virtual address space, malloc should always assign a virtual address only if its from its dedicated pool, same thing with my_malloc(). I cannot limit heap size, I just need to guarantee that malloc() and my_malloc() never return the same/overlapping virtual addresses.
thanks!
One possibility would be to have my_malloc() call malloc() at startup to pre-allocate a large pool of memory, then apportion that memory to its callers and manage it accordingly. However, a full implementation would need to handle garbage collection and defragmentation.
Another possibility would be to have my_malloc() call malloc() each time it needs to allocate memory and simply handle whatever "bookkeeping" aspects you're interested in, such as number of blocks allocated, number of blocks freed, largest outstanding block, total allocated memory, etc. This is by far the safer and more efficient mechanism, as you're passing all of the "hard" operations to malloc().
One answer is to make your my_malloc use memory allocated by malloc. Using big enough blocks would mostly acheive that; then within each block your version will maintain its own structures and allocate parts of it to callers.
It gets tricky because you can't rely on extending the entire available space for your version, as you can when you get memory from sbrk or similar. So your version would have to maintain several blocks.
Reserve a large block of virtual address space, and have that be the pool from which my_malloc() allocates. Once you have reserved a large contiguous region of memory from the OS, then subsequent calls to malloc() have to come from elsewhere.
For example, on Windows, you can use VirtualAlloc() to reserve a 256mb block of space. The memory won't actually be allocated until you "commit" it with a subsequent call, but it will reserve an address range (such as 0x4000000-0x5000000) which subsequent malloc() will not use. Then your my_malloc() can commit blocks out of this reserved range as requested, and subdivide them by whatever allocation scheme you've written.
I'm told the equivalent Linux call is mmap(). (edit: I previously said "kmalloc or vmalloc, depending on whether you need the memory to be physically contiguous or not," but those are kernel-level functions.)
We use this mechanism in our app to redirect all allocations of a certain size into our own custom pooled-block allocator for speed and efficiency. Among other things, it lets us reserve virtual pages in certain specific sizes that are more efficient for the CPU to handle.
If you add an mmap(2) call near the start of the program, you can allocate as much memory as you need with whatever addresses you need (see the hint, that's usually left NULL for the OS to determine) immediately; that will prevent malloc(3), or any other memory allocation routines, from getting those particular pages.
Don't worry about the memory usage; since modern systems are quite happy to overcommit, you'll only use a few hundred kilobytes more kernel space to handle page tables. Not too bad.

Resources