Large memory usage of aio [duplicate]

Large memory usage of aio [duplicate] - c

Here's my question: Does calling free or delete ever release memory back to the "system". By system I mean, does it ever reduce the data segment of the process?
Let's consider the memory allocator on Linux, i.e ptmalloc.
From what I know (please correct me if I am wrong), ptmalloc maintains a free list of memory blocks and when a request for memory allocation comes, it tries to allocate a memory block from this free list (I know, the allocator is much more complex than that but I am just putting it in simple words). If, however, it fails, it gets the memory from the system using say sbrk or brk system calls. When a memory is free'd, that block is placed in the free list.
Now consider this scenario, on peak load, a lot of objects have been allocated on heap. Now when the load decreases, the objects are free'd. So my question is: Once the object is free'd will the allocator do some calculations to find whether it should just keep this object in the free list or depending upon the current size of the free list it may decide to give that memory back to the system i.e decrease the data segment of the process using sbrk or brk?
Documentation of glibc tells me that if the allocation request is much larger than page size, it will be allocated using mmap and will be directly released back to the system once free'd. Cool. But let's say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what?
From what I know (correct me please), a memory allocated with malloc will never be released back to the system ever until the process ends i.e. the allocator will simply keep it in the free list if I free it. But the question that is troubling me is then, if I use a tool to see the memory usage of my process (I am using pmap on Linux, what do you guys use?), it should always show the memory used at peak load (as the memory is never given back to the system, except when allocated using mmap)? That is memory used by the process should never ever decrease(except the stack memory)? Is it?
I know I am missing something, so please shed some light on all this.
Experts, please clear my concepts regarding this. I will be grateful. I hope I was able to explain my question.

There isn't much overhead for malloc, so you are unlikely to achieve any run-time savings. There is, however, a good reason to implement an allocator on top of malloc, and that is to be able to trace memory leaks. For example, you can free all memory allocated by the program when it exits, and then check to see if your memory allocator calls balance (i.e. same number of calls to allocate/deallocate).
For your specific implementation, there is no reason to free() since the malloc won't release to system memory and so it will only release memory back to your own allocator.
Another reason for using a custom allocator is that you may be allocating many objects of the same size (i.e you have some data structure that you are allocating a lot). You may want to maintain a separate free list for this type of object, and free/allocate only from this special list. The advantage of this is that it will avoid memory fragmentation.

No.
It's actually a bad strategy for a number of reasons, so it doesn't happen --except-- as you note, there can be an exception for large allocations that can be directly made in pages.
It increases internal fragmentation and therefore can actually waste memory. (You can only return aligned pages to the OS, so pulling aligned pages out of a block will usually create two guaranteed-to-be-small blocks --smaller than a page, anyway-- to either side of the block. If this happens a lot you end up with the same total amount of usefully-allocated memory plus lots of useless small blocks.)
A kernel call is required, and kernel calls are slow, so it would slow down the program. It's much faster to just throw the block back into the heap.
Almost every program will either converge on a steady-state memory footprint or it will have an increasing footprint until exit. (Or, until near-exit.) Therefore, all the extra processing needed by a page-return mechanism would be completely wasted.

It is entirely implementation dependent. On Windows VC++ programs can return memory back to the system if the corresponding memory pages contain only free'd blocks.

I think that you have all the information you need to answer your own question. pmap shows the memory that is currenly being used by the process. So, if you call pmap before the process achieves peak memory, then no it will not show peak memory. if you call pmap just before the process exits, then it will show peak memory for a process that does not use mmap. If the process uses mmap, then if you call pmap at the point where maximum memory is being used, it will show peak memory usage, but this point may not be at the end of the process (it could occur anywhere).
This applies only to your current system (i.e. based on the documentation you have provided for free and mmap and malloc) but as the previous poster has stated, behavior of these is implmentation dependent.

This varies a bit from implementation to implementation.
Think of your memory as a massive long block, when you allocate to it you take a bit out of your memory (labeled '1' below):
111
If I allocate more more memory with malloc it gets some from the system:
1112222
If I now free '1':
___2222
It won't be returned to the system, because two is in front of it (and memory is given as a continous block). However if the end of the memory is freed, then that memory is returned to the system. If I freed '2' instead of '1'. I would get:
111
the bit where '2' was would be returned to the system.
The main benefit of freeing memory is that that bit can then be reallocated, as opposed to getting more memory from the system. e.g:
33_2222

I believe that the memory allocator in glibc can return memory back to the system, but whether it will or not depends on your memory allocation patterns.
Let's say you do something like this:
void *pointers[10000];
for(i = 0; i < 10000; i++)
pointers[i] = malloc(1024);
for(i = 0; i < 9999; i++)
free(pointers[i]);
The only part of the heap that can be safely returned to the system is the "wilderness chunk", which is at the end of the heap. This can be returned to the system using another sbrk system call, and the glibc memory allocator will do that when the size of this last chunk exceeds some threshold.
The above program would make 10000 small allocations, but only free the first 9999 of them. The last one should (assuming nothing else has called malloc, which is unlikely) be sitting right at the end of the heap. This would prevent the allocator from returning any memory to the system at all.
If you were to free the remaining allocation, glibc's malloc implementation should be able to return most of the pages allocated back to the system.
If you're allocating and freeing small chunks of memory, a few of which are long-lived, you could end up in a situation where you have a large chunk of memory allocated from the system, but you're only using a tiny fraction of it.

Here are some "advantages" to never releasing memory back to the system:
Having already used a lot of memory makes it very likely you will do so again, and
when you release memory the OS has to do quite a bit of paperwork
when you need it again, your memory allocator has to re-initialise all its data structures in the region it just received
Freed memory that isn't needed gets paged out to disk where it doesn't actually make that much difference
Often, even if you free 90% of your memory, fragmentation means that very few pages can actually be released, so the effort required to look for empty pages isn't terribly well spent

Many memory managers can perform TRIM operations where they return entirely unused blocks of memory to the OS. However, as several posts here have mentioned, it's entirely implementation dependent.
But lets say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what ?
This depends on your allocation pattern. Do you free ALL of the small allocations? If so and if the memory manager has handling for a small block allocations, then this may be possible. However, if you allocate many small items and then only free all but a few scattered items, you may fragment memory and make it impossible to TRIM blocks since each block will have only a few straggling allocations. In this case, you may want to use a different allocation scheme for the temporary allocations and the persistant ones so you can return the temporary allocations back to the OS.

Related

When should a free() implementation give memory back to the OS?

I'm writing a simple malloc implementation for a college project. One of the tasks is to sometimes give back freed memory to the OS (the example given was of a process using say 1GB malloc-ed memory during a period, and afterwards it only uses 100MB memory until it terminates), however I'm not sure how to implement this. I was thinking of periodically checking the amount of memory the process has allocated and the amount freed and, if possible, give back some of the freed pages to the OS, but I'm not sure if this is an efficient approach.
EDIT: I didn't realize when I first wrote this, but the way I worded this is too vague. By "unused memory" I'm talking specifically about freed one.

Asking the OS for memory or returning it back are (relatively) expensive operation because they require a context switch user/kernel and back. For that reason, in most implementations, the malloc call only asks for large chunks and internally allocates from those chunks, and manages freed memory with a free blocks list. In that case, it only returns memory to the OS when a full chunk is present in the free list.
For a custom implementation, the rule for returning memory to the system is up to the programmer (you...).

Why does malloc() call mmap() and brk() interchangeably?

I'm new to C and heap memory, still struggling to understand dynamic memory allocation.
I traced Linux system calls and found that if I use malloc to request a small amount of heap memory, then malloc calls brk internally.
But if I use malloc to request a very large amount of heap memory, then malloc calls mmap internally.
So there must be a big difference between brk and mmap, but theoretically we should be able to use brk to allocate heap memory regardless of the requested size. So why does malloc call mmap when allocating a large amount of memory?

so why malloc calls mmap when it comes to allocate a large size of memory?
The short answer is for improved efficiency on newer implementations of Linux, and the updated memory allocation algorithms that come with them. But keep in mind that this is a very implementation dependent topic, and the whys and wherefores would vary greatly for differing vintages and flavors of the specific Linux OS being discussed.
Here is fairly recent write-up regarding the low-level parts mmap() and brk() play in Linux memory allocation. And, a not so recent, but still relevant Linux Journal article that includes some content that is very on-point for the topic here, including this:
For very large requests, malloc() uses the mmap() system call to find
addressable memory space. This process helps reduce the negative
effects of memory fragmentation when large blocks of memory are freed
but locked by smaller, more recently allocated blocks lying between
them and the end of the allocated space. In this case, in fact, had
the block been allocated with brk(), it would have remained unusable
by the system even if the process freed it.
(emphasis mine)
Regarding brk():
incidentally, "...mmap() didn't exist in the early versions of Unix. brk() was the only way to increase the size of the data segment of the process at that time. The first version of Unix with mmap() was SunOS in the mid 80's, the first open-source version was BSD-Reno in 1990.". Since that time, modern implementation of memory allocation algorithms have been refactored with many improvements, greatly reducing the need for them to include using brk().

mmap (when used with MAP_ANONYMOUS) allocates a chunk of RAM that can be placed anywhere within the process's virtual address space, and that can be deallocated later (with munmap) independently of all other allocations.
brk changes the ending address of a single, contiguous "arena" of virtual address space: if this address is increased it allocates more memory to the arena, and if it is decreased, it deallocates the memory at the end of the arena. Therefore, memory allocated with brk can only be released back to the operating system when a continuous range of addresses at the end of the arena is no longer needed by the process.
Using brk for small allocations, and mmap for big allocations, is a heuristic based on the assumption that small allocations are more likely to all have the same lifespan, whereas big allocations are more likely to have a lifespan that isn't correlated with any other allocations' lifespan. So, big allocations use the system primitive that lets them be deallocated independently from anything else, and small allocations use the primitive that doesn't.
This heuristic is not very reliable. The current generation of malloc implementations, if I remember correctly, has given up altogether on brk and uses mmap for everything. The malloc implementation I suspect you are looking at (the one in the GNU C Library, based on your tags) is very old and mainly continues to be used because nobody is brave enough to take the risk of swapping it out for something newer that will probably but not certainly be better.

brk() is a traditional way of allocating memory in UNIX -- it just expands the data area by a given amount. mmap() allows you to allocate independent regions of memory without being restricted to a single contiguous chunk of virtual address space.
malloc() uses the data space for "small" allocations and mmap() for "big" ones, for a number of reasons, including reducing memory fragmentation. It's just an implementation detail you shouldn't have to worry about.
Please check this question also.

Reducing fragmentation is commonly given as the reason why mmap is used for large allocations; see ryyker’s answer for details. But I think that’s not the real benefit nowadays; in practice there’s still fragmentation even with mmap, just in a larger pool (the virtual address space, rather than the heap).
The big advantage of mmap is discardability.
When allocating memory with sbrk, if the memory is actually used (so that the kernel maps physical memory at some point), and then freed, the kernel itself can’t know about that, unless the allocator also reduces the program break (which it can’t if the freed block isn’t the topmost previously-used block under the program break). The result is that the contents of that physical memory become “precious” as far as the kernel is concerned; if it ever needs to re-purpose that physical memory, it then has to ensure that it doesn’t lose its contents. So it might end up swapping pages out (which is expensive) even though the owning process no longer cares about them.
When allocating memory with mmap, freeing the memory doesn’t just return the block to a pool somewhere; the corresponding virtual memory allocation is returned to the kernel, and that tells the kernel that any corresponding physical memory, dirty or otherwise, is no longer needed. The kernel can then re-purpose that physical memory without worrying about its contents.

the key part of the reason I think, which I copied from the chat said by Peter
free() is a user-space function, not a system call. It either hands them back to the OS with munmap or brk, or keeps them dirty in user-space. If it doesn't make a system call, the OS must preserve the contents of those pages as part of the process state.
So when you use brk to increase your memory adress, when return back, you have to use the brk a negtive value, so brk only can return the most recently memory block you allocated, when you call malloc(huge), malloc(small), free(huge). the huge cannot be returned back to system, you can only maintain a list of fragmentation for this process, so the huge is actually hold by this process. this is the drawback of brk.
but the mmap and munmap can avoid this.

I want to emphasize another view point.
malloc is system function that allocate memory.
You do not really need to debug it, because in some implementations, it might give you memory from static "arena" (e.g. static char array).
In some other implementations it may just return null pointer.
If you want to see what mallow really do, I suggest you look at
http://gee.cs.oswego.edu/dl/html/malloc.html
Linux gcc malloc is based on this.
You can take a look at jemalloc too. It basically uses same brk and mmap, but organizes the data differently and usually is "better".
Happy researching.

Does malloc without corresponding free always produce a memory leak?

Does malloc without corresponding free always produce a memory leak, or are there situations when it doesn't?

It depends on how you define "memory leak". If you define it as having any outstanding objects with allocated storage duration at the time of program exit, then yes, it's a leak. This is what tools like valgrind report. However, it's not a useful definition at all.
My definition of memory leak is roughly an unbounded increase in total memory consumption of the program over its lifetime despite having a bounded working set. For example, if I always have at most 10 tabs open in my browser, to the same 10 sites, but memory usage keeps increasing unboundedly, that's a memory leak. On the other hand, a program that allocates a buffer to load a whole file into memory, loads the file, prints it in reverse, then exits without freeing the memory does not have a memory leak.
One particular important case where a malloc without free is not only not-a-leak but absolutely necessary (for general code that can't make assumptions about the whole program it's running in) is any use of runtime-allocated constant tables whose generation is controlled by call_once. No matter how late you tried to free such tables, it would be possible for code (in another thread, or an atexit handler, etc.) to attempt to access it after free, and call_once type interfaces intentionally do not provide any way to synchronize any access except first call (this is how they avoid introducing unwanted acquire barriers/synchronization cost at every read).
Note that the concept of "working set" here is somewhat subjective and highly load-bearing. Often memory leaks are a matter of the software still considering something part of its working set when the user no longer considers it so.

A memory leak is a situation where a program allocates memory, does not free it when it is no longer in use, and loses track of its address (the value of the pointer returned by malloc, calloc or realloc).
Since the pointer is lost, the memory can no longer be freed and will stay attached to the program until it exits.
If the program exits, all memory associated with it is reclaimed by the operating system (except for rare circumstances beyond the scope of this question), so the memory leak has no consequences.
If the program executes for a long time, potentially until the system is shut down, unused blocks of memory attached to the program cannot be used for other purposes. If the amount of memory thus wasted is small, again no consequences are expected.
Conversely, if the program keeps allocating more memory and not free it, the system will run out of memory for the program to use and either return NULL for an allocation request or become unstable as it uses virtual memory to honor the requests at the expense of other programs and at the cost of lengthy swapping operations to storage devices or other compression techniques. At some point the system may kill processes at random to try and recover usable memory.
Such memory leaks are problematic and must be avoided. They are especially problematic in library functions that may be used in programs that run for extended sessions, such as web browsers, email readers, file managers, media players, program managers...
Unlike other programming languages, C does not have an embedded garbage collector that could determine which allocated blocks of memory are still in use, so it is the programmer's responsibility to keep track of all allocated blocks and free them as soon as possible. Advanced tools such as valgrind can be used to verify if all allocated blocks have been freed upon program exit. Although it is not necessary to free the memory at exit, it is good programming practice and a good way to determine if all allocated memory blocks have been accounted for.

The answer might depend on the implementation of malloc, but generally there are two cases where malloc is expected to not produce a memory leak:
when you pass it 0 as the size parameter, some implementations will just return NULL and not allocate anything, while others will return an unique pointer and even though this counts as zero bytes allocated you will leak about 64 bytes anyway in bookkeeping records.
When Out Of Memory happens. Check out the global variable errno to have specific values, normally ENOMEM to see if it failed. In such cases mallocreturns NULL as well.

The standard does not require a memory leak at all. So per se, there is no situation that will guarantee a memory leak.
On the other hand the standard does not require a memory leak to not happen in the scenario you mention either.
In most situations, all allocated memory will be freed when the program exits. But there may be exceptions on some systems, especially embedded ones. If this is vital to your program, you should not rely on it.

C, Xcode and memory

Why after executing next C-code Xcode shows 20KB more than it was?
void *p = malloc(sizeof(int)*1000);
free(p);
Do I have to free the memory another way? Or it's just an Xcode mistake?

When you say "Xcode shows 20KB more than it was", I presume you mean that the little bar graph goes up by 20kB.
When you malloc an object, the C library first checks the process's address space to see if there is enough free space to satisfy the request. If there isn't enough memory, it goes to the operating system to ask for more virtual memory to be allocated to the process. The graph in Xcode measures the amount of virtual memory the process has.
When you free an object, the memory is never returned to the operating system, rather, it is "just" placed on the list of free blocks for malloc to reuse. I put the word "just" in scare quotes because the actual algorithm can be quite complex, in order to minimise fragmentation of the heap and the time taken to malloc and free blocks. The reason memory is never returned to the operating system is that it is very expensive to do system calls to the OS to get and free memory.
Thus, you will never see the memory usage of the process go down. If you malloc a Gigabyte of memory and then free it, the process will still appear to be using a Gigabyte of virtual memory.
If you want to see if your program really leaks, you need to use the leaks profile tool. This intercepts malloc and free calls so it knows which blocks are still nominally in use and which have been freed.

Can I 'reserve' memory somehow, so as to ensure malloc doesn't fail for this reason

Is it possible to 'reserve' memory before a malloc() call? In other words, can I do something (perhaps OS-specific) which ensures there is a certain amount of free memory available, so that you know that your next malloc() (or realloc() etc.) call won't return NULL due to lack of memory?
The 'reservation' or 'pre-allocation' can fail just like a malloc, but if it succeeds, I want to be sure my next malloc() succeeds.
Notes:
Yes, I know, I want to allocate memory before allocating memory. That's exactly right. The thing is the later allocations are not really under my control and I want to be able to assume they succeed.
Bonus points for an answer regarding multi-threaded code as well.
My motivation: I was considering adopting the use of glib for my C development, but apparently it abort()s when it fails to allocate memory, and that's not acceptable to me.
Perhaps a solution which dynamically replaces the malloc symbol with something else? Or the symbol for the function wrapping the sbrk system call?

With glibc you can hook the allocation functions:
https://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html
Now that you control memory allocation in your program, you can do what you like, including writing a function to reserve a (possibly thread-local, since you asked about multi-threading) chunk of memory from the system that future calls to your malloc and realloc hooks will use to return memory.
Obviously, you need to somehow know in advance an upper bound how much memory will be required by the series of malloc calls that you need to not fail.

Back in the old Mac Toolbox days it was extremely common to use a chunk of memory called a "rainy-day fund." You'd allocate enough memory such that if you freed it, there'd be enough free memory to throw up a dialog box explaining that the app had run out of memory, save your work, and exit. Then you'd keep that pointer around until malloc() returned null, and at least you'd be guaranteed to be able to deal with it gracefully.
That was on a 100% real-memory system, though, and things these days are very different. Still, if we're talking about those small and simple real-memory systems that still exist, then a similar strategy still makes sense.

I realize the following does not directly answer your question with respect to malloc(). It is instead an attempt to offer up another avenue that might be applicable to your situation.
For a few years I was dealing with certified embedded systems. Two of the constraints were that 1) we were forbidden to free memory and 2) we were forbidden from allocating memory beyond a certain point during the initialization process. This was because fragmentation that could result from dynamic memory allocations and deallocations made it too costly to certify (and guarantee that allocations would succeed).
Our solution was to allocate pools of memory during the early initialization process. The blocks of memory handled by a given pool would all be the same size, thereby avoiding the fragmentation issue. Different pools would handle differently sized memory blocks for a different purpose. This meant that we had to allocate enough memory up front for our worst case memory consumption scenario as well as manage those pools ourselves.
Hope this helps.

Obviously there's no magic way for your program to ensure your system has an arbitrary amount of memory, but you can get the memory as soon as your process starts, so that it won't fail unexpectedly part way through the work/day when it'll be a right pain.
On some OSes, simply doing a big malloc then freeing the memory immediately will still have called sbrk or similar OS function to grow your process memory, but even that's not a great solution because getting virtual address space is still a ways short of getting physical memory to back it when needed, so you'd want to write through that memory with some noise values, then even if it's swapped out to disk while unused you can expect that the virtual memory of the system is committed to your memory needs and will instead deny other programs (or smaller new/malloc requests you make later ;-P) memory should the system be running short.
Another approach is to seek an OS specific function to insist on locking memory pages in physical memory, such as mlock(2) on Linux.
(These kind of "I'm the most important thing on the server" assumptions tend to make for a fragile system once a few running programs have all taken that attitude....)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight