Improve heap memory usage reporting in Windows application - c

In using the following code, I have observed that increases in memory usage are registered fairly quickly in the variable members of PROCESS_MEMORY_COUNTERS, but as memory is freed in a process that continues to run, it appears that decreasing memory usage does not seem to register
time_t GetMemUsage(void)
{
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS pmc;
DWORD processID = GetCurrentProcessId();
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
GetProcessMemoryInfo( hProcess, &pmc, sizeof(pmc));
CloseHandle(hProcess);
return pmc.PeakWorkingSetSize;
}
Using Task Manager, I can see almost immediately a physical memory size change (smaller) after terminating a process, but freeing memory as a process runs does not register. (Even after a substantial delay before calling the above function.)
I am not sure if my observations are due to the way free() works. (i.e. maybe it does not notify the OS memory is freed), or if Windows is just slow in registering it to the PROCESS_MEMORY_COUNTERS struct.
In any case, is anyone aware of a better technique to obtain more timely and accurate reports of actual memory usage within (or for) a running process?
I have also tried looking at pmc.WorkingSetSize and pmc.PagefileUsage with same results.

There are several things going on here.
Memory managers, tend to keep memory around after you've freed it. The theory is that you're likely to need more memory soon, and giving you back memory you've used recently is cheaper than getting more memory from the operating system. Thus, from the operating system's perspective, your program is still using memory you've freed.
Working set size is an indirect measure of the memory your program is using. The operating system determines the working set size based on memory demand from your program and based on memory pressure from everything else running on the system. As your program starts to use more memory, the operating system will (typically) quickly grow the process's working set to accommodate it. If your program releases a bunch of memory back to the operating system, that doesn't necessarily mean the OS will trim the process's working set size. In fact, if there's plenty of RAM available, it probably won't bother. But if lots of other processes need memory, and you start to reach the capacity of your RAM, then the OS will start trimming the working set of processes that have more than they currently need.
It looks like your code sample is reporting peak working set size rather than the current working set size. The peak tells you the largest working set size the process reached during its lifetime, even if the current working set is much smaller.

Related

Freeing memory after the program return EXIT_FAILURE and terminates [duplicate]

Let's say I have the following C code:
int main () {
int *p = malloc(10 * sizeof *p);
*p = 42;
return 0; //Exiting without freeing the allocated memory
}
When I compile and execute that C program, ie after allocating some space in memory, will that memory I allocated be still allocated (ie basically taking up space) after I exit the application and the process terminates?
It depends on the operating system. The majority of modern (and all major) operating systems will free memory not freed by the program when it ends.
Relying on this is bad practice and it is better to free it explicitly. The issue isn't just that your code looks bad. You may decide you want to integrate your small program into a larger, long running one. Then a while later you have to spend hours tracking down memory leaks.
Relying on a feature of an operating system also makes the code less portable.
In general, modern general-purpose operating systems do clean up after terminated processes. This is necessary because the alternative is for the system to lose resources over time and require rebooting due to programs which are poorly written or simply have rarely-occurring bugs that leak resources.
Having your program explicitly free its resources anyway can be good practice for various reasons, such as:
If you have additional resources that are not cleaned up by the OS on exit, such as temporary files or any kind of change to the state of an external resource, then you will need code to deal with all of those things on exit, and this is often elegantly combined with freeing memory.
If your program starts having a longer lifetime, then you will not want the only way to free memory to be to exit. For example, you might want to convert your program into a server (daemon) which keeps running while handling many requests for individual units of work, or your program might become a small part of a larger program.
However, here is a reason to skip freeing memory: efficient shutdown. For example, suppose your application contains a large cache in memory. If when it exits it goes through the entire cache structure and frees it one piece at a time, that serves no useful purpose and wastes resources. Especially, consider the case where the memory pages containing your cache have been swapped to disk by the operating system; by walking the structure and freeing it you're bringing all of those pages back into memory all at once, wasting significant time and energy for no actual benefit, and possibly even causing other programs on the system to get swapped out!
As a related example, there are high-performance servers that work by creating a process for each request, then having it exit when done; by this means they don't even have to track memory allocation, and never do any freeing or garbage collection at all, since everything just vanishes back into the operating system's free memory at the end of the process. (The same kind of thing can be done within a process using a custom memory allocator, but requires very careful programming; essentially making one's own notion of “lightweight processes” within the OS process.)
My apologies for posting so long after the last post to this thread.
One additional point. Not all programs make it to graceful exits. Crashes and ctrl-C's, etc. will cause a program to exit in uncontrolled ways. If your OS did not free your heap, clean up your stack, delete static variables, etc, you would eventually crash your system from memory leaks or worse.
Interesting aside to this, crashes/breaks in Ubuntu, and I suspect all other modern OSes, do have problems with "handled' resources. Sockets, files, devices, etc. can remain "open" when a program ends/crashes. It is also good practice to close anything with a "handle" or "descriptor" as part of your clean up prior to graceful exit.
I am currently developing a program that uses sockets heavily. When I get stuck in a hang I have to ctrl-c out of it, thus, stranding my sockets. I added a std::vector to collect a list of all opened sockets and a sigaction handler that catches sigint and sigterm. The handler walks the list and closes the sockets. I plan on making a similar cleanup routine for use before throw's that will lead to premature termination.
Anyone care to comment on this design?
What's happening here (in a modern OS), is that your program runs inside its own "process." This is an operating system entity that is endowed with its own address space, file descriptors, etc. Your malloc calls are allocating memory from the "heap", or unallocated memory pages that are assigned to your process.
When your program ends, as in this example, all of the resources assigned to your process are simply recycled/torn down by the operating system. In the case of memory, all of the memory pages that are assigned to you are simply marked as "free" and recycled for the use of other processes. Pages are a lower-level concept than what malloc handles-- as a result, the specifics of malloc/free are all simply washed away as the whole thing gets cleaned up.
It's the moral equivalent of, when you're done using your laptop and want to give it to a friend, you don't bother to individually delete each file. You just format the hard drive.
All this said, as all other answerers are noting, relying on this is not good practice:
You should always be programming to take care of resources, and in C that means memory as well. You might end up embedding your code in a library, or it might end up running much longer than you expect.
Some OSs (older ones and maybe some modern embedded ones) may not maintain such hard process boundaries, and your allocations might affect others' address spaces.
Yes. The OS cleans up resources. Well ... old versions of NetWare didn't.
Edit: As San Jacinto pointed out, there are certainly systems (aside from NetWare) that do not do that. Even in throw-away programs, I try to make a habit of freeing all resources just to keep up the habit.
Yes, the operating system releases all memory when the process ends.
It depends, operating systems will usually clean it up for you, but if you're working on for instance embedded software then it might not be released.
Just make sure you free it, it can save you a lot of time later when you might want to integrate it in to a large project.
That really depends on the operating system, but for all operating systems you'll ever encounter, the memory allocation will disappear when the process exits.
I think direct freeing is best. Undefined behaviour is the worst thing, so if you have access while it's still defined in your process, do it, there are lots of good reasons people have given for it.
As to where, or whether, I found that in W98, the real question was 'when' (I didn't see a post emphasising this). A small template program (for MIDI SysEx input, using various malloc'd spaces) would free memory in the WM_DESTROY bit of the WndProc, but when I transplanted this to a larger program it crashed on exit. I assumed this meant I was trying to free what the OS had already freed during a larger cleanup. If I did it on WM_CLOSE, then called DestroyWindow(), it all worked fine, instant clean exit.
While this isn't exactly the same as MIDI buffers, there is similarity in that it is best to keep the process intact, clean up fully, then exit. With modest memory chunks this is very fast. I found that many small buffers worked faster in operation and cleanup than fewer large ones.
Exceptions may exist, as someone said when avoiding hauling large memory chunks back out of a swap file on disk, but even that may be minimised by keeping more, and smaller, allocated spaces.

Why do i need to free dynamic memory manually? [duplicate]

Let's say I have the following C code:
int main () {
int *p = malloc(10 * sizeof *p);
*p = 42;
return 0; //Exiting without freeing the allocated memory
}
When I compile and execute that C program, ie after allocating some space in memory, will that memory I allocated be still allocated (ie basically taking up space) after I exit the application and the process terminates?
It depends on the operating system. The majority of modern (and all major) operating systems will free memory not freed by the program when it ends.
Relying on this is bad practice and it is better to free it explicitly. The issue isn't just that your code looks bad. You may decide you want to integrate your small program into a larger, long running one. Then a while later you have to spend hours tracking down memory leaks.
Relying on a feature of an operating system also makes the code less portable.
In general, modern general-purpose operating systems do clean up after terminated processes. This is necessary because the alternative is for the system to lose resources over time and require rebooting due to programs which are poorly written or simply have rarely-occurring bugs that leak resources.
Having your program explicitly free its resources anyway can be good practice for various reasons, such as:
If you have additional resources that are not cleaned up by the OS on exit, such as temporary files or any kind of change to the state of an external resource, then you will need code to deal with all of those things on exit, and this is often elegantly combined with freeing memory.
If your program starts having a longer lifetime, then you will not want the only way to free memory to be to exit. For example, you might want to convert your program into a server (daemon) which keeps running while handling many requests for individual units of work, or your program might become a small part of a larger program.
However, here is a reason to skip freeing memory: efficient shutdown. For example, suppose your application contains a large cache in memory. If when it exits it goes through the entire cache structure and frees it one piece at a time, that serves no useful purpose and wastes resources. Especially, consider the case where the memory pages containing your cache have been swapped to disk by the operating system; by walking the structure and freeing it you're bringing all of those pages back into memory all at once, wasting significant time and energy for no actual benefit, and possibly even causing other programs on the system to get swapped out!
As a related example, there are high-performance servers that work by creating a process for each request, then having it exit when done; by this means they don't even have to track memory allocation, and never do any freeing or garbage collection at all, since everything just vanishes back into the operating system's free memory at the end of the process. (The same kind of thing can be done within a process using a custom memory allocator, but requires very careful programming; essentially making one's own notion of “lightweight processes” within the OS process.)
My apologies for posting so long after the last post to this thread.
One additional point. Not all programs make it to graceful exits. Crashes and ctrl-C's, etc. will cause a program to exit in uncontrolled ways. If your OS did not free your heap, clean up your stack, delete static variables, etc, you would eventually crash your system from memory leaks or worse.
Interesting aside to this, crashes/breaks in Ubuntu, and I suspect all other modern OSes, do have problems with "handled' resources. Sockets, files, devices, etc. can remain "open" when a program ends/crashes. It is also good practice to close anything with a "handle" or "descriptor" as part of your clean up prior to graceful exit.
I am currently developing a program that uses sockets heavily. When I get stuck in a hang I have to ctrl-c out of it, thus, stranding my sockets. I added a std::vector to collect a list of all opened sockets and a sigaction handler that catches sigint and sigterm. The handler walks the list and closes the sockets. I plan on making a similar cleanup routine for use before throw's that will lead to premature termination.
Anyone care to comment on this design?
What's happening here (in a modern OS), is that your program runs inside its own "process." This is an operating system entity that is endowed with its own address space, file descriptors, etc. Your malloc calls are allocating memory from the "heap", or unallocated memory pages that are assigned to your process.
When your program ends, as in this example, all of the resources assigned to your process are simply recycled/torn down by the operating system. In the case of memory, all of the memory pages that are assigned to you are simply marked as "free" and recycled for the use of other processes. Pages are a lower-level concept than what malloc handles-- as a result, the specifics of malloc/free are all simply washed away as the whole thing gets cleaned up.
It's the moral equivalent of, when you're done using your laptop and want to give it to a friend, you don't bother to individually delete each file. You just format the hard drive.
All this said, as all other answerers are noting, relying on this is not good practice:
You should always be programming to take care of resources, and in C that means memory as well. You might end up embedding your code in a library, or it might end up running much longer than you expect.
Some OSs (older ones and maybe some modern embedded ones) may not maintain such hard process boundaries, and your allocations might affect others' address spaces.
Yes. The OS cleans up resources. Well ... old versions of NetWare didn't.
Edit: As San Jacinto pointed out, there are certainly systems (aside from NetWare) that do not do that. Even in throw-away programs, I try to make a habit of freeing all resources just to keep up the habit.
Yes, the operating system releases all memory when the process ends.
It depends, operating systems will usually clean it up for you, but if you're working on for instance embedded software then it might not be released.
Just make sure you free it, it can save you a lot of time later when you might want to integrate it in to a large project.
That really depends on the operating system, but for all operating systems you'll ever encounter, the memory allocation will disappear when the process exits.
I think direct freeing is best. Undefined behaviour is the worst thing, so if you have access while it's still defined in your process, do it, there are lots of good reasons people have given for it.
As to where, or whether, I found that in W98, the real question was 'when' (I didn't see a post emphasising this). A small template program (for MIDI SysEx input, using various malloc'd spaces) would free memory in the WM_DESTROY bit of the WndProc, but when I transplanted this to a larger program it crashed on exit. I assumed this meant I was trying to free what the OS had already freed during a larger cleanup. If I did it on WM_CLOSE, then called DestroyWindow(), it all worked fine, instant clean exit.
While this isn't exactly the same as MIDI buffers, there is similarity in that it is best to keep the process intact, clean up fully, then exit. With modest memory chunks this is very fast. I found that many small buffers worked faster in operation and cleanup than fewer large ones.
Exceptions may exist, as someone said when avoiding hauling large memory chunks back out of a swap file on disk, but even that may be minimised by keeping more, and smaller, allocated spaces.

heap overflow affecting other programs

I was trying to create the condition for malloc to return a NULL pointer. In the below program, though I can see malloc returning NULL, once the program is forcebly terminated, I see that all other programs are becoming slow and finally I had to reboot the system. So my question is whether the memory for heap is shared with other programs? If not, other programs should not have affected. Is OS is not allocating certain amount of memory at the time of execution? I am using windows 10, Mingw.
#include <stdio.h>
#include <malloc.h>
void mallocInFunction(void)
{
int *ptr=malloc(500);
if(ptr==NULL)
{
printf("Memory Could not be allocated\n");
}
else
{
printf("Allocated memory successfully\n");
}
}
int main (void)
{
while(1)
{
mallocInFunction();
}
return(0);
}
So my question is whether the memory for heap is shared with other programs?
Physical memory (RAM) is a resource that is shared by all processes. The operating system makes decisions about how much RAM to allocate to each process and adjusts that over time.
If not, other programs should not have affected. Is OS is not allocating certain amount of memory at the time of execution?
At the time the program starts executing, the operating system has no idea how much memory the program will want or need. Instead, it deals with allocations as they happen. Unless configured otherwise, it will typically do everything it possibly can to allow the program's allocation to succeed because presumably there's a reason the program is doing what it's doing and the operating system won't try to second guess it.
... whether the memory for heap is shared with other programs?
Well, the C standard doesn't exactly require a heap, but in the context of a task-switching, multi-user and multi-threaded OS, of course memory is shared between processes! The C standard doesn't require any of this, but this is all pretty common stuff:
CPU cache memory tends to be preferred for code that's executed often, though this might get swapped around quite a bit; that may or may not be swapped to a heap.
Task switching causes registers to be swapped to other forms of memory; that may or may not be swapped to a heap.
Entire pages are swapped to and from disk, so that other programs can make use of them when your OS switches execution away from your program and to the other programs, and when it's your programs turn to execute again among other reasons. This may or may not involve manipulating the heap.
FWIW, you're referring to memory that has allocated storage duration. It's best to avoid using terms like heap and stack, as they're virtually meaningless. The memory you're referring to is on a silicon chip, regardless of whether it uses a heap or a stack.
... Is OS is not allocating certain amount of memory at the time of execution?
Speaking of silicon chips and execution, your OS likely only has control of one processor (a silicon chip which contains some logic circuits and memory, among other things I'm sure) with which to execute many programs! To summarise this post, yes, your program is most likely sharing those silicon chips with other programs!
On a tangential note, I don't think heap overflow means what you think it means.
Your question cannot be answered in the context of C, the language. For C, there's no such thing as a heap, a process, ...
But it can be answered in the context of operating systems. Even a bit generically because many modern multitasking OSes do similar things.
Given a modern multitasking OS, it will use virtual address spaces for each process. The OS manages a fixed size of physical RAM and divides this into pages, when a process needs memory, such pages are mapped into the process' virtual address space (typically using a different virtual address than the physical one). So when all memory pages are claimed by the OS itself and by the processes running, the OS will typically save some of these pages that are not in active use to disk, in a swap area, in order to serve this page as a fresh page to the next process requesting one. But when the original page is touched (and this is typically the case with free(), see below), it must first be loaded from disk again, but to have a free page for this, another page must be saved to swap space.
This is, like all disk I/O, slow, and it's probably what you see happening here.
Now to fully understand this: what does malloc() do? It typically requests from the operating system to have the memory of the own process increased (and if necessary, the OS does this by mapping another page), and it uses this new memory by writing some information there about the block of memory requested (so free() can work correctly later) and ultimately returns a pointer to a block that's free to use for the program. free() uses the information written by malloc(), modifies it to indicate this block is free again, and it typically can't give any memory back to the OS because there are other malloc()d blocks in the same page. It will give memory back when possible, but that's the exception in a typical scenario where dynamic allocations are heavily used.
So, the answer to your question is: Yes, the RAM is shared because there is only one set of physical RAM. The OS does the best it can to hide that fact and virtualize RAM, but if a process consumes all that is there, this will have visible effects.
malloc() is not system call but libc library function. So when a program ask for allocating memory via malloc(), system call brk()/sbrk() OR mmap() to allocated page(s), more details here.
Please keep in mind that the memory you get is all virtual in nature, that means if you have 3GB of physical RAM you can actually allocate almost infinite memory. So how does this happens? This happens via concept called 'paging', where system stores and retrieves data from secondary memory storage(HDD/SDD) to main memory(RAM), more details here.
So with this theory, out of memory usually quite rare but program like above which is checking system limits, this can happen. This is nicely explained here.
Now, why other programs are sort of hanged OR slow? Because they all share the same operating system and system is starving for resource. In fact at a point the system will crash and reboot again.
Hope this helps?

Yet another Memory Leak Issue (memory is still gone when program terminates)- C program on SLES

I run my C program on Suse Linux Enterprise that compresses several thousand large files (between 10MB and 100MB in size), and the program gets slower and slower as the program runs (it's running multi-threaded with 32 threads on a Intel Sandy Bridge board). When the program completes, and it's run again, it's still very slow.
When I watch the program running, I see that the memory is being depleted while the program runs, which you would think is just a classic memory leak problem. But, with a normal malloc()/free() mismatch, I would expect all the memory to return when the program terminates. But, most of the memory doesn't get reclaimed when the program completes. The free or top command shows Mem: 63996M total, 63724M used, 272M free when the program is slowed down to a halt, but, after the termination, the free memory only grows back to about 3660M. When the program is rerun, the free memory is quickly used up.
The top program only shows that the program, while running, is using at most 4% or so of the memory.
I thought that it might be a memory fragmentation problem, but, I built a small test program that simulates all the memory allocation activity in the program (many randomized aspects were built in - size/quantity), and it always returns all the memory upon completion. So, I don't think that's it.
Questions:
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
No, malloc and free, and even mmap are harmless in this respect, and when the process terminates the OS (SUSE Linux in this case) claims all their memory back (unless it's shared with some other process that's still running).
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Like malloc/free and mmap, files opened by the process are automatically closed by the OS.
There are a few things which cause permanent memory leaks like big pages but you would certainly know about it if you were using them. Apart from that, no.
However, if you define memory loss as memory not marked 'free' immediately, then a couple of things can happen.
Writes to disk or mmap may be cached for a while in RAM. The OS must keep the pages around until it synchs them back to disk.
Files READ by the process may remain in memory if the OS has nothing else to use that RAM for right now - on the reasonable assumption that it might need them soon and it's quicker to read the copy that's already in RAM. Again, if the OS or another process needs some of that RAM, it can be discarded instantly.
Note that as someone who paid for all my RAM, I would rather the OS used ALL of it ALL the time, if it helps in even the smallest way. Free RAM is wasted RAM.
The main problem with having little free RAM is when it is overcommitted, which is to say there are more processes (and the OS) asking for or using RAM right now than is available on the system. It sounds like you are using about 4Gb of RAM in your processes, which might be a problem - (and remember the OS needs a good chunk too. But it sounds like you have plenty of RAM! Try running half the number of processes and see if it gets better.
Sometimes a memory leak can cause temporary overcommitment - it's a good idea to look into that. Try plotting the memory use of your program over time - if it rises continuously, then it may well be a leak.
Note that forking a process creates a copy that shares the memory the original allocated - until both are closed or one of them 'exec's. But you aren't doing that.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
Yes, top and ps are perfectly reasonable ways to look at memory, in particular observe the RES field. Ignore the VIRT field for now. In addition:
To see what the whole system is doing with memory, run:
vmstat 10
While your program is running and for a while after. Look at what happens to the ---memory--- columns.
In addition, after your process has finished, run
cat /proc/meminfo
And post the results in your question.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Probably, but it can be extremely slow, which might be impractical in this case. There are plenty of other tools which can help such as electricfence and others which do not slw your program down noticeably. I've even rolled my own in the past.
malloc()/free() work on the heap. This memory is guaranteed to be released to the OS when the process terminates. It is possible to leak memory even after the allocating process terminates using certain shared memory primitives (e.g. System V IPC). However, I don't think any of this is directly relevant.
Stepping back a bit, here's output from a lightly-loaded Linux server:
$ uptime
03:30:56 up 72 days, 8:42, 2 users, load average: 0.06, 0.17, 0.27
$ free -m
total used free shared buffers cached
Mem: 24104 23452 652 0 15821 978
-/+ buffers/cache: 6651 17453
Swap: 3811 5 3806
Oh no, only 652 MB free! Right? Wrong.
Whenever Linux accesses a block device (say, a hard drive), it looks for any unused memory, and stores a copy of the data there. After all, why not? The data's already in RAM, some program clearly wanted that data, and RAM that's unused can't do anyone any good. If a program comes along and asks for more memory, the cached data is discarded to make room -- until then, might as well hang onto it.
The key to this free output is not the first line, but the second. Yes, 23.4 GB of RAM is being used -- but 17.4 GB is available for programs that want it. See Help! Linux ate my RAM! for more.
I can't say why the program is getting slower, but having the "free memory" metric steadily drop down to nothing is entirely normal and not the cause.
The operating system only makes as much memory free as it absolutely needs. Making memory free is wasted effort if the memory is later used normally -- it's more efficient to just directly transition the memory from one use to another than to make the memory free just to have to make it unfree later.
The only thing the system needs free memory for is operations that require memory that can't switch used memory from one purpose to another. This is a very small set of unusual operations such as servicing network interrupts.
If you type this command sysctl vm.min_free_kbytes, the system will tell you the number of KB it needs free. It's likely less than 100MB. So having any amount more than that free is perfectly fine.
If you want more of your memory free, remove it from the computer. Otherwise, the operating system assumes that there is zero cost to using it, and thus zero benefit to making it free.
For example, consider the data you wrote to disk. The operating system could make the memory that was holding that data free. But that's a double loss. If the data you wrote to disk is later read, it will have to read it from disk rather than just grabbing it from memory. And if that memory is later needed for some other purpose, it will just have to undo all the work it went through making it free. Yuck. So if the system doesn't absolutely need free memory, it won't make it free.
My guess would be the problem is not in your program, but in the operating system. The OS keeps a cache of recently used files in memory on the assumption that you are going to access them again. It does not know with certainty what files are going to be needed, so it can end up deciding to keep the wrong ones at the expense of the ones you wish it was keeping.
It may be keeping the output files of the first run cached when you do your second run, which prevents it from effectively using the cache on the second run. You can test this theory by deleting all files from the first run (which should free them from cache) and seeing if that makes the second run go faster.
If that doesn't work, try deleting all the input files for the first run as well.
Answers
Yes there is no requirement in C or C++ to release memory that is not freed back to the OS
Do you have memory mapped files, open file handles for deleted files etc. Linux will not delete a file until all references to is a deallocated. Also linux will cache the file in memory in case it needs to be read again - file cache memory usage can be ignored as the OS will deal with it
No
Maybe valgrind will highlight cases where memory is not

Why does my c program not free memory as it should?

I have made a program in c and wanted to see, how much memory it uses and noticed, that the memory usage grows while normally using it (at launch time it uses about 250k and now it's at 1.5mb). afaik, I freed all the unused memory and after some time hours, the app uses less memory. Could it be possible, that the freed memory just goes from the 'active' memory to the 'wired' or something, so it's released when free space is needed?
btw. my machine runs on mac os x, if this is important.
How do you determine the memory usage? Have you tried using valgrind to locate potential memory leaks? It's really easy. Just start your application with valgrind, run it, and look at the well-structured output.
If you're looking at the memory usage from the OS, you are likely to see this behavior. Freed memory is not automatically returned to the OS, but normally stays with the process, and can be malloced later. What you see is usually the high-water mark of memory use.
As Konrad Rudolph suggested, use something that examines the memory from inside the process to look for memory links.
The C library does not usually return "small" allocations to the OS. Instead it keeps the memory around for the next time you use malloc.
However, many C libraries will release large blocks, so you could try doing a malloc of several megabytes and then freeing it.
On OSX you should be able to use MallocDebug.app if you have installed the Developer Tools from OSX (as you might have trouble finding a port of valgrind for OSX).
/Developer/Applications/PerformanceTools/MallocDebug.app
I agree with what everyone has already said, but I do want to add just a few clarifying remarks specific to os x:
First, the operating system actually allocates memory using vm_allocate which allocates entire pages at a time. Because there is a cost associated with this, like others have stated, the C library does not just deallocate the page when you return memory via free(3). Specifically, if there are other allocations within the memory page, it will not be released. Currently memory pages are 4096 bytes in mac os x. The number of bytes in a page can be determined programatically with sysctl(2) or, more easily, with getpagesize(2). You can use this information to optimize your memory usage.
Secondly, user-space applications do not wire memory. Generally the kernel wires memory for critical data structures. Wired memory is basically memory that can never be swapped out and will never generate a page fault. If, for some reason, a page fault is generated in a wired memory page, the kernel will panic and your computer will crash. If your application is increasing your computer's wired memory by a noticeable amount, it is a very bad sign. It generally means that your application is doing something that significantly grows kernel data structures, like allocating and not reaping hundreds of threads of child processes. (of course, this is a general statement... in some cases, this growth is expected, like when developing a virtual host or something like that).
In addition to what the others have already written:
malloc() allocates bigger chunks from the OS and spits it out in smaller pieces as you malloc() it. When free()ing, the piece first goes into a free-list, for quick reuse by another malloc if the size fits. It may at this time be merged with another free item, to form bigger free blocks, to avoid fragmentation (a whole bunch of different algorithms exist there, from freeLists to binary-sized-fragments to hashing and what not else).
When freed pieces arrive so that multiple fragments can be joined, free() usually does this, but sometimes, fragments remain, depending on size and orderof malloc() and free(). Also, only when a big such free block has been created will it be (sometimes) returned to the OS as a block. But usually, malloc() keeps things in its pocket, dependig on the allocated/free ratio (many heuristics and sometimes compile or flag options are often available).
Notice, that there is not ONE malloc/free algotrithm. There is a whole bunch of different implementations (and literature). Highly system, OS and library dependent.

Resources