Checking for memory leaks in a running program - c

I have a question out of curiosity relating to checking for memory leaks.
Being someone who has used valgrind frequently to check for memory leaks in my code for the last year or two, I suddenly came to think that it only detects lost/unfreed memory after the life of the program.
So, in light of that, I was thinking that if you have a long-running program which malloc()'s intermittently and doesn't free() until the application exits, then the potential to eat memory (not necessarily through leaks) is huge and isn't observable using these tools because they only check after the programs lifetime. Are there GDB-like tools which can stop an application while running and check for memory which is and isn't referenced at an instance in the life of the application?

Are there GDB-like tools which can stop an application while running and check for memory which is and isn't referenced at an instance in the life of the application?
Yes: Valgrind.
Specifically, the SVN version of Valgrind has a gdbserver stub embedded into it.
This allows you to do all kinds of cool debugging, not possible before:
You can run program under valgrind and have GDB breakpoints at the same time
You can ask valgrind: is this memory allocated? was this variable initialized?
You can ask valgrind: what new leaks happened since last time I asked for leaks?
I think you can also ask it to list not-leaked new allocations.

What I have done, which was not a tool, for a long-running socket-based server, was to do an operation, but before that print out the amount of free memory, then print it out after my operation, and see if there was any difference.
I knew that in theory my server should have returned all memory used on each call to the server, so if I was the only one calling it, it shouldn't use much more memory than when it started.
You may find that some memory was needed on the first call, so you may want to make several calls, so everything is initialized, then you can do checks like this.
The other option is to create a list of all memory you are mallocing, then when you free it delete from the list that node, and at the end, see which ones still haven't been freed.

That is not generally possible in a language that supports pointer arithmetic, since for example - you could cast an pointer to an integer and back. See http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/pointer.html

Leaked memory is defined as memory that is not referenced by anything in the program.
If you malloced memory and somewhere in your data there is a pointer pointing to that memory it isn't "lost" as far as any automatic check can now.
However, if you allocated memory, never free'ed it but you don't have any pointer pointing to it you have most likely leaked that memory - as there is no way for you to reference it.
Programs like valgrind can find leaks of the kind described above (lost of reference). AFAIK nothing can find "logical" leaks where you still hold a reference to the memory.

Related

malloc causes the application to crash and show memory map [duplicate]

I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now...I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I'm going to have to assume that you mean SIGSEGV (segmentation fault) is firing in malloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you call malloc.
malloc stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next time malloc is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out.
Note that free can have the same thing happen, if the block being released or the free block list is messed up.
How you catch this kind of error depends entirely on how you access the memory that malloc returns. A malloc of a single struct usually isn't a problem; it's malloc of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.
So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data.
Try Asan
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks
Please find the links to know more and how to use it
https://github.com/google/sanitizers/wiki/AddressSanitizer and
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
I know this is old, but issues like this will continue to exist as long as we have pointers. Although valgrind is the best tool for this purpose, it has a steep learning curve and often the results are too intimidating to understand.
Assuming you are working on some *nux, another tool I can suggest is electricfence. Quote:
Electric Fence helps you detect two common programming bugs:
software that overruns the boundaries of a malloc() memory allocation,
software that touches a memory allocation that has been released by free().
Unlike other malloc() debuggers, Electric Fence will detect read accesses
as well as writes, and it will pinpoint the exact instruction that causes
an error.
Usage is amazingly simple. Just link your code with an additional library lefence
When you run the application, a corefile will be generated when memory is corrupted, instead of when corrupted memory is used.

Free() before return 0;

What happen if I end the execution by passing return 0; after using a malloc and without freeing the part of memory allocated?
int * var;
var = (int *)malloc(sizeof(int)) ;
free(var) ;
return 0;
The program contains a memory leak, as explained here. To answer the question specifically, the effect of the memory leak depends on the environment; in the best case, nothing happens, in the worst case, the machine might crash sooner or later. The existence of the memory leak is to be regarded as a bug in any case.
It is implementation specific.
On most operating systems (notably desktop or server OSes, e.g. Linux, MacOSX, Windows...) once a process is terminated, every used resources is released, and that includes its virtual address space. Hence even unfreed heap memory is released.
In particular, if you code a quickly running program (e.g. you know that it always will run in less than few seconds), it might be easier to accept some memory leak (but if you do, please comment that in your code and/or documentation). And many real life programs are doing that (in particular the GCC compiler, and probably several Unix shells).
On the contrary, if you are coding a server (e.g. a database or compute server), you should avoid any memory leaks which would make the server's process RSS grow indefinitely (till some crash). You usually should take great care that every allocated heap memory to deal with one request gets free-d after replying to that request.
On some embedded operating systems, if you don't release all the resources explicitly (free all heap allocated memory, fclose all opened streams) and properly, you have a resource leak.
See also this related question. On many OSes (including Linux) you can use valgrind to hunt memory leak bugs. With a recent gcc you might use debugging options like -g -fsanitize=address
Read also (at least for the concepts and the terminology) something about garbage collection and about fragmentation. If programming in C, you might consider using Boehm's garbage collector.
There is however a very good practical reason to systematically free all previously malloc-ed memory: it is a good programming discipline, and it helps a lot using tools like valgrind which helps debugging genuine memory bugs. Also it makes your code more clean, and you could reuse it in some different contexts (e.g. as part of some library, usable in long-running processes).
If you don't free the memory, your application will have a memory leak. So if you don't free the memory and leave the application on for a couple of days, it will start to slow down, and eventually crash.

Debugging memory leak issues without any tool

Interviewer - If you have no tools to check how would you detect memory leak problems?
Answer - I will read the code and see if all the memory I have allocated has been freed by me in the code itself.
Interviewer wasn't satisfied. Is there any other way to do so?
For all the implementation defined below, one needs to write wrappers for malloc() & free() functions.
To keep things simple, keep track of count of malloc() & free(). If not equal then you have a memory leak.
A better version would be to keep track of the addresses malloc()'ed & free()'ed this way you can identify which addresses are malloc()'ed but not free()'ed. But this again, won't help much either, since you can't relate the addresses to source code, especially it becomes a challenge when you have a large source code.
So here, you can add one more feature to it. For eg, I wrote a similar tool for FreeBSD Kernel, you can modify the malloc() call to store the module/file information (give each module/file a no, you can #define it in some header), the stack trace of the function calls leading to this malloc() and store it in a data structure, along side the above information whenever a malloc() or free() is called. Use addresses returned by malloc() to match with it free(). So, when their's a memory leak, you have information about what addresses were not free()'ed in which file, what were the exact functions called (through the stack trace) to pin point it.
The way, this tool worked was, on a crash, I used to get a core-dump. I had defined globals (this data structure where I was collecting data) in kernel memory space, which I could access using gdb and retrieve the information.
Edit:
Recently while debugging a memeory leak in linux kernel, I came across this tool called kmemleak which implements a similar algorithm I described in point#3 above. Read under the Basic Algorithm section here: https://www.kernel.org/doc/Documentation/kmemleak.txt
My response when I had to do this for real was to build tools... a debugging heap layer, wrapped around the C heap, and macros to switch code to running against those calls rather than accessing the normal heap library directly. That layer included some fencepost logic to detect array bounds violations, some instrumentation to monitor what the heap was doing, optionally some recordkeeping of exactly who allocated and freed each block...
Another approach, of course, is "divide and conquer". Build unit tests to try to narrow down which operations are causing the leak, then to subdivide that code further.
Depending on what "no tools" means, core dumps are also sometimes useful; seeing the content of the heap may tell you what's being leaked, for example.
And so on....

Does every malloc call have to be freed

From what I understand because malloc dynamically assigns mem , you need to free that mem so that it can be used again.
What happens if you return a char* that was created using malloc (i.e. how are you supposed to free that)
If you leave the pointer as it is
and exit the application will it be
freed.(I cant find a definite answer on this , some say yes , some say no).
The caller has to free it (or arrange for it to be freed). This means that functions that create and return resources need to document exactly how it should be freed.
Most OSes will free the memory when the program exits, as part of the definition of a "process". The C standard doesn't care what happens, it's beyond the scope of the program. Not all OSes have a full process abstraction, but desktop-style OSes certainly do.
The main reasons to free it before that are:
If you free memory as soon as possible, often a long time before process exit, your program uses less memory total.
If you don't free it, and you later want to change your program into a routine within another program, that perhaps is called many times, then suddenly you require many times as much memory as before (memory leak).
There are debugging tools that will help you identify memory leaks, by warning you about memory that is still allocated when the program exits. These don't really help much if there's a lot of deliberately-leaked junk to wade through.
If you don't free it and you hit any problems, it's much harder to go back later and find all the memory that needs freeing, than it is to do it right in the first place.
There are so many cases where you do need to free the memory (to prevent huge memory use in long-running programs), that your default strategy must be to clean pretty much everything up anyway.
The vaguely plausible reasons not to free are:
Less code.
If you have squillions of blocks to free individually, immediately before program exit, then it might be much faster to let the OS drop the whole process.
Stuff which is created on demand and stored in globals might be quite difficult to clean up safely, if you don't know exactly where it's used. Think of some kind of cache that's populated as you go along, that might have MRU rules to limit how much memory it occupies, so it's not an unlimited leak. OK, so this is one bad thing (unrestricted globals) causing another bad thing (unfreed memory), but it's worth knowing about as a reason why you might see unfreed blocks in existing code, and you can't necessarily just go in and fix them.
The reasons for freeing almost always outweigh the reasons against.
If you have a pointer to memory created by malloc, freeing that memory, using that pointer, will do the right thing. Yes, there is some magic involved; this will be taken care of by your compiler.
Yes, if you ignore the memory freeing, and exit the application, the OS will release the memory. However, it's considered bad practice to leave it unfreed. The OS may not do the right thing (especially in embedded settings), or may not do it in a timely fashion. Also, if you're running your program continuously, you may end up consuming a growing amount of memory, eventually consuming it all, and running out of memory and crashing.
Yes. If you malloc, you need to free. You are guaranteeing memory leaks while your program is running if you don't free.
Free it.
Always.
Period.
Yes, every call to malloc() has to be matched with a call to free().
To answer your specific questions:
You have to explicitly document your API telling the user whether the returned pointer has to be free()'d
The OS will free all memory allocated to the process.
If you write the function yourself: Avoid doing that.
Instead, let the caller pass a buffer, let the caller specify the buffer's size and copy the data into that buffer. That way, you can use your function from other modules that don't use the same heap (other programming languages, different C runtime...)
If you for whatever reason can not use such an interface, specify in the function's documentation that the caller has to free the returned pointer after it is done with it.
If you are using a library function: Have a look at the documentation.
If the documentation states that you have to free, do so.
If the documentation states that you don't have to, it might be some global cleanup function that has to be called to free the module's resources.
Regarding your second question, freeing before exiting is recommended. Technically it wont hurt, but when you ever want to reuse your code in a bigger project, you will be thankful that you wrote the correct cleanup in the first place.
The C standard has no concept of the system environment outside of a single program's execution, so it cannot specify what happens "after the program exits". At the same time, nowhere does it make any requirement that memory obtained with malloc should or must be released with free before a call to exit or a return from main, and I think it's pretty clear that the intention is that exiting without manually freeing memory will not leave resources tied up - much like how calling exit without closing all files first automatically closes them (including flushing them).
Now, as for whether you should or should not call free, that depends a lot on your particular program.
Any library code should free any memory that it obtained purely for internal use as soon as possible.
A library which returns allocated objects to the calling program should always provide a corresponding call to free those objects.
A library which performs any allocations as part of a global initialization (note: this is a very bad design, but sometimes inevitable) should provide a way for the application to reverse that initialization and free everything that was allocated. This is especially important if the library might ever be loaded dynamically (even as a consequence of satisfying another dynamically-loaded library's dependencies).
So far I've only talked about library code. At this point, all that's left is allocations made by the application itself or on the application's behalf by libraries. My view, and I will admit that it is unorthodox, is that freeing such objects is not just unnecessary but harmful. The main reason I say this is that most long-lived applications will have accumulated quite a bit of allocated memory which they are not making significant use of (think of the undo buffer in a word processor or the history in a browser). On a moderately loaded system, much of this data has been swapped to disk by the time the application terminates. If you want to free it, you're going to end up walking all over swapped-out memory addresses tracking down all the pointers to free,
putting useless wear on the physical components of the hard drive
making the user wait for your application to exit
causing other still-in-use applications' data to get swapped out, making them run slower
All of this in the name of a ridiculous "you must free everything you allocate" rule.
For short-lived applications, it's not so much of a big deal, but you can often simplify the implementation of short-lived applications that perform a single linear task and exit if you don't bother freeing all the memory they allocate. Think of most unix command line utilities. Is there any use to writing the loops for sed to free all its compiled regular expressions before exiting? Couldn't programmers' time be spent on something more productive?
1) The same way you'd free the memory normally, i.e.
p = func();
//...
free(p);
The trick is in making sure that you always do it...
2) Generally speaking, yes. But you should still free any memory you use as good practice. Not spending the time to figure out where to free the memory is just being lazy.
Let's take those one point at a time...
If you return a char * that you know was created with malloc, then yes, it is your responsibility to free that. You can do that with free(myCharPtr).
The OS will claim the memory back, and it won't be lost forever, but there's technically no guarantee that it will be reclaimed right when the application dies. That just depends on the operating system.
I wouldn't go so far as to say every malloc must be freed, but I would say that, no matter how long a program runs, there must be a bounded number of allocations (and total size) that won't be freed. The number need not be a static constant, but it must be specifiable in terms of something else (e.g. this program processes widgets; it will allocate one 64-byte struct for each quizzix in the largest widget). One may not know beforehand the size of the largest widget, but if e.g. one knows that the temporary storage required to process a widget is proportional to the square of its size, one might safely infer that the largest widget will be small enough that the total amount of memory stranded will be pretty slight.

memory leak debug

What are some techniques in detecting/debugging memory leak if you don't have trace tools?
Intercept all functions that allocate and deallocate memory (depending on the platform, the list may look like: malloc, calloc, realloc, strdup, getcwd, free), and in addition to performing what these functions originally do, save information about the calls somewhere, in a dynamically growing global array probably, protected by synchronization primitives for multithreaded programs.
This information may include function name, amount of memory requested, address of the successfully allocated block, stack trace that lets you figure out what the caller was, and so on. In free(), remove corresponding element from the array (if there are none, a wrong pointer is passed to free which is also a error that's good to be detected early). When the program ends, dump the remaining elements of the array - they will be the blocks that leaked. Don't forget about global objects that allocate and deallocate resources before and after main(), respectively. To properly count those resources, you will need to dump the remaining resources after the last global object gets destroyed, so a small hack of your compiler runtime may be necessary
Check out your loops
Look at where you are allocating variables - do you ever de-allocate them?
Try and reproduce the leak with a small subset of suspected code.
MAKE trace tools - you can always log to a file.
One possibility could be to compile the code and execute it on a system where you can take advantage of built in tools (e.g. libumem on Solaris, or the libc capability on Linux)
Divide and conquer is the best approach. If you have written you code in a systematic way, it should be pretty easy to call subsets of you code. Your best bet is to execute each section of code over and over and see if your memory usage steadily climbs, if not move on to the next section of code.
Also, the wikipedia article on memory leaks has several great links in the references section on detecting memory leaks for different systems (window, macos, linux, etc)
Similar questions on SO:
Memory leak detectors for C
Strategies For Tracking Down Memory Leaks When You’ve Done Everything Wrong
In addition to the manual inspection techniques mentioned by others, you should consider a code analysis tool such as valgrind.
Introduction from their site:
Valgrind is an award-winning
instrumentation framework for building
dynamic analysis tools. There are
Valgrind tools that can automatically
detect many memory management and
threading bugs, and profile your
programs in detail. You can also use
Valgrind to build new tools.
The Valgrind distribution currently
includes six production-quality tools:
a memory error detector, two thread
error detectors, a cache and
branch-prediction profiler, a
call-graph generating cache profiler,
and a heap profiler. It also includes
two experimental tools: a
heap/stack/global array overrun
detector, and a SimPoint basic block
vector generator. It runs on the
following platforms: X86/Linux,
AMD64/Linux, PPC32/Linux, PPC64/Linux,
and X86/Darwin (Mac OS X).
I have used memtrace
http://www.fpx.de/fp/Software/MemTrace/
http://sourceforge.net/projects/memtrace/
You may need to call the statistics function to printout if there are any leaks. Best thing is to call this statistics function before and after a module or piece of code gets executed.
* Warning * Memtrace is kind enough to allow memory overwrite/double free. It detects these anomalies and gracefully avoids any crash.

Resources