What are some techniques in detecting/debugging memory leak if you don't have trace tools?
Intercept all functions that allocate and deallocate memory (depending on the platform, the list may look like: malloc, calloc, realloc, strdup, getcwd, free), and in addition to performing what these functions originally do, save information about the calls somewhere, in a dynamically growing global array probably, protected by synchronization primitives for multithreaded programs.
This information may include function name, amount of memory requested, address of the successfully allocated block, stack trace that lets you figure out what the caller was, and so on. In free(), remove corresponding element from the array (if there are none, a wrong pointer is passed to free which is also a error that's good to be detected early). When the program ends, dump the remaining elements of the array - they will be the blocks that leaked. Don't forget about global objects that allocate and deallocate resources before and after main(), respectively. To properly count those resources, you will need to dump the remaining resources after the last global object gets destroyed, so a small hack of your compiler runtime may be necessary
Check out your loops
Look at where you are allocating variables - do you ever de-allocate them?
Try and reproduce the leak with a small subset of suspected code.
MAKE trace tools - you can always log to a file.
One possibility could be to compile the code and execute it on a system where you can take advantage of built in tools (e.g. libumem on Solaris, or the libc capability on Linux)
Divide and conquer is the best approach. If you have written you code in a systematic way, it should be pretty easy to call subsets of you code. Your best bet is to execute each section of code over and over and see if your memory usage steadily climbs, if not move on to the next section of code.
Also, the wikipedia article on memory leaks has several great links in the references section on detecting memory leaks for different systems (window, macos, linux, etc)
Similar questions on SO:
Memory leak detectors for C
Strategies For Tracking Down Memory Leaks When You’ve Done Everything Wrong
In addition to the manual inspection techniques mentioned by others, you should consider a code analysis tool such as valgrind.
Introduction from their site:
Valgrind is an award-winning
instrumentation framework for building
dynamic analysis tools. There are
Valgrind tools that can automatically
detect many memory management and
threading bugs, and profile your
programs in detail. You can also use
Valgrind to build new tools.
The Valgrind distribution currently
includes six production-quality tools:
a memory error detector, two thread
error detectors, a cache and
branch-prediction profiler, a
call-graph generating cache profiler,
and a heap profiler. It also includes
two experimental tools: a
heap/stack/global array overrun
detector, and a SimPoint basic block
vector generator. It runs on the
following platforms: X86/Linux,
AMD64/Linux, PPC32/Linux, PPC64/Linux,
and X86/Darwin (Mac OS X).
I have used memtrace
http://www.fpx.de/fp/Software/MemTrace/
http://sourceforge.net/projects/memtrace/
You may need to call the statistics function to printout if there are any leaks. Best thing is to call this statistics function before and after a module or piece of code gets executed.
* Warning * Memtrace is kind enough to allow memory overwrite/double free. It detects these anomalies and gracefully avoids any crash.
Related
What happen if I end the execution by passing return 0; after using a malloc and without freeing the part of memory allocated?
int * var;
var = (int *)malloc(sizeof(int)) ;
free(var) ;
return 0;
The program contains a memory leak, as explained here. To answer the question specifically, the effect of the memory leak depends on the environment; in the best case, nothing happens, in the worst case, the machine might crash sooner or later. The existence of the memory leak is to be regarded as a bug in any case.
It is implementation specific.
On most operating systems (notably desktop or server OSes, e.g. Linux, MacOSX, Windows...) once a process is terminated, every used resources is released, and that includes its virtual address space. Hence even unfreed heap memory is released.
In particular, if you code a quickly running program (e.g. you know that it always will run in less than few seconds), it might be easier to accept some memory leak (but if you do, please comment that in your code and/or documentation). And many real life programs are doing that (in particular the GCC compiler, and probably several Unix shells).
On the contrary, if you are coding a server (e.g. a database or compute server), you should avoid any memory leaks which would make the server's process RSS grow indefinitely (till some crash). You usually should take great care that every allocated heap memory to deal with one request gets free-d after replying to that request.
On some embedded operating systems, if you don't release all the resources explicitly (free all heap allocated memory, fclose all opened streams) and properly, you have a resource leak.
See also this related question. On many OSes (including Linux) you can use valgrind to hunt memory leak bugs. With a recent gcc you might use debugging options like -g -fsanitize=address
Read also (at least for the concepts and the terminology) something about garbage collection and about fragmentation. If programming in C, you might consider using Boehm's garbage collector.
There is however a very good practical reason to systematically free all previously malloc-ed memory: it is a good programming discipline, and it helps a lot using tools like valgrind which helps debugging genuine memory bugs. Also it makes your code more clean, and you could reuse it in some different contexts (e.g. as part of some library, usable in long-running processes).
If you don't free the memory, your application will have a memory leak. So if you don't free the memory and leave the application on for a couple of days, it will start to slow down, and eventually crash.
Interviewer - If you have no tools to check how would you detect memory leak problems?
Answer - I will read the code and see if all the memory I have allocated has been freed by me in the code itself.
Interviewer wasn't satisfied. Is there any other way to do so?
For all the implementation defined below, one needs to write wrappers for malloc() & free() functions.
To keep things simple, keep track of count of malloc() & free(). If not equal then you have a memory leak.
A better version would be to keep track of the addresses malloc()'ed & free()'ed this way you can identify which addresses are malloc()'ed but not free()'ed. But this again, won't help much either, since you can't relate the addresses to source code, especially it becomes a challenge when you have a large source code.
So here, you can add one more feature to it. For eg, I wrote a similar tool for FreeBSD Kernel, you can modify the malloc() call to store the module/file information (give each module/file a no, you can #define it in some header), the stack trace of the function calls leading to this malloc() and store it in a data structure, along side the above information whenever a malloc() or free() is called. Use addresses returned by malloc() to match with it free(). So, when their's a memory leak, you have information about what addresses were not free()'ed in which file, what were the exact functions called (through the stack trace) to pin point it.
The way, this tool worked was, on a crash, I used to get a core-dump. I had defined globals (this data structure where I was collecting data) in kernel memory space, which I could access using gdb and retrieve the information.
Edit:
Recently while debugging a memeory leak in linux kernel, I came across this tool called kmemleak which implements a similar algorithm I described in point#3 above. Read under the Basic Algorithm section here: https://www.kernel.org/doc/Documentation/kmemleak.txt
My response when I had to do this for real was to build tools... a debugging heap layer, wrapped around the C heap, and macros to switch code to running against those calls rather than accessing the normal heap library directly. That layer included some fencepost logic to detect array bounds violations, some instrumentation to monitor what the heap was doing, optionally some recordkeeping of exactly who allocated and freed each block...
Another approach, of course, is "divide and conquer". Build unit tests to try to narrow down which operations are causing the leak, then to subdivide that code further.
Depending on what "no tools" means, core dumps are also sometimes useful; seeing the content of the heap may tell you what's being leaked, for example.
And so on....
I have a question out of curiosity relating to checking for memory leaks.
Being someone who has used valgrind frequently to check for memory leaks in my code for the last year or two, I suddenly came to think that it only detects lost/unfreed memory after the life of the program.
So, in light of that, I was thinking that if you have a long-running program which malloc()'s intermittently and doesn't free() until the application exits, then the potential to eat memory (not necessarily through leaks) is huge and isn't observable using these tools because they only check after the programs lifetime. Are there GDB-like tools which can stop an application while running and check for memory which is and isn't referenced at an instance in the life of the application?
Are there GDB-like tools which can stop an application while running and check for memory which is and isn't referenced at an instance in the life of the application?
Yes: Valgrind.
Specifically, the SVN version of Valgrind has a gdbserver stub embedded into it.
This allows you to do all kinds of cool debugging, not possible before:
You can run program under valgrind and have GDB breakpoints at the same time
You can ask valgrind: is this memory allocated? was this variable initialized?
You can ask valgrind: what new leaks happened since last time I asked for leaks?
I think you can also ask it to list not-leaked new allocations.
What I have done, which was not a tool, for a long-running socket-based server, was to do an operation, but before that print out the amount of free memory, then print it out after my operation, and see if there was any difference.
I knew that in theory my server should have returned all memory used on each call to the server, so if I was the only one calling it, it shouldn't use much more memory than when it started.
You may find that some memory was needed on the first call, so you may want to make several calls, so everything is initialized, then you can do checks like this.
The other option is to create a list of all memory you are mallocing, then when you free it delete from the list that node, and at the end, see which ones still haven't been freed.
That is not generally possible in a language that supports pointer arithmetic, since for example - you could cast an pointer to an integer and back. See http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/pointer.html
Leaked memory is defined as memory that is not referenced by anything in the program.
If you malloced memory and somewhere in your data there is a pointer pointing to that memory it isn't "lost" as far as any automatic check can now.
However, if you allocated memory, never free'ed it but you don't have any pointer pointing to it you have most likely leaked that memory - as there is no way for you to reference it.
Programs like valgrind can find leaks of the kind described above (lost of reference). AFAIK nothing can find "logical" leaks where you still hold a reference to the memory.
I have to do a project in C where I have to constantly allocate memory for big data structures and then free it. Does there exista a library with a function that helps to keep track of the memory usage so I can be sure if I am doing things correctly? (I'm new to C)
For example, a function that returns:
A) The total of memory used by the program at the moment, OR
B) The total of memory left,
would do the job. I already googled for that and searched in other answers.
Thanks!
Try tcmalloc: you are looking for a heap profiler, although valgrind might be more useful initially.
If you're worried about memory leaks, valgrind is probably what you need. On the other hand, if you're more concerned just with whether you're data structures are using excessive memory, you might just use the common mallinfo function included as an extension to malloc in many unix standard libraries including glibc on Linux.
Although some people excoriate it, the book "Writing Solid Code" by Steve Maguire has a lot of reasonable ideas about how to track your memory usage without modifying the system memory allocation functions. Basically, instead of calling the raw malloc() etc functions directly, you call your own memory allocation API built on top of the standard one. Your API can track allocations and frees, detect double frees, frees of non-allocated memory, unreleased (leaked) memory, complete dumps of what is allocated, etc. You either need to crib the code from the book or write your own equivalent code. One interesting problem is providing a stack trace for each allocation; there isn't a standard way to determine the call stack. (The book is a bit dated now; it was written just a few years after the C89 standard was published and does not exploit const qualifiers.)
Some will argue that these services can be provided by the system malloc(); indeed, they can, and these days often are. You should look carefully at the manual provided for your version of malloc(), and decide whether it provides enough for you. If not, then the wrapper API mechanism is reasonable. Note that using your own API means you track what you explicitly allocate, while leaving library functions not written to use your API using the system services - as, indeed, does your code, under the covers.
You should also look into valgrind. It does a superb job tracking memory abuses, and in particular will report leaked memory (memory that was allocated but not freed). It also spots when you read or write outside the bounds of an allocated space, spotting buffer overflows.
Nevertheless, ultimately, you need to be disciplined in the way you write your code, ensuring that every time you allocate memory, you know when it will be released.
Every time you allocate/free memory, you could log how big your data structure is.
I have one C code app. which i was building using MS-VS2005. I had one output data buffer which was being allocated dynamically using malloc.
For some test cases, the memory size which was being malloc'd was falling short than the the actual output size in bytes which was generated. That larger sized output was written into the smaller sized buffer causing buffer overflow. As a result of which the test-run was crashing with MSVS-2005 showing up a window "Heap corruption ...."
I knew it had to do with some dynamic memory allocation, but i took long time to actually find the root cause, as i did not doubt the memory allocation because i was allocating large enough size necessary for the output. But one particular test case was generating more output than what i had calculated, hence the resulting crash.
My question is:
1.) What tools i can use to detect such dynamic memory buffer over-flow conditions. Can they also help detect any buffer overflow conditions(irrespective of whether the buffer/array is on heap, stack, global memory area)?
2.) Will memory leak tools(like say Purify) or code analysis tools like lint, klocworks would have helped in particular case? I believe they have to be run time analysis tools.
Thank you.
-AD.
One solution, which I first encountered in the book Writing Solid Code, is to "wrap" the malloc() API with diagnostic code.
First, the diagnostic malloc() arranges to allocate additional bytes for a trailing sentinel. For example, an additional four bytes following the allocated memory are reserved and contain the characters 'FINE'.
Later, when the pointer from malloc() is passed to free(), a corresponding diagnostic version of free() is called. Before calling the standard implementation of free() and relinquishing the memory, the trailing sentinel is verified; it should be unmodified. If the sentinel is modified, then the block pointer has been misused at some point subsequent to being returned from the diagnostic malloc().
There are advantages of using a memory protection guard page rather than a sentinel pattern for detecting buffer overflows. In particular, with a pattern-based method, the illegal memory access is detected only after the fact. Only illegal writes are detected by the sentinel pattern method. The memory protection method catches both illegal reads and writes, and they are detected immediately as they occur.
Diagnostic wrapper functions for malloc() can also address other misuses of malloc(), such as multiple calls to free() for same memory block. Also, realloc() can be modified to always move blocks when executed in a debugging environment, to test the callers of realloc().
In particular, the diagnostic wrappers may record all of the blocks allocated and freed, and report on memory leaks when the program exits. Memory leaks are blocks which are allocated by not freed during the program execution.
When wrapping the malloc() API, one must wrap all of the related functions, including calloc(), realloc(), and strdup().
The typical way of wrapping these functions is via preprocessor macros:
#define malloc(s) diagnostic_malloc(s, __FILE__, __LINE__)
/* etc.... */
If the need arises to code a call to the standard implementation (for example, the allocated block will be passed to a third-party, binary-only library which expects to free the block using the standard free() implementation, the original function names can be accessed rather than the preprocessor macro by using (malloc)(s) -- that is, place parentheses around the function name.
Something you can try is allocate enough pages + 1 using VirtualAlloc, use VirtualProtect with PAGE_READONLY | PAGE_GUARD flags on the last page, then align the suspected allocation so the end of the object is near the beginning of the protected page. If all goes well you should get an access violation when the guard page is accessed. It helps if you know approximately which allocation is overwritten. Otherwise it requires overriding all allocations which may require a lot of extra memory (at least 2 pages per allocation). A variation on this technique that I'm hereby christening as "statistical page guard" is to only randomly allocate memory for a relatively small percentage of allocations in that manner to avoid large bloat for small objects. Over a large number of execution runs you should be able to hit the error. The random number generator would have to be seeded off something like time in this case. Similarly you can allocate the guard page in front of the object if you suspect an overwrite at a lower address (can't do both at the same time but possible to randomly mix up as well).
An update: it turns out the gflags.exe (used to be pageheap.exe) microsoft utility already supports "statistical page guard" so i reinvented the wheel :) All you need to do is run gflags.exe /p /enable [/random 0-100] YourApplication.exe and run your app. If you are using a custom heap or custom guards on your heap allocations then you can simply switch to using HeapAlloc at least for catching bugs and then switch back. Gflags.exe is a part of Support Tools package and can be downloaded from the microsoft download center, just do a search there.
PC-Lint can catch some forms of malloc/new size problems, but I'm not sure if it would have found yours.
VS2005 has good buffer overflow checking for stack objects in debug mode (runs at end of function). And it does periodic checking of the heap.
As for it helping to track down where the problems occurred, this is where I tend to start using macro's to dump all allocations to match against the corrupted memory later (when it's detected).
Painful process, so I'm keen to learn better ways also.
Consider our Memory Safety Check. I think it will catch all the errors you describe. Yes, it is runtime checking of every access, with some considerable overhead (not as bad as valgrind we think) with the benefit of diagnosing the first program action that is errorneous.