how to debug memory overwrite in C?

how to debug memory overwrite in C? - c

This problem maybe has no specific answer appropriate for all situations,but is there some general principle we can respect?
Overwrite happened in own module may be a little easy,but if the overwrite is caused by another module written by other people and the program crashed random, how we can do for these?

I have had a lot of luck with a product called Purify, that performs memory bounds checking, after you include it at compile time. The Wikipedia page I linked to also lists some open source alternatives.

Memory overwrite is often caused by dangling pointers. While this is not the only case, it's quite common and so I've found one technique that's pretty useful:
By implementing your own memory allocator you can turn on a special debug mode where you write some known pattern into freed memory. You then periodically check all free memory to see if the pattern has been overwritten. If it has you assert or log or something. This will often find the culprit that's writing to some dangling pointer.
In addition, you can use the custom allocator to log the allocations made by address. So if you see that someone has overwritten address 0x30203 you can just check who that memory was allocated to.
I realize this sounds like a special case, but it's helped me out of so many cases before

Related

Check if memory was already freed in C

As you know, after we finish using dynamic variables we free() them.
However, sometimes those variables are already free()d.
I need to check if it is free to avoid double free. Would anyone give me a clue?

You can't check if it's already been free'd (the signature of free should tell you that much; it can't modify the pointer from the caller's perspective). However, you can do one of two things.
Change your design. Who is responsible for this memory? It seems as though your design does not make that clear, which is the most common reason for memory leaks. Place ownership on one part of your code and be done with it. Why would the interrupt function conditionally deallocate the memory? Why does that seem like the most logical solution?
Set the pointer to null and double free all you like. free(NULL) is perfectly valid.
I prefer option 1 and learning this lesson now will help you write better code down the road.

+1 to Ed S.'s answer.
But also, run valgrind - it will pick up many dynamic memory allocation errors quickly, and may be better at reading your code than you are.

Debugging memory leak issues without any tool

Interviewer - If you have no tools to check how would you detect memory leak problems?
Answer - I will read the code and see if all the memory I have allocated has been freed by me in the code itself.
Interviewer wasn't satisfied. Is there any other way to do so?

For all the implementation defined below, one needs to write wrappers for malloc() & free() functions.
To keep things simple, keep track of count of malloc() & free(). If not equal then you have a memory leak.
A better version would be to keep track of the addresses malloc()'ed & free()'ed this way you can identify which addresses are malloc()'ed but not free()'ed. But this again, won't help much either, since you can't relate the addresses to source code, especially it becomes a challenge when you have a large source code.
So here, you can add one more feature to it. For eg, I wrote a similar tool for FreeBSD Kernel, you can modify the malloc() call to store the module/file information (give each module/file a no, you can #define it in some header), the stack trace of the function calls leading to this malloc() and store it in a data structure, along side the above information whenever a malloc() or free() is called. Use addresses returned by malloc() to match with it free(). So, when their's a memory leak, you have information about what addresses were not free()'ed in which file, what were the exact functions called (through the stack trace) to pin point it.
The way, this tool worked was, on a crash, I used to get a core-dump. I had defined globals (this data structure where I was collecting data) in kernel memory space, which I could access using gdb and retrieve the information.
Edit:
Recently while debugging a memeory leak in linux kernel, I came across this tool called kmemleak which implements a similar algorithm I described in point#3 above. Read under the Basic Algorithm section here: https://www.kernel.org/doc/Documentation/kmemleak.txt

My response when I had to do this for real was to build tools... a debugging heap layer, wrapped around the C heap, and macros to switch code to running against those calls rather than accessing the normal heap library directly. That layer included some fencepost logic to detect array bounds violations, some instrumentation to monitor what the heap was doing, optionally some recordkeeping of exactly who allocated and freed each block...
Another approach, of course, is "divide and conquer". Build unit tests to try to narrow down which operations are causing the leak, then to subdivide that code further.
Depending on what "no tools" means, core dumps are also sometimes useful; seeing the content of the heap may tell you what's being leaked, for example.
And so on....

What does it mean when malloc() or free() gives you a segmentation fault?

I am currently using dlmalloc() to see how much faster it can be than the original libc malloc().
However, running free() keeps giving me a segmentation fault...
Does anyone know some logical reasons why this could keep happening?

A segfault inside the memory management functions almost always indicates that you've done something wrong (like overwriting memory beyond the valid bounds) before the call that actually segfaults.
Running your code under Valgrind may help you determine the real source of the problem.

I would be looking first into memory corruption issues. For example, if you allocate N bytes and then write to N+100 of them, you're very likely to corrupt the memory arena.
That's because many implementations keep their housekeeping information (block sizes, list pointers and so on) in-line (between the actual data areas).
Another possibility would be double freeing of blocks which may cause problems if that memory has since been used for some other allocation (especially if your address is now in the middle of a data area rather than at the beginning).
First things first, make sure you're following the rules. Any thing else is undefined behaviour and all bets are off.
You may also want to post the source code for the problem you're having so we can examine it. If you do this, try to reduce it to the smallest example that exhibits the problem. Only the most dedicated SOer (a) will want to look over some 10,000-line behemoth to find your issue .
(a) And I'm certainly not that dedicated :-)

Does every malloc call have to be freed

From what I understand because malloc dynamically assigns mem , you need to free that mem so that it can be used again.
What happens if you return a char* that was created using malloc (i.e. how are you supposed to free that)
If you leave the pointer as it is
and exit the application will it be
freed.(I cant find a definite answer on this , some say yes , some say no).

The caller has to free it (or arrange for it to be freed). This means that functions that create and return resources need to document exactly how it should be freed.
Most OSes will free the memory when the program exits, as part of the definition of a "process". The C standard doesn't care what happens, it's beyond the scope of the program. Not all OSes have a full process abstraction, but desktop-style OSes certainly do.
The main reasons to free it before that are:
If you free memory as soon as possible, often a long time before process exit, your program uses less memory total.
If you don't free it, and you later want to change your program into a routine within another program, that perhaps is called many times, then suddenly you require many times as much memory as before (memory leak).
There are debugging tools that will help you identify memory leaks, by warning you about memory that is still allocated when the program exits. These don't really help much if there's a lot of deliberately-leaked junk to wade through.
If you don't free it and you hit any problems, it's much harder to go back later and find all the memory that needs freeing, than it is to do it right in the first place.
There are so many cases where you do need to free the memory (to prevent huge memory use in long-running programs), that your default strategy must be to clean pretty much everything up anyway.
The vaguely plausible reasons not to free are:
Less code.
If you have squillions of blocks to free individually, immediately before program exit, then it might be much faster to let the OS drop the whole process.
Stuff which is created on demand and stored in globals might be quite difficult to clean up safely, if you don't know exactly where it's used. Think of some kind of cache that's populated as you go along, that might have MRU rules to limit how much memory it occupies, so it's not an unlimited leak. OK, so this is one bad thing (unrestricted globals) causing another bad thing (unfreed memory), but it's worth knowing about as a reason why you might see unfreed blocks in existing code, and you can't necessarily just go in and fix them.
The reasons for freeing almost always outweigh the reasons against.

If you have a pointer to memory created by malloc, freeing that memory, using that pointer, will do the right thing. Yes, there is some magic involved; this will be taken care of by your compiler.
Yes, if you ignore the memory freeing, and exit the application, the OS will release the memory. However, it's considered bad practice to leave it unfreed. The OS may not do the right thing (especially in embedded settings), or may not do it in a timely fashion. Also, if you're running your program continuously, you may end up consuming a growing amount of memory, eventually consuming it all, and running out of memory and crashing.

Yes. If you malloc, you need to free. You are guaranteeing memory leaks while your program is running if you don't free.
Free it.
Always.
Period.

Yes, every call to malloc() has to be matched with a call to free().
To answer your specific questions:
You have to explicitly document your API telling the user whether the returned pointer has to be free()'d
The OS will free all memory allocated to the process.

If you write the function yourself: Avoid doing that.
Instead, let the caller pass a buffer, let the caller specify the buffer's size and copy the data into that buffer. That way, you can use your function from other modules that don't use the same heap (other programming languages, different C runtime...)
If you for whatever reason can not use such an interface, specify in the function's documentation that the caller has to free the returned pointer after it is done with it.
If you are using a library function: Have a look at the documentation.
If the documentation states that you have to free, do so.
If the documentation states that you don't have to, it might be some global cleanup function that has to be called to free the module's resources.
Regarding your second question, freeing before exiting is recommended. Technically it wont hurt, but when you ever want to reuse your code in a bigger project, you will be thankful that you wrote the correct cleanup in the first place.

The C standard has no concept of the system environment outside of a single program's execution, so it cannot specify what happens "after the program exits". At the same time, nowhere does it make any requirement that memory obtained with malloc should or must be released with free before a call to exit or a return from main, and I think it's pretty clear that the intention is that exiting without manually freeing memory will not leave resources tied up - much like how calling exit without closing all files first automatically closes them (including flushing them).
Now, as for whether you should or should not call free, that depends a lot on your particular program.
Any library code should free any memory that it obtained purely for internal use as soon as possible.
A library which returns allocated objects to the calling program should always provide a corresponding call to free those objects.
A library which performs any allocations as part of a global initialization (note: this is a very bad design, but sometimes inevitable) should provide a way for the application to reverse that initialization and free everything that was allocated. This is especially important if the library might ever be loaded dynamically (even as a consequence of satisfying another dynamically-loaded library's dependencies).
So far I've only talked about library code. At this point, all that's left is allocations made by the application itself or on the application's behalf by libraries. My view, and I will admit that it is unorthodox, is that freeing such objects is not just unnecessary but harmful. The main reason I say this is that most long-lived applications will have accumulated quite a bit of allocated memory which they are not making significant use of (think of the undo buffer in a word processor or the history in a browser). On a moderately loaded system, much of this data has been swapped to disk by the time the application terminates. If you want to free it, you're going to end up walking all over swapped-out memory addresses tracking down all the pointers to free,
putting useless wear on the physical components of the hard drive
making the user wait for your application to exit
causing other still-in-use applications' data to get swapped out, making them run slower
All of this in the name of a ridiculous "you must free everything you allocate" rule.
For short-lived applications, it's not so much of a big deal, but you can often simplify the implementation of short-lived applications that perform a single linear task and exit if you don't bother freeing all the memory they allocate. Think of most unix command line utilities. Is there any use to writing the loops for sed to free all its compiled regular expressions before exiting? Couldn't programmers' time be spent on something more productive?

1) The same way you'd free the memory normally, i.e.
p = func();
//...
free(p);
The trick is in making sure that you always do it...
2) Generally speaking, yes. But you should still free any memory you use as good practice. Not spending the time to figure out where to free the memory is just being lazy.

Let's take those one point at a time...
If you return a char * that you know was created with malloc, then yes, it is your responsibility to free that. You can do that with free(myCharPtr).
The OS will claim the memory back, and it won't be lost forever, but there's technically no guarantee that it will be reclaimed right when the application dies. That just depends on the operating system.

I wouldn't go so far as to say every malloc must be freed, but I would say that, no matter how long a program runs, there must be a bounded number of allocations (and total size) that won't be freed. The number need not be a static constant, but it must be specifiable in terms of something else (e.g. this program processes widgets; it will allocate one 64-byte struct for each quizzix in the largest widget). One may not know beforehand the size of the largest widget, but if e.g. one knows that the temporary storage required to process a widget is proportional to the square of its size, one might safely infer that the largest widget will be small enough that the total amount of memory stranded will be pretty slight.

Patterns for freeing memory in C?

I'm currently working on a C based application am a bit stuck on freeing memory in a non-antipattern fashion. I am a memory-management amateur.
My main problem is I declare memory structures in various different scopes, and these structures get passed around by reference to other functions. Some of those functions may throw errors and exit().
How do I go about freeing my structures if I exit() in one scope, but not all my data structures are in that scope?
I get the feeling I need to wrap it all up in a psuedo exception handler and have the handler deal with freeing, but that still seems ugly because it would have to know about everything I may or may not need to free...

Consider wrappers to malloc and using them in a disciplined way. Track the memory that you do allocate (in a linked list maybe) and use a wrapper to exit to enumerate your memory to free it. You could also name the memory with an additional parameter and member of your linked list structure. In applications where allocated memory is highly scope dependent you will find yourself leaking memory and this can be a good method to dump the memory and analyze it.
UPDATE:
Threading in your application will make this very complex. See other answers regarding threading issues.

You don't need to worry about freeing memory when exit() is called. When the process exits, the operating system will free all of the associated memory.

I think to answer this question appropriately, we would need to know about the architecture of your entire program (or system, or whatever the case may be).
The answer is: it depends. There are a number of strategies you can use.
As others have pointed out, on a modern desktop or server operating system, you can exit() and not worry about the memory your program has allocated.
This strategy changes, for example, if you are developing on an embedded operating system where exit() might not clean everything up. Typically what I see is when individual functions return due to an error, they make sure to clean up anything they themselves have allocated. You wouldn't see any exit() calls after calling, say, 10 functions. Each function would in turn indicate an error when it returns, and each function would clean up after itself. The original main() function (if you will - it might not be called main()) would detect the error, clean up any memory it had allocated, and take the appropriate actions.
When you just have scopes-within-scopes, it's not rocket science. Where it gets difficult is if you have multiple threads of execution, and shared data structures. Then you might need a garbage collector or a way to count references and free the memory when the last user of the structure is done with it. For example, if you look at the source to the BSD networking stack, you'll see that it uses a refcnt (reference count) value in some structures that need to be kept "alive" for an extended period of time and shared among different users. (This is basically what garbage collectors do, as well.)

You can create a simple memory manager for malloc'd memory that is shared between scopes/functions.
Register it when you malloc it, de-register it when you free it. Have a function that frees all registered memory before you call exit.
It adds a bit of overhead, but it helps keep track of memory. It can also help you hunt down pesky memory leaks.

Michael's advice is sound - if you are exiting, you don't need to worry about freeing the memory since the system will reclaim it anyway.
One exception to that is shared memory segments - at least under System V Shared Memory. Those segments can persist longer than the program that creates them.
One option not mentioned so far is to use an arena-based memory allocation scheme, built on top of standard malloc(). If the entire application uses a single arena, your cleanup code can release that arena, and all is freed at once. (APR - Apache Portable Runtime - provides a pools feature which I believe is similar; David Hanson's "C Interfaces and Implementations" provides an arena-based memory allocation system; I've written one that you could use if you wanted to.) You can think of this as "poor man's garbage collection".
As a general memory discipline, every time you allocate memory dynamically, you should understand which code is going to release it and when it can be released. There are a few standard patterns. The simplest is "allocated in this function; released before this function returns". This keeps the memory largely under control (if you don't run too many iterations on the loop that contains the memory allocation), and scopes it so that it can be made available to the current function and the functions it calls. Obviously, you have to be reasonably sure that the functions you call are not going to squirrel away (cache) pointers to the data and try to reuse them later after you've released and reused the memory.
The next standard pattern is exemplified by fopen() and fclose(); there's a function that allocates a pointer to some memory, which can be used by the calling code, and then released when the program has finished with it. However, this often becomes very similar to the first case - it is usually a good idea to call fclose() in the function that called fopen() too.
Most of the remaining 'patterns' are somewhat ad hoc.

People have already pointed out that you probably don't need to worry about freeing memory if you're just exiting (or aborting) your code in case of error. But just in case, here's a pattern I developed and use a lot for creating and tearing down resources in case of error. NOTE: I'm showing a pattern here to make a point, not writing real code!
int foo_create(foo_t *foo_out) {
int res;
foo_t foo;
bar_t bar;
baz_t baz;
res = bar_create(&bar);
if (res != 0)
goto fail_bar;
res = baz_create(&baz);
if (res != 0)
goto fail_baz;
foo = malloc(sizeof(foo_s));
if (foo == NULL)
goto fail_alloc;
foo->bar = bar;
foo->baz = baz;
etc. etc. you get the idea
*foo_out = foo;
return 0; /* meaning OK */
/* tear down stuff */
fail_alloc:
baz_destroy(baz);
fail_baz:
bar_destroy(bar);
fail_bar:
return res; /* propagate error code */
}
I can bet I'm going to get some comments saying "this is bad because you use goto". But this is a disciplined and structured use of goto that makes code clearer, simpler, and easier to maintain if applied consistently. You can't achieve a simple, documented tear-down path through the code without it.
If you want to see this in real in-use commercial code, take a look at, say, arena.c from the MPS (which is coincidentally a memory management system).
It's a kind of poor-man's try...finish handler, and gives you something a bit like destructors.
I'm going to sound like a greybeard now, but in my many years of working on other people's C code, lack of clear error paths is often a very serious problem, especially in network code and other unreliable situations. Introducing them has occasionally made me quite a bit of consultancy income.
There are plenty of other things to say about your question -- I'm just going to leave it with this pattern in case that's useful.

Very simply, why not have a reference counted implementation, so when you create an object and pass it around you increment and decrement the reference counted number (remember to be atomic if you have more than one thread).
That way, when an object is no longer used (zero references) you can safely delete it, or automatically delete it in the reference count decrement call.

This sounds like a task for a Boehm garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Depends on the system of course whether you can or should afford to use it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight