Random malloc crash? - c

I'm trying to read a binary file that has blocks starting with an identifier (like a 3DS file). I loop through the file and using a switch the program determines what identifier a block has and then reads the data into the file struct. Sometimes I need to use malloc to allocate memory for dynamic sized data. While reading, the switch often goes through the same case wherin memory is allocated, but at a specific point in the file it crashes on that same malloc. The file that I want to read is about 1MB. But when I try the program with another file of about 10kB and the same structure, it reads it succesfully.
What could be causing this problem?
The error code that I get when debugging is:
Heap corruption detected at 0441F080
HEAP[prog.exe]: HEAP: Free Heap block 441f078 modified at 441f088 after it was freed
Also when I execute it in debug mode, for some reason I can read more data from the file. The program lives longer before it crashes.
Here is the code piece where it crashes:
switch (id) {
case 0x62:
case 0x63:
// ...
{
char n_vertices = id - 0x60 + 1;// just how I calculate the n_vertices from the block ID
fread(&mem.blocks[i].data.attr_6n.height, 2, 1, f);
mem.blocks[i].data.attr_6n.vertices = malloc(2 * n_vertices);// crash
for (short k = 0; k < n_vertices; k++) {
fread(&mem.blocks[i].data.attr_6n.vertices[k], 2, 1, f);// read shorts
}
}
break;
// ...
}

You probably have a corrupt heap. This could be caused by invalid deallocations (deallocating unowned or already free memory), or by some random chunk of code writing outside its memory area into a place that happens to hold the heap bookeeping data structures. This most likely will be a piece of code that has nothing whatsoever to do with that dynamically allocated memory.
Tracking down bugs like this is a real bear. They tend to appear long after the offending code has executed, and they have an annoying tendency to turn into heisenbugs (bugs that move or go away when you attempt to debug them).
My suggestion for approaching debugging would be to try to comment out portions of your code and see what causes the problem to go away. That isn't foolproof, as you could just end up moving the out-of-bounds write to somewhere else.
Looking over the code you just posted, one thing I would highly suggest you do is verify that your malloc specified enough memory to hold all the data you are attempting to load into it. It looks to me like you are assuming 2 bytes for each vertex. That seems a bit suspicious to me. I don't know your code, but 4 or 8 would be much more common element sizes to see there. Regardless, industry practice is to use sizeof() on the target type to help ensure you have it right.
Another option, if that debugger message of yours can show you where it is happening, would be to put a debugger watch point there (or write some watching code...or manually dump and inspect the area) when stepping in the debugger to try to figure out which is the offending line of code.
Good luck. I hate these bugs.

Most likely the heap gets corrupted somehow, malloc is crashing e.g. trying to traverse a corrupted linked list of free blocks (or a similar structure, I'm not exactly sure what is used in modern heap allocators these days).
Make sure your code is not writing past the end of an allocated block.

You need to run this in a memory debugger like valgrind. Since it looks like you're on windows, see the following: Is there a good Valgrind substitute for Windows?

Related

why does calling free() after heap overflow result in crash - what is the exact reason?

I wrote a simple program for heap overflow (providing larger input than the memory allocated), and when I try to print that input, I am getting the complete string (because retrieving is done till we encounter '\0' in memory), it is clear till this part.
But when I called free(), system crashed and got some error message like this
free() invalid next size (fast) aborted (core dumped): some_address
here the address points to the base address of memory allocated via malloc.
Below is my code
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *string = (char*)malloc(sizeof(char)*10);
scanf("%s", string); //intentionally i am providing input much longer than 10 bytes
printf("\n string input given by user is %s\n", string);
free(string);
return 0;
}
on my way towards knowing the exact reason for this crash, I got to know what metadata does both live and free chunks contain, and little bit about "bins" managed by GLIBC.
I got to know, metadata that a live chunk store is: size(specifies how much I have requested for, and it will be aligned to either 8 or 16), and few bits which speaks about arena, previous chunk, off-heap, and in the end there will be pointer to previous chunk (if previous chunk is free).
Initially I thought the previous chunk might be empty and when GLIBC tried to fetch previous chunk address and merge with the chunk which I am passing it to free(), it encountered some junk values in previous chunk pointer field because of the overflow I did, but later when I saw free() function definition and the error message, more clearly, I got to know it has something to do with fastbins, and fast bins don't merge chunks with previous chunks, so my assumption is wrong.
Can anyone of you explain the exact reason for the crash?
I tried reading code, and got lost as some point of time when they are doing "chunk_at_offset"
Explaining from code and some pictorial representation would be very much helpful
This is the link to the source code I am referring.
edit:
I am using onlinegdb c compiler for this, I tried the same in my personal machine where I had Ubuntu GLIBC 2.27-3ubuntu1, and my system is pretty much stable even for huge input
Can anyone of you explain the exact reason for the crash?
The exact reason for the crash is the the heap implementation inside GLIBC was able to detect heap corruption in this particular case, and called abort().
As others have said, you are exercising undefined behavior, so anything can happen.
On a different system, or with a different version of GLIBC, or for a different user, or when setting different environment variables, this crash may no longer happen (and that is not a bug).
I feel this undefined behaviour is just "un-defined behaviour", which means we haven't put efforts to analyse how the system behaves.
It is pointless to analyze the system behavior: you are playing russian roulette. You may get lucky and in your particular environment the gun is either completely empty or completely full, so you get predictable behavior. But you can't make any conclusions from this -- on a different system, or tomorrow, the system may behave differently.

malloc causes the application to crash and show memory map [duplicate]

I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now...I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I'm going to have to assume that you mean SIGSEGV (segmentation fault) is firing in malloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you call malloc.
malloc stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next time malloc is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out.
Note that free can have the same thing happen, if the block being released or the free block list is messed up.
How you catch this kind of error depends entirely on how you access the memory that malloc returns. A malloc of a single struct usually isn't a problem; it's malloc of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.
So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data.
Try Asan
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks
Please find the links to know more and how to use it
https://github.com/google/sanitizers/wiki/AddressSanitizer and
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
I know this is old, but issues like this will continue to exist as long as we have pointers. Although valgrind is the best tool for this purpose, it has a steep learning curve and often the results are too intimidating to understand.
Assuming you are working on some *nux, another tool I can suggest is electricfence. Quote:
Electric Fence helps you detect two common programming bugs:
software that overruns the boundaries of a malloc() memory allocation,
software that touches a memory allocation that has been released by free().
Unlike other malloc() debuggers, Electric Fence will detect read accesses
as well as writes, and it will pinpoint the exact instruction that causes
an error.
Usage is amazingly simple. Just link your code with an additional library lefence
When you run the application, a corefile will be generated when memory is corrupted, instead of when corrupted memory is used.

Debugging memory corruption

Earlier I encountered a problem with dynamic memory in C (visual studio) .
I had a more or less working program that threw a run-time error when freeing one of the buffers. It was a clear memory corruption, the program wrote over the end of the buffer.
My problem is, that it was very time consuming to track down. The error was thrown way down after the corruption, and i had to manually debug the entire run to find when is the buffer end overwritten.
Is there any tool\ way to assist in tracking down this issue? if the program would have crashed immediately i would have found the problem a lot faster...
an example of the issue:
int *pNum = malloc(10 * sizeof(int));
// ||
// \/
for(int i = 0; i < 13; i++)
{
pNum[i] = 3;
}
// error....
free(pNum);
I use "data breakpoints" for that. In your case, when the program crashes, it might first complain like this:
Heap block at 00397848 modified at 0039789C past requested size of 4c
Then, start your program again, and set a data breakpoint at address 0039789C. When the code writes to that address, the execution will stop. It often happens that i find the bug immediately at this point.
If your program allocates and deallocates memory repeatedly, and it happens to be at this exact address, just disable deallocations:
_CrtSetDbgFlag(_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG) | _CRTDBG_DELAY_FREE_MEM_DF);
I use pageheap. This is a tool from Microsoft that changes how the allocator works. With pageheap on, when you call malloc, the allocation is rounded up to the nearest page(a block of memory), and an additional page of virtual memory that is set to no-read/no-write is placed after it. The dynamic memory you allocate is aligned so that the end of your buffer is just before the end of the page before the virtual page. This way, if you go over the edge of your buffer, often by a single byte, the debugger can catch it easily.
Is there any tool\ way to assist in tracking down this issue?
Yes, that's precisely the type of error which static code analysers try to locate. e.g. splint/PC-Lint
Here's a list of such tools:
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
Edit: In trying out splint on your code snippet I get the following warning:
main.c:9:2: Possible out-of-bounds store: pnum[i]
Presumably this warning would have assisted you.
Our CheckPointer tool can help find memory management errors. It works with GCC 3/4 and Microsoft dialects of C.
Many dynamic checkers only catch accesses outside of an object, and then only if the object is heap allocated. CheckPointer will find memory access errors inside a heap-allocated object; it is illegal to access off the end of a field in a struct regardless of the field type; most dynamic checkers cannot detect such errors. It will also find accesses off the edge of locals.

Strange C program behaviour

I really have a strange situation. I'm making a Linux multi-threaded C application using all the nitty-gritty memory stuff involving char* strings, and I'm stuck in a really odd position.
Basically, what happens is, using POSIX threads, I'm reading and writing to a two-dimensional char array, but it has unusual errors. You have my word that I have done extensive testing on what they are individually accessing, and they don't read another threads' data, let alone write to others. When the last thread that works with the array changes its parts of the array, it seems to change the last few chars of its arrays and put characters in there that I don't know how they could possibly have got in there; mainly ones that print as black diamond question mark things.
I use valgrind and GDB, and they don't really help. As far as I can tell, all should work. Valgrind tells me I'm not freeing everything.
I know all that sounds fairly undescriptive, but here's where it gets weird: if I compile my program with electric fence, then it all works. Valgrind tells me I'm freeing everything and that there's no memory errors at all, just as I thought it should have been. It works absolutely flawlessly!
So, I guess my question is, why does my program work fine when compiled with electric fence?
(And also as a side question, what steps need to be taken to ensure 100% "thread-safe" code?)
Electric fence allocates pages, I've heard at least two, for each allocation you make. It uses the OSs paging mechanisms to check for accessing outside of the allocation. This means that if you want a new 14-character array you end up with a whole new page to hold it, say 8k. Most of the page is unused but you can detect errant accesses by watching which pages get used. I can imagine that on account of having so much extra space if a problem gets past the guards you wouldn't see an error.
If you don't have a bad access but rather corruption due to two threads not locking correctly efence won't detect it. efence also likely keeps pointers to allocated memory, fooling valgrind into reporting no problems. You should run valgrind with the --show-reachable=yes flag and see what's unclaimed at the end of your run.
It sounds like you're trashing your data structures. Try putting canaries at the beginning and end of your arrays, open up GDB, then put write breakpoints on the canaries.
A canary is a const value that should never be changed - its only purpose is to detect memory corruption should it be overwritten. For example:
int the_size_i_need;
char* array = malloc((the_size_i_need + 2) * sizeof(char));
array[0] = 0xAA;
array[the_size_i_need+1] = 0xFF;
char* real_array = array+1;
/* Do some stuff here using real_array */
if (array[0] != 0xAA || array[the_size_i_need+1] != 0xFF) {
printf("Oh noes! We're corrupted\n");
}
Oh god, I'm so sorry. I've worked it out: there was a variable given to the thread for each to put their answer into, but I didn't define it as zero, and it contains 2 funny chars. Maybe the electric fence malloc() allocates 'zeroed' memory like calloc(), but standard malloc() of course doesn't.

Why can I write and read memory when I haven't allocated space?

I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.
Here's my current code:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13;. I also shouldn't be able read the memory locations in the last 2 printfs in main().
Can someone please explain this to me and if I can/should do anything about it?
EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.
EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...
Just don't do it, it's undefined behavior.
It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.
Just don't do it - only read/write memory you legally allocated.
Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).
Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.
As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.
In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:
Corruption of other malloc'ed data
Corruption of stack variables, or the call stack itself, causing crashes when a function return's
Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..
Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:
Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.
Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.
But still you should not do it. Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen e.g. the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc-ed memory.
(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)

Resources