I really have a strange situation. I'm making a Linux multi-threaded C application using all the nitty-gritty memory stuff involving char* strings, and I'm stuck in a really odd position.
Basically, what happens is, using POSIX threads, I'm reading and writing to a two-dimensional char array, but it has unusual errors. You have my word that I have done extensive testing on what they are individually accessing, and they don't read another threads' data, let alone write to others. When the last thread that works with the array changes its parts of the array, it seems to change the last few chars of its arrays and put characters in there that I don't know how they could possibly have got in there; mainly ones that print as black diamond question mark things.
I use valgrind and GDB, and they don't really help. As far as I can tell, all should work. Valgrind tells me I'm not freeing everything.
I know all that sounds fairly undescriptive, but here's where it gets weird: if I compile my program with electric fence, then it all works. Valgrind tells me I'm freeing everything and that there's no memory errors at all, just as I thought it should have been. It works absolutely flawlessly!
So, I guess my question is, why does my program work fine when compiled with electric fence?
(And also as a side question, what steps need to be taken to ensure 100% "thread-safe" code?)
Electric fence allocates pages, I've heard at least two, for each allocation you make. It uses the OSs paging mechanisms to check for accessing outside of the allocation. This means that if you want a new 14-character array you end up with a whole new page to hold it, say 8k. Most of the page is unused but you can detect errant accesses by watching which pages get used. I can imagine that on account of having so much extra space if a problem gets past the guards you wouldn't see an error.
If you don't have a bad access but rather corruption due to two threads not locking correctly efence won't detect it. efence also likely keeps pointers to allocated memory, fooling valgrind into reporting no problems. You should run valgrind with the --show-reachable=yes flag and see what's unclaimed at the end of your run.
It sounds like you're trashing your data structures. Try putting canaries at the beginning and end of your arrays, open up GDB, then put write breakpoints on the canaries.
A canary is a const value that should never be changed - its only purpose is to detect memory corruption should it be overwritten. For example:
int the_size_i_need;
char* array = malloc((the_size_i_need + 2) * sizeof(char));
array[0] = 0xAA;
array[the_size_i_need+1] = 0xFF;
char* real_array = array+1;
/* Do some stuff here using real_array */
if (array[0] != 0xAA || array[the_size_i_need+1] != 0xFF) {
printf("Oh noes! We're corrupted\n");
}
Oh god, I'm so sorry. I've worked it out: there was a variable given to the thread for each to put their answer into, but I didn't define it as zero, and it contains 2 funny chars. Maybe the electric fence malloc() allocates 'zeroed' memory like calloc(), but standard malloc() of course doesn't.
Related
I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now...I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I'm going to have to assume that you mean SIGSEGV (segmentation fault) is firing in malloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you call malloc.
malloc stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next time malloc is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out.
Note that free can have the same thing happen, if the block being released or the free block list is messed up.
How you catch this kind of error depends entirely on how you access the memory that malloc returns. A malloc of a single struct usually isn't a problem; it's malloc of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.
So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data.
Try Asan
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks
Please find the links to know more and how to use it
https://github.com/google/sanitizers/wiki/AddressSanitizer and
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
I know this is old, but issues like this will continue to exist as long as we have pointers. Although valgrind is the best tool for this purpose, it has a steep learning curve and often the results are too intimidating to understand.
Assuming you are working on some *nux, another tool I can suggest is electricfence. Quote:
Electric Fence helps you detect two common programming bugs:
software that overruns the boundaries of a malloc() memory allocation,
software that touches a memory allocation that has been released by free().
Unlike other malloc() debuggers, Electric Fence will detect read accesses
as well as writes, and it will pinpoint the exact instruction that causes
an error.
Usage is amazingly simple. Just link your code with an additional library lefence
When you run the application, a corefile will be generated when memory is corrupted, instead of when corrupted memory is used.
Can anyone explain me why this code works perfectly?
int main(int argc, char const *argv[])
{
char* str = (char*)malloc(sizeof(char));
int c, i = 0;
while ((c = getchar()) != EOF)
{
str[i] = c;
i++;
}
printf("\n%s\n", str);
return 0;
}
Shouldn't this program crash when I enter for example "aaaaaassssssssssssddddddddddddddd"? here is what I get with this input :
aaaaaassssssssssssddddddddddddddd
aaaaaassssssssssssddddddddddddddd
And I really don't get why is it so.
As you've presumably identified you're overrunning the sizeof(char) (~1 byte) block of memory you've asked malloc to give you, and you are printing a string that you have not specifically null terminated.
Either of these two things could lead to badness such as crashes but don't right now. Overrunning your allocated block of memory simply means that you are running into memory that you didn't ask malloc to give you. It could be memory malloc gave you anyway, a minimum allocation greater than 64 bytes would not be particularly surprising. Additionally since this is the only place you allocate memory in the heap you are unlikely to overwrite a memory address you use somewhere else (ie if you allocated a second string you might overrun the buffer of the first string and write into the space used for the second string). Even if you had multiple allocations your program might not crash until you tried to write to a memory address the operating system hadn't allocated to the process. Typically operating systems allocate virtual memory as pages and then a memory allocator such as malloc is used within the process to distribute that memory and request more from the operating system. You probably had several MB of read/write virtual address space already allocated to the process and wouldn't crash until you exceeded that. Were you to have tried to write to the memory that contained your code you would likely have caused a crash due to the OS protecting that from writes (or if it didn't you would crash due to garbage instructions getting executed). That's probably enough on why you didn't crash due to an overflow. I'd suggest having fun experimenting by sending it more data to see how much you can get to work correctly without it crashing, though it may vary from run to run.
Now the other place you could have crashed or gotten incorrect behavior is in printing out your string because printf assumes a null byte terminated string, that it starts at the address of the pointer and prints until it reads a byte with value 0. Since you didn't initialize the memory yourself this could have been forever. However, it terminated printing in exactly the right spot. This means that byte 'just happened' to be 0. But that's a simplification. On a 'reasonable' modern OS the kernel will zero (write 0s to) the memory that it allocates to the process to prevent leaking information from prior users of the memory. Since this is the first/only allocation you've done the memory is all shiny and clean, but had you freed memory previously malloc might reuse it and then it would have non zero values from stuff your process had written.
Now useful advice to detect these problems in future even on programs that appear to work perfectly. If you are working on Linux (on OS X you'll need to install it) I suggest running 'small' programs through valgrind to see if they produce errors. As an exercise and an easy way to learn what the output looks like where you already know the errors try it on this program. Since valgrind slows things down you may get frustrated running a 'large' program through it, but 'small' will cover most single projects (ie always run valgrind for a school project and fix the errors).
With additional information about the environment your program is running in could lead to further explanations of implementation specific behavior. ie C implementation or OS memory zeroing behavior.
I am currently using dlmalloc() to see how much faster it can be than the original libc malloc().
However, running free() keeps giving me a segmentation fault...
Does anyone know some logical reasons why this could keep happening?
A segfault inside the memory management functions almost always indicates that you've done something wrong (like overwriting memory beyond the valid bounds) before the call that actually segfaults.
Running your code under Valgrind may help you determine the real source of the problem.
I would be looking first into memory corruption issues. For example, if you allocate N bytes and then write to N+100 of them, you're very likely to corrupt the memory arena.
That's because many implementations keep their housekeeping information (block sizes, list pointers and so on) in-line (between the actual data areas).
Another possibility would be double freeing of blocks which may cause problems if that memory has since been used for some other allocation (especially if your address is now in the middle of a data area rather than at the beginning).
First things first, make sure you're following the rules. Any thing else is undefined behaviour and all bets are off.
You may also want to post the source code for the problem you're having so we can examine it. If you do this, try to reduce it to the smallest example that exhibits the problem. Only the most dedicated SOer (a) will want to look over some 10,000-line behemoth to find your issue .
(a) And I'm certainly not that dedicated :-)
I'm trying to read a binary file that has blocks starting with an identifier (like a 3DS file). I loop through the file and using a switch the program determines what identifier a block has and then reads the data into the file struct. Sometimes I need to use malloc to allocate memory for dynamic sized data. While reading, the switch often goes through the same case wherin memory is allocated, but at a specific point in the file it crashes on that same malloc. The file that I want to read is about 1MB. But when I try the program with another file of about 10kB and the same structure, it reads it succesfully.
What could be causing this problem?
The error code that I get when debugging is:
Heap corruption detected at 0441F080
HEAP[prog.exe]: HEAP: Free Heap block 441f078 modified at 441f088 after it was freed
Also when I execute it in debug mode, for some reason I can read more data from the file. The program lives longer before it crashes.
Here is the code piece where it crashes:
switch (id) {
case 0x62:
case 0x63:
// ...
{
char n_vertices = id - 0x60 + 1;// just how I calculate the n_vertices from the block ID
fread(&mem.blocks[i].data.attr_6n.height, 2, 1, f);
mem.blocks[i].data.attr_6n.vertices = malloc(2 * n_vertices);// crash
for (short k = 0; k < n_vertices; k++) {
fread(&mem.blocks[i].data.attr_6n.vertices[k], 2, 1, f);// read shorts
}
}
break;
// ...
}
You probably have a corrupt heap. This could be caused by invalid deallocations (deallocating unowned or already free memory), or by some random chunk of code writing outside its memory area into a place that happens to hold the heap bookeeping data structures. This most likely will be a piece of code that has nothing whatsoever to do with that dynamically allocated memory.
Tracking down bugs like this is a real bear. They tend to appear long after the offending code has executed, and they have an annoying tendency to turn into heisenbugs (bugs that move or go away when you attempt to debug them).
My suggestion for approaching debugging would be to try to comment out portions of your code and see what causes the problem to go away. That isn't foolproof, as you could just end up moving the out-of-bounds write to somewhere else.
Looking over the code you just posted, one thing I would highly suggest you do is verify that your malloc specified enough memory to hold all the data you are attempting to load into it. It looks to me like you are assuming 2 bytes for each vertex. That seems a bit suspicious to me. I don't know your code, but 4 or 8 would be much more common element sizes to see there. Regardless, industry practice is to use sizeof() on the target type to help ensure you have it right.
Another option, if that debugger message of yours can show you where it is happening, would be to put a debugger watch point there (or write some watching code...or manually dump and inspect the area) when stepping in the debugger to try to figure out which is the offending line of code.
Good luck. I hate these bugs.
Most likely the heap gets corrupted somehow, malloc is crashing e.g. trying to traverse a corrupted linked list of free blocks (or a similar structure, I'm not exactly sure what is used in modern heap allocators these days).
Make sure your code is not writing past the end of an allocated block.
You need to run this in a memory debugger like valgrind. Since it looks like you're on windows, see the following: Is there a good Valgrind substitute for Windows?
I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.
Here's my current code:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13;. I also shouldn't be able read the memory locations in the last 2 printfs in main().
Can someone please explain this to me and if I can/should do anything about it?
EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.
EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...
Just don't do it, it's undefined behavior.
It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.
Just don't do it - only read/write memory you legally allocated.
Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).
Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.
As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.
In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:
Corruption of other malloc'ed data
Corruption of stack variables, or the call stack itself, causing crashes when a function return's
Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..
Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:
Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.
Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.
But still you should not do it. Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen e.g. the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc-ed memory.
(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)