find the source of crash in C program - c

I have a code base where the process running on linux is crashing in free and sometimes in malloc:
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7424e30 in raise () from /lib/libc.so.6
#2 0xf7426765 in abort () from /lib/libc.so.6
#3 0xf7469d33 in malloc_printerr () from /lib/libc.so.6
#4 0xf746e7bc in free () from /lib/libc.so.6
#5 0xf6047e25 in myFree (mem_ptr=0x82165d8) at ../my_code/mylib.c:78
#6 0xf6014a10 in FreeBuffer (buffer=0x82042f8)
In the code I am not seeing anything fishy at the place where memory freeing is happening.
Function myFree() has nothing but a free() function call.
void FreeBuffer(struct MY_BUFFER *buffer)
{
if (buffer)
{
myFree(buffer);
}
}
void myFree(void *mem_ptr)
{
free(mem_ptr);
}
I tried using MALLOC_CHECK_ but it was not of any help.
I suspect some where the heap is getting corrupted and want to find that.
Any hint to proceed to debug the process in such cases?

There is hard to be 100% sure but most probably you are trying to free memory twice. It's great that you are checking if pointer is not NULL before trying to free it
if (buffer)
{
myFree(buffer);
}
but you are probably NOT setting pointer NULL after doing exact free job - that's my guess.
Check if you have something like this
struct MY_BUFFER *buffer;
//do something with buffer
FreeBuffer(buffer)
buffer = NULL;
You can do this as well inside FreeBuffer() function but to do so you will have to pass address of pointer AKA pointer to pointer so the definition will be like this FreeBuffer(struct MY_BUFFER **buffer)

Related

Seg Fault at end Main, GDB output provided

Sorry, I'm new to c programming
As the title says, The code runs perfectly till the end of main where it returns 0. It then gives a seg fault with no reason why. Some answers said that maybe I wasnt freeing all that I malloced but I did. So I tried using gdb to figure out why. This was the first time I've ever used it.
This is the output:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7644f1d in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff7644f1d in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff76450aa in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff760365b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff76036f5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff75eaecc in __libc_start_main ()
from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000400bc9 in _start ()
My main:
int main(int argc, char *argv[]) {
if(argv[1] == NULL)
{
printf("Please enter the path to the map generating file as an argument.\n");
exit(0);
}
run(getName(), argv[1]);
return 0;
}
My program is a ncurses program, which i can (I believe I am) succesfully create the screen and then close it. I've checked that all the malloced variables have been freed as well.
Run is in a diffrent c file where I draw the ncurses board.
Any help would be appreciated.
It's not sure that gdb will give you the exact location when the corruption really happens. (the backtrace suggests it's a stack-related one)
For this kind of errors, the best tool is Valgrind, run your app with it.
(In my experience memory corruption errors could be tracked down and eliminated within minutes with Valgrind)

signal 11 SIGSEGV in malloc?

I usually love good explained questions and answers. But in this case I really can't give any more clues.
The question is: why malloc() is giving me SIGSEGV? The debug bellow show the program has no time to test the returned pointer to NULL and exit. The program quits INSIDE MALLOC!
I'm assuming my malloc in glibc is just fine. I have a debian/linux wheezy system, updated, in an old pentium (i386/i486 arch).
To be able to track, I generated a core dump. Lets follow it:
iguana$gdb xadreco core-20131207-150611.dump
Core was generated by `./xadreco'.
Program terminated with signal 11, Segmentation fault.
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xb76824bc in malloc () from /lib/i386-linux-gnu/libc.so.6
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x...) at xadreco.c:4519
#3 0x0804b93a in geramov (tabu=..., nmovi=0xbfd411f8) at xadreco.c:1473
#4 0x0804e7b7 in minimax (atual=..., deep=1, alfa=-105000, bet...) at xadreco.c:2778
#5 0x0804e9fa in minimax (atual=..., deep=0, alfa=-105000, bet...) at xadreco.c:2827
#6 0x0804de62 in compjoga (tabu=0xbfd41924) at xadreco.c:2508
#7 0x080490b5 in main (argc=1, argv=0xbfd41b24) at xadreco.c:604
(gdb) frame 2
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x ...) at xadreco.c:4519
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
(gdb) l
4516
4517 void enche_pmovi (movimento **cabeca, movimento **pmovi, int c0, int c1, int c2, int c3, int p, int r, int e, int f, int *nmovi)
4518 {
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
4520 if (paux == NULL)
4521 exit(1);
Of course I need to look at frame 2, the last on stack related to my code. But the line 4519 gives SIGSEGV! It does not have time to test, on line 4520, if paux==NULL or not.
Here it is "movimento" (abbreviated):
typedef struct smovimento
{
int lance[4]; //move in integer notation
int roque; // etc. ...
struct smovimento *prox;// pointer to next
} movimento;
This program can load a LOT of memory. And I know the memory is in its limits. But I thought malloc would handle better when memory is not available.
Doing a $free -h during execution, I can see memory down to as low as 1MB! Thats ok. The old computer only has 96MB. And 50MB is used by the OS.
I don't know to where start looking. Maybe check available memory BEFORE a malloc call? But that sounds a wast of computer power, as malloc would supposedly do that. sizeof (movimento) is about 48 bytes. If I test before, at least I'll have some confirmation of the bug.
Any ideas, please share. Thanks.
Any crash inside malloc (or free) is an almost sure sign of heap corruption, which can come in many forms:
overflowing or underflowing a heap buffer
freeing something twice
freeing a non-heap pointer
writing to freed block
etc.
These bugs are very hard to catch without tool support, because the crash often comes many thousands of instructions, and possibly many calls to malloc or free later, in code that is often in a completely different part of the program and very far from where the bug is.
The good news is that tools like Valgrind or AddressSanitizer usually point you straight at the problem.

Why am I getting a seg fault?

After getting a seg fault, I used the gdb a.out core command. Afterwards I used backtrace (bt) and this is what gdb tells me
warning: core file may not match specified executable file.
warning: Error reading shared library list entry at 0xfbe8
warning: Error reading shared library list entry at 0x74c085ff
Core was generated by 'family.out smith.ged'.
Program terminated with signal 11, Segmentation fault.
(poundsign)0 0x08086a6 in count_records ()
(gdb) bt
(poundsign)0 0x080486a6 in count_records()
(poundsign)1 0x08048906 in __libc_csu_init ()
(poundsign)2 0xbf85624c in ??()
(poundsign)3 0xbf856310 in ?? ()
Backtrace stopped: previous fram inner to this frame(corrupt stack?)
Could someone give me some insight as to what might have caused this seg fault? Usually gdb gives me the line number in the program, but this time it didn't.
What likely happen here is you've corrupted the stack. A lot of the state of the program (including all the stack frames that tell you what function you're in) resides on the stack, so once that gets overwritten, the debugger only has corrupt information to work with.
A common way to do this is to overflow a buffer declared as a local variable as a string, e.g.
int main()
{
char buf[4];
return func1(buf);
}
int func1(char* theBuf)
{
return func2(theBuf);
}
int func2(char* sameBufBackSomeplaceInTheStack)
{
sprintf(sameBufBackSomeplaceInTheStack, "The stack is doomed.");
return 0;
}
Results may vary, but my destroyed stack looks like this in the debugger after I do this:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x0000000100000d00 in _mh_execute_header ()
(gdb) where
#0 0x0000000100000d00 in _mh_execute_header ()
#1 0x0000000000000000 in ?? ()
(gdb)
Anyway, somewhere someplace your program has overwritten the stack, which is often challenging to debug...

Segmenation fault in malloc()?

I am getting a segmentation fault inside of the malloc() routine. Here is the stacktrace from gdb:
#0 0x00007ffff787e882 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff787fec6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7882a45 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000403ab0 in xmalloc (size=1024) at global.c:14
#4 0x00000000004020fb in processConnectionQueue (arguments=0x60a4e0)
at connection.c:117
#5 0x00007ffff7bc4e9a in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007ffff78f24bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()
What's going on? What could cause malloc() to segfault?
EDIT: Here is the code from xmalloc(). It's pretty standard, and as you can see from the stacktrace it's calling malloc with a size of 1024.
void* xmalloc(size_t size)
{
void* result = malloc(size);
if(!result)
{
if(!size)
{
result = malloc(1);
}
if(!result)
{
fprintf(stderr, "Error allocating memory of size %zu\n", size);
exit(-1);
}
}
return result;
}
And line 117 in connection.c:
item->readBuffer = xmalloc(kInitialPacketBufferSize);
You are most likely seeing the effect of an error elsewhere in your code, that access memory outside an allocation. If you are lucky enough, your code can touch some of the internal values malloc uses to track allocations.
If you have the possibility, try linking your code with an allocation checker like libefence or similar, and use this to locate the real problem.
Can you check the line
#3 0x0000000000403ab0 in xmalloc (size=1024) at global.c:14
the global.c for the size=1024.
and also you can use some tools as Wegge mentioned to detect some issue regrading this segfault.
There are at least three basic causes of this.
You overran the item returned by malloc() and wrote something before its beginning or after its end.
You freed something twice, which in a way is a special case of
You freed something that wasn't a malloc() result.
Any of these actions will corrupt the malloc)) heap, which causes it to malfunction.

Heap corruption in HP-UX?

I'm trying to understand what's going wrong with a program run in HP-UX 11.11 that results in a SIGSEGV (11, segmentation fault):
(gdb) bt
#0 0x737390e8 in _sigfillset+0x618 () from /usr/lib/libc.2
#1 0x73736a8c in _sscanf+0x55c () from /usr/lib/libc.2
#2 0x7373c23c in malloc+0x18c () from /usr/lib/libc.2
#3 0x7379e3f8 in _findbuf+0x138 () from /usr/lib/libc.2
#4 0x7379c9f4 in _filbuf+0x34 () from /usr/lib/libc.2
#5 0x7379c604 in __fgets_unlocked+0x84 () from /usr/lib/libc.2
#6 0x7379c7fc in fgets+0xbc () from /usr/lib/libc.2
#7 0x7378ecec in __nsw_getoneconfig+0xf4 () from /usr/lib/libc.2
#8 0x7378f8b8 in __nsw_getconfig+0x150 () from /usr/lib/libc.2
#9 0x737903a8 in __thread_cond_init_default+0x100 () from /usr/lib/libc.2
#10 0x737909a0 in nss_search+0x80 () from /usr/lib/libc.2
#11 0x736e7320 in __gethostbyname_r+0x140 () from /usr/lib/libc.2
#12 0x736e74bc in gethostbyname+0x94 () from /usr/lib/libc.2
#13 0x11780 in dnetResolveName (name=0x400080d8 "smtp.org.com", hent=0x737f3334) at src/dnet.c:64
..
The problem seems to be occurring somewhere inside libc! A system call trace ends with:
Connecting to server smtp.org.com on port 25
write(1, "C o n n e c t i n g t o s e ".., 51) .......................... = 51
open("/etc/nsswitch.conf", O_RDONLY, 0666) ............................... [entry]
open("/etc/nsswitch.conf", O_RDONLY, 0666) ................................... = 5
Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
Siginfo: si_code: I_NONEXIST, faulting address: 0x400118fc, si_errno: 0
PC: 0xc01980eb, instruction: 0x0d3f1280
exit(11) [implicit] ............................ WIFSIGNALED(SIGSEGV)|WCOREDUMP
Last instruction by the program:
struct hostent *him;
him = gethostbyname(name); // name == "smtp.org.com" as shown by gdb
Is this a problem with the system, or am I missing something?
Any guidance for digging deeper would be appreciated.
Thx.
Long story short: vsnprintf corrupted my heap under HP-UX 11.11.
vsnprintf was introduced in C99 (ISO/IEC 9899:1999) and "is equivalent to snprintf, with the variable argument list" (§7.19.6.12.2), snprintf (§7.19.6.5.2): "If n is zero, nothing is written".
Well, HP UX 11.11 doesn't comply with this specification. When 2nd arg == 0, arguments are written at the end of the 1st arg.. which, of course, corrupts the heap (I allocate no space when maxsize==0, given that nothing should be written).
HP manual pages are unclear ("It is the user's responsibility to ensure that enough storage is available."), nothing is said regarding the case of maxsize==0. Nice trap.. at the very least, the WARNINGS section of the man page should warn std-compliant users..
It's an egg/chicken pb: vnsprintf is variadic, so for the "user's responsibility" to ensure that enough storage is available" the "user's responsibility" must first know how much space is needed. And the best way to do that is to call vnsprintf with 2nd arg == 0: it should then return the amount of space required and sprintfs nothing.. well, except HP's !
One solution to use vnsprintf under this std violation to determine needed space: malloc 1 byte more to your buffer (1st arg) and call vnsprintf(buf+buf.length,1,..). This only puts a \0 in the new byte you allocated. Silly, but effective. If you're under wchar conditions, malloc(sizeof..).
Anyway, workaround is trivial : never call v/snprintf under HP-UX with maxsize==0!
I now have a happy stable program!
Thanks to all contributers.
Heap corruption through vsnprintf under HP-UX B11.11
This program prints "##" under Linux/Cygwin/..
It prints "#fooo#" under HP-UX B11.11:
#include <stdarg.h>
#include <stdio.h>
const int S=2;
void f (const char *fmt, ...) {
va_list ap;
int actualLen=0;
char buf[S];
bzero(buf, S);
va_start(ap, fmt);
actualLen = vsnprintf(buf, 0, fmt, ap);
va_end(ap);
printf("#%s#\n", buf);
}
int main () {
f("%s", "fooo");
return 0;
}
Whenever this situation happens to me (unexpected segfault in a system lib), it is usually because I did something foolish somewhere else, i.e. buffer overrun, double delete on a pointer, etc.
In those instances where my mistake is not obvious, I use valgrind. Something like the following is usually sufficient:
valgrind -v --leak-check=yes --show-reachable=yes ./myprog
I assume valgrind may be used in HP-UX...
Your stack trace is in malloc which almost certainly means that somewhere you corrupted one of malloc's data structures. As a previous answer said, you likely have a buffer overrun or underrun and corrupted one of the items allocated off the heap.
Another explanation is that you tried to do a free on something that didn't come from the heap, but that's less likely--that would probably have crashed right in free.
Reading the (OS X) manpage says that gethostbyname() returns a pointer, but as far as I can tell may not be allocating memory for that pointer. Do you need to malloc() first? Try this:
struct hostent *him = malloc(sizeof(struct hostent));
him = gethostbyname(name);
...
free(him);
Does that work any better?
EDIT: I tested this and it's probably wrong. Granted I used the bare string "stmp.org.com" instead of a variable, but both versions (with and without malloc()ing) worked on OS X. Maybe HP-UX is different.

Resources