Why am I getting a seg fault? - c

After getting a seg fault, I used the gdb a.out core command. Afterwards I used backtrace (bt) and this is what gdb tells me
warning: core file may not match specified executable file.
warning: Error reading shared library list entry at 0xfbe8
warning: Error reading shared library list entry at 0x74c085ff
Core was generated by 'family.out smith.ged'.
Program terminated with signal 11, Segmentation fault.
(poundsign)0 0x08086a6 in count_records ()
(gdb) bt
(poundsign)0 0x080486a6 in count_records()
(poundsign)1 0x08048906 in __libc_csu_init ()
(poundsign)2 0xbf85624c in ??()
(poundsign)3 0xbf856310 in ?? ()
Backtrace stopped: previous fram inner to this frame(corrupt stack?)
Could someone give me some insight as to what might have caused this seg fault? Usually gdb gives me the line number in the program, but this time it didn't.

What likely happen here is you've corrupted the stack. A lot of the state of the program (including all the stack frames that tell you what function you're in) resides on the stack, so once that gets overwritten, the debugger only has corrupt information to work with.
A common way to do this is to overflow a buffer declared as a local variable as a string, e.g.
int main()
{
char buf[4];
return func1(buf);
}
int func1(char* theBuf)
{
return func2(theBuf);
}
int func2(char* sameBufBackSomeplaceInTheStack)
{
sprintf(sameBufBackSomeplaceInTheStack, "The stack is doomed.");
return 0;
}
Results may vary, but my destroyed stack looks like this in the debugger after I do this:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x0000000100000d00 in _mh_execute_header ()
(gdb) where
#0 0x0000000100000d00 in _mh_execute_header ()
#1 0x0000000000000000 in ?? ()
(gdb)
Anyway, somewhere someplace your program has overwritten the stack, which is often challenging to debug...

Related

why does readlink function cause a segmentation fault instead of moving the path string to a pointer?

when I run the code
char* dirPath = (char*) malloc(pathSize); // pathSize is 512 and its pre defined
readlink("/proc/self/exe",dirPath,pathSize); //segfault here
it segfaults. I've tried increasing the value of pathSize or passing the pathSize value one larger into readlink.I've also put the /proc/self/exe in separate variable and passing that that didn't work either. running the program through gdb says.
Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
384 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
the code work for a long time but only recently broke

Seg Fault at end Main, GDB output provided

Sorry, I'm new to c programming
As the title says, The code runs perfectly till the end of main where it returns 0. It then gives a seg fault with no reason why. Some answers said that maybe I wasnt freeing all that I malloced but I did. So I tried using gdb to figure out why. This was the first time I've ever used it.
This is the output:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7644f1d in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff7644f1d in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff76450aa in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff760365b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff76036f5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff75eaecc in __libc_start_main ()
from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000400bc9 in _start ()
My main:
int main(int argc, char *argv[]) {
if(argv[1] == NULL)
{
printf("Please enter the path to the map generating file as an argument.\n");
exit(0);
}
run(getName(), argv[1]);
return 0;
}
My program is a ncurses program, which i can (I believe I am) succesfully create the screen and then close it. I've checked that all the malloced variables have been freed as well.
Run is in a diffrent c file where I draw the ncurses board.
Any help would be appreciated.
It's not sure that gdb will give you the exact location when the corruption really happens. (the backtrace suggests it's a stack-related one)
For this kind of errors, the best tool is Valgrind, run your app with it.
(In my experience memory corruption errors could be tracked down and eliminated within minutes with Valgrind)

trace variable change using valgrind and gdb

I have a program which SIGABRT after >5hrs of execution. It is most likely cause by memory leak after checking by valgrind, but I have problem trace down onto which variable actually causes this issue based on valgrind report (which simply contains addresses and ???).
I try to use valgrind and gdb to step through. However since it takes 5hrs to reach the leak (after looping for 428 rounds), I would like to set a breakpoint, let say, when loop=428, and step into the codes. How can I do that?
Based on a simple program below, may I know,
a) how to trace change of value in variable 'a'?
b) how to set a breakpoint when loop = 428?
typedef struct data_attr {
int a[2500];
}stdata;
typedef struct pcfg{
stdata *data;
}stConfig;
int funcA(stConfig* pt){
int loop = 0;
while (loop < NUM_NODE){
pt->data->a[0] = 1000;
pt->data->a[0] = 1001;
loop++;
}
return 0;
}
int main(){
stConfig *p;
p = (stConfig*) malloc(sizeof(stConfig));
p->data = (stdata*) malloc (sizeof(stdata));
funcA(p);
free(p->data);
free (p);
return 0;
}
I am using valgrind 3.7 on ubuntu 10.04
# valgrind terminal,
valgrind -v --vgdb=yes --vgdb-error=0 --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=40 --track-origins=yes --log-file=mr3m1n2500_valgrind_0717_1155.txt ./pt m >& mr3m1n2500_logcheck_0717_1155.txt
# gdb terminal
I tried to get address of 'p' but it returns void, why?
> gdb ./pt
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 12857
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
[Switching to Thread 12857]
0x04000850 in _start () from /lib/ld-linux.so.2
(gdb) p $p
$1 = void
(gdb) bt 10
#0 0x04000850 in _start () from /lib/ld-linux.so.2
To trace the change in the value of a variable, you can set watch-point on that variable.
For your case, use: watch p->data->a[index]
To break at the required condition, you can use break break if loop_counter==428
From help break in GDB:
(gdb) help break
Set breakpoint at specified line or function.
break [LOCATION] [thread THREADNUM] [if CONDITION]
LOCATION may be a line number, function name, or "*" and an address.
If a line number is specified, break at start of code for that line.
If a function is specified, break at start of code for that function.
If an address is specified, break at that exact address.
With no LOCATION, uses current execution address of selected stack frame.
This is useful for breaking on return to a stack frame.
THREADNUM is the number from "info threads".
CONDITION is a boolean expression.
Multiple breakpoints at one place are permitted, and useful if conditional.
Do "help breakpoints" for info on other commands dealing with breakpoints.
To set a breakpoint on a condition, use break if condition, in your case break if loop_counter == 428 or similar.
a) To set a break point of that loop if can do something like:
if(loop == 428)
int nop = 0;
And then set the break point for the line int nop = 0. Like this the program only stops when that line is executed which happens in loop 428.
b) I am not sure about this one. Where are you trying to examine the value of 'p'?.
For your first question, how to trace change of value in variable 'a'?
Please use "watch",
watch [-l|-location] expr [thread threadnum] [mask maskvalue]
Set a watchpoint for an expression. gdb will break when the expression expr is written into by the program and its value changes. The simplest (and the most popular) use of this command is to watch the value of a single variable:
(gdb) watch foo
Joachim Pileborg have the answer of your second question.
For your third question, you need to set a break at the line
p->data = (stdata*) malloc (sizeof(stdata));
and then try to print the value of "p".

Heap corruption in HP-UX?

I'm trying to understand what's going wrong with a program run in HP-UX 11.11 that results in a SIGSEGV (11, segmentation fault):
(gdb) bt
#0 0x737390e8 in _sigfillset+0x618 () from /usr/lib/libc.2
#1 0x73736a8c in _sscanf+0x55c () from /usr/lib/libc.2
#2 0x7373c23c in malloc+0x18c () from /usr/lib/libc.2
#3 0x7379e3f8 in _findbuf+0x138 () from /usr/lib/libc.2
#4 0x7379c9f4 in _filbuf+0x34 () from /usr/lib/libc.2
#5 0x7379c604 in __fgets_unlocked+0x84 () from /usr/lib/libc.2
#6 0x7379c7fc in fgets+0xbc () from /usr/lib/libc.2
#7 0x7378ecec in __nsw_getoneconfig+0xf4 () from /usr/lib/libc.2
#8 0x7378f8b8 in __nsw_getconfig+0x150 () from /usr/lib/libc.2
#9 0x737903a8 in __thread_cond_init_default+0x100 () from /usr/lib/libc.2
#10 0x737909a0 in nss_search+0x80 () from /usr/lib/libc.2
#11 0x736e7320 in __gethostbyname_r+0x140 () from /usr/lib/libc.2
#12 0x736e74bc in gethostbyname+0x94 () from /usr/lib/libc.2
#13 0x11780 in dnetResolveName (name=0x400080d8 "smtp.org.com", hent=0x737f3334) at src/dnet.c:64
..
The problem seems to be occurring somewhere inside libc! A system call trace ends with:
Connecting to server smtp.org.com on port 25
write(1, "C o n n e c t i n g t o s e ".., 51) .......................... = 51
open("/etc/nsswitch.conf", O_RDONLY, 0666) ............................... [entry]
open("/etc/nsswitch.conf", O_RDONLY, 0666) ................................... = 5
Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
Siginfo: si_code: I_NONEXIST, faulting address: 0x400118fc, si_errno: 0
PC: 0xc01980eb, instruction: 0x0d3f1280
exit(11) [implicit] ............................ WIFSIGNALED(SIGSEGV)|WCOREDUMP
Last instruction by the program:
struct hostent *him;
him = gethostbyname(name); // name == "smtp.org.com" as shown by gdb
Is this a problem with the system, or am I missing something?
Any guidance for digging deeper would be appreciated.
Thx.
Long story short: vsnprintf corrupted my heap under HP-UX 11.11.
vsnprintf was introduced in C99 (ISO/IEC 9899:1999) and "is equivalent to snprintf, with the variable argument list" (§7.19.6.12.2), snprintf (§7.19.6.5.2): "If n is zero, nothing is written".
Well, HP UX 11.11 doesn't comply with this specification. When 2nd arg == 0, arguments are written at the end of the 1st arg.. which, of course, corrupts the heap (I allocate no space when maxsize==0, given that nothing should be written).
HP manual pages are unclear ("It is the user's responsibility to ensure that enough storage is available."), nothing is said regarding the case of maxsize==0. Nice trap.. at the very least, the WARNINGS section of the man page should warn std-compliant users..
It's an egg/chicken pb: vnsprintf is variadic, so for the "user's responsibility" to ensure that enough storage is available" the "user's responsibility" must first know how much space is needed. And the best way to do that is to call vnsprintf with 2nd arg == 0: it should then return the amount of space required and sprintfs nothing.. well, except HP's !
One solution to use vnsprintf under this std violation to determine needed space: malloc 1 byte more to your buffer (1st arg) and call vnsprintf(buf+buf.length,1,..). This only puts a \0 in the new byte you allocated. Silly, but effective. If you're under wchar conditions, malloc(sizeof..).
Anyway, workaround is trivial : never call v/snprintf under HP-UX with maxsize==0!
I now have a happy stable program!
Thanks to all contributers.
Heap corruption through vsnprintf under HP-UX B11.11
This program prints "##" under Linux/Cygwin/..
It prints "#fooo#" under HP-UX B11.11:
#include <stdarg.h>
#include <stdio.h>
const int S=2;
void f (const char *fmt, ...) {
va_list ap;
int actualLen=0;
char buf[S];
bzero(buf, S);
va_start(ap, fmt);
actualLen = vsnprintf(buf, 0, fmt, ap);
va_end(ap);
printf("#%s#\n", buf);
}
int main () {
f("%s", "fooo");
return 0;
}
Whenever this situation happens to me (unexpected segfault in a system lib), it is usually because I did something foolish somewhere else, i.e. buffer overrun, double delete on a pointer, etc.
In those instances where my mistake is not obvious, I use valgrind. Something like the following is usually sufficient:
valgrind -v --leak-check=yes --show-reachable=yes ./myprog
I assume valgrind may be used in HP-UX...
Your stack trace is in malloc which almost certainly means that somewhere you corrupted one of malloc's data structures. As a previous answer said, you likely have a buffer overrun or underrun and corrupted one of the items allocated off the heap.
Another explanation is that you tried to do a free on something that didn't come from the heap, but that's less likely--that would probably have crashed right in free.
Reading the (OS X) manpage says that gethostbyname() returns a pointer, but as far as I can tell may not be allocating memory for that pointer. Do you need to malloc() first? Try this:
struct hostent *him = malloc(sizeof(struct hostent));
him = gethostbyname(name);
...
free(him);
Does that work any better?
EDIT: I tested this and it's probably wrong. Granted I used the bare string "stmp.org.com" instead of a variable, but both versions (with and without malloc()ing) worked on OS X. Maybe HP-UX is different.

memset and SIGSEGV

I have been facing a weird issue in a piece of code.
void app_ErrDesc(char *ps_logbuf, char *pc_buf_err_recno)
{
char *pc_logbuf_in;
char rec_num[10];
char *y = "|";
int i, j;
memset(rec_num, 0, sizeof(rec_num));
memset(pc_buf_err_recno, 0, LOGBUFF);
.....
.....
}
For some reason the first memset call sends a SIGSEGV. Whats more strange is when
inside gdb the same line executes for about 30 times though the function is called
only once and there are no loops inside! Here's a piece of gdb session.
7295 /*Point to logbuffer string*/
(gdb)
7292 memset(rec_num, 0, sizeof(rec_num));
(gdb)
7295 /*Point to logbuffer string*/
(gdb)
7292 memset(rec_num, 0, sizeof(rec_num));
(gdb) n
7295 /*Point to logbuffer string*/
(gdb)
7292 memset(rec_num, 0, sizeof(rec_num));
(gdb)
Program received signal SIGSEGV, Segmentation fault.
I have also tried running the program through valgrind's memcheck tool but not getting anything significant about the above piece of code.
The file that I'm parsing has just one record.
Any pointers are appreciated. Thanks.
It's likely that it's the second memset and the reason is that the outer function is called with an insufficient buffer size. Debuggers can show incorrectly where you are. Try to add logging after each step to find out what exactly crashes.
i suspect the call to the function, so ensure the call is not something like
char pc_buf_err_recno[SMALLER_THAN_LOGBUFF];
char ps_logbuf[TOO_SMALL]
app_ErrDesc(ps_logbuf, pc_buf_err_recno);
Debuggers can be incorrect, particularly if you're getting SEGV. Remember, it's quite possible you've trashed the stack when you get a segmentation fault and the debugger will get confused if that happens.
It's also quite possible the calling function has made a mess, not the current one.
Whats more strange is when inside gdb the same line executes for about 30 times though the function is called only once and there are no loops inside!
This sounds symptomatic of having compiled with optimizations. You may have an easier time pinpointing the problem in GDB if you compile with optimizations turned off.

Resources