How can this code corrupt my stack trace? - c

This simple test program,
$ cat crash.c
int main() {
int x = 0;
*(&x + 5) = 10;
return 0;
}
Compiled with GCC 7.4.0,
$ gcc -O0 -g crash.c
Has an unexpected stack trace
$ ./a.out
Segmentation fault (core dumped)
$ gdb ./a.out /tmp/wk_cores/core-pid_19675.dump
Reading symbols from ./a.out...done.
[New LWP 19675]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f450000000a in ?? ()
(gdb) bt
#0 0x00007f450000000a in ?? ()
#1 0x0000000000000001 in ?? ()
#2 0x00007fffd6f97598 in ?? ()
#3 0x0000000100008000 in ?? ()
#4 0x00005632be83d66a in frame_dummy ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I don't understand why the stack doesn't show the invalid store to privileged memory? Can someone help me understand this?

I don't understand why the stack doesn't show the invalid store to privileged memory?
Because you didn't store anything to privileged memory.
To do that, you need to write way outside of stack, something like:
*(&x + 0x10000) = 5;
As is, your program does exhibit undefined behavior, but it doesn't write to "privileged" memory, just to memory that is writable, but that you should't write to.

x is the latest variable in your stack. So if you write at x+5, no matter how far you go, you always write in stack memory after the region of the current allocated stack. Therefore it always fail.

Related

How to "unwind" a core dump in gdb

I'm not sure if this is the correct wording of the issue, but let's take the following example where I have a program that will crash/abort:
#include <assert.h>
int main(void)
{
int z=2;
assert (z>5);
}
And if I compile it with debugging and then run it:
$ gcc -ggdb3 a.c -o a.o && ./a.o
a.o: a.c:8: main: Assertion `z>5' failed.
Aborted (core dumped)
Now I'll open it up in gdb to see if I can inspect the program:
$ gdb a.o core
Core was generated by `./a.o'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
If I now "run" the program with r I will get something like this (from gdb-dashboard viewer):
My question is the stack is now pretty deep into the C runtime / linux:
─── Stack ──────────────────────────────────────────────────────────────────────────────────────────
[0] from 0x00007ffff7a22f47 in __GI_raise+199 at ../sysdeps/unix/sysv/linux/raise.c:51
[1] from 0x00007ffff7a248b1 in __GI_abort+321 at abort.c:79
[2] from 0x00007ffff7a1442a in __assert_fail_base+330 at assert.c:92
[3] from 0x00007ffff7a144a2 in __GI___assert_fail+66 at assert.c:101
[4] from 0x00005555555546ce in main+52 at a.c:8
Is it possible that I can unwind the stack to where the error was triggered:
[4] from 0x00005555555546ce in main+52 at a.c:8
So that I can see what the registers, variables, etc. were at that point? Another way to phrase the question is "How do I ignore things outside my code when inspecting a core dump / gdb" ?
Here's a stab at this:
To travel up or down the callstack, use up|down. In this case we do up 4 to get back to main:
>>> up 4
#4 0x000055555555467e in main () at a.c:7
7 assert (z>5);
info frame and info locals can tell us high-level information about the function:
>>> info locals
z = 2
__PRETTY_FUNCTION__ = "main"
>>> info frame
Stack level 4, frame at 0x7fffffffe0e0:
rip = 0x55555555467e in main (a.c:7); saved rip = 0x7ffff7a05b97
caller of frame at 0x7fffffffe0c0
source language c.
Arglist at 0x7fffffffe0d0, args:
Locals at 0x7fffffffe0d0, Previous frame's sp is 0x7fffffffe0e0
Saved registers:
rbp at 0x7fffffffe0d0, rip at 0x7fffffffe0d8
For example, given the above, we can make a guess that the un-optimized assembly would put z into %rbp-4 and we can examine its value there:
>>> x/d $rbp-4
0x7fffffffe0cc: 2
# or, in long-form to ensure our rbp address above from `info` is the same:
>>> x/d 0x7fffffffe0d0-4
0x7fffffffe0cc: 2

gdb bt gives only ??, how can I debug?

/var/log/message:
segfault at 0 ip 00007fcd16e5853a sp 00007ffd98e37e58 error 4 in libc-2.24.so[7fcd16dc9000+195000]
addr2line -e a.out 00007fcd16e5853a
??:0
gdb bt
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fcd16e5853a in ?? ()
(gdb) bt
#0 0x00007fcd16e5853a in ?? ()
#1 0x000055f2f45fe95b in ?? ()
#2 0x000055f200000080 in ?? ()
#3 0x00007fcd068c2040 in ?? ()
#4 0x000055f2f6109c48 in ?? ()
#5 0x0000000000000000 in ?? ()
build with gcc -Wall -O0 -g
How can I debug this, are there more methods?
gdb bt
Surely that is not the command you actually executed.
Most likely you did something like this:
gdb /path/to/core
(gdb) bt
Don't do that. Do this instead:
gdb /path/to/a.out /path/to/core
(gdb) bt
If you already did invoke GDB correctly, other likely reasons why bt did not work:
You are analyzing the core on a different machine from the one on which it was produced. See this answer.
You rebuilt a.out with different flags. Use the exact binary that crashed.
You have updated libc after the core was produced. Restore it to the version that was current as of when the core was produced.
P.S. This command
addr2line -e a.out 00007fcd16e5853a
makes no sense: the error message told you that the address 00007fcd16e5853a is in libc-2.24.so. The a.out has nothing to do with that address.
The command you want to use is:
addr2line -fe /path/to/libc-2.24.so 195000
P.P.S.
segfault at 0 ip 00007fcd16e5853a ...
This means: NULL pointer dereference inside libc. The most probable cause: not checking for error return, e.g. something like:
FILE *fp = fopen("/some/file", "r");
fscanf(fp, buffer, sizeof(buffer)); // Oops: didn't check for NULL.

Inspecting caller frames with gdb

Suppose I have:
#include <stdlib.h>
int main()
{
int a = 2, b = 3;
if (a!=b)
abort();
}
Compiled with:
gcc -g c.c
Running this, I'll get a coredump (due to the SIGABRT raised by abort()), which I can debug with:
gdb a.out core
How can I get gdb to print the values of a and b from this context?
Here's the another way to specifically get a and b values by moving to the interested frame and then info locals would give you the values.
a.out was compiled with your code. (frame 2 is what you are interested in i.e., main()).
$ gdb ./a.out core
[ removed some not-so-interesting info here ]
Reading symbols from ./a.out...done.
[New LWP 14732]
Core was generated by `./a.out'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fac16269f5d in __GI_abort () at abort.c:90
#2 0x00005592862f266d in main () at f.c:7
(gdb) frame 2
#2 0x00005592862f266d in main () at f.c:7
7 abort();
(gdb) info locals
a = 2
b = 3
(gdb) q
You can also use print once frame 2:
(gdb) print a
$1 = 2
(gdb) print b
$2 = 3
Did you compile with debug symbols -g? The command should be bt for backtrace, you can also use bt full for a full backtrace.
More infos: https://sourceware.org/gdb/onlinedocs/gdb/Backtrace.html

Who called atexit()?

I have a C program that quits unexpectedly on Linux and I have a hard time finding out why (no core dump, see XIO: fatal IO error 11). I placed an atexit() at the beginning of the program and the callback function is indeed being called when the crash happens.
How can I know what called the atexit callback function? From reading the man page, atexit is called at exit (d'ho!) or return from main. I can exclude the latter because there are a bunch of printf at the end of the main and I don't see them. And I can exclude the former simply because there aren't any exit() in my program.
That leaves only one solution: exit is being called from a library function. Is that the only possibility? And how can I know from where? Is it possible to print out a stack trace or force a core dump from inside the atexit callback?
Call e.g. abort() in your atexit handler, and inspect the coredump in gdb. The gdb backtrace command shows you where it exits, if the atexit handler is run. Here's a demonstration:
#include <stdlib.h>
void exit_handler(void)
{
abort();
}
void startup()
{
#ifdef DO_EXIT
exit(99);
#endif
}
int main(int argc, char *argv[])
{
atexit(exit_handler);
startup();
return 0;
}
And doing this:
$ gcc -DDO_EXIT -g atexit.c
$ ulimit -c unlimited
$ ./a.out
Aborted (core dumped)
$ gdb ./a.out core.28162
GNU gdb (GDB) Fedora 7.7.1-19.fc20
..
Core was generated by `./a.out'.
Program terminated with signal SIGABRT, Aborted.
#0 0xb77d7424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install glibc-2.18-16.fc20.i686
(gdb) bt
#0 0xb77d7424 in __kernel_vsyscall ()
#1 0x42e1a8e7 in raise () from /lib/libc.so.6
#2 0x42e1c123 in abort () from /lib/libc.so.6
#3 0x0804851b in exit_handler () at atexit.c:6
#4 0x42e1dd61 in __run_exit_handlers () from /lib/libc.so.6
#5 0x42e1ddbd in exit () from /lib/libc.so.6
#6 0x0804852d in startup () at atexit.c:12
#7 0x08048547 in main (argc=1, argv=0xbfc39fb4) at atexit.c:21
As expected, it shows startup() calling exit.
You can ofcourse debug this interactively too, start your program in gdb and set a breakpoint in the atexit handler.
The standard only says "at normal program termination", so maybe on Linux this is more than exit or return from main. Also you forgot pthread_exit, which also may terminate the thread of main and thus the whole program.
In any case, there is no way to see immediatly from where the termination was issued. The atexit handlers are usually called by the initializtion function. By definition all other application code, but the atexit handlers are gone at that point.
You could try to trace execution through a debugger no nail the place where the termination happens down.

What does this mean in gdb?

Program received signal SIGSEGV, Segmentation fault.
0x08049795 in execute_jobs ()
Current language: auto; currently asm
(gdb) info symbol 0x08049795
execute_jobs + 22 in section .text
(gdb) ptype 0x08049795
type = int
How to get the line number at which the error occurred?
Your binary was not compiled with debugging information. Rebuild with at least -g (or -ggdb, or -ggdb -g3, see GCC manual.)
The exact lines from GDB output:
(gdb) info symbol 0x08049795 execute_jobs + 22 in section .text
means that instruction at address 0x08049795, which is 22 bytes from beginning of function execute_jobs, generated the segmentation fault.
(gdb) ptype 0x08049795 type = int
Here you are asking for type of an integer, and GDB happily replies. Do
(gdb) x/10i 0x08049795
or
(gdb) disassemble execute_jobs
to see actual instructions.
The gdb command "bt" will show you a back trace. Unless you've corrupted the stack this should show the sequence of function calls that lead to the segfault. To get more meaningful information make sure that you've compiled your program with debug information by including -g on the gcc/g++ command line.

Resources