How to make good use of stack trace (from kernel or core dump)? - c

If you are lucky when your kernel module crashes, you would get an oops with a log with a lot of information, such as values in the registers etc. One such information is the stack trace (The same is true for core dumps, but I had originally asked this for kernel modules). Take this example:
[<f97ade02>] ? skink_free_devices+0x32/0xb0 [skin_kernel]
[<f97aba45>] ? cleanup_module+0x1e5/0x550 [skin_kernel]
[<c017d0e7>] ? __stop_machine+0x57/0x70
[<c016dec0>] ? __try_stop_module+0x0/0x30
[<c016f069>] ? sys_delete_module+0x149/0x210
[<c0102f24>] ? sysenter_do_call+0x12/0x16
My guess is that the +<number1>/<number2> has something to do with the offset from function in which the error has occurred. That is, by inspecting this number, perhaps looking at the assembly output I should be able to find out the line (better yet, instruction) in which this error has occurred. Is that correct?
My question is, what are these two numbers exactly? How do you make use of them?

skink_free_devices+0x32/0xb0
This means the offending instruction is 0x32 bytes from the start of the function skink_free_devices() which is 0xB0 bytes long in total.
If you compile your kernel with -g enabled, then you can get the line number inside functions where the control jumped using the tool addr2line or our good old gdb
Something like this
$ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
or
$ gdb ./vmlinux
...
(gdb) l *0xc01cf0d1
0xc01cf0d1 is in read_chan (include/asm/bitops.h:244).
(...)
244 return ((1UL << (nr & 31)) & (((const volatile unsigned int *) addr)[nr >> 5])) != 0;
(...)
So just give the address you want to inspect to addr2line or gdb and they shall tell you the line number in the source file where the offending function is present
See this article for full details
EDIT: vmlinux is the uncompressed version of the kernel used for debugging and is generally found # /lib/modules/$(uname -r)/build/vmlinux provided you have built your kernel from sources. vmlinuz that you find at /boot is the compressed kernel and may not be that useful in debugging

For Emacs users, here's is a major mode to easily jump around within the stack trace (uses addr2line internally).
Disclaimer: I wrote it :)

regurgitating this answer you need to use faddr2line
In my case I had the following truncated call trace:
[ 246.790938][ T35] Call trace:
[ 246.794075][ T35] __switch_to+0x10c/0x180
[ 246.798348][ T35] __schedule+0x278/0x6e0
[ 246.802531][ T35] schedule+0x44/0xd0
[ 246.806368][ T35] rpm_resume+0xf4/0x628
[ 246.810463][ T35] __pm_runtime_resume+0x94/0xc0
[ 246.815257][ T35] macb_open+0x30/0x2b8
[ 246.819265][ T35] __dev_open+0x10c/0x188
and ran the following in the mainline linux kernel:
./scripts/faddr2line vmlinux macb_open+0x30/0x2b8
giving the output
macb_open+0x30/0x2b8:
pm_runtime_get_sync at include/linux/pm_runtime.h:386
(inlined by) macb_open at drivers/net/ethernet/cadence/macb_main.c:2726

Related

Debugging C: GDB returns "address where <file> has been loaded is missing"

I'm very new to the C language and have been tasked with modifying GRUB. What a way to learn, right? Anyway, I'm trying to debug my modified GRUB using VMWare and GDB. I've been able to get the debugger working before, but for some reason, every time I load up my VM and connect GDB, during the loading process of GRUB, I get:
.loadsym.gdb:1: Error in sourced command file:
The address where biosdisk.module has been loaded is missing
and I have no idea what to do about it. My first thought was, "Oh, I'll just add-symbol-file <file> and that'll fix it!" but apparently that tells GDB to forget every other symbol it loaded???? So I can't add the symbol-file and set a breakpoint.
My googling only returns one semi-relevant post that doesn't really go all that in-depth on fixing the issue.
This output may also be relevant.
info file biosdisk.module
Symbols from "H:\Workspace\GRUB\Bootloader\Trunk\grub-core\kernel.exec".
Remote serial target in gdb-specific protocol:
Debugging a target over a serial line.
While running this, GDB does not access memory from...
Local exec file:
`H:\Workspace\GRUB\Bootloader\Trunk\grub-core\kernel.exec', file type elf32-i386.
Entry point: 0x9000
0x00009000 - 0x0000e6e0 is .text
0x0000e6e0 - 0x0000f68d is .rodata
0x0000f6a0 - 0x0000fe74 is .data
0x0000fe80 - 0x000175d4 is .bss
Ended up being that my codebase wasn't the same. That is, on my Windows host, I had one copy of my code and on my Ubuntu VM was another.
Using version control solved this issue.

How to fix GDB not finding file: "../sysdeps/unix/sysv/linux/raise.c:50"

We're learning to use GDB in my Computer Architecture class. To do this we do most of our work by using SSH to connect to a raspberry pi. When running GDB on some code he gave us to debug though it ends with an error message on how it can't find raise.c
I've tried:
installing libc6, libc6-dbg (says they're already up-to-date)
apt-get source glibc (gives me: "You must put some 'source' URIs in your sources.list")
https://stackoverflow.com/a/48287761/12015458 (apt source returns same thing as the apt-get source above, the "find $PWD" command the user gave returns nothing)
I've tried looking for it manually where told it may be? (/lib/libc doesn't exist for me)
This is the code he gave us to try debugging on GDB:
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
However, whenever I run the code in GDB I get the following error:
Program received signal SIGFPE, Arithmetic exception.
__GI_raise (sig=8) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
I asked him about it and he didn't really have any ideas on how to fix it.
It does not really matter that the source for raise() is not found. It would only show you the line where the exception is finally raised, but not the place where the error is triggered.
Run the erroneous program again in GDB. And when the exception is raised, investigate the call stack and the stackframes with GBDs commands. This is the point in your task, so I won't give you more than this hint.
If you're clever you can see the error in the given source just by looking at it. ;-)
When GDB does not know any symbol, you need to compile with the option -g to get debugger support.
EDIT
Now on a Windows system this is my log (please excuse the colouring, I didn't found a language selector for pure text):
D:\tmp\StackOverflow\so_027 > type crash1.c
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
D:\tmp\StackOverflow\so_027 > gcc crash1.c -g -o crash1.out
crash1.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
main()
^~~~
D:\tmp\StackOverflow\so_027 > dir
[...cut...]
04.09.2019 08:33 144 crash1.c
04.09.2019 08:40 54.716 crash1.out
D:\tmp\StackOverflow\so_027 > gdb crash1.out
GNU gdb (GDB) 8.1
[...cut...]
This GDB was configured as "x86_64-w64-mingw32".
[...cut...]
Reading symbols from crash1.out...done.
(gdb) run
Starting program: D:\tmp\StackOverflow\so_027\crash1.out
[New Thread 4520.0x28b8]
[New Thread 4520.0x33f0]
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) backtrace
#0 0x0000000000401571 in main () at crash1.c:7
(gdb) help stack
Examining the stack.
The stack is made up of stack frames. Gdb assigns numbers to stack frames
counting from zero for the innermost (currently executing) frame.
At any time gdb identifies one frame as the "selected" frame.
Variable lookups are done with respect to the selected frame.
When the program being debugged stops, gdb selects the innermost frame.
The commands below can be used to select other frames by number or address.
List of commands:
backtrace -- Print backtrace of all stack frames
bt -- Print backtrace of all stack frames
down -- Select and print stack frame called by this one
frame -- Select and print a stack frame
return -- Make selected stack frame return to its caller
select-frame -- Select a stack frame without printing anything
up -- Select and print stack frame that called this one
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(gdb) next
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) next
[Inferior 1 (process 4520) exited with code 030000000224]
(gdb) next
The program is not being run.
(gdb) quit
D:\tmp\StackOverflow\so_027 >
Well, it marks directly the erroneous source line. That is different to your environment as you use a Raspi. However, it shows you some GDB commands to try.
Concerning your video:
It is clear that inside raise() you can't access x. That's why GDB moans about it.
If an exception is raised usually the program is about to quit. So there is no value in stepping forward.
Instead, as shown in my log, use GDB commands to investigate the stack frames. I think this is the issue you are about to learn.
BTW, do you know that you should be able to copy the screen content? This will make reading so much easier for us.
From a practical standpoint the other answer is correct, but if you do want the libc sources:
apt-get source is the right way to get the sources of libc, but yes, you do need to have source repositories configured in /etc/apt/sources.list.
If you're using Ubuntu, see the deb-src lines in https://help.ubuntu.com/community/Repositories/CommandLine
For debian, see https://wiki.debian.org/SourcesList#Example_sources.list
Then apt-get source should work. Remember to tell GDB where those sources are using the "directory" command.

Weird exception thrown when using simulavr with avr-gdb

I am debugging a program that I have written for the AVR architecture and compiled using avr-gcc with the -g argument.
I launch simulavr using the following command: simulavr --device atmega8 --gdbserver
Then I invoke avr-gdb and do (gdb) file main.elf as well as (gdb) target remote localhost:1212
Once debugging has started, I can successfully step through the assembly portion of my program .init et al. However, once jmp main is executed and a call to another function is made, simulavr throws the following exception: Assertion failed: (m_on_call_sp != 0x0000), function OnCall, file hwstack.cpp, line 266. Abort trap: 6
It has something to do with the pushing a frame to the stack, but I can't quite put my finger on how to fix it.
That stack value is very far from what it should be. At the start of your program, it should be near the end of RAM, not at the beginning.
It is likely to be some problem with simulavr not configuring RAM properly for your device. A quick look for the source code shows that the stack pointer is set to 0 if the simulator can't determine the correct value.
Did you include -mmcu=atmega8 in the command line when compiling? Try adding -V switch to the simulavr command for more clues.

Buffer overflow works in gdb but not without it

I am on CentOS 6.4 32 bit and am trying to cause a buffer overflow in a program. Within GDB it works. Here is the output:
[root#localhost bufferoverflow]# gdb stack
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/bufferoverflow/stack...done.
(gdb) r
Starting program: /root/bufferoverflow/stack
process 6003 is executing new program: /bin/bash
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.2.i686
sh-4.1#
However when I run the program stack just on its own it seg faults. Why might this be?
Exploit development can lead to serious headaches if you don't adequately account for factors that introduce non-determinism into the debugging process. In particular, the stack addresses in the debugger may not match the addresses during normal execution. This artifact occurs because the operating system loader places both environment variables and program arguments before the beginning of the stack:
Since your vulnerable program does not take any arguments, the environment variables are likely the culprit. Mare sure they are the same in both invocations, in the shell and in the debugger. To this end, you can wrap your invocation in env:
env - /path/to/stack
And with the debugger:
env - gdb /path/to/stack
($) show env
LINES=24
COLUMNS=80
In the above example, there are two environment variables set by gdb, which you can further disable:
unset env LINES
unset env COLUMNS
Now show env should return an empty list. At this point, you can start the debugging process to find the absolute stack address you envision to jump to (e.g., 0xbffffa8b), and hardcode it into your exploit.
One further subtle but important detail: there's a difference between calling ./stack and /path/to/stack: since argv[0] holds the program exactly how you invoked it, you need to ensure equal invocation strings. That's why I used /path/to/stack in the above examples and not just ./stack and gdb stack.
When learning to exploit with memory safety vulnerabilities, I recommend to use the wrapper program below, which does the heavy lifting and ensures equal stack offsets:
$ invoke stack # just call the executable
$ invoke -d stack # run the executable in GDB
Here is the script:
#!/bin/sh
while getopts "dte:h?" opt ; do
case "$opt" in
h|\?)
printf "usage: %s -e KEY=VALUE prog [args...]\n" $(basename $0)
exit 0
;;
t)
tty=1
gdb=1
;;
d)
gdb=1
;;
e)
env=$OPTARG
;;
esac
done
shift $(expr $OPTIND - 1)
prog=$(readlink -f $1)
shift
if [ -n "$gdb" ] ; then
if [ -n "$tty" ]; then
touch /tmp/gdb-debug-pty
exec env - $env TERM=screen PWD=$PWD gdb -tty /tmp/gdb-debug-pty --args $prog "$#"
else
exec env - $env TERM=screen PWD=$PWD gdb --args $prog "$#"
fi
else
exec env - $env TERM=screen PWD=$PWD $prog "$#"
fi
Here is a straightforward way of running your program with identical stacks in the terminal and in gdb:
First, make sure your program is compiled without stack protection,
gcc -m32 -fno-stack-protector -z execstack -o shelltest shelltest.c -g
and and ASLR is disabled:
echo 0 > /proc/sys/kernel/randomize_va_space
NOTE: default value on my machine was 2, note yours before changing this.
Then run your program like so (terminal and gdb respectively):
env -i PWD="/root/Documents/MSec" SHELL="/bin/bash" SHLVL=0 /root/Documents/MSec/shelltest
env -i PWD="/root/Documents/MSec" SHELL="/bin/bash" SHLVL=0 gdb /root/Documents/MSec/shelltest
Within gdb, make sure to unset LINES and COLUMNS.
Note: I got those environment variables by playing around with a test program.
Those two runs will give you identical pointers to the top of the stack, so no need for remote script shenanigans if you're trying to exploit a binary hosted remotely.
The address of stack frame pointer when running the code in gdb is different from running it normally. So you may corrupt the return address right in gdb mode, but it may not right when running in normal mode. The main reason for that is the environment variables differ among the two situation.
As this is just a demo, you can change the victim code, and print the address of the buffer. Then change your return address to offset+address of buffer.
In reality, however,you need to guess the return address add NOP sled before your malicious code. And you may guess multiple times to get a correct address, as your guess may be incorrect.
Hope this can help you.
The reason your buffer overflow works under gdb and segfaults otherwise is that gdb disables address space layout randomization. I believe this was turned on by default in gdb version 7.
You can check this by running this command:
show disable-randomization
And set it with
set disable-randomization on
or
set disable-randomization off
I have tried the solution accepted here and It doesn't work (for me). I knew that gdb added environment variables and for that reason the stack address doesn't match, but even removing that variables I can't work my exploit without gdb (I also tried the script posted in the accepted solution).
But searching in the web I found other script that work for me: https://github.com/hellman/fixenv/blob/master/r.sh
The use is basically the same that script in the accepted solution:
r.sh gdb ./program [args] to run the program in gdb
r.sh ./program [args] to run the program without gdb
And this script works for me.
I am on CentOS 6.4 32 bit and am trying to cause a buffer overflow in a program... However when I run the program stack just on its own it seg faults.
You should also ensure FORTIFY_SOURCE is not affecting your results. The seg fault sounds like FORTIFY_SOURCE could be the issue because FORTIFY_SOURCE will insert "safer" function calls to guard against some types of buffer overflows. If the compiler can deduce destination buffer sizes, then the size is checked and abort() is called on a violation (i.e., your seg fault).
To turn off FORTIFY_SOURCE for testing, you should compile with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.
One of the main things that gdb does that doesnt happen outside gdb is zero memory. More than likely somewhere in the code you are not initializing your memory and it is getting garbage values. Gdb automatically clears all memory that you allocate hiding those types of errors.
For example: the following should work in gdb, but not outside it:
int main(){
int **temp = (int**)malloc(2*sizeof(int*)); //temp[0] and temp[1] are NULL in gdb, but not outside
if (temp[0] != NULL){
*temp[0] = 1; //segfault outside of gdb
}
return 0;
}
Try running your program under valgrind to see if it can detect this issue.
I think the best way works out for me is to attach the process of the binary with gdb and using setarch -R <binary> to temporarily disable the ASLR protection only for the binary. This way the stack frame should be the same within and without gdb.

Core in libc.so.1

I am using Solaris 10 and my C program is getting crashed and creates a core file. On debugging, it seems like the core is created in libc.so.1. Please let me know if anyone have any clue.
Below is the dbx report.
dbx prock.new core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6' in your .dbxrc
Reading prock.new
core file header read successfully
Reading ld.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading libl.so.1
Reading libpthread.so.1
Reading librt.so.1
Reading libthread.so.1
Reading libc.so.1
Reading libaio.so.1
Reading libmd.so.1
Reading libc_psr.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
t#null (l#1) terminated by signal SEGV (no mapping at the fault address)
0xffffffff7ea3bc14: strcasecmp+0x0134: orn %i0, %i3, %i0
(dbx) where
=>[1] strcasecmp(0x10014b68e, 0x57, 0x7ffffc00, 0x1001332d7, 0x27, 0x24), at 0xffffffff7ea3bc14
[2] 0x10000af48(0x27, 0x10014b68e, 0x57, 0x10014b68e, 0x57, 0x0), at 0x10000af48
[3] 0x100009c08(0x27, 0x5e, 0x0, 0x9, 0x1001332c3, 0x2b), at 0x100009c08
(dbx) whereis strcasecmp
function: `libc.so.1`strcasecmp
(dbx)
My solaris version is
Solaris 10 8/07 s10s_u4wos_12b SPARC
Copyright 2007 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007
No, the problem is not with the C standard library. You're passing an invalid parameter (NULL string pointer, etc.) to strcasecmp(). Without actual code (which you haven't posted), it's not possible to deduce what exactly the error is.
(Also, you better compile your program with debug symbols - with optimization turned off! If you're on Solaris, you most probably use GCC:
gcc -O0 -g etc...
)
1) Compile your program to include debug information (add "-g" to the list of options to your compiler), so that you actually get information instead of this:
[2] 0x10000af48(0x27, 0x10014b68e, 0x57, 0x10014b68e, 0x57, 0x0), at 0x10000af48
[3] 0x100009c08(0x27, 0x5e, 0x0, 0x9, 0x1001332c3, 0x2b), at 0x100009c08
2) DBX will now tell you which of your functions has been calling strcasecmp. Step through the source (or have it generate log output), check the parameters of the fatal function call for anything out of the ordinary (like invalid pointers).
The chances of you discovering a bug in a libc function are infinitesimal compared to the chances that your call to that function was in error.
1) Run bt (backtrace) to see who is calling strcasecmp [ this will list frames like #0, #1 ]
2) Now jump in to the specific frame to get the values [ frame 0 ]
3) Then display / print the value of the argument passed to strcasecmp ( using print or display)
I feel the argument is NULL on calling strcasecmp and that's why you are getting segfault.

Resources