segfault on jump from PLT - c

I am trying to find the cause of a segfault, and narrowed it to the PLT using gdb's btrace. The segfault occurs during a jump from the PLT to the GOT, which I interpret to signify that the PLT became corrupted during execution. Based on the analysis presented below, is this interpretation correct? What are likely culprits for corruption of the PLT? Stack overflow? I believe that installing a watchpoint on the GOT address could be helpful in this instance. Would watch -l 0x55555562f048 be the correct approach? Other ideas for debugging are welcome.
For context, the segfault occurs during a call to strlen in function foo:
int foo(char * path, ...) {
...
if (strlen(path) >= PATH_MAX) {
The corresponding lines of assembly are:
0x58114 <foo+212> cmpq $0x0,-0x4c8(%rbp)
0x5811c <foo+220> jne 0x5812a <foo+234>
0x5811e <foo+222> lea 0xb96fb(%rip),%rdi # 0x111820
0x58125 <foo+229> callq 0x376c0 <__ubsan_handle_nonnull_arg#plt>
0x5812a <foo+234> mov -0x4c8(%rbp),%rax
0x58131 <foo+241> mov %rax,%rdi
0x58134 <foo+244> callq 0x37090 <strlen#plt>
First, path is compared to NULL (cmpq $0x0,-0x4c8(%rbp)), which I believe is solely added for ubsan instrumentation in this case. That branch was not followed, and the program jumped to <foo+234>, where it set up for the strlen call, by moving path to rax and then rdi, and finally calling strlen#plt. Running record btrace in gdb just before this produced this instruction history:
(gdb) record btrace
(gdb) c
136 0x00005555555ac114 <foo+212>: cmpq $0x0,-0x4c8(%rbp)
137 0x00005555555ac11c <foo+220>: jne 0x5555555ac12a <foo+234>
138 0x00005555555ac12a <foo+234>: mov -0x4c8(%rbp),%rax
139 0x00005555555ac131 <foo+241>: mov %rax,%rdi
140 0x00005555555ac134 <foo+244>: callq 0x55555558b090 <strlen#plt>
141 0x000055555558b090 <strlen#plt+0>: jmpq *0xa3fb2(%rip) # 0x55555562f048 <strlen#got.plt>
Here, we see that path was not null, and so the program jumped to <foo+234>, and set up for the strlen call (mov, mov, callq <strlen#plt>). The final instruction executed was jmpq *0xa3fb2(%rip) to the entry for strlen in the GOT (strlen#got.plt), whereupon the program crashed and gdb lost context (reporting that it Cannot find bounds of current function).

Related

Why does compiler decide to put memory variables to '%fs' registor? [duplicate]

I've written a piece of C code and I've disassembled it as well as read the registers to understand how the program works in assembly.
int test(char *this){
char sum_buf[6];
strncpy(sum_buf,this,32);
return 0;
}
The piece of my code that I've been examining is the test function. When I disassemble the output my test function I get ...
0x00000000004005c0 <+12>: mov %fs:0x28,%rax
=> 0x00000000004005c9 <+21>: mov %rax,-0x8(%rbp)
... stuff ..
0x00000000004005f0 <+60>: xor %fs:0x28,%rdx
0x00000000004005f9 <+69>: je 0x400600 <test+76>
0x00000000004005fb <+71>: callq 0x4004a0 <__stack_chk_fail#plt>
0x0000000000400600 <+76>: leaveq
0x0000000000400601 <+77>: retq
What I would like to know is what mov %fs:0x28,%rax is really doing?
Both the FS and GS registers can be used as base-pointer addresses in order to access special operating system data-structures. So what you're seeing is a value loaded at an offset from the value held in the FS register, and not bit manipulation of the contents of the FS register.
Specifically what's taking place, is that FS:0x28 on Linux is storing a special sentinel stack-guard value, and the code is performing a stack-guard check. For instance, if you look further in your code, you'll see that the value at FS:0x28 is stored on the stack, and then the contents of the stack are recalled and an XOR is performed with the original value at FS:0x28. If the two values are equal, which means that the zero-bit has been set because XOR'ing two of the same values results in a zero-value, then we jump to the test routine, otherwise we jump to a special function that indicates that the stack was somehow corrupted, and the sentinel value stored on the stack was changed.
If using GCC, this can be disabled with:
-fno-stack-protector
glibc:
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
the _dl_random from kernel.
Looking at http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm, I think %fs:28 is actually an offset of 28 bytes from the address in %fs. So I think it's loading a full register size from location %fs + 28 into %rax.

How to use gdb stacktrace with run time generated machine code?

I've inherited some clever x64 machine code for GNU/Linux that creates a machine code wrapper around a c-function call. I guess that in higher language terms the code might be called a decorator or a closure. The code is functioning well, but with the unfortunate artifact that when the wrapper is called, it gobbles the stack trace in gdb.
From what I have learned from the net gdb uses https://en.wikipedia.org/wiki/DWARF as a guide for separating the stack frames in the stack. This works well for static code, but obviously code generated and called at run time isn't registered in the DWARF framework.
My question is if there is any way to rescue the stack trace in this situation?
Here is some similar c-code that shows the problem.
typedef int (*ftype)(int x);
int wuz(int x) { return x + 7; }
int wbar(int x) { return wuz(x)+5; }
int main(int argc, char **argv)
{
const unsigned char wbarcode[] = {
0x55 , // push %rbp
0x48,0x89,0xe5 , // mov %rsp,%rbp
0x48,0x83,0xec,0x08 , // sub $0x8,%rsp
0x89,0x7d,0xfc , // mov %edi,-0x4(%rbp)
0x8b,0x45,0xfc , // mov -0x4(%rbp),%eax
0x89,0xc7 , // mov %eax,%edi
0x48,0xc7,0xc0,0xf6,0x04,0x40,00, // mov $0x4004f6,%rax
0xff,0xd0, // callq *%rax
0x83,0xc0,0x05 , // add $0x5,%eax
0xc9 , // leaveq
0xc3 // retq
};
int wb = wbar(5);
ftype wf = (ftype)wbarcode;
int fwb = wf(5);
}
Compile it by:
gcc -g -o mcode mcode.c
execstack -s mcode
and run it in gdb by:
$ gdb mcode
(gdb) break wuz
If we disassemble wbar we get something very similar to the byte sequence in wbarcode[]. The only difference is that I changed the calling convention for calling wuz().
(gdb) disas/r wbar
Dump of assembler code for function wbar:
0x0000000000400505 <+0>: 55 push %rbp
0x0000000000400506 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000400509 <+4>: 48 83 ec 08 sub $0x8,%rsp
0x000000000040050d <+8>: 89 7d fc mov %edi,-0x4(%rbp)
0x0000000000400510 <+11>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400513 <+14>: 89 c7 mov %eax,%edi
0x0000000000400515 <+16>: e8 dc ff ff ff callq 0x4004f6 <wuz>
0x000000000040051a <+21>: 83 c0 05 add $0x5,%eax
0x000000000040051d <+24>: c9 leaveq
0x000000000040051e <+25>: c3 retq
End of assembler dump.
If we now run the program it will stop twice in wuz(). The first time
through our c-call and we can ask for a stack trace through bt.
Breakpoint 3, wuz (x=5) at mcode.c:2
=> 0x00000000004004fd <wuz+7>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400500 <wuz+10>: 83 c0 07 add $0x7,%eax
0x0000000000400503 <wuz+13>: 5d pop %rbp
0x0000000000400504 <wuz+14>: c3 retq
(gdb) bt
#0 wuz (x=5) at mcode.c:2
#1 0x000000000040051a in wbar (x=5) at mcode.c:3
#2 0x00000000004005b0 in main (argc=1, argv=0x7fffffffe528) at mcode.c:20
This is a normal stack trace showing that we got from main() → wbar() → wuz().
But if we now continue we reach wuz() a second time, and we again
request a stack trace:
(gdb) c
Continuing.
Breakpoint 3, wuz (x=5) at mcode.c:2
=> 0x00000000004004fd <wuz+7>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400500 <wuz+10>: 83 c0 07 add $0x7,%eax
0x0000000000400503 <wuz+13>: 5d pop %rbp
0x0000000000400504 <wuz+14>: c3 retq
(gdb) bt
#0 wuz (x=5) at mcode.c:2
#1 0x00007fffffffe419 in ?? ()
#2 0x0000000500000001 in ?? ()
#3 0x00007fffffffe440 in ?? ()
#4 0x00000000004005c6 in main (argc=0, argv=0xffffffff) at mcode.c:22
Backtrace stopped: frame did not save the PC
Even though we have done the same two hierarchical calls, we get a
stack trace that contains the wrong frames. In my original inherited
wrapper code the situation was even worse, as the the stack trace
ended after 5 frames with the top level having address 0.
So the question is again, is there any extra code that can be added to
wbarcode[] that will cause gdb to output a valid stacktrace? Or is
there any other run time technique that may be used to make gdb
recognize the stack frames?
On some architectures, you can just make the frame have the layout that is expected by gdb's default unwinder for that port. However, this isn't available on all architectures. Reading the x86-64 port (see gdb/amd64-tdep.c, in particular the function amd64_frame_cache_1), I think here gdb wants to know the function bounds, so it can try to analyze the prologue. But, the function bounds come from the (ELF) symbol table, so you're out of luck there.
There's still a way, though. Due to the recent (in gdb terms) rise of JIT compilers, gdb provides three other ways to deal with this problem.
One way is that your program can emit a special ELF object (really any object format that gdb understands, IIRC) in memory, and call a runtime hook to inform gdb of its existence. gdb will read this object, including any debug information it contains. This approach is rather heavy, but gives access to most of gdb's capabilities -- you can specify not just the unwinding but also types, local variables, etc.
A second way is somewhat similar. Your program still calls a special hook. However, you also provide a plugin that is loaded by gdb. This plugin can read symbols and other information from the inferior, but in this case the symbols and unwind information don't have to be in any particular format.
The final way (new in gdb 7.10) is that you can write an unwinder in Python. When working on my JIT unwinder, I chose this approach because it is simple to debug, simple to deploy, reasonably flexible, and does not require any particular changes in the inferior.
These methods are all documented in the gdb manual. In some cases, though, I think the documentation leaves a bit to be desired. You may have to find some example code or dig into the gdb sources to really understand how it's supposed to work.

Strange segmentation fault in x86 assembly

While I'm debugging a segmentation fault in x86-Linux, I've ran into this problem:
Here goes the seg-fault message from the GDB
0xe2a5a99f in my_function (pSt=pSt#entry=0xe1d09000, version=43)
Here goes the faulting assembly:
0xe2a5a994 <my_function> push %ebp
0xe2a5a995 <my_function+1> push %edi
0xe2a5a996 <my_function+2> push %esi
0xe2a5a997 <my_function+3> push %ebx
0xe2a5a998 <my_function+4> lea -0x100b0c(%esp),%esp
0xe2a5a99f <my_function+11> call 0xe29966cb <__x86.get_pc_thunk.bx>
0xe2a5a9a4 <my_function+16> add $0x9542c,%ebx
As you can see above, the faulting line is "call get_pc_thunk" which is just getting the pc value.
And, I checked the memory at 0xe29966cb is valid and accessible with the following command:
(gdb) x/10i 0xe29966cb
0xe29966cb <__x86.get_pc_thunk.bx>: nop
0xe29966cc <__x86.get_pc_thunk.bx+1>: nop
0xe29966cd <__x86.get_pc_thunk.bx+2>: nop
0xe29966ce <__x86.get_pc_thunk.bx+3>: nop
0xe29966cf <__x86.get_pc_thunk.bx+4>: nop
0xe29966d0 <__x86.get_pc_thunk.bx+5>: nop
0xe29966d1 <__x86.get_pc_thunk.bx+6>: nop
0xe29966d2 <__x86.get_pc_thunk.bx+7>: nop
0xe29966d3 <__x86.get_pc_thunk.bx+8>: mov (%esp),%ebx
0xe29966d6 <__x86.get_pc_thunk.bx+11>: ret
Which looks perfectly fine.
But Strangely, if I use "si" to step into the "get_pc_thunk" function, it seg-faults without even entering the first nop.
Any help would be appreciated.
A crash on CALL (or PUSH, of MOV (%esp)) instruction almost always is due to stack overflow.
Check your program for infinite (or just very deep) recursion.
Also, this:
0xe2a5a998 <my_function+4> lea -0x100b0c(%esp),%esp
means that my_function is allocating slightly over 1MB of local variables for the current stack frame. And now you know why that may not be a good idea.

EIP value incorrect during buffer overflow

I am working on ubuntu 12.04 and 64 bit machine. I was reading a good book on buffer overflows and while playing with one example found one strange moment.
I have this really simple C code:
void getInput (void){
char array[8];
gets (array);
printf("%s\n", array);
}
main() {
getInput();
return 0;
}
in the file overflow.c
I compile it with 32 bit flag cause all example in the book assumed 32 bit machine, I do it like this
gcc -fno-stack-protector -g -m32 -o ./overflow ./overflow.c
In the code char array was only 8 bytes but looking at disassembly I found that that array starts 16 bytes away from saved EBP on the stack, so I executed this line:
printf "aaaaaaaaaaaaaaaaaaaa\x10\x10\x10\x20" | ./overflow
And got:
aaaaaaaaaaaaaaaaaaaa
Segmentation fault (core dumped)
Then I opened core file:
gdb ./overflow core
#0 0x20101010 in ?? ()
(gdb) info registers
eax 0x19 25
ecx 0xffffffff -1
edx 0xf77118b8 -143583048
ebx 0xf770fff4 -143589388
esp 0xffef6370 0xffef6370
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0x20101010 0x20101010
As you see EIP in fact got new value, which I wanted. BUT when I want to put some useful values like this 0x08048410
printf "aaaaaaaaaaaaaaaaaaaa\x10\x84\x04\x08" | ./overflow
Program crashes as usual but than something strange happens when I'm trying to observe the value in EIP register:
#0 0xf765be1f in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) info registers
eax 0x61616151 1633771857
ecx 0xf77828c4 -143120188
edx 0x1 1
ebx 0xf7780ff4 -143126540
esp 0xff92dffc 0xff92dffc
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0xf765be1f 0xf765be1f
Suddenly EIP start to look like this 0xf765be1f, it doesn't look like 0x08048410. In fact I noticed that it's enough to put any hexadecimal value starting from 0 to get this crumbled EIP value. Do you know why this might happen? Is it because I'm on 64 bit machine?
UPD
Well guys in comments asked for more information, here is the disassembly of getInput function:
(gdb) disas getInput
Dump of assembler code for function getInput:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: sub $0x28,%esp
0x0804840a <+6>: lea -0x10(%ebp),%eax
0x0804840d <+9>: mov %eax,(%esp)
0x08048410 <+12>: call 0x8048310 <gets#plt>
0x08048415 <+17>: lea -0x10(%ebp),%eax
0x08048418 <+20>: mov %eax,(%esp)
0x0804841b <+23>: call 0x8048320 <puts#plt>
0x08048420 <+28>: leave
0x08048421 <+29>: ret
Perhaps code at 0x08048410 was executed, and jumped to the area of 0xf765be1f.
What's in this address? I guess it's a function (libC?), so you can examine its assembly code and see what it would do.
Also note that in the successful run, you managed to overrun EBP, not EIP. EBP contains 0x61616161, which is aaaa, and EIP contains 0x20101010, which is \n\n\n. It seems like the corrupt EBP indirectly got EIP corrupt.
Try to make the overrun 4 bytes longer, and then it should overrun the return address too.
This is probably due to the fact that modern OS (Linux does at least, I don't know about Windows) and modern libc have mechanisms that do not allow code found in stack to be executed.
Buffer overflow is invoking undefined behavior, therefore anything can happen. Theorizing what might happen is futile.

Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value?

I've written a piece of C code and I've disassembled it as well as read the registers to understand how the program works in assembly.
int test(char *this){
char sum_buf[6];
strncpy(sum_buf,this,32);
return 0;
}
The piece of my code that I've been examining is the test function. When I disassemble the output my test function I get ...
0x00000000004005c0 <+12>: mov %fs:0x28,%rax
=> 0x00000000004005c9 <+21>: mov %rax,-0x8(%rbp)
... stuff ..
0x00000000004005f0 <+60>: xor %fs:0x28,%rdx
0x00000000004005f9 <+69>: je 0x400600 <test+76>
0x00000000004005fb <+71>: callq 0x4004a0 <__stack_chk_fail#plt>
0x0000000000400600 <+76>: leaveq
0x0000000000400601 <+77>: retq
What I would like to know is what mov %fs:0x28,%rax is really doing?
Both the FS and GS registers can be used as base-pointer addresses in order to access special operating system data-structures. So what you're seeing is a value loaded at an offset from the value held in the FS register, and not bit manipulation of the contents of the FS register.
Specifically what's taking place, is that FS:0x28 on Linux is storing a special sentinel stack-guard value, and the code is performing a stack-guard check. For instance, if you look further in your code, you'll see that the value at FS:0x28 is stored on the stack, and then the contents of the stack are recalled and an XOR is performed with the original value at FS:0x28. If the two values are equal, which means that the zero-bit has been set because XOR'ing two of the same values results in a zero-value, then we jump to the test routine, otherwise we jump to a special function that indicates that the stack was somehow corrupted, and the sentinel value stored on the stack was changed.
If using GCC, this can be disabled with:
-fno-stack-protector
glibc:
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
the _dl_random from kernel.
Looking at http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm, I think %fs:28 is actually an offset of 28 bytes from the address in %fs. So I think it's loading a full register size from location %fs + 28 into %rax.

Resources