How to use gdb stacktrace with run time generated machine code? - c

I've inherited some clever x64 machine code for GNU/Linux that creates a machine code wrapper around a c-function call. I guess that in higher language terms the code might be called a decorator or a closure. The code is functioning well, but with the unfortunate artifact that when the wrapper is called, it gobbles the stack trace in gdb.
From what I have learned from the net gdb uses https://en.wikipedia.org/wiki/DWARF as a guide for separating the stack frames in the stack. This works well for static code, but obviously code generated and called at run time isn't registered in the DWARF framework.
My question is if there is any way to rescue the stack trace in this situation?
Here is some similar c-code that shows the problem.
typedef int (*ftype)(int x);
int wuz(int x) { return x + 7; }
int wbar(int x) { return wuz(x)+5; }
int main(int argc, char **argv)
{
const unsigned char wbarcode[] = {
0x55 , // push %rbp
0x48,0x89,0xe5 , // mov %rsp,%rbp
0x48,0x83,0xec,0x08 , // sub $0x8,%rsp
0x89,0x7d,0xfc , // mov %edi,-0x4(%rbp)
0x8b,0x45,0xfc , // mov -0x4(%rbp),%eax
0x89,0xc7 , // mov %eax,%edi
0x48,0xc7,0xc0,0xf6,0x04,0x40,00, // mov $0x4004f6,%rax
0xff,0xd0, // callq *%rax
0x83,0xc0,0x05 , // add $0x5,%eax
0xc9 , // leaveq
0xc3 // retq
};
int wb = wbar(5);
ftype wf = (ftype)wbarcode;
int fwb = wf(5);
}
Compile it by:
gcc -g -o mcode mcode.c
execstack -s mcode
and run it in gdb by:
$ gdb mcode
(gdb) break wuz
If we disassemble wbar we get something very similar to the byte sequence in wbarcode[]. The only difference is that I changed the calling convention for calling wuz().
(gdb) disas/r wbar
Dump of assembler code for function wbar:
0x0000000000400505 <+0>: 55 push %rbp
0x0000000000400506 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000400509 <+4>: 48 83 ec 08 sub $0x8,%rsp
0x000000000040050d <+8>: 89 7d fc mov %edi,-0x4(%rbp)
0x0000000000400510 <+11>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400513 <+14>: 89 c7 mov %eax,%edi
0x0000000000400515 <+16>: e8 dc ff ff ff callq 0x4004f6 <wuz>
0x000000000040051a <+21>: 83 c0 05 add $0x5,%eax
0x000000000040051d <+24>: c9 leaveq
0x000000000040051e <+25>: c3 retq
End of assembler dump.
If we now run the program it will stop twice in wuz(). The first time
through our c-call and we can ask for a stack trace through bt.
Breakpoint 3, wuz (x=5) at mcode.c:2
=> 0x00000000004004fd <wuz+7>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400500 <wuz+10>: 83 c0 07 add $0x7,%eax
0x0000000000400503 <wuz+13>: 5d pop %rbp
0x0000000000400504 <wuz+14>: c3 retq
(gdb) bt
#0 wuz (x=5) at mcode.c:2
#1 0x000000000040051a in wbar (x=5) at mcode.c:3
#2 0x00000000004005b0 in main (argc=1, argv=0x7fffffffe528) at mcode.c:20
This is a normal stack trace showing that we got from main() → wbar() → wuz().
But if we now continue we reach wuz() a second time, and we again
request a stack trace:
(gdb) c
Continuing.
Breakpoint 3, wuz (x=5) at mcode.c:2
=> 0x00000000004004fd <wuz+7>: 8b 45 fc mov -0x4(%rbp),%eax
0x0000000000400500 <wuz+10>: 83 c0 07 add $0x7,%eax
0x0000000000400503 <wuz+13>: 5d pop %rbp
0x0000000000400504 <wuz+14>: c3 retq
(gdb) bt
#0 wuz (x=5) at mcode.c:2
#1 0x00007fffffffe419 in ?? ()
#2 0x0000000500000001 in ?? ()
#3 0x00007fffffffe440 in ?? ()
#4 0x00000000004005c6 in main (argc=0, argv=0xffffffff) at mcode.c:22
Backtrace stopped: frame did not save the PC
Even though we have done the same two hierarchical calls, we get a
stack trace that contains the wrong frames. In my original inherited
wrapper code the situation was even worse, as the the stack trace
ended after 5 frames with the top level having address 0.
So the question is again, is there any extra code that can be added to
wbarcode[] that will cause gdb to output a valid stacktrace? Or is
there any other run time technique that may be used to make gdb
recognize the stack frames?

On some architectures, you can just make the frame have the layout that is expected by gdb's default unwinder for that port. However, this isn't available on all architectures. Reading the x86-64 port (see gdb/amd64-tdep.c, in particular the function amd64_frame_cache_1), I think here gdb wants to know the function bounds, so it can try to analyze the prologue. But, the function bounds come from the (ELF) symbol table, so you're out of luck there.
There's still a way, though. Due to the recent (in gdb terms) rise of JIT compilers, gdb provides three other ways to deal with this problem.
One way is that your program can emit a special ELF object (really any object format that gdb understands, IIRC) in memory, and call a runtime hook to inform gdb of its existence. gdb will read this object, including any debug information it contains. This approach is rather heavy, but gives access to most of gdb's capabilities -- you can specify not just the unwinding but also types, local variables, etc.
A second way is somewhat similar. Your program still calls a special hook. However, you also provide a plugin that is loaded by gdb. This plugin can read symbols and other information from the inferior, but in this case the symbols and unwind information don't have to be in any particular format.
The final way (new in gdb 7.10) is that you can write an unwinder in Python. When working on my JIT unwinder, I chose this approach because it is simple to debug, simple to deploy, reasonably flexible, and does not require any particular changes in the inferior.
These methods are all documented in the gdb manual. In some cases, though, I think the documentation leaves a bit to be desired. You may have to find some example code or dig into the gdb sources to really understand how it's supposed to work.

Related

Change on rax register during debug session with gdb does not affect code execution

I'm planning to participate to some of the Capture the flags (CTF) challenges, in the near future. For that reason, I've decided to study assembly. As of now I'm focusing on the usage of the CPU registers. Following some examples that I have found on internet, I tried to debug a very simple "Hello World" program written in C, to see how the CPU registers are used. My environment is Linux and GCC version 11. I compiled my code with the -g flag, in order to include debug symbols.
Following is my very simple C source code:
#include <iostream>
int main (int argc, char** argv)
{
char message_c_str[] = "Hello World from C!";
printf("%s\n", message_c_str);
return 0;
}
Studying the disassembly of the main function, I understand that the string containing the message gets stored inside the RAX (and RDX registers?), before calling the printf function:
└─$ objdump -M intel -D main| grep -A20 main.:
0000000000001159 <main>:
1159: 55 push rbp
115a: 48 89 e5 mov rbp,rsp
115d: 48 83 ec 30 sub rsp,0x30
1161: 89 7d dc mov DWORD PTR [rbp-0x24],edi
1164: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
1168: 48 b8 48 65 6c 6c 6f movabs rax,0x6f57206f6c6c6548
116f: 20 57 6f
1172: 48 ba 72 6c 64 20 66 movabs rdx,0x6d6f726620646c72
1179: 72 6f 6d
117c: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
1180: 48 89 55 e8 mov QWORD PTR [rbp-0x18],rdx
1184: c7 45 f0 20 43 21 00 mov DWORD PTR [rbp-0x10],0x214320
118b: 48 8d 45 e0 lea rax,[rbp-0x20]
118f: 48 89 c7 mov rdi,rax
1192: e8 b9 fe ff ff call 1050 <puts#plt>
1197: b8 00 00 00 00 mov eax,0x0
119c: c9 leave
119d: c3 ret
I thought to start a debug session and try to change the RAX on the fly, just for the sake of seeing if I was able to change the string content before printing it on the command line. Unfortunately, even though it seems that I can change the RAX value, the program still prints the hard coded message. So, I'm not sure why I cannot change it. Am I missing to run any gdb command after updating the value of RAX?
Following is my debug session with the issue:
┌──(alexis㉿kali)-[~/Desktop/Hacking/hello_world]
└─$ gdb -q main
Reading symbols from main...
(gdb) break main
Breakpoint 1 at 0x1168: file /home/alexis/Desktop/Hacking/hello_world/main.cpp, line 5.
(gdb) run
Starting program: /home/alexis/Desktop/Hacking/hello_world/main
Breakpoint 1, main (argc=1, argv=0x7fffffffdf58) at
/home/alexis/Desktop/Hacking/hello_world/main.cpp:5
5 char message_c_str[] = "Hello World from C!";
(gdb) info register rax
rax 0x555555555159 93824992235865
(gdb) next
6 printf("%s\n", message_c_str);
(gdb) info register rax
rax 0x6f57206f6c6c6548 8022916924116329800
(gdb) set $rax=0x6361636361
(gdb) info register rax
rax 0x6361636361 426835665761
(gdb) next
Hello World from C!
8 return 0;
(gdb)
You can see that the code still prints "Hello World from C!", even if the RAX register changed. Why?
The string is only temporarily in rax+rdx. In the following lines it is placed on the stack and the address goes to rdi, that is used by puts.
What's important here is to understand that one line of source code is translated to multiple lines of assembly. When you change the rax on line printf("%s\n", message_c_str); the string is already pushed on the stack and rax only keeps an old value as it wasn't overwritten by anything. It is no longer the string that's being printed.
To accomplish your goal you would have to change the string on the stack or change it in rax before it's being pushed onto it (so before your next command).
Also be aware that next advances one source code line. If you want to move one assembly instruction use nexti - with that you have more control about what gets executed.
You're using next (whole block of asm corresponding to a C source line), not nexti or stepi (aka ni or si) to step by asm instruction.
And you made a debug build so GCC doesn't keep anything in registers across C statements. The points where execution stops with next are the ones where the compiler-generated instructions are about to load or LEA a new RAX, so its current value is dead and doesn't matter.
(And it's only using RAX at all because it's a debug build with GCC; otherwise things like lea rax,[rbp-0x20] / mov rdi,rax would LEA straight into RDI, instead of uselessly using RAX as a temporary. Return value from writing an unused parameter when falling off the end of a non-void function Or for mov-immediate to memory, there's no mov r/m64, imm64, only to register, so those moves to RAX and RDX do make sense.)
If you wanted to have it print something different, you could si until after movabs rax,0x6f57206f6c6c6548 but before mov QWORD PTR [rbp-0x20],rax, and at that point change the initializer for part of the string data. (Which is in RAX at that point.) e.g. introducing a 0x00 byte will terminate the C string.
Or right before the call puts, you could set $rdi = $rdi+5 to be like puts(message_c_str + 5).
layout reg or layout asm (use layout next / prev to fix the display if its broken) are helpful for seeing where execution is. See other GDB asm tips at the bottom of https://stackoverflow.com/tags/x86/info

Inserting gdb breakpoints fail

I'm learning about buffer overflow in c.
For that purpose, I'm following this simple example.
I have the following gcc version:
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And this simple c file:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char buf[256];
strcpy(buf, argv[1]);
printf("%s,", buf);
return 0;
}
I then compile this file with $ gcc buf.c -o buf.
I then open in gdb by $ gdb ./buf
I call disas and get the result assembly:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: sub $0x120,%rsp
0x0000000000001198 <+15>: mov %edi,-0x114(%rbp)
0x000000000000119e <+21>: mov %rsi,-0x120(%rbp)
0x00000000000011a5 <+28>: mov %fs:0x28,%rax
0x00000000000011ae <+37>: mov %rax,-0x8(%rbp)
0x00000000000011b2 <+41>: xor %eax,%eax
0x00000000000011b4 <+43>: mov -0x120(%rbp),%rax
0x00000000000011bb <+50>: add $0x8,%rax
0x00000000000011bf <+54>: mov (%rax),%rdx
0x00000000000011c2 <+57>: lea -0x110(%rbp),%rax
0x00000000000011c9 <+64>: mov %rdx,%rsi
0x00000000000011cc <+67>: mov %rax,%rdi
0x00000000000011cf <+70>: callq 0x1070 <strcpy#plt>
--Type <RET> for more, q to quit, c to continue without paging--
0x00000000000011d4 <+75>: lea -0x110(%rbp),%rax
0x00000000000011db <+82>: mov %rax,%rsi
0x00000000000011de <+85>: lea 0xe1f(%rip),%rdi # 0x2004
0x00000000000011e5 <+92>: mov $0x0,%eax
0x00000000000011ea <+97>: callq 0x1090 <printf#plt>
0x00000000000011ef <+102>: mov $0x0,%eax
0x00000000000011f4 <+107>: mov -0x8(%rbp),%rcx
0x00000000000011f8 <+111>: xor %fs:0x28,%rcx
0x0000000000001201 <+120>: je 0x1208 <main+127>
0x0000000000001203 <+122>: callq 0x1080 <__stack_chk_fail#plt>
0x0000000000001208 <+127>: leaveq
0x0000000000001209 <+128>: retq
End of assembler dump.
With some really low memory adresses.
I then want to see what happens if I input a big string of A's into the program, I therefore place a breakpoint at 0x00000000000011db
I then run it:
(gdb) run $(python3 -c "print('A'*256)"
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)"
/bin/bash: -c: line 0: unexpected EOF while looking for matching `)'
/bin/bash: -c: line 1: syntax error: unexpected end of file
During startup program exited with code 1.
(gdb) run $(python3 -c "print('A'*256)")
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)")
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x11db
Ok, so something with the memory adresses is a bit funky.
I google the issue and find this post where I find that this is because Position-Independent Executable (PIE) was probably enabled, and the memory adresses would be changed when the program is actually run.
I can confirm this by running disas after running the program, and seeing that the memory adresses are in a lot higher ranges.
This all makes sense, but it makes me wonder, iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
This happens because GDB by default disables address randomization (to make debugging easier).
If you re-enable ASLR with (gdb) set disable-randomization off, then you wouldn't be able to set the breakpoint on an address.
You would still be able to set breakpoint on e.g. main -- in that case GDB will wait until the executable has been relocated, and will set the breakpoint on the actual runtime instruction (the address will change on every run).

Why segmentation fault doesn't occur with smaller stack boundary?

I'm trying to understand the difference of behavior between a code compiled with the GCC option -mpreferred-stack-boundary=2 and the default value which is -mpreferred-stack-boundary=4.
I already read a lot of Q/A about this option but I am not able to understand the case I'll described below.
Let's consider this code:
#include <stdio.h>
#include <string.h>
void dumb_function() {}
int main(int argc, char** argv) {
dumb_function();
char buffer[24];
strcpy(buffer, argv[1]);
return 0;
}
On my 64 bits architecture, I want to compile it for 32 bits so I'll use the -m32 option. So, I create two binaries, one with -mpreferred-stack-boundary=2, one with the default value:
sysctl -w kernel.randomize_va_space=0
gcc -m32 -g3 -fno-stack-protector -z execstack -o default vuln.c
gcc -mpreferred-stack-boundary=2 -m32 -g3 -fno-stack-protector -z execstack -o align_2 vuln.c
Now, if I execute them with an overflow of two bytes, I have segmentation fault for the default alignment but not in the other case:
$ ./default 1234567890123456789012345
Segmentation fault (core dumped)
$ ./align_2 1234567890123456789012345
$
I try to dig why this behavior with default. Here is the disassembly of the main function:
08048411 <main>:
8048411: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048415: 83 e4 f0 and $0xfffffff0,%esp
8048418: ff 71 fc pushl -0x4(%ecx)
804841b: 55 push %ebp
804841c: 89 e5 mov %esp,%ebp
804841e: 53 push %ebx
804841f: 51 push %ecx
8048420: 83 ec 20 sub $0x20,%esp
8048423: 89 cb mov %ecx,%ebx
8048425: e8 e1 ff ff ff call 804840b <dumb_function>
804842a: 8b 43 04 mov 0x4(%ebx),%eax
804842d: 83 c0 04 add $0x4,%eax
8048430: 8b 00 mov (%eax),%eax
8048432: 83 ec 08 sub $0x8,%esp
8048435: 50 push %eax
8048436: 8d 45 e0 lea -0x20(%ebp),%eax
8048439: 50 push %eax
804843a: e8 a1 fe ff ff call 80482e0 <strcpy#plt>
804843f: 83 c4 10 add $0x10,%esp
8048442: b8 00 00 00 00 mov $0x0,%eax
8048447: 8d 65 f8 lea -0x8(%ebp),%esp
804844a: 59 pop %ecx
804844b: 5b pop %ebx
804844c: 5d pop %ebp
804844d: 8d 61 fc lea -0x4(%ecx),%esp
8048450: c3 ret
8048451: 66 90 xchg %ax,%ax
8048453: 66 90 xchg %ax,%ax
8048455: 66 90 xchg %ax,%ax
8048457: 66 90 xchg %ax,%ax
8048459: 66 90 xchg %ax,%ax
804845b: 66 90 xchg %ax,%ax
804845d: 66 90 xchg %ax,%ax
804845f: 90 nop
Thanks to sub $0x20,%esp instruction, we can learn the compiler allocates 32 bytes for the stack which is coherent is the -mpreferred-stack-boundary=4 option: 32 is a multiple of 16.
First question: why, if I have a stack of 32 bytes (24 bytes for the buffer and the rest of junk), I get a segmentation fault with an overflow of just one byte?
Let's look what's happening with gdb:
$ gdb default
(gdb) b 10
Breakpoint 1 at 0x804842a: file vuln.c, line 10.
(gdb) b 12
Breakpoint 2 at 0x8048442: file vuln.c, line 12.
(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/default 1234567890123456789012345
Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10 strcpy(buffer, argv[1]);
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x804842a in main (vuln.c:10); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcde8, args: argc=2, argv=0xffffce94
Locals at 0xffffcde8, Previous frame's sp is 0xffffce00
Saved registers:
ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffffcdfc
(gdb) x/6x buffer
0xffffcdc8: 0xf7e1da60 0x080484ab 0x00000002 0xffffce94
0xffffcdd8: 0xffffcea0 0x08048481
(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647
Just before the call to strcpy, we can see the saved eip is 0xf7e07647. We can find this information back from the buffer address (32 bytes for the stack stack + 4 bytes for the esp = 36 bytes).
Let's continue:
(gdb) c
Continuing.
Breakpoint 2, main (argc=0, argv=0x0) at vuln.c:12
12 return 0;
(gdb) i f
Stack level 0, frame at 0xffff0035:
eip = 0x8048442 in main (vuln.c:12); saved eip = 0x0
source language c.
Arglist at 0xffffcde8, args: argc=0, argv=0x0
Locals at 0xffffcde8, Previous frame's sp is 0xffff0035
Saved registers:
ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffff0031
(gdb) x/7x buffer
0xffffcdc8: 0x34333231 0x38373635 0x32313039 0x36353433
0xffffcdd8: 0x30393837 0x34333231 0xffff0035
(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647
We can see the overflow with the next bytes after the buffer: 0xffff0035. Also, where the eip where stored, nothing changed: 0xffffcdec: 0xf7e07647 because the overflow is of two bytes only. However, the saved eip given by info frame changed: saved eip = 0x0 and the segmentation fault occurs if I continue:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
What's happening? Why my saved eip changed while the overflow is of two bytes only?
Now, let's compare this with the binary compiled with another alignment:
$ objdump -d align_2
...
08048411 <main>:
...
8048414: 83 ec 18 sub $0x18,%esp
...
The stack is exactly 24 bytes. That means an overflow of 2 bytes will override the esp (but still not the eip). Let's check that with gdb:
(gdb) b 10
Breakpoint 1 at 0x804841c: file vuln.c, line 10.
(gdb) b 12
Breakpoint 2 at 0x8048431: file vuln.c, line 12.
(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/align_2 1234567890123456789012345
Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10 strcpy(buffer, argv[1]);
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x804841c in main (vuln.c:10); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
Saved registers:
ebp at 0xffffcdf8, eip at 0xffffcdfc
(gdb) x/6x buffer
0xffffcde0: 0xf7fa23dc 0x080481fc 0x08048449 0x00000000
0xffffcdf0: 0xf7fa2000 0xf7fa2000
(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647
(gdb) c
Continuing.
Breakpoint 2, main (argc=2, argv=0xffffce94) at vuln.c:12
12 return 0;
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x8048431 in main (vuln.c:12); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
Saved registers:
ebp at 0xffffcdf8, eip at 0xffffcdfc
(gdb) x/7x buffer
0xffffcde0: 0x34333231 0x38373635 0x32313039 0x36353433
0xffffcdf0: 0x30393837 0x34333231 0x00000035
(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647
(gdb) c
Continuing.
[Inferior 1 (process 6118) exited normally]
As expected, no segmentation fault here because I don't override the eip.
I don't understand this difference of behavior. In the two cases, the eip is not overriden. The only difference is the size of the stack. What's happening?
Additional information:
This behavior doesn't occur if the dumb_function is not present
I'm using the following version of GCC:
$ gcc -v
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
Some information about my system:
$ uname -a
Linux pierre-Inspiron-5567 4.15.0-107-generic #108~16.04.1-Ubuntu SMP Fri Jun 12 02:57:13 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
You're not overwriting the saved eip, it's true. But you are overwriting a pointer that the function is using to find the saved eip. You can actually see this in your i f output; look at "Previous frame's sp" and notice how the two low bytes are 00 35; ASCII 0x35 is 5 and 00 is the terminating null. So although the saved eip is perfectly intact, the machine is fetching its return address from somewhere else, thus the crash.
In more detail:
GCC apparently doesn't trust the startup code to align the stack to 16 bytes, so it takes matters into its own hands (and $0xfffffff0,%esp). But it needs to keep track of the previous stack pointer value, so that it can find its parameters and the return address when needed. This is the lea 0x4(%esp),%ecx, which loads ecx with the address of the dword just above the saved eip on the stack. gdb calls this address "Previous frame's sp", I guess because it was the value of the stack pointer immediately before the caller executed its call main instruction. I will call it P for short.
After aligning the stack, the compiler pushes -0x4(%ecx) which is the argv parameter from the stack, for easy access since it's going to need it later. Then it sets up its stack frame with push %ebp; mov %esp, %ebp. We can keep track of all addresses relative to %ebp from now on, in the way compilers usually do when not optimizing.
The push %ecx a couple lines down stores the address P on the stack at offset -0x8(%ebp). The sub $0x20, %esp makes 32 more bytes of space on the stack (ending at -0x28(%ebp)), but the question is, where in that space does buffer end up being placed? We see it happen after the call to dumb_function, with lea -0x20(%ebp), %eax; push %eax; this is the first argument to strcpy being pushed, which is buffer, so indeed buffer is at -0x20(%ebp), not at -0x28 as you might have guessed. So when you write 24 (=0x18) bytes there, you overwrite two bytes at -0x8(%ebp) which is our stored P pointer.
It's all downhill from here. The corrupted value of P (call it Px) is popped into ecx, and just before the return, we do lea -0x4(%ecx), %esp. Now %esp is garbage and points somewhere bad, so the following ret is sure to lead to trouble. Maybe Px points to unmapped memory and just attempting to fetch the return address from there causes the fault. Maybe it points to readable memory, but the address fetched from that location does not point to executable memory, so the control transfer faults. Maybe the latter does point to executable memory, but the instructions located there are not the ones we want to be executing.
If you take out the call to dumb_function(), the stack layout changes slightly. It's no longer necessary to push ebx around the call to dumb_function(), so the P pointer from ecx now winds up at -4(%ebp), there are 4 bytes of unused space (to maintain alignment), and then buffer is at -0x20(%ebp). So your two-byte overrun goes into space that's not used at all, hence no crash.
And here is the generated assembly with -mpreferred-stack-boundary=2. Now there is no need to re-align the stack, because the compiler does trust the startup code to align the stack to at least 4 bytes (it would be unthinkable for this not to be the case). The stack layout is simpler: push ebp, and subtract 24 more bytes for buffer. Thus your overrun overwrites two bytes of the saved ebp. This is eventually popped from the stack back into ebp, and so main returns to its caller with a value in ebp that is
not the same as on entry. That's naughty, but it so happens that the system startup code doesn't use the value in ebp for anything (indeed in my tests it is set to 0 on entry to main, likely to mark the top of the stack for backtraces), and so nothing bad happens afterwards.

How does GDB determine the address to break at when you do "break function-name"?

A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

Does main have a return address, dynamic link or return value in C?

According to our book, each function has an activation record in the run-time stack in C. Each of these activation records has a return address, dynamic link, and return value. Does main have these also?
All of these terms are purely implementation details - C has no notion of "return addresses" or "dynamic links." It doesn't even have a notion of a "stack" at all. Most implementations of C have these objects in them, and in those implementations it is possible that they exist for main. However, there is no requirement that this happen.
Hope this helps!
If you disassemble functions you will realize that most of the time the stack doesn't even contain the return value - often times the EAX register does (intel x86).
You can also look up "calling conventions" - it all pretty much depends on the compiler.
C is a language, how it's interpreted into machine code is not 'its' business.
While this depends on the implementation, it is worthy looking at a C program compiled with gcc. If you run objdump -d executable, you will see it disassembled and you can see how main() behaves. Here's an example:
08048680 <_start>:
...
8048689: 54 push %esp
804868a: 52 push %edx
804868b: 68 a0 8b 04 08 push $0x8048ba0
8048690: 68 30 8b 04 08 push $0x8048b30
8048695: 51 push %ecx
8048696: 56 push %esi
8048697: 68 f1 88 04 08 push $0x80488f1
804869c: e8 9f ff ff ff call 8048640 <__libc_start_main#plt>
80486a1: f4 hlt
...
080488f1 <main>:
80488f1: 55 push %ebp
80488f2: 89 e5 mov %esp,%ebp
80488f4: 57 push %edi
80488f5: 56 push %esi
80488f6: 53 push %ebx
...
8048b2b: 5b pop %ebx
8048b2c: 5e pop %esi
8048b2d: 5f pop %edi
8048b2e: 5d pop %ebp
8048b2f: c3 ret
You can see that main behaves similarly to a regular function in that it returns normally. In fact, if you look at the linux base documentation, you'll see that the call to __libc_start_main that we see from _start actually requires main to behave like a regular function.
In C/C++, main() is written just like a function, but isn't one. For example, it isn't allowed to call main(), it has several possible prototypes (can't do that in C!). Whatever is returned from it gets passed to the operating system (and the program ends).
Individual C implementations might handle main() like a function called from "outside" for uniformity, but nobody forces them to do so (or disallow switching to some other form of doing it without telling anybody). There are traditional ways of implementing C, but nobody is forced to do it that way. It is just the simplest way on our typical architectures.

Resources