GDB and Assembly: how to examine consts variables defined in heap? - c

For example in the following code
"justatest" and the format "%s" is defined in heap:
char str[15]="justatest";
int main(){
printf("%s",str);
return 0;
}
in GDB,i got the assembly code before call to printf as:
=> 0x0804841f <+14>: movl $0x804a020,0x4(%esp)
0x08048427 <+22>: movl $0x80484d8,(%esp)
0x0804842e <+29>: call 0x80482f0 <printf#plt>
Do i have to examine the parameter 1by1 using "x/s 0x804a020" and "x/s 0x80484d8"
or is there a Table of constants defined in heap that i can directly refer to?
thanks!

Your understanding about str reside on heap is not correct. Its global variable which gets stored into the data segment. Regarding your print global variable, you can do as follows on my GNU/Linux terminal.
$ gcc -g -Wall hello.c
$ gdb -q ./a.out
Reading symbols from /home/mantosh/practice/a.out...done.
(gdb) break main
Breakpoint 1 at 0x400524: file hello.c, line 6.
(gdb) run
Starting program: /home/mantosh/practice/a.out
Breakpoint 1, main () at bakwas.c:6
6 printf("%s",str);
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400520 <+0>: push %rbp
0x0000000000400521 <+1>: mov %rsp,%rbp
=> 0x0000000000400524 <+4>: mov $0x601020,%esi
0x0000000000400529 <+9>: mov $0x4005e4,%edi
0x000000000040052e <+14>: mov $0x0,%eax
0x0000000000400533 <+19>: callq 0x4003f0 <printf#plt>
0x0000000000400538 <+24>: mov $0x0,%eax
0x000000000040053d <+29>: pop %rbp
0x000000000040053e <+30>: retq
End of assembler dump.
(gdb) p str
$1 = "justatest\000\000\000\000\000"
(gdb) p &str
$2 = (char (*)[15]) 0x601020
// These are addresses of two arguments which would be passed in printf.
// From assembly instruction we can verify that before calling the printf
// these are getting stored into the registers.
(gdb) x/s 0x4005e4
0x4005e4: "%s"
(gdb) x/s 0x601020
0x601020 <str>: "justatest

later i found that for object files without a debugging symbols table
objdump -t obj
would contains most of the symbols of global variables/functions and their address
,and
objdump -D obj instead of -d
would include all sections such as .text/.data/.rodata instead of .text only
these two combined provided sufficient access to what i mentioned aboved, such as switch tables/const strings/global variables

Related

gdb addresses: 0x565561f5 instead of 0x41414141

I want to try a buffer overflow on a c program. I compiled it like this gcc -fno-stack-protector -m32 buggy_program.c with gcc. If i run this program in gdb and i overflow the buffer, it should said 0x41414141, because i sent A's. But its saying 0x565561f5. Sorry for my bad english. Can somebody help me?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
}
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x565561f5 in main ()
I want to see this:
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in main ()
Looking at the address at which the process segfaulted shows the relevant line in the disassembled code:
gdb a.out <<EOF
set logging on
r < inp
disassemble main
x/i $eip
p/x $esp
Produces the following output:
(gdb) Starting program: .../a.out < in
Program received signal SIGSEGV, Segmentation fault.
0x08048482 in main (argc=, argv=) at tmp.c:10 10 }
(gdb) Dump of assembler code for function main:
0x08048436 <+0>: lea 0x4(%esp),%ecx
0x0804843a <+4>: and $0xfffffff0,%esp
0x0804843d <+7>: pushl -0x4(%ecx)
0x08048440 <+10>: push %ebp
0x08048441 <+11>: mov %esp,%ebp
0x08048443 <+13>: push %ebx
0x08048444 <+14>: push %ecx
0x08048445 <+15>: sub $0x40,%esp
0x08048448 <+18>: call
0x8048370 <__x86.get_pc_thunk.bx>
0x0804844d <+23>: add $0x1bb3,%ebx
0x08048453 <+29>: sub $0xc,%esp
0x08048456 <+32>: lea -0x1af0(%ebx),%eax
0x0804845c <+38>: push %eax
0x0804845d <+39>: call 0x8048300
0x08048462 <+44>: add $0x10,%esp
0x08048465 <+47>: sub $0xc,%esp
0x08048468 <+50>: lea -0x48(%ebp),%eax
0x0804846b <+53>: push %eax
0x0804846c <+54>: call 0x8048310
0x08048471 <+59>: add $0x10,%esp
0x08048474 <+62>: mov $0x0,%eax
0x08048479 <+67>: lea -0x8(%ebp),%esp
0x0804847c <+70>: pop %ecx
0x0804847d <+71>: pop %ebx
0x0804847e <+72>: pop %ebp
0x0804847f <+73>: lea -0x4(%ecx),%esp
=> 0x08048482 <+76>: ret
End of assembler dump.
(gdb) => 0x8048482 : ret
(gdb) $1 = 0x4141413d
(gdb) quit
The failing statement is the ret at the end of main. The program fails, when ret attempts to load the return-address from the top of the stack. The produced executable stores the old value of esp on the stack, before aligning to word-boundaries. When main is completed, the program attempts to restore the esp from the stack and afterwards read the return-address. However the whole top of the stack is compromised, thus rendering the new value of the stack-pointer garbage ($1 = 0x4141413d). When ret is executed, it attempts to read a word from address 0x4141413d, which isn't allocated and produces as segfault.
Notes
The above disassembly was produced from the code in the question using the following compiler-options:
-m32 -fno-stack-protector -g -O0
So guys, i found a solution:
Just compile it with gcc 3.3.4
gcc -m32 buggy_program.c
Modern operating systems use address-space-layout-randomization ASLR to make this stuff not work quite so easily.
I remember the controversy when it was first started. ASLR was kind of a bad idea for 32 bit processes due to the number of other constraints it imposed on the system and dubious security benefit. On the other hand, it works great on 64 bit processes and almost everybody uses it now.
You don't know where the code is. You don't know where the heap is. You don't know where the stack is. Writing exploits is hard now.
Also, you tried to use 32 bit shellcode and documentation on a 64 bit process.
On reading the updated question: Your code is compiled with frame pointers (which is the default). This is causing the ret instruction itself to fault because esp is trashed. ASLR appears to still be in play most likely it doesn't really matter.

How do i bypass a return adress overwrite not redirecting control flow?

Let me illustrate the problem here
This is main
(gdb) disass main
Dump of assembler code for function main:
0x000000000040057c <+0>: push rbp
0x000000000040057d <+1>: mov rbp,rsp
0x0000000000400580 <+4>: sub rsp,0x40
0x0000000000400584 <+8>: mov DWORD PTR [rbp-0x34],edi
0x0000000000400587 <+11>: mov QWORD PTR [rbp-0x40],rsi
0x000000000040058b <+15>: mov rax,QWORD PTR [rbp-0x40]
0x000000000040058f <+19>: add rax,0x8
0x0000000000400593 <+23>: mov rdx,QWORD PTR [rax]
0x0000000000400596 <+26>: lea rax,[rbp-0x30]
0x000000000040059a <+30>: mov rsi,rdx
0x000000000040059d <+33>: mov rdi,rax
0x00000000004005a0 <+36>: call 0x400430 <strcpy#plt>
0x00000000004005a5 <+41>: mov eax,0x0
0x00000000004005aa <+46>: call 0x400566 <function>
0x00000000004005af <+51>: mov eax,0x0
0x00000000004005b4 <+56>: leave
0x00000000004005b5 <+57>: ret
End of assembler dump.
(gdb) list
#include <stdio.h>
#include <string.h>
void function(){
printf("test");
}
int main(int argc, char* argv[]) {
char a[34];
strcpy(a , argv[1]);
function();
return 0;
}
It was compiled using the following:
gcc -g -o exploitable0 vulnerable_program0.c -fno-stack-protector -z execstack
The attacker is obviously tempted to overrun the buffer specified by the array 'a'.
Let's examine more in depth what the stack looks like when i run the exploit
(gdb) run $(perl -e 'print "\xeb\x15\x59\x31\xc0\xb0\x04\x31\xdb\xff\xc3\x31\xd2\xb2\x0f\xcd\x80\xb0\x01\xff\xcb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x1f\x77\x6f\x72\x6c\x64\x21\x21\x21\x21\x21\x21\x21\x21\x66\x05\x40\x00\x00\x00\x00\x00"')
Starting program: /home/twister17/Documents/hacking/exploitable0 $(perl -e 'print "\xeb\x15\x59\x31\xc0\xb0\x04\x31\xdb\xff\xc3\x31\xd2\xb2\x0f\xcd\x80\xb0\x01\xff\xcb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x1f\x77\x6f\x72\x6c\x64\x21\x21\x21\x21\x21\x21\x21\x21\x66\x05\x40\x00\x00\x00\x00\x00"')
Breakpoint 1, main (argc=2, argv=0x7fffffffdcf8) at vulnerable_program0.c:9
9 function();
(gdb) i r rbp
rbp 0x7fffffffdc10 0x7fffffffdc10
(gdb) i r rsp
rsp 0x7fffffffdbd0 0x7fffffffdbd0
(gdb) x/18xw $rsp
0x7fffffffdbd0: 0xffffdcf8 0x00007fff 0x0040060d 0x00000002
0x7fffffffdbe0: 0x315915eb 0x3104b0c0 0x31c3ffdb 0xcd0fb2d2
0x7fffffffdbf0: 0xff01b080 0xe880cdcb 0xffffffe6 0x6c6c6548
0x7fffffffdc00: 0x771f2c6f 0x646c726f 0x21212121 0x21212121
0x7fffffffdc10: 0x00400566 0x00000000
(gdb)
the affected buffer starts at 0x7fffffffdbe0 and ends at 0x7fffffffdc10.
The shellcode injects smoothtly , to make things simpler i overwrite the return address with that of the function.
That is, I actually used 0x400566 as the return address, so if everything runs smoothly it would just call the function and print "test". You can clearly see from the assembly dump that is the address of the function "function"
expected output: "testtest".
actual output: test ( the program calls the function already).
I think you're having troubles with the StackGuard protection from GCC that prevents buffer overflows.
Try compiling your code with the
-fno-stack-protector
flag in order to disable the protection for your program.
Assuming you're running it on Linux, if you want to disable kernel based protection, which is called Address Space Randomization, you can run :
sysctl -w kernel.randomize_va_space=0

Confusing about the implementation of shared library in Linux

I'm doing some experiment about shared library in Linux. By reading several papers I think I know what happens when a shared library function is called.
But when I am trying to trace the memory to get the binary code in a shared library function, I find something strange. In my opinion, after calling a shared library function, the corresponding slot in .got.plt should contain the actual function address, but my experiment shows that it still remains the same, i.e the address of the second instruction in func#plt section. I'm rather confused about this, so if anyone could help me?
Here is my code and output:
#include <stdio.h>
#include <string.h>
typedef unsigned long u_l;
int main()
{
char *p_ch = strstr("abc", "b");
printf("result = %s\n", p_ch);
long long *p = (long long *) &strstr;
printf("data = %llx\n", *(p));
long long k = *p >> 16;
u_l *entry_addr = (u_l *)(k & 0x00000000ffffffff);
printf("entry_addr = %lx\n", entry_addr);
u_l *func_addr = (u_l *)*entry_addr;
printf("func_addr = %lx\n", func_addr);
printf("code = %llx\n", *func_addr);
return 0;
}
output:
result = bc
data = 680804a00c25ff
entry_addr = 804a00c
func_addr = 8048326
code = 68080400000068
Thanks first!
PS: Please don't ask me why I need to get the code of a shared library function. Of course I know the source code and the binary could be obtained easily. It's just a experiment.
My GCC version is 4.7.3. Kernel version is 3.8.0-35
Not sure what is the logic of Your program, but I'll try to show where address changes.
$ gcc -Wall -g test.c
$ gdb a.out
(gdb) break main
Breakpoint 1 at 0x40054c: file test.c, line 8.
(gdb) run
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400544 <+0>: push %rbp
0x0000000000400545 <+1>: mov %rsp,%rbp
0x0000000000400548 <+4>: sub $0x30,%rsp
=> 0x000000000040054c <+8>: movq $0x4006fd,-0x28(%rbp)
0x0000000000400554 <+16>: mov $0x400700,%eax
0x0000000000400559 <+21>: mov -0x28(%rbp),%rdx
0x000000000040055d <+25>: mov %rdx,%rsi
0x0000000000400560 <+28>: mov %rax,%rdi
0x0000000000400563 <+31>: mov $0x0,%eax
0x0000000000400568 <+36>: callq 0x400430 <printf#plt>
0x000000000040056d <+41>: movq $0x400450,-0x20(%rbp)
0x0000000000400575 <+49>: mov -0x20(%rbp),%rax
0x0000000000400579 <+53>: mov (%rax),%rdx
0x000000000040057c <+56>: mov $0x40070d,%eax
0x0000000000400581 <+61>: mov %rdx,%rsi
0x0000000000400584 <+64>: mov %rax,%rdi
0x0000000000400587 <+67>: mov $0x0,%eax
0x000000000040058c <+72>: callq 0x400430 <printf#plt>
0x0000000000400591 <+77>: mov -0x20(%rbp),%rax
0x0000000000400595 <+81>: mov (%rax),%rax
0x0000000000400598 <+84>: sar $0x10,%rax
0x000000000040059c <+88>: mov %rax,-0x18(%rbp)
Let's make breakpoint in PLT table at printf entry (0x400430) and continue:
(gdb) break *0x400430
Breakpoint 2 at 0x400430
(gdb) continue
Continuing.
Breakpoint 2, 0x0000000000400430 in printf#plt ()
(gdb) disassemble
Dump of assembler code for function printf#plt:
=> 0x0000000000400430 <+0>: jmpq *0x200bca(%rip) # 0x601000 <printf#got.plt>
0x0000000000400436 <+6>: pushq $0x0
0x000000000040043b <+11>: jmpq 0x400420
End of assembler dump.
(gdb) x/x 0x601000
0x601000 <printf#got.plt>: 0x00400436
In PLT table You can see indirect jump by address stored in GOT at 0x601000 (0x200bca+0x400430+6), which at first function invocation resolves to next address in PLT (0x00400436: pushq and jump to dynamic linker). Dynamic linker finds real printf, updates it's GOT entry and jumps to it.
Next time You call the same printf function (and hit the breakpoint), it's entry at GOT 0x601000 is already updated to 0xf7a6d840, so there is jump directly to printf, not to dynamic linker.
(gdb) c
Continuing.
result = bc
Breakpoint 2, 0x0000000000400430 in printf#plt ()
(gdb) disassemble
Dump of assembler code for function printf#plt:
=> 0x0000000000400430 <+0>: jmpq *0x200bca(%rip) # 0x601000 <printf#got.plt>
0x0000000000400436 <+6>: pushq $0x0
0x000000000040043b <+11>: jmpq 0x400420
End of assembler dump.
(gdb) x/x 0x601000
0x601000 <printf#got.plt>: 0xf7a6d840
This example is from 64bit Linux. On other *NIX'es assembly or similar details may vary, but idea remains the same.
One thing else, I couldn't find printf in my libc.so, …
This program shows you an address and the containing library (using a Glibc extension) for each function given as an argument:
/* cc -ldl */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(int argc, char *argv[])
{
while (*++argv)
{
void *handle = dlopen(NULL, RTLD_NOW);
if (!handle) puts(dlerror()), exit(1);
void *p = dlsym(handle, *argv);
char *s = dlerror();
if (s) puts(s), exit(1);
printf("%s = %p\n", *argv, p);
Dl_info info;
if (dladdr(p, &info))
printf("%s contains %s\n", info.dli_fname, info.dli_sname);
}
}

Overflow buffer in C on x86_64 to call function

Hello i have such code
#include <stdio.h>
#define SECRET "1234567890AZXCVBNFRT"
int checksecret(){
char buf[32];
gets(buf);
if(strcmp(SECRET,buf)==0) return 1;
else return 0;
}
void outsecret(){
printf("%s\n",SECRET);
}
int main(int argc, char** argv){
if (checksecret()){
outsecret();
};
}
disass of outsecret
(gdb) disassemble outsecret
Dump of assembler code for function outsecret:
0x00000000004005f4 <+0>: push %rbp
0x00000000004005f5 <+1>: mov %rsp,%rbp
0x00000000004005f8 <+4>: mov $0x4006b4,%edi
0x00000000004005fd <+9>: callq 0x400480 <puts#plt>
0x0000000000400602 <+14>: pop %rbp
0x0000000000400603 <+15>: retq
I have an assumption that i don't know SECRET, so i try to run my program with such string python -c 'print "A" * 32 + "\x40\x05\xf4"[::-1]'. But it fails with segmentation fault. What i am doing wrong? Thank you for any help.
PS
I want to call function outsecret by overwriting return code in checksecret
You have to remember that all strings have an extra character that terminates the string, so if you input 32 characters then gets will write 33 characters to the buffer. Writing beyond the limits of an array leads to undefined behavior which often leads to crashes.
The gets function have no bounds-checking, and is very dangerous to use. It has been deprecated since long, and in the latest C11 standard it has even been removed.
$ python -c 'print "A" * 32 + "\x40\x05\xf4"[::1]'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#
$ perl -le 'print length("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#")'
33
Your input string is too long for buffer size of 32 characters (extra one is needed for '\0' terminating null character). You are victim to buffer or array overflow (sometimes also called as array overrun).
Note that gets() is deprecated in C99 and eventually it has been dropped in C11 Standard for security reasons.
I want to call function outsecret by overwriting return code in
checksecret
Beware, you are about to leave relatively safe regions of C Standard. This means that behaviour is relative to compiler, compiler's versions, optimization settings, ABI and so on (maybe inclucing current phase of moon).
As of x86 calling conventions integer return value is stored directly in %eax register (that's assuming that you have x86 or x86-64 CPU). Stack-likely-located array buf is handled by %rbp offsets within current stack frame. Let's consult with gdb disassemble command:
$ gcc -O0 test.c
$ gdb -q a.out
(gdb) b checksecret
(gdb) r
Breakpoint 1, 0x0000000000400631 in checksecret ()
(gdb) disas
Dump of assembler code for function checksecret:
0x000000000040062d <+0>: push %rbp
0x000000000040062e <+1>: mov %rsp,%rbp
=> 0x0000000000400631 <+4>: sub $0x30,%rsp
0x0000000000400635 <+8>: mov %fs:0x28,%rax
0x000000000040063e <+17>: mov %rax,-0x8(%rbp)
0x0000000000400642 <+21>: xor %eax,%eax
0x0000000000400644 <+23>: lea -0x30(%rbp),%rax
0x0000000000400648 <+27>: mov %rax,%rdi
0x000000000040064b <+30>: callq 0x400530 <gets#plt>
0x0000000000400650 <+35>: lea -0x30(%rbp),%rax
0x0000000000400654 <+39>: mov %rax,%rsi
0x0000000000400657 <+42>: mov $0x400744,%edi
0x000000000040065c <+47>: callq 0x400510 <strcmp#plt>
0x0000000000400661 <+52>: test %eax,%eax
0x0000000000400663 <+54>: jne 0x40066c <checksecret+63>
0x0000000000400665 <+56>: mov $0x1,%eax
0x000000000040066a <+61>: jmp 0x400671 <checksecret+68>
0x000000000040066c <+63>: mov $0x0,%eax
0x0000000000400671 <+68>: mov -0x8(%rbp),%rdx
0x0000000000400675 <+72>: xor %fs:0x28,%rdx
0x000000000040067e <+81>: je 0x400685 <checksecret+88>
0x0000000000400680 <+83>: callq 0x4004f0 <__stack_chk_fail#plt>
0x0000000000400685 <+88>: leaveq
0x0000000000400686 <+89>: retq
There is no way overwrite %eax directly from C code, but what you could do is to overwrite selective fragment of code section. In your case what you want is to replace:
0x000000000040066c <+63>: mov $0x0,%eax
with
0x000000000040066c <+63>: mov $0x1,%eax
It's easy to accomplish by gdb itself:
(gdb) x/2bx 0x40066c
0x40066c <checksecret+63>: 0xb8 0x00
set {unsigned char}0x40066d = 1
Now let's confirm it:
(gdb) x/i 0x40066c
0x40066c <checksecret+63>: mov $0x1,%eax
From that point checksecret() is returning 1 even if SECRET does not match. However It wouldn't be so easy to do it by buf itself, as you need to know (guess somehow?) correct offset of particular code section instruction.
Above answers are pretty clear and corret way to exploit buffer overflow vulnerability. But there is a different way to do same thing without exploit vulnerability.
mince#rootlab tmp $ gcc test.c -o test
mince#rootlab tmp $ strings test
/lib64/ld-linux-x86-64.so.2
libc.so.6
gets
puts
__stack_chk_fail
strcmp
__libc_start_main
__gmon_start__
GLIBC_2.4
GLIBC_2.2.5
UH-X
UH-X
[]A\A]A^A_
1234567890AZXCVBNFRT
;*3$
Please look at last 2 row. You will see your secret key in there.

How does GDB determine the address to break at when you do "break function-name"?

A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

Resources