GDB skips shared library breakpoint - c

Im using a debugger with a simple C program, im trying to set a breakpoint with a shared library, but GDB skips this breakpoint entirely.
Im trying to use GDB with a simple C program to learn about GDB. I set 3 breakpoints, 1 at line 7, one at the strcpy function, and one at line 8. I try to set a breakpoint in my program involving a shared library (specifically "break strcpy"), but every time I run the program and press "c", the program skips breakpoint 2 entirely
#include <stdio.h>
#include <string.h>
int main() {
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Whenever I run the program in the debugger, it stops normally at breakpoint 1, which is expected, but then whenever I press "c" to continue to breakpoint 2, it skips breakpoint 2 entirely and just shows the output breakpoint 3 is supposed to have. Is this something to do with GDB's handling of shared libraries?
EDIT: Here is the disassembly
0x0000555555555145 <+0>: push rbp
0x0000555555555146 <+1>: mov rbp,rsp
0x0000555555555149 <+4>: sub rsp,0x20
0x000055555555514d <+8>: lea rax,[rbp-0x20]
0x0000555555555151 <+12>: lea rsi,[rip+0xeac] # 0x555555556004
0x0000555555555158 <+19>: mov rdi,rax
0x000055555555515b <+22>: call 0x555555555030 <strcpy#plt>
0x0000555555555160 <+27>: lea rax,[rbp-0x20]
0x0000555555555164 <+31>: mov rdi,rax
0x0000555555555167 <+34>: mov eax,0x0
0x000055555555516c <+39>: call 0x555555555040 <printf#plt>
0x0000555555555171 <+44>: mov eax,0x0
0x0000555555555176 <+49>: leave
0x0000555555555177 <+50>: ret

You didn't specify your platform. I suspect it's Linux with GLIBC.
The reason GDB behaves this way is that strcpy is not a normal function, but a GNU IFUNC.
Try setting breakpoint on __strcpy_sse2_unaligned and see this answer.
Update:
the debugger spits out this error whenever it reaches breakpoint 2, "../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: No such file or directory.
That isn't an error.
The fact that it reaches that breakpoint confirms that this answer is correct.
You can simply treat __strcpy_sse2_unaligned as an alias to strcpy. Setting a breakpoint there is (on your system) equivalent to setting it on strcpy.

Related

Inserting gdb breakpoints fail

I'm learning about buffer overflow in c.
For that purpose, I'm following this simple example.
I have the following gcc version:
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And this simple c file:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char buf[256];
strcpy(buf, argv[1]);
printf("%s,", buf);
return 0;
}
I then compile this file with $ gcc buf.c -o buf.
I then open in gdb by $ gdb ./buf
I call disas and get the result assembly:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: sub $0x120,%rsp
0x0000000000001198 <+15>: mov %edi,-0x114(%rbp)
0x000000000000119e <+21>: mov %rsi,-0x120(%rbp)
0x00000000000011a5 <+28>: mov %fs:0x28,%rax
0x00000000000011ae <+37>: mov %rax,-0x8(%rbp)
0x00000000000011b2 <+41>: xor %eax,%eax
0x00000000000011b4 <+43>: mov -0x120(%rbp),%rax
0x00000000000011bb <+50>: add $0x8,%rax
0x00000000000011bf <+54>: mov (%rax),%rdx
0x00000000000011c2 <+57>: lea -0x110(%rbp),%rax
0x00000000000011c9 <+64>: mov %rdx,%rsi
0x00000000000011cc <+67>: mov %rax,%rdi
0x00000000000011cf <+70>: callq 0x1070 <strcpy#plt>
--Type <RET> for more, q to quit, c to continue without paging--
0x00000000000011d4 <+75>: lea -0x110(%rbp),%rax
0x00000000000011db <+82>: mov %rax,%rsi
0x00000000000011de <+85>: lea 0xe1f(%rip),%rdi # 0x2004
0x00000000000011e5 <+92>: mov $0x0,%eax
0x00000000000011ea <+97>: callq 0x1090 <printf#plt>
0x00000000000011ef <+102>: mov $0x0,%eax
0x00000000000011f4 <+107>: mov -0x8(%rbp),%rcx
0x00000000000011f8 <+111>: xor %fs:0x28,%rcx
0x0000000000001201 <+120>: je 0x1208 <main+127>
0x0000000000001203 <+122>: callq 0x1080 <__stack_chk_fail#plt>
0x0000000000001208 <+127>: leaveq
0x0000000000001209 <+128>: retq
End of assembler dump.
With some really low memory adresses.
I then want to see what happens if I input a big string of A's into the program, I therefore place a breakpoint at 0x00000000000011db
I then run it:
(gdb) run $(python3 -c "print('A'*256)"
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)"
/bin/bash: -c: line 0: unexpected EOF while looking for matching `)'
/bin/bash: -c: line 1: syntax error: unexpected end of file
During startup program exited with code 1.
(gdb) run $(python3 -c "print('A'*256)")
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)")
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x11db
Ok, so something with the memory adresses is a bit funky.
I google the issue and find this post where I find that this is because Position-Independent Executable (PIE) was probably enabled, and the memory adresses would be changed when the program is actually run.
I can confirm this by running disas after running the program, and seeing that the memory adresses are in a lot higher ranges.
This all makes sense, but it makes me wonder, iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
This happens because GDB by default disables address randomization (to make debugging easier).
If you re-enable ASLR with (gdb) set disable-randomization off, then you wouldn't be able to set the breakpoint on an address.
You would still be able to set breakpoint on e.g. main -- in that case GDB will wait until the executable has been relocated, and will set the breakpoint on the actual runtime instruction (the address will change on every run).

gdb addresses: 0x565561f5 instead of 0x41414141

I want to try a buffer overflow on a c program. I compiled it like this gcc -fno-stack-protector -m32 buggy_program.c with gcc. If i run this program in gdb and i overflow the buffer, it should said 0x41414141, because i sent A's. But its saying 0x565561f5. Sorry for my bad english. Can somebody help me?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
}
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x565561f5 in main ()
I want to see this:
Starting program: /root/Downloads/a.out
Type in something: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in main ()
Looking at the address at which the process segfaulted shows the relevant line in the disassembled code:
gdb a.out <<EOF
set logging on
r < inp
disassemble main
x/i $eip
p/x $esp
Produces the following output:
(gdb) Starting program: .../a.out < in
Program received signal SIGSEGV, Segmentation fault.
0x08048482 in main (argc=, argv=) at tmp.c:10 10 }
(gdb) Dump of assembler code for function main:
0x08048436 <+0>: lea 0x4(%esp),%ecx
0x0804843a <+4>: and $0xfffffff0,%esp
0x0804843d <+7>: pushl -0x4(%ecx)
0x08048440 <+10>: push %ebp
0x08048441 <+11>: mov %esp,%ebp
0x08048443 <+13>: push %ebx
0x08048444 <+14>: push %ecx
0x08048445 <+15>: sub $0x40,%esp
0x08048448 <+18>: call
0x8048370 <__x86.get_pc_thunk.bx>
0x0804844d <+23>: add $0x1bb3,%ebx
0x08048453 <+29>: sub $0xc,%esp
0x08048456 <+32>: lea -0x1af0(%ebx),%eax
0x0804845c <+38>: push %eax
0x0804845d <+39>: call 0x8048300
0x08048462 <+44>: add $0x10,%esp
0x08048465 <+47>: sub $0xc,%esp
0x08048468 <+50>: lea -0x48(%ebp),%eax
0x0804846b <+53>: push %eax
0x0804846c <+54>: call 0x8048310
0x08048471 <+59>: add $0x10,%esp
0x08048474 <+62>: mov $0x0,%eax
0x08048479 <+67>: lea -0x8(%ebp),%esp
0x0804847c <+70>: pop %ecx
0x0804847d <+71>: pop %ebx
0x0804847e <+72>: pop %ebp
0x0804847f <+73>: lea -0x4(%ecx),%esp
=> 0x08048482 <+76>: ret
End of assembler dump.
(gdb) => 0x8048482 : ret
(gdb) $1 = 0x4141413d
(gdb) quit
The failing statement is the ret at the end of main. The program fails, when ret attempts to load the return-address from the top of the stack. The produced executable stores the old value of esp on the stack, before aligning to word-boundaries. When main is completed, the program attempts to restore the esp from the stack and afterwards read the return-address. However the whole top of the stack is compromised, thus rendering the new value of the stack-pointer garbage ($1 = 0x4141413d). When ret is executed, it attempts to read a word from address 0x4141413d, which isn't allocated and produces as segfault.
Notes
The above disassembly was produced from the code in the question using the following compiler-options:
-m32 -fno-stack-protector -g -O0
So guys, i found a solution:
Just compile it with gcc 3.3.4
gcc -m32 buggy_program.c
Modern operating systems use address-space-layout-randomization ASLR to make this stuff not work quite so easily.
I remember the controversy when it was first started. ASLR was kind of a bad idea for 32 bit processes due to the number of other constraints it imposed on the system and dubious security benefit. On the other hand, it works great on 64 bit processes and almost everybody uses it now.
You don't know where the code is. You don't know where the heap is. You don't know where the stack is. Writing exploits is hard now.
Also, you tried to use 32 bit shellcode and documentation on a 64 bit process.
On reading the updated question: Your code is compiled with frame pointers (which is the default). This is causing the ret instruction itself to fault because esp is trashed. ASLR appears to still be in play most likely it doesn't really matter.

how to make gdb show line numbers with respect to function's head?

this is my gdb output. How can I make it write the line numbers, instead of ...227 to be main+1, as it shows when I disassemble it?
It is not clear exactly what you are asking since machine instruction address and source-code line number are not directly related. Possibly suited to your need is to use mixed source/disassembly. For example:
(gdb) disassemble /m main
Dump of assembler code for function main:
5 {
0x08048330 <+0>: push %ebp
0x08048331 <+1>: mov %esp,%ebp
0x08048333 <+3>: sub $0x8,%esp
0x08048336 <+6>: and $0xfffffff0,%esp
0x08048339 <+9>: sub $0x10,%esp
6 printf ("Hello.\n");
0x0804833c <+12>: movl $0x8048440,(%esp)
0x08048343 <+19>: call 0x8048284 <puts#plt>
7 return 0;
8 }
0x08048348 <+24>: mov $0x0,%eax
0x0804834d <+29>: leave
0x0804834e <+30>: ret
End of assembler dump.
This shows each line of source code ahead of the machine code disassembly associates with it. Both the source line numbers and instruction addresses and offsets are shown. Note that it is likely to be far less comprehensible if you apply optimisation as often code is eliminated or re-ordered such that it no longer has a direct correspondence to the source code order.
If rather you want to show the current program counter address/offset as you step, then that can be done with the display /i $pc command:
(gdb) display /i $pc
(gdb) run
Starting program: /home/a.out
Breakpoint 2, main () at main.c:13
13 printf("Hello World");
1: x/i $pc
=> 0x40053a <main+4>: mov $0x4005d4,%edi
(gdb) step
__printf (format=0x4005d4 "Hello World") at printf.c:28
28 printf.c: No such file or directory.
1: x/i $pc
=> 0x7ffff7a686b0 <__printf>: sub $0xd8,%rsp
(gdb)

Gdb jumping some parts of the assembly codes

I'm having a difficult to debug a program at assembly level because GDB is jumping some parts of the code. The code is:
#include <stdio.h>
#define BUF_SIZE 8
void getInput(){
char buf[BUF_SIZE];
gets(buf);
puts(buf);
}
int main(int argc, char* argv){
printf("Digite alguma coisa, tamanho do buffer eh: %d\n", BUF_SIZE);
getInput();
return 0;
}
The program was compiled with gcc -ggdb -fno-stack-protector -mpreferred-stack-boundary=4 -o exploit1 exploit1.c
In gdb, I added break getInput and when I run disas getInput it returns me:
Dump of assembler code for function getInput:
0x00000000004005cc <+0>: push %rbp
0x00000000004005cd <+1>: mov %rsp,%rbp
0x00000000004005d0 <+4>: sub $0x10,%rsp
0x00000000004005d4 <+8>: lea -0x10(%rbp),%rax
0x00000000004005d8 <+12>: mov %rax,%rdi
0x00000000004005db <+15>: mov $0x0,%eax
0x00000000004005e0 <+20>: callq 0x4004a0 <gets#plt>
0x00000000004005e5 <+25>: lea -0x10(%rbp),%rax
0x00000000004005e9 <+29>: mov %rax,%rdi
0x00000000004005ec <+32>: callq 0x400470 <puts#plt>
0x00000000004005f1 <+37>: nop
0x00000000004005f2 <+38>: leaveq
0x00000000004005f3 <+39>: retq
If I type run I noticed that the program stops at the line 0x00000000004005d4 and not in the first line of the function 0x00000000004005cc as I expected. Why is this happening?
By the way, this is messing me up because I'm noticing that some extra data is being added to the Stack and I want to see step by step the stack growing.
If I type run I noticed that the program stops at the line 0x00000000004005d4 and not in the first line of the function 0x00000000004005cc as I expected.
Your expectation is incorrect.
Why is this happening?
Because when you set breakpoint via break getInput, GDB sets the breakpoint after function prolog. From documentation:
-function function
The value specifies the name of a function. Operations on function locations
unmodified by other options (such as -label or -line) refer to the line that
begins the body of the function. In C, for example, this is the line with the
open brace.
If you want to set breakpoint on the first instruction, use break *getInput instead.
Documentation here and here.

How does GDB determine the address to break at when you do "break function-name"?

A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

Resources