Related
I have the C code:
long fib(long n) {
if (n < 2) return 1;
return fib(n-1) + fib(n-2);
}
int main(int argc, char** argv) {
return 0;
}
which I compiled by running gcc -O0 -fno-optimize-sibling-calls -S file.c yielding assembly code that has not been optimized:
.file "long.c"
.text
.globl fib
.type fib, #function
fib:
.LFB5:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
.cfi_offset 3, -24
movq %rdi, -24(%rbp)
cmpq $1, -24(%rbp)
jg .L2
movl $1, %eax
jmp .L3
.L2:
movq -24(%rbp), %rax
subq $1, %rax
movq %rax, %rdi
call fib
movq %rax, %rbx
movq -24(%rbp), %rax
subq $2, %rax
movq %rax, %rdi
call fib
addq %rbx, %rax
.L3:
addq $24, %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE5:
.size fib, .-fib
.globl main
.type main, #function
main:
.LFB6:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE6:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
My question is:
Why do we decrement the stack pointer by 24, subq $24, %rsp? As I see it, we store one element only, first argument n in %rdi, on the stack after the initial two pushes. So why don't we just decrement the stack pointer by 8 and then move n to -8(%rbp)? So
subq $8, %rsp
movq %rdi, -8(%rbp)
GCC does not fully optimize with -O0, not even its stack use. (This may aid in debugging by making some of its use of the stack more transparent to humans. For example, objects a, b, and c may share a single stack location if their active lifetimes (defined by uses in the program, not by the model of lifetime in the C standard) with -O3, but may have separately reserved places in the stack with -O0, and that makes it easier for a human to see where a, b, and c are used in the assembly code. The wasted 16 bytes may be a side effect of this, as those spaces may be reserved for some purpose that this small function did not happen to use, such as space to save certain registers if needed.)
Changing optimization to -O3 results in GCC subtracting only eight from the stack pointer.
I am trying to change the return address of some function in C for skipping one instruction. I am doing this on virtual machine with Ubuntu Server (because on Mac gcc doesn't let me turn off stack protection).
I am compiling my code with gcc:
gcc –g –fno-stack-protector –z execstack –o bufover bufover.c
This is the code:
void foo(int a, int b, int c) {
char buff[256];
long *ret, *ret2;
ret = buff + 256 + 8;
(*ret) += 5;
}
int main() {
char x;
x = '0';
foo(1,2,3);
x = '1';
printf("%c\n",x);
}
To the address of buff I added 256 (size of buff) and 8 (size of %RBP). Before this on the stack should be return address.
Next I added 5 bytes to the address that, because I checked using gdb that next instruction is in 5 bytes.
But it doesn't work...
I am analyzing the variables (addresses) step by step using gdb but I am not seeing any mistake.
Any idea?
Edit: Assembly code:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14
.globl _foo ## -- Begin function foo
.p2align 4, 0x90
_foo: ## #foo
Lfunc_begin0:
.file 1 "me.c"
.loc 1 3 0 ## me.c:3:0
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $160, %rsp
leaq -272(%rbp), %rax
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl %edx, -12(%rbp)
Ltmp0:
##DEBUG_VALUE: foo:buff <- [%rax+0]
.loc 1 7 19 prologue_end ## me.c:7:19
addq $256, %rax ## imm = 0x100
Ltmp1:
.loc 1 7 25 is_stmt 0 ## me.c:7:25
addq $8, %rax
.loc 1 7 12 ## me.c:7:12
movq %rax, -280(%rbp)
.loc 1 8 10 is_stmt 1 ## me.c:8:10
movq -280(%rbp), %rax
.loc 1 8 15 is_stmt 0 ## me.c:8:15
movq (%rax), %rcx
addq $5, %rcx
movq %rcx, (%rax)
.loc 1 9 5 is_stmt 1 ## me.c:9:5
addq $160, %rsp
popq %rbp
retq
Ltmp2:
Lfunc_end0:
.cfi_endproc
## -- End function
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## #main
Lfunc_begin1:
.loc 1 11 0 ## me.c:11:0
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $1, %edi
movl $2, %esi
movl $3, %edx
Ltmp3:
.loc 1 13 9 prologue_end ## me.c:13:9
movb $48, -1(%rbp)
.loc 1 14 7 ## me.c:14:7
callq _foo
leaq L_.str(%rip), %rdi
.loc 1 15 9 ## me.c:15:9
movb $49, -1(%rbp)
.loc 1 16 21 ## me.c:16:21
movsbl -1(%rbp), %esi
.loc 1 16 7 is_stmt 0 ## me.c:16:7
movb $0, %al
callq _printf
xorl %edx, %edx
.loc 1 17 5 is_stmt 1 ## me.c:17:5
movl %eax, -8(%rbp) ## 4-byte Spill
movl %edx, %eax
addq $16, %rsp
popq %rbp
retq
Ltmp4:
Lfunc_end1:
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%c\n"
.section __DWARF,__debug_str,regular,debug
You should use GCC return-address related builtins like __builtin_frame_address or __builtin_return_address, and you should carefully study the x86-64 ABI specification to understand in details the relevant x86 calling conventions.
Try also to understand them by writing some C code in foo.c, and compiling it with gcc -O -fverbose-asm -S foo.c then looking into the generated foo.s
At last, the stack segment is usually not executable (this could matter for some trampoline techniques). Read about the NX bit. On Linux, learn to use mprotect(2), mmap(2), backtrace(3).
There is no guarantee that GCC even uses any call stack. It could optimize to avoid using it (tail-call optimization could happen sometimes), and your code might not even need additional call frames. So of course you cannot achieve your goal in standard C, or without additional hypothesis on your particular GCC compiler (and GCC 8 and GCC 9 could optimize differently).
Of course, changing the return address is undefined behavior.
I am a huge fan of network protocols and libnet, which is why I've been trying to imitate some network protocols that are not included by libnet. Capturing packets, imitating headers etc works so far. Now I need a way to actually write these exact packets to my network card. I've tried libnet_adv_write_rawipv4() and -link(), both won't work. I can't cull the headers with libnet_adv_cull_header() because of the stupid errors and bugs. So I figured, that the problem could be solved with a little assembly: get the assembly code for the actual libnet_build() and libnet_write() call, alter some bytes and voila: raw bytes get written to the network card. So I have written a dummy program:
#include <stdio.h>
#include <stdlib.h>
#include <libnet.h>
int main() {
libnet_t *l;
l = libnet_init(LIBNET_RAW4, 0, NULL);
libnet_build_tcp(2000, 450, 0, 1234, TH_SYN, 254, 0, NULL, LIBNET_TCP_H + 5,
"aaaaa", 5, l, 0);
libnet_build_ipv4(LIBNET_TCP_H + LIBNET_IPV4_H + 5, 0, 1, 0, 64, 6, 0,
2186848448, 22587584, NULL, 0, l, 0);
libnet_write(l);
return 0;
}
Works so far. Now I got the assembly version of the program using
gcc -o program program.c -S
And this is where the actual problem starts:
.LC0:
.string "aaaaa"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $0, %edx
movl $0, %esi
movl $1, %edi
call libnet_init
movq %rax, -8(%rbp)
subq $8, %rsp
pushq $0
pushq -8(%rbp)
pushq $5
pushq $.LC0
pushq $25
pushq $0
pushq $0
movl $254, %r9d
movl $2, %r8d
movl $1234, %ecx
movl $0, %edx
movl $450, %esi
movl $2000, %edi
call libnet_build_tcp
addq $64, %rsp
subq $8, %rsp
pushq $0
pushq -8(%rbp)
pushq $0
pushq $0
pushq $22587584
pushq $-2108118848
pushq $0
movl $6, %r9d
movl $64, %r8d
movl $0, %ecx
movl $1, %edx
movl $0, %esi
movl $45, %edi
call libnet_build_ipv4
addq $64, %rsp
movq -8(%rbp), %rax
movq %rax, %rdi
call libnet_write
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
See this?
call libnet_build_ipv4
I can't copy the assembly code of these build() or write() calls, because all there is is a reference to them. Now, where would I find the assembly code for these pre-written functions included in libnet-functions.h (libnet_build_ipv4(), libnet_build_tcp(), libnet_write()) ?????
GDB is your friend in situations like this. You don't say anything about what platform you're on, the following example works on Ubuntu, but should work similarly on other distributions.
First, make sure that you have debug-symbols for libnet installed:
sudo apt install libnet1-dbg
Find out where libnet is installed:
~$ dpkg -L libnet1 | grep \.so
/usr/lib/x86_64-linux-gnu/libnet.so.1.7.0
/usr/lib/x86_64-linux-gnu/libnet.so.1
Open it (or your own application) with GDB:
~$ gdb /usr/lib/x86_64-linux-gnu/libnet.so.1.7.0
Reading symbols from /usr/lib/x86_64-linux-gnu/libnet.so.1.7.0...Reading symbols from /usr/lib/debug//usr/lib/x86_64-linux-gnu/libnet.so.1.7.0...done.
done.
Use the disassemble command to inspect anything you like:
(gdb) disassemble libnet_build_ipv4
Dump of assembler code for function libnet_build_ipv4:
0x0000000000007d60 <+0>: push %r15
0x0000000000007d62 <+2>: push %r14
0x0000000000007d64 <+4>: push %r13
0x0000000000007d66 <+6>: push %r12
0x0000000000007d68 <+8>: push %rbp
0x0000000000007d69 <+9>: push %rbx
0x0000000000007d6a <+10>: sub $0x48,%rsp
0x0000000000007d6e <+14>: mov 0xa8(%rsp),%rbx
0x0000000000007d76 <+22>: mov %edx,0x8(%rsp)
0x0000000000007d7a <+26>: mov %fs:0x28,%rax
0x0000000000007d83 <+35>: mov %rax,0x38(%rsp)
0x0000000000007d88 <+40>: xor %eax,%eax
0x0000000000007d8a <+42>: mov %ecx,0x14(%rsp)
0x0000000000007d8e <+46>: mov 0x80(%rsp),%r14d
0x0000000000007d96 <+54>: test %rbx,%rbx
0x0000000000007d99 <+57>: mov 0x98(%rsp),%r15
0x0000000000007da1 <+65>: je 0x810a <libnet_build_ipv4+938>
0x0000000000007da7 <+71>: mov %esi,%r13d
0x0000000000007daa <+74>: mov 0xb0(%rsp),%esi
0x0000000000007db1 <+81>: mov %edi,%ebp
0x0000000000007db3 <+83>: mov $0xd,%ecx
0x0000000000007db8 <+88>: mov $0x14,%edx
0x0000000000007dbd <+93>: mov %rbx,%rdi
0x0000000000007dc0 <+96>: mov %r9d,0x1c(%rsp)
0x0000000000007dc5 <+101>: mov %r8d,0x18(%rsp)
0x0000000000007dca <+106>: callq 0xea10 <libnet_pblock_probe>
0x0000000000007dcf <+111>: test %rax,%rax
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb)
I'm writing a program in Assembly that has has 2 arrays declared at the beginning and 3 functions, which are:
printQArray(int size, long *array1)
invertArray(int size, long *array1)
multQuad(int size, long *array1, long *array2)
Now the program takes these arrays and prints the products of the 2 arrays for each corresponding positions and prints them.
Then it prints Array1.
Then it prints Array1 Reversed.
Then it should take the reversed array and call the multiplication function again and print the product of the positions of 1st array reversed and the 2nd array which never changes.(Array values in source code)
I'm having problems after I reverse the array and attempt to multiply the reversed 1st array and 2nd array.
The following is the output of my program
Products
200
-925
1386
-2928
9375
64350
Elements in QArray1
10
25
33
48
125
550
Elements in QArray1
550
125
48
33
25
10
Products
0
-1036
-31584
44896
0
0
So this last output is clearly not the products of array1 reversed and array2
As you can see in my code below(PS I have already tried movq in place of leaq) my reversed array is being returned in %rax and I put it into %rcx
This is all fine and dandy because I successfully print out a reversed array below
#PRINT Inverted ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
leaq (%rcx), %rsi #put reversed array into rsi
call printQArray
movq $0, %rax
However once I call the multQuads again I get weird results, I'm confident my reversed array isn't getting moved into the register properly. The original array was a constant and thus simple but I think me pushing all the value's onto the stack and popping them back off in reverse order has changed the structure somehow. Or maybe I have a typo. Source Code below:
.section .rodata
.LC1: .string "Products\n"
.LC3: .string "Elements in QArray1\n"
.LC4: .string "%i\n"
.LC5: .string "\n"
.data
sizeQArrays:
.quad 6
QArray1:
.quad 10
.quad 25
.quad 33
.quad 48
.quad 125
.quad 550
QArray2:
.quad 20
.quad -37
.quad 42
.quad -61
.quad 75
.quad 117
.globl main
.type main, #function
.globl printQArray
.type printQArray, #function
.globl multQuads
.type multQuads, #function
.globl invertArray
.type invertArray, #function
.text
main:
pushq %rbp #stack housekeeping
movq %rsp, %rbp
#order of calls: quad print invert print quad
#MULTQUADS void multQuads(int size, long *array1, long *array2)
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq $QArray1, %rsi #2nd Param
movq $QArray2, %rdx #3rd Param
call multQuads
movq $0, %rax
#PRINT ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
movq $QArray1, %rsi #address of QArray1 to %rsi (parameter 2)
#purposely not pushing anything because I have not put anything in registers
#except parameters and I will be putting new values there after return
call printQArray
movq $0, %rax
#InvertArray void invertArray(long size, long *array1)
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq $QArray1, %rsi #2nd Param
call invertArray
leaq (%rax), %rcx #put inverted array into %rcx
movq $0, %rax #set %rax back to 0
#PRINT Inverted ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
movq %rcx, %rsi #put reversed array into rsi
call printQArray
movq $0, %rax
#MULTQUADS W/ REVERSED ARRAY void multQuads(int size, long *array1, long *array2);
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq %rcx, %rsi #inversed array as 2nd param
movq $QArray2, %rdx #3rd Param
call multQuads
movq $0, %rax
#END of main
leave
ret
.size main, .-main
#printQArray prints an array of 8 byte values
# the size of the array is passed in %rdi,
# a pointer to the beginning of the array is passed in %rsi
printQArray:
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %r13
pushq %rbx
movq %rdi, %r12 #copy size to %r12
movq %rsi, %r13 #copy array pointer to %r13
# print array title
movq $.LC3, %rdi
movq $0, %rax
# purposely not pushing any caller save registers.
callq printf
movq $0, %rbx #array index
printQArrayLoop:
movq (%r13, %rbx, 8), %rsi #element of array in 2nd parameter register
movq $.LC4, %rdi #format literal in 1st parameter register
movq $0, %rax
#purposely not pushing any caller save registers
callq printf
incq %rbx #increment index
decq %r12 #decrement count
jle printQArrayExit
jmp printQArrayLoop
printQArrayExit:
# print final \n
movq $.LC5, %rdi #parameter 1
movq $0, %rax
call printf
popq %rbx
popq %r13
popq %r12
leave
ret
.size printQArray, .-printQArray
multQuads:
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %r13
pushq %r14
pushq %rbx
movq %rdi, %r12 #copy size to %r12
movq %rsi, %r13 #copy array1 pointer to %r13
movq %rdx, %r14 #copy array2 pointer to %r14
# print "Products"
movq $.LC1, %rdi
movq $0, %rax
call printf
movq $0, %rbx #array index
multQuadLoop:
movq (%r13, %rbx, 8), %rsi #element of array in 2nd parameter register
movq (%r14, %rbx, 8), %rdx #element of array in 3rd parameter register
movq $.LC4, %rdi #format literal in 1st parameter register
imulq %rdx, %rsi #insert product into second parameter
movq $0, %rax
callq printf
incq %rbx #increment index
decq %r12 #decrement count
jle multQuadExit
jmp multQuadLoop
multQuadExit:
# print final \n
movq $.LC5, %rdi #parameter 1
movq $0, %rax
call printf
popq %rbx
popq %r13
popq %r12
popq %r14
leave
ret
.size multQuad, .-multQuad
invertArray:
pushq %rbp
movq %rsp, %rbp
pushq %r12 #size
pushq %r13 #array pointer
pushq %rbx #array index
pushq %r9 #holder
pushq %r10 #holder
push %r14
movq %rdi, %r12 #copy size to %r12
movq %rdi, %r9
movq %rsi, %r13 #copy array pointer to %r13
movq $0, %rbx #array index
movq $0, %r10
invertArrayLoop:
pushq (%r13, %rbx, 8) #push elements of array onto stack
incq %rbx #increment index
decq %r12 #decrement count
jle reverseArray
jmp invertArrayLoop
reverseArray:
popq %r14
movq %r14, (%r13, %r10, 8)
incq %r10
decq %r9
subq %r12, %r9
jle invertArrayExit
jmp reverseArray
invertArrayExit:
movq %r13, %rax
popq %r14
popq %r10
popq %r9
popq %rbx
popq %r13
popq %r12
leave
ret
.size invertArray, .-invertArray
If the multQuad function works the 1st time and I can print out the reversed array properly then I imagine the problem must be right before im calling multQuad and setting the registers
I was losing the array in printQArray
It was just one line!!
Following's are part of three different .s files. The .c file has been compiled with three different options:
-fno-inline -fstack-protector-strong,
-fno-inline -fsanitize=address,
-fno-inline -fno-stack-protector -zexecstack.
The followings are the contents of .s files:
handle_read:
.LFB20:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
movq %rsi, %r12
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
movq 8(%rdi), %rbx
movq %rdi, %rbp
movq 160(%rbx), %rsi
movq 152(%rbx), %rdx
cmpq %rdx, %rsi
jb .L394
cmpq $5000, %rdx
jbe .L421
handle_read:
.LASANPC20:
.LFB20:
.cfi_startproc
pushq %r15
.cfi_def_cfa_offset 16
.cfi_offset 15, -16
pushq %r14
.cfi_def_cfa_offset 24
.cfi_offset 14, -24
pushq %r13
.cfi_def_cfa_offset 32
.cfi_offset 13, -32
pushq %r12
.cfi_def_cfa_offset 40
.cfi_offset 12, -40
pushq %rbp
.cfi_def_cfa_offset 48
.cfi_offset 6, -48
movq %rdi, %rbp
addq $8, %rdi
pushq %rbx
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
movq %rdi, %rax
shrq $3, %rax
subq $24, %rsp
.cfi_def_cfa_offset 80
cmpb $0, 2147450880(%rax)
jne .L1170
movq 8(%rbp), %rbx
leaq 160(%rbx), %r13
movq %r13, %r15
shrq $3, %r15
cmpb $0, 2147450880(%r15)
jne .L1171
leaq 152(%rbx), %r14
movq %rsi, %r12
movq 160(%rbx), %rsi
movq %r14, %rax
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1172
movq 152(%rbx), %rdx
leaq 144(%rbx), %rcx
cmpq %rdx, %rsi
jb .L1054
cmpq $5000, %rdx
jbe .L1055
movl $httpd_err400form, %eax
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1173
movl $httpd_err400title, %eax
movq httpd_err400form(%rip), %r8
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1174
handle_read:
.LFB20:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
movq %rsi, %r12
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
movq 8(%rdi), %rbx
movq %rdi, %rbp
movq 160(%rbx), %rsi
movq 152(%rbx), %rdx
cmpq %rdx, %rsi
jb .L384
cmpq $5000, %rdx
jbe .L411
Can anyone tell me how these codes prevent buffer overrun?
Your handle_read function doesn't end up allocating anything on the stack so there's nothing for -fstack-protector-strong to protect, and so this option makes no difference. The -zexecstack option sets a flag in the generated executable, telling the operating system that it should allow code stored in the stack to be executed. It has no effect on the generated assembly.
Only -fsanitize=address option has an effect that shows up in the generated assembly output you've posted. It's responsible for the shrq $3, rXX; cmp $0, 2147450880(%rXX); jne .LXXXX sequences that appear in your second block of generated assembly. These instructions look up each address in memory that function accesses in a "shadow memory" table. The table records which locations are have been allocated and which haven't. If the inserted code detects that the program is attempting to access a memory location that hasn't be allocated yet it will cause program to exit with an error message.
For more details on how the shadow memory table works, and how AddressSanitizer works in general, you can read the author's Usenix paper AddressSanitizer: A Fast Address Sanity Checker.