Following's are part of three different .s files. The .c file has been compiled with three different options:
-fno-inline -fstack-protector-strong,
-fno-inline -fsanitize=address,
-fno-inline -fno-stack-protector -zexecstack.
The followings are the contents of .s files:
handle_read:
.LFB20:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
movq %rsi, %r12
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
movq 8(%rdi), %rbx
movq %rdi, %rbp
movq 160(%rbx), %rsi
movq 152(%rbx), %rdx
cmpq %rdx, %rsi
jb .L394
cmpq $5000, %rdx
jbe .L421
handle_read:
.LASANPC20:
.LFB20:
.cfi_startproc
pushq %r15
.cfi_def_cfa_offset 16
.cfi_offset 15, -16
pushq %r14
.cfi_def_cfa_offset 24
.cfi_offset 14, -24
pushq %r13
.cfi_def_cfa_offset 32
.cfi_offset 13, -32
pushq %r12
.cfi_def_cfa_offset 40
.cfi_offset 12, -40
pushq %rbp
.cfi_def_cfa_offset 48
.cfi_offset 6, -48
movq %rdi, %rbp
addq $8, %rdi
pushq %rbx
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
movq %rdi, %rax
shrq $3, %rax
subq $24, %rsp
.cfi_def_cfa_offset 80
cmpb $0, 2147450880(%rax)
jne .L1170
movq 8(%rbp), %rbx
leaq 160(%rbx), %r13
movq %r13, %r15
shrq $3, %r15
cmpb $0, 2147450880(%r15)
jne .L1171
leaq 152(%rbx), %r14
movq %rsi, %r12
movq 160(%rbx), %rsi
movq %r14, %rax
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1172
movq 152(%rbx), %rdx
leaq 144(%rbx), %rcx
cmpq %rdx, %rsi
jb .L1054
cmpq $5000, %rdx
jbe .L1055
movl $httpd_err400form, %eax
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1173
movl $httpd_err400title, %eax
movq httpd_err400form(%rip), %r8
shrq $3, %rax
cmpb $0, 2147450880(%rax)
jne .L1174
handle_read:
.LFB20:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
movq %rsi, %r12
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
movq 8(%rdi), %rbx
movq %rdi, %rbp
movq 160(%rbx), %rsi
movq 152(%rbx), %rdx
cmpq %rdx, %rsi
jb .L384
cmpq $5000, %rdx
jbe .L411
Can anyone tell me how these codes prevent buffer overrun?
Your handle_read function doesn't end up allocating anything on the stack so there's nothing for -fstack-protector-strong to protect, and so this option makes no difference. The -zexecstack option sets a flag in the generated executable, telling the operating system that it should allow code stored in the stack to be executed. It has no effect on the generated assembly.
Only -fsanitize=address option has an effect that shows up in the generated assembly output you've posted. It's responsible for the shrq $3, rXX; cmp $0, 2147450880(%rXX); jne .LXXXX sequences that appear in your second block of generated assembly. These instructions look up each address in memory that function accesses in a "shadow memory" table. The table records which locations are have been allocated and which haven't. If the inserted code detects that the program is attempting to access a memory location that hasn't be allocated yet it will cause program to exit with an error message.
For more details on how the shadow memory table works, and how AddressSanitizer works in general, you can read the author's Usenix paper AddressSanitizer: A Fast Address Sanity Checker.
Related
I have the C code:
long fib(long n) {
if (n < 2) return 1;
return fib(n-1) + fib(n-2);
}
int main(int argc, char** argv) {
return 0;
}
which I compiled by running gcc -O0 -fno-optimize-sibling-calls -S file.c yielding assembly code that has not been optimized:
.file "long.c"
.text
.globl fib
.type fib, #function
fib:
.LFB5:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
.cfi_offset 3, -24
movq %rdi, -24(%rbp)
cmpq $1, -24(%rbp)
jg .L2
movl $1, %eax
jmp .L3
.L2:
movq -24(%rbp), %rax
subq $1, %rax
movq %rax, %rdi
call fib
movq %rax, %rbx
movq -24(%rbp), %rax
subq $2, %rax
movq %rax, %rdi
call fib
addq %rbx, %rax
.L3:
addq $24, %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE5:
.size fib, .-fib
.globl main
.type main, #function
main:
.LFB6:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE6:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
My question is:
Why do we decrement the stack pointer by 24, subq $24, %rsp? As I see it, we store one element only, first argument n in %rdi, on the stack after the initial two pushes. So why don't we just decrement the stack pointer by 8 and then move n to -8(%rbp)? So
subq $8, %rsp
movq %rdi, -8(%rbp)
GCC does not fully optimize with -O0, not even its stack use. (This may aid in debugging by making some of its use of the stack more transparent to humans. For example, objects a, b, and c may share a single stack location if their active lifetimes (defined by uses in the program, not by the model of lifetime in the C standard) with -O3, but may have separately reserved places in the stack with -O0, and that makes it easier for a human to see where a, b, and c are used in the assembly code. The wasted 16 bytes may be a side effect of this, as those spaces may be reserved for some purpose that this small function did not happen to use, such as space to save certain registers if needed.)
Changing optimization to -O3 results in GCC subtracting only eight from the stack pointer.
I know that OS X is 16 byte stack align, but I don't really understand why it is causing an error here.
All I am doing here is to pass an object size (which is 24) to %rdi, and call malloc. Does this error mean I have to ask for 32 bytes ?
And the error message is:
libdyld.dylib`stack_not_16_byte_aligned_error:
-> 0x7fffc12da2fa <+0>: movdqa %xmm0, (%rsp)
0x7fffc12da2ff <+5>: int3
libdyld.dylib`_dyld_func_lookup:
0x7fffc12da300 <+0>: pushq %rbp
0x7fffc12da301 <+1>: movq %rsp, %rbp
Here is the code:
Object_copy:
pushq %rbp
movq %rbp, %rsp
subq $8, %rsp
movq %rdi, 8(%rsp) # save self address
movq obj_size(%rdi), %rax # get object size
imul $8, %rax
movq %rax, %rdi
callq _malloc <------------------- error in this call
# rsi old object address
# rax new object address
# rdi object size, mutiple of 8
# rcx temp reg
# copy object tag
movq 0(%rsi), %rcx
movq %rcx, 0(%rax)
# set rdx to counter, starting from 8
movq $8, %rdx
# add 8 to object size, since we are starting from 8
addq $8, %rdi
start_loop:
cmpq %rdx, %rdi
jle end_loop
movq (%rdx, %rsi, 1), %rcx
movq %rcx, (%rdx, %rax, 1)
addq $8, %rdx
jmp start_loop
end_loop:
leave
ret
Main_protoObj:
.quad 5 ; object tag
.quad 3 ; object size
.quad Main_dispatch_table ; dispatch table
_main:
leaq Main_protoObj(%rip), %rdi
callq Object_copy # copy main proto object
subq $8, %rsp # save the main object on the stack
movq %rax, 8(%rsp)
movq %rax, %rdi # set rdi point to SELF
callq Main_init
callq Main_main
addq $8, %rsp # restore stack
leaq _term_msg(%rip), %rax
callq _print_string
Like you said, MacOS X has a 16 byte stack alignment, which means that the machine expects each variable on the stack to start on a byte that is a multiple of 16 from the current stack pointer.
When the stack is misaligned, it means we start trying to read variables from the middle of that 16 byte window and usually end up with a segmentation fault.
Before you call a routine in your code, you need to make sure that your stack is aligned correctly; in this case, meaning that the base pointer register is divisible by 16.
subq $8, %rsp # stack is misaligned by 8 bytes
movq %rdi, 8(%rsp) #
movq obj_size(%rdi), %rax #
imul $8, %rax #
movq %rax, %rdi #
callq _malloc # stack is still misaligned when this is called
To fix this, you can subq the %rsp by something like 16 instead of 8.
subq $16, %rsp # stack is still aligned
movq %rdi, 16(%rsp) #
... #
callq _malloc # stack is still aligned when this is called, good
I'm writing a program in Assembly that has has 2 arrays declared at the beginning and 3 functions, which are:
printQArray(int size, long *array1)
invertArray(int size, long *array1)
multQuad(int size, long *array1, long *array2)
Now the program takes these arrays and prints the products of the 2 arrays for each corresponding positions and prints them.
Then it prints Array1.
Then it prints Array1 Reversed.
Then it should take the reversed array and call the multiplication function again and print the product of the positions of 1st array reversed and the 2nd array which never changes.(Array values in source code)
I'm having problems after I reverse the array and attempt to multiply the reversed 1st array and 2nd array.
The following is the output of my program
Products
200
-925
1386
-2928
9375
64350
Elements in QArray1
10
25
33
48
125
550
Elements in QArray1
550
125
48
33
25
10
Products
0
-1036
-31584
44896
0
0
So this last output is clearly not the products of array1 reversed and array2
As you can see in my code below(PS I have already tried movq in place of leaq) my reversed array is being returned in %rax and I put it into %rcx
This is all fine and dandy because I successfully print out a reversed array below
#PRINT Inverted ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
leaq (%rcx), %rsi #put reversed array into rsi
call printQArray
movq $0, %rax
However once I call the multQuads again I get weird results, I'm confident my reversed array isn't getting moved into the register properly. The original array was a constant and thus simple but I think me pushing all the value's onto the stack and popping them back off in reverse order has changed the structure somehow. Or maybe I have a typo. Source Code below:
.section .rodata
.LC1: .string "Products\n"
.LC3: .string "Elements in QArray1\n"
.LC4: .string "%i\n"
.LC5: .string "\n"
.data
sizeQArrays:
.quad 6
QArray1:
.quad 10
.quad 25
.quad 33
.quad 48
.quad 125
.quad 550
QArray2:
.quad 20
.quad -37
.quad 42
.quad -61
.quad 75
.quad 117
.globl main
.type main, #function
.globl printQArray
.type printQArray, #function
.globl multQuads
.type multQuads, #function
.globl invertArray
.type invertArray, #function
.text
main:
pushq %rbp #stack housekeeping
movq %rsp, %rbp
#order of calls: quad print invert print quad
#MULTQUADS void multQuads(int size, long *array1, long *array2)
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq $QArray1, %rsi #2nd Param
movq $QArray2, %rdx #3rd Param
call multQuads
movq $0, %rax
#PRINT ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
movq $QArray1, %rsi #address of QArray1 to %rsi (parameter 2)
#purposely not pushing anything because I have not put anything in registers
#except parameters and I will be putting new values there after return
call printQArray
movq $0, %rax
#InvertArray void invertArray(long size, long *array1)
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq $QArray1, %rsi #2nd Param
call invertArray
leaq (%rax), %rcx #put inverted array into %rcx
movq $0, %rax #set %rax back to 0
#PRINT Inverted ARRAY1 void printArray(int size, long *array1);
movq $sizeQArrays, %rax
movq (%rax), %rdi #sizeQArrays to %rdi (parameter 1)
movq %rcx, %rsi #put reversed array into rsi
call printQArray
movq $0, %rax
#MULTQUADS W/ REVERSED ARRAY void multQuads(int size, long *array1, long *array2);
movq $sizeQArrays, %rax
movq (%rax), %rdi #1st param
movq %rcx, %rsi #inversed array as 2nd param
movq $QArray2, %rdx #3rd Param
call multQuads
movq $0, %rax
#END of main
leave
ret
.size main, .-main
#printQArray prints an array of 8 byte values
# the size of the array is passed in %rdi,
# a pointer to the beginning of the array is passed in %rsi
printQArray:
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %r13
pushq %rbx
movq %rdi, %r12 #copy size to %r12
movq %rsi, %r13 #copy array pointer to %r13
# print array title
movq $.LC3, %rdi
movq $0, %rax
# purposely not pushing any caller save registers.
callq printf
movq $0, %rbx #array index
printQArrayLoop:
movq (%r13, %rbx, 8), %rsi #element of array in 2nd parameter register
movq $.LC4, %rdi #format literal in 1st parameter register
movq $0, %rax
#purposely not pushing any caller save registers
callq printf
incq %rbx #increment index
decq %r12 #decrement count
jle printQArrayExit
jmp printQArrayLoop
printQArrayExit:
# print final \n
movq $.LC5, %rdi #parameter 1
movq $0, %rax
call printf
popq %rbx
popq %r13
popq %r12
leave
ret
.size printQArray, .-printQArray
multQuads:
pushq %rbp
movq %rsp, %rbp
pushq %r12
pushq %r13
pushq %r14
pushq %rbx
movq %rdi, %r12 #copy size to %r12
movq %rsi, %r13 #copy array1 pointer to %r13
movq %rdx, %r14 #copy array2 pointer to %r14
# print "Products"
movq $.LC1, %rdi
movq $0, %rax
call printf
movq $0, %rbx #array index
multQuadLoop:
movq (%r13, %rbx, 8), %rsi #element of array in 2nd parameter register
movq (%r14, %rbx, 8), %rdx #element of array in 3rd parameter register
movq $.LC4, %rdi #format literal in 1st parameter register
imulq %rdx, %rsi #insert product into second parameter
movq $0, %rax
callq printf
incq %rbx #increment index
decq %r12 #decrement count
jle multQuadExit
jmp multQuadLoop
multQuadExit:
# print final \n
movq $.LC5, %rdi #parameter 1
movq $0, %rax
call printf
popq %rbx
popq %r13
popq %r12
popq %r14
leave
ret
.size multQuad, .-multQuad
invertArray:
pushq %rbp
movq %rsp, %rbp
pushq %r12 #size
pushq %r13 #array pointer
pushq %rbx #array index
pushq %r9 #holder
pushq %r10 #holder
push %r14
movq %rdi, %r12 #copy size to %r12
movq %rdi, %r9
movq %rsi, %r13 #copy array pointer to %r13
movq $0, %rbx #array index
movq $0, %r10
invertArrayLoop:
pushq (%r13, %rbx, 8) #push elements of array onto stack
incq %rbx #increment index
decq %r12 #decrement count
jle reverseArray
jmp invertArrayLoop
reverseArray:
popq %r14
movq %r14, (%r13, %r10, 8)
incq %r10
decq %r9
subq %r12, %r9
jle invertArrayExit
jmp reverseArray
invertArrayExit:
movq %r13, %rax
popq %r14
popq %r10
popq %r9
popq %rbx
popq %r13
popq %r12
leave
ret
.size invertArray, .-invertArray
If the multQuad function works the 1st time and I can print out the reversed array properly then I imagine the problem must be right before im calling multQuad and setting the registers
I was losing the array in printQArray
It was just one line!!
This is strange. So in the following code when I comment the pushq %r15 (line 9) and its counterpart popq %r15 (line 23) the code works but gives a segmentation fault otherwise.
The _initialize function is a c function that does some mallocs inside. It works fine when there is no push/pop %r15.
Any idea as to what could be going on?
1 .globl _main
2 _main:
3 pushq %rbp
4 movq %rsp, %rbp
5
6 pushq %r13
7 pushq %r14
8 pushq %r12
9 pushq %r15
10 pushq %rbx
11 subq $0, %rsp
12
13
14 movq $16384, %rdi
15 movq $16384, %rsi
16 callq _initialize
17
18
19 movq $0, %rax
20 addq $0, %rsp
21
22 popq %rbx
23 popq %r15
24 popq %r12
25 popq %r14
26 popq %r13
27 popq %rbp
28 retq
In our project we make use of global register variables. In particular, we use %r12, %r13, %r14 for 64-bit and %esi, %edi for 32-bit code.
For example:
register void * my_var asm ("r12");
These global vars are accessed from different modules (.c files).
According to the ABI (http://www.x86-64.org/documentation/abi.pdf), these regs “belong” to the calling function, and the called function is required to preserve their values.
For mingw64, we can see these regs are saved on the stack before any call are made, even if that call doesn't use these regs inside. However, this doesn't occur when we compile using gcc on linux. Has anyone run into this or understand why this may be?
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $40, %rsp
movq %rcx, %rbx
xorl %ecx, %ecx
call my_func
testl %eax, %eax
je .L40
movq 168(%rbx), %rax
addq $40, %rsp
popq %rbx
popq %r12
popq %r13
popq %r14
ret