I encountered the following code in my computer architecture class:
void mystery( long A[], long B[], long n )
{
long i;
for ( i = 0; i < n; i++ ) {
B[i] = A[n-(i+1)];
}
}
And my professor showed the corresponding assembly code GCC generates on an Ubuntu machine and he seems to be confused as well:
mystery:
pushq %rbp
movq %rsp, %rbp
movq %rdi, -24(%rbp)
movq %rsi, -32(%rbp)
movq %rdx, -40(%rbp)
movq $0, -8(%rbp)
jmp .L2
.L3:
movq -8(%rbp), %rax
leaq 0(,%rax,8), %rdx
movq -32(%rbp), %rax
addq %rax, %rdx
movq -8(%rbp), %rax
notq %rax
movq %rax, %rcx
movq -40(%rbp), %rax
addq %rcx, %rax
leaq 0(,%rax,8), %rcx
movq -24(%rbp), %rax
addq %rcx, %rax
movq (%rax), %rax
movq %rax, (%rdx)
addq $1, -8(%rbp)
.L2:
movq -8(%rbp), %rax
cmpq -40(%rbp), %rax
jl .L3
popq %rbp
ret
But I can't understand why the compiler will generate this code. It appears the A, B, and n are pushed onto the stack but the stack pointer %rsp doesn't change its value. Also, -16(%rbp) also seems to be allocated but is never put in a value. Is there any reason GCC will behave this way?
Compiler Explorer (godbolt.org) is a great tool to look at generated assembly from various compilers and with different flags. Here's what g++7 -O2 produces for your code:
mystery(long*, long*, long):
test rdx, rdx
jle .L1
lea rax, [rdi-8+rdx*8]
sub rdi, 8
.L3:
mov rdx, QWORD PTR [rax]
sub rax, 8
add rsi, 8
mov QWORD PTR [rsi-8], rdx
cmp rax, rdi
jne .L3
.L1:
rep ret
To answer your question: compiling with optimizations disabled usually unexpected/less sensible output. "Why?" is a difficult question to answer as this highly depends on how the compiler is implemented.
Here's a screenshot showing a comparison of -O2, -O0 and -Ofast:
Try it out here: https://godbolt.org/g/pQ637a
Related
Below is my assembly for the C code
int main() {
int a = 1;
int b = 2;
return a + b;
}
.globl main
main:
push %rbp
mov %rsp, %rbp
mov $1, %rax
push %rax
mov $2, %rax
push %rax
mov 0(%rbp), %rax
push %rax
mov -8(%rbp), %rax
pop %rcx
add %rcx, %rax
mov %rbp, %rsp
pop %rbp
ret
I am partially sure its the current mapping of C code to ASM. The issue am facing which I observed while running the assembled binary through GDB, is that:
mov -8(%rbp), %rax
doesn't update the value of %rax to 2.
The command I used for assembling the assembly code is:
gcc data/stage_5/valid/multiple_vars.s -o data/stage_5/valid/multiple_vars
I'm doing an x86 assembly project for class and we're supposed to implement a heap of personnel records. The call heap_swap line is giving me trouble. If I uncomment it, it throws a seg fault. However, the heap_swap function works fine no matter how I test it. I've really racked my brain and would appreciate any help anyone can give!
sift_up1:
# ecx = i
# rdx = address to heap
# r9 = address to heap[i]
# rax = offset of id
# r8 = address for heap[i].id_number
# r10d = heap[i].id_number
# r11d = index of parent
# rdx = address for parent id number
# ebx = heap[parent].id_number
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
pushq %rbx #a section to keep track of all the callee saved registers
pushq %rdi #that need to be restored
leaq offset_of_id(%rip), %rax #put the id offset into a register
leaq heap(%rip), %rdx
jmp LOOP_TOP
LOOP_TOP:
cmpl $0, %ecx #Check if i=0, if so jump to exit loop
je EXIT_LOOP
movl $8, %r9d
imull %ecx, %r9d #finding heap[i]
addq (%rdx), %r9
movq %r9, %r8 #r8 contains heap[i]
addq (%rax), %r8 #add id offset, it becomes heap[i].id_number
movl (%r8), %r10d #dereference id_number and place it into r10d
movl %ecx, %r11d #find the index of the parent of i
subl $1, %r11d
shrl $1, %r11d
movl $8, %edi
imull %r11d, %edi
addq (%rdx), %rdi #rdi holds the address of heap[parent]
addq (%rax), %rdi #rdi holds the address of heap[parent].id_number
movl (%rdi), %ebx #ebx holds the heap[parent].id_number
cmpl %ebx, %r10d
jle EXIT_LOOP
pushq %rdx
movq %r11, %rdx #put the indexes in the correct parameter functions
# call heap_swap #call heap_swap
popq %rdx
movl %r11d, %ecx #modify i
jmp LOOP_TOP #jump to loop top
I know that OS X is 16 byte stack align, but I don't really understand why it is causing an error here.
All I am doing here is to pass an object size (which is 24) to %rdi, and call malloc. Does this error mean I have to ask for 32 bytes ?
And the error message is:
libdyld.dylib`stack_not_16_byte_aligned_error:
-> 0x7fffc12da2fa <+0>: movdqa %xmm0, (%rsp)
0x7fffc12da2ff <+5>: int3
libdyld.dylib`_dyld_func_lookup:
0x7fffc12da300 <+0>: pushq %rbp
0x7fffc12da301 <+1>: movq %rsp, %rbp
Here is the code:
Object_copy:
pushq %rbp
movq %rbp, %rsp
subq $8, %rsp
movq %rdi, 8(%rsp) # save self address
movq obj_size(%rdi), %rax # get object size
imul $8, %rax
movq %rax, %rdi
callq _malloc <------------------- error in this call
# rsi old object address
# rax new object address
# rdi object size, mutiple of 8
# rcx temp reg
# copy object tag
movq 0(%rsi), %rcx
movq %rcx, 0(%rax)
# set rdx to counter, starting from 8
movq $8, %rdx
# add 8 to object size, since we are starting from 8
addq $8, %rdi
start_loop:
cmpq %rdx, %rdi
jle end_loop
movq (%rdx, %rsi, 1), %rcx
movq %rcx, (%rdx, %rax, 1)
addq $8, %rdx
jmp start_loop
end_loop:
leave
ret
Main_protoObj:
.quad 5 ; object tag
.quad 3 ; object size
.quad Main_dispatch_table ; dispatch table
_main:
leaq Main_protoObj(%rip), %rdi
callq Object_copy # copy main proto object
subq $8, %rsp # save the main object on the stack
movq %rax, 8(%rsp)
movq %rax, %rdi # set rdi point to SELF
callq Main_init
callq Main_main
addq $8, %rsp # restore stack
leaq _term_msg(%rip), %rax
callq _print_string
Like you said, MacOS X has a 16 byte stack alignment, which means that the machine expects each variable on the stack to start on a byte that is a multiple of 16 from the current stack pointer.
When the stack is misaligned, it means we start trying to read variables from the middle of that 16 byte window and usually end up with a segmentation fault.
Before you call a routine in your code, you need to make sure that your stack is aligned correctly; in this case, meaning that the base pointer register is divisible by 16.
subq $8, %rsp # stack is misaligned by 8 bytes
movq %rdi, 8(%rsp) #
movq obj_size(%rdi), %rax #
imul $8, %rax #
movq %rax, %rdi #
callq _malloc # stack is still misaligned when this is called
To fix this, you can subq the %rsp by something like 16 instead of 8.
subq $16, %rsp # stack is still aligned
movq %rdi, 16(%rsp) #
... #
callq _malloc # stack is still aligned when this is called, good
I'm trying to create a green thread implementation based off this tutorial, However my switch function is giving me a segfault because the code to load the registers is not run at the end of the function. Here is my code:
void ThreadSwitch(Thread in, Thread out) {
if (!out && !in) {
return;
}
if (out) {
// save registers for out
}
if (in) {
SetCurrentThread(in);
mtx_lock(&in->mutex);
uint64_t rsp = in->cpu.rsp;
uint64_t r15 = in->cpu.r15;
uint64_t r14 = in->cpu.r14;
uint64_t r13 = in->cpu.r13;
uint64_t r12 = in->cpu.r12;
uint64_t rbx = in->cpu.rbx;
uint64_t rbp = in->cpu.rbp;
mtx_unlock(&in->mutex);
asm volatile("mov %[rsp], %%rsp\n"
"mov %[r15], %%r15\n"
"mov %[r14], %%r14\n"
"mov %[r13], %%r13\n"
"mov %[r12], %%r12\n"
"mov %[rbx], %%rbx\n"
"mov %[rbp], %%rbp\n" : : [rsp] "r"(rsp), [r15] "r"(r15), [r14] "r"(r14), [r13] "r"(r13), [r12] "r"(r12), [rbx] "r"(rbx), [rbp] "r"(rbp));
}
}
Xcode says that the inline assembly is causing a segfault, but my lldb disassembly looks like this (you can ignore 95% of it, just provided for context):
0x1000f88b4: movq -0x8(%rbp), %rdi
0x1000f88b8: callq 0x1000f83a0 ; SetCurrentThread at thread.cc:21
0x1000f88bd: movq -0x8(%rbp), %rdi
0x1000f88c1: addq $0x50, %rdi
0x1000f88c8: callq 0x1000f7b80 ; mtx_lock at tct.c:106
0x1000f88cd: movq -0x8(%rbp), %rdi
0x1000f88d1: movq (%rdi), %rdi
0x1000f88d4: movq %rdi, -0x18(%rbp)
0x1000f88d8: movq -0x8(%rbp), %rdi
0x1000f88dc: movq 0x8(%rdi), %rdi
0x1000f88e0: movq %rdi, -0x20(%rbp)
0x1000f88e4: movq -0x8(%rbp), %rdi
0x1000f88e8: movq 0x10(%rdi), %rdi
0x1000f88ec: movq %rdi, -0x28(%rbp)
0x1000f88f0: movq -0x8(%rbp), %rdi
0x1000f88f4: movq 0x18(%rdi), %rdi
0x1000f88f8: movq %rdi, -0x30(%rbp)
0x1000f88fc: movq -0x8(%rbp), %rdi
0x1000f8900: movq 0x20(%rdi), %rdi
0x1000f8904: movq %rdi, -0x38(%rbp)
0x1000f8908: movq -0x8(%rbp), %rdi
0x1000f890c: movq 0x28(%rdi), %rdi
0x1000f8910: movq %rdi, -0x40(%rbp)
0x1000f8914: movq -0x8(%rbp), %rdi
0x1000f8918: movq 0x30(%rdi), %rdi
0x1000f891c: movq %rdi, -0x48(%rbp)
0x1000f8920: movq -0x8(%rbp), %rdi
0x1000f8924: addq $0x50, %rdi
0x1000f892b: movl %eax, -0x54(%rbp)
0x1000f892e: callq 0x1000f7de0 ; mtx_unlock at tct.c:264
0x1000f8933: movq -0x18(%rbp), %rdi ; beginning of inline asm
0x1000f8937: movq -0x20(%rbp), %rcx
0x1000f893b: movq -0x28(%rbp), %rdx
0x1000f893f: movq -0x30(%rbp), %rsi
0x1000f8943: movq -0x38(%rbp), %r8
0x1000f8947: movq -0x40(%rbp), %r9
0x1000f894b: movq -0x48(%rbp), %r10
0x1000f894f: movq %rdi, %rsp
0x1000f8952: movq %rcx, %r15
0x1000f8955: movq %rdx, %r14
0x1000f8958: movq %rsi, %r13
0x1000f895b: movq %r8, %r12
0x1000f895e: movq %r9, %rbx
0x1000f8961: movq %r10, %rbp ; end of inline asm
-> 0x1000f8964: movl %eax, -0x58(%rbp)
0x1000f8967: addq $0x60, %rsp
0x1000f896b: popq %rbp
0x1000f896c: retq
The segfault happens when it tries to access stuff back on the stack, which makes sense because it just switched out the stack. But why is the compiler inserting this? The compiler also stores %eax on the stack at 0x1000f892b. Is the compiler opening up a register? Because it doesn't use %rax in the inline asm. Is there a workaround?
This is using Apple LLVM version 6.0 (clang-600.0.57) on OSX 10.10.2, if that's any help.
Thanks in advance.
I strongly advise you not to write programs that depend on undefined behaviour.
Jumps into and out of inline assembly are not permitted as the compiler can't analyse control flow it doesn't know about, upon thread creation you jump into the asm statement from nowhere then leaves it. To avoid these implicit jumps you need to save and restore the registers including %rip in the same asm statement.
All registers that an asm statement alters must be listed as outputs or clobbers, for a thread switch routine that is all the registers whose values are not saved, as they are altered by the other threads. If you do not do so the compiler will incorrectly assume that they are not altered.
An asm statement must avoid overwriting it's inputs before they are used, in your code there is nothing prohibiting the compiler from storing the variable r12 in the register %r14.
Your lock is either pointless or inadequate.
It is much simpler to write your function entirely in assembly, like in tutorial you cite.
I'm trying to write an x86-64 assembly program that is the function "int addarray(int n, int * array)". The first arg is the length of the array, second is a pointer to the array. It's supposed to add up the elements of the array and return. Here is the code I have so far, and I don't know why it doesn't work.
.text
.globl addarray
.type addarray, #function
addarray:
movq $0, %rdx
movq $0, %rax
while:
cmpq %rdx, %rdi
jle afterw
movq %rdx, %rcx
imulq $8, %rcx
addq %rsi, %rcx
addq %rcx, %rax
addq $1, %rdx
jmp while
afterw:
ret
I'm not real familiar with the AT&T syntax, but it looks like your line:
addq %rcx, %rax
Is going to add the value of rcx to rax. You want to add the value that rcx references, that is:
addq (%rcx), %rax
At least, I think that's how it's done in AT&T syntax. In Intel syntax, it would be:
add rax,[rcx]
There are a couple of simple optimizations you can do to speed things up somewhat, but I think the above is the key to your problem.