On mac OS(which has intel inside), I tried to make a simple x86 hybrid program with main module written in C and a function written in x86 assembly language (NASM assembler).
Then, the following function is to reverse the string of the argument.
My C code is
#include <stdio.h>
char *revstring(char *s);
int main(int argc, char* argv[]){
for (int i=1; i<argc; i++){
printf("%s->", argv[i]);
printf("%s\n", revstring(argv[i]));
}
}
Then my assembly code
section .text
global _revstring
_revstring:
push rbp
mov rbp, rsp
mov rax, [rbp+8]
mov rcx, rax
_find_end:
mov dl, [rax]
inc rax
test dl, dl
jnz _find_end
sub rax, 2
_swap:
cmp rax, rcx
jbe _fin
mov dl, [rax]
xchg dl, [rcx]
mov [rax], dl
dec rax
inc rcx
jmp _swap
_fin:
mov rax, [rbp+8]
pop rbp
ret
Or
section .text
global _revstring
_revstring:
push rbp
mov rbp, rsp
mov rax, [rbp+8]
mov rcx, rax
find_end:
mov dl, [rax]
inc rax
test dl, dl
jnz find_end
sub rax, 2
swap:
cmp rax, rcx
jbe fin
mov dl, [rax]
xchg dl, [rcx]
mov [rax], dl
dec rax
inc rcx
jmp swap
fin:
mov rax, [rbp+8]
pop rbp
ret
Currnt MacOS cannot run 32 bit program, so I built the program by using these commands.
cc -m64 -std=c99 -c revs.c
nasm -f macho64 revstring.s
cc -m64 -o revs revs.o revstring.o
But When I enter
./revs abc123
the following error occured.
zsh: bus error ./revs abc123
I cannot find any solutions, so could anyone help me?
Related
Background
I've assumed for a while that gcc will convert while-loops into do-while form. (See Why are loops always compiled into "do...while" style (tail jump)?)
And that -O0 for while-loops...
while (test-expr)
body-statement
..Will generate code on the form of jump-to-middle do-while
goto test;
loop:
body-statement
test:
if (test-expr) goto loop;
And gcc -O2 will generate guarded do while
if (test-expr)
goto done;
loop:
body-statement
if (test-expr) goto loop;
done:
Concrete examples
Here are godbolt examples of functions for which gcc generates the kind of control flow I'm describing above (I use for-loops but a while loop will give the same code).
This simple function...
int sum1(int a[], size_t N) {
int s = 0;
for (size_t i = 0; i < N; i++) {
s += a[i];
}
return s;
}
Will for -O0 generate this jump to middle code
```sum1:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov QWORD PTR [rbp-32], rsi
mov DWORD PTR [rbp-4], 0
mov QWORD PTR [rbp-16], 0
jmp .L2
.L3:
mov rax, QWORD PTR [rbp-16]
lea rdx, [0+rax*4]
mov rax, QWORD PTR [rbp-24]
add rax, rdx
mov eax, DWORD PTR [rax]
add DWORD PTR [rbp-4], eax
add QWORD PTR [rbp-16], 1
.L2:
mov rax, QWORD PTR [rbp-16]
cmp rax, QWORD PTR [rbp-32]
jb .L3
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
Will for -O2 generate this guarded-do code.
sum1:
test rsi, rsi
je .L4
lea rdx, [rdi+rsi*4]
xor eax, eax
.L3:
add eax, DWORD PTR [rdi]
add rdi, 4
cmp rdi, rdx
jne .L3
ret
.L4:
xor eax, eax
ret
My question
What I'm after is hand-wavy rule to apply when looking at -Os loops. I'm more used to looking at -O2 code and now that I'm working in the embedded field where -Os is more prevalent, I'm surprised by the form of loops I see.
It seems that gcc -Og and -Os both generate code as a jmpat a bottom and if() break at the top. Clang on the other hand generated guarded-do-while A godbolt link to gcc and clang output
Here is an example of gcc -Os output for the above function:
sum1:
xor eax, eax
xor r8d, r8d
.L2:
cmp rax, rsi
je .L5
add r8d, DWORD PTR [rdi+rax*4]
inc rax
jmp .L2
.L5:
mov eax, r8d
ret
Am I correct in assuming that gcc -Og and -Os generates code on the form I described above?
Does anyone have a resource that describes the rationale for using while-form for -Og and -Os? Is it by design or an accidental fall-out form the way the optimization passes are organized.
I thought that converting loops into do-while form was part of the early canonicalization done by compilers? How come gcc -O0 generates do-while but gcc -Og gives while-loops? Do that canonicalization only happen when optimization is enabled?
Sidenote: I'm surprised by how much code generated with -Os and -O2 differ given that there aren't many compiler flags that are different. Maybe many passes checks some variable for tradeoff_speed_vs_space.
This question already has answers here:
Why is the address of static variables relative to the Instruction Pointer?
(1 answer)
32-bit absolute addresses no longer allowed in x86-64 Linux?
(1 answer)
Closed 4 years ago.
The C source:
int sum(int a, int b) {
return a + b;
}
int main() {
int (*ptr_sum_1)(int,int) = sum; // assign the address of the "sum"
int (*ptr_sum_2)(int,int) = sum; // to the function pointer
int (*ptr_sum_3)(int,int) = sum;
int a = (*ptr_sum_1)(2,4); // call the "sum" through the pointer
int b = sum(2,4); // call the "sum" by usual way
return 0;
}
The crucial part of the assembly code:
lea rax, sum[rip]
mov QWORD PTR -24[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -16[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -8[rbp], rax
The executing program instructions from GDB:
0x5fa <sum>: push rbp
0x5fb <sum+1>: mov rbp,rsp
0x5fe <sum+4>: mov DWORD PTR [rbp-0x4],edi
0x601 <sum+7>: mov DWORD PTR [rbp-0x8],esi
0x604 <sum+10>: mov edx,DWORD PTR [rbp-0x4]
0x607 <sum+13>: mov eax,DWORD PTR [rbp-0x8]
0x60a <sum+16>: add eax,edx
0x60c <sum+18>: pop rbp
0x60d <sum+19>: ret
0x60e <main>: push rbp
0x60f <main+1>: mov rbp,rsp
0x612 <main+4>: sub rsp,0x20
0x616 <main+8>: lea rax,[rip+0xffffffffffffffdd] # 0x5fa <sum>
0x61d <main+15>: mov QWORD PTR [rbp-0x18],rax
0x621 <main+19>: lea rax,[rip+0xffffffffffffffd2] # 0x5fa <sum>
0x628 <main+26>: mov QWORD PTR [rbp-0x10],rax
0x62c <main+30>: lea rax,[rip+0xffffffffffffffc7] # 0x5fa <sum>
0x633 <main+37>: mov QWORD PTR [rbp-0x8],rax
0x637 <main+41>: mov rax,QWORD PTR [rbp-0x18]
0x63b <main+45>: mov esi,0x4
0x640 <main+50>: mov edi,0x2
0x645 <main+55>: call rax
0x647 <main+57>: mov DWORD PTR [rbp-0x20],eax
0x64a <main+60>: mov esi,0x4
0x64f <main+65>: mov edi,0x2
0x654 <main+70>: call 0x5fa <sum>
0x659 <main+75>: mov DWORD PTR [rbp-0x1c],eax
0x65c <main+78>: mov eax,0x0
0x661 <main+83>: leave
0x662 <main+84>: ret
I think that the sum label is just the starting address of the sum procedure - 0x5fa, so I don't understand why gcc can't use it directly, but uses the calculation sum[rip] for this.
Question:
Why is sum[rip] used in the lea rax, sum[rip] instruction in assembly, instead of the simple sum label, e.g. lea rax, sum?
Will the mov rax, 0x5fa instruction do the same? Because we know the sum address after linking: the call 0x5fa <sum> instruction just uses it directly.
I believe that it might depend on your build of GCC, but on the Linux distributions that I use everything is set up to default to PIC builds. That's Position Independent Code. It's better for both shared libraries and executables, because the result can be mapped into memory anywhere without needing a fixup pass. It's better for security because ASLR can be applied.
With x86-64 there's no significant penalty for using PIC so why wouldn't it be used everywhere?
I am trying to take multiple inputs in a loop and display them after that. The problem is with the first iteration of the loop. The first time while going through the loop it doesn't wait for user input but instead it directly goes to the second iteration as shown on the picture:
My code is here for the loop
section .text
computer_info:
push rbp
mov rbp, rsp
sub rsp, 300
lea rbx, [computers]
add QWORD [counter], 0
.input_computers:
mov rdi, QWORD echo_computer_name
call print_string
call print_nl
call read_string
mov rsi, rax
mov rdi, rbx
mov rcx, 16
rep movsb
add rbx, 16
add QWORD [counter], 1
cmp QWORD [counter], 2
jne .input_computers
mov QWORD [counter], 0
sub rbx, rbx
.display_loop:
mov rdi, computers
add rdi, rbx
call print_string
call print_nl
add rbx, 16
add QWORD [counter], 1
cmp QWORD [counter], 2
jne .display_loop
add rsp, 300
pop rbp
ret
user_info:
push rbp
mov rbp, rsp
sub rsp, 32
call main
add rsp, 32
pop rbp
ret
search:
call main
delete:
call main
main:
; We have these three lines for compatability only
push rbp
mov rbp, rsp
sub rsp,32
mov rdi, QWORD echo_welcome
call print_string
call print_nl
mov rdi, QWORD echo_computer
call print_string
call print_nl
mov rdi, QWORD echo_user
call print_string
call print_nl
mov rdi, QWORD echo_search
call print_string
call print_nl
mov rdi, QWORD echo_delete
call print_string
call print_nl
call print_nl
mov rdi, QWORD echo_selection
call print_string
call read_int
mov rdi, rax
cmp rdi, 1
je computer_info
cmp rdi, 2
je user_info
cmp rdi, 3
je search
cmp rdi, 4
je delete
; and these lines are for compatability
add rsp, 32
pop rbp
ret
I was studying one of my courses when I ran into a specific exercise that I cannot seem to resolve... It is pretty basic because I am VERY new to assembly. So lets begin.
I have a C function
unsigned int func(int *ptr, unsigned int j) {
unsigned int res = j;
int i = ptr[j+1];
for(; i<8; ++i) {
res >>= 1;
}
return res;
}
I translated it with gcc to assembly
.file "func.c"
.intel_syntax noprefix
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
mov eax, DWORD PTR [rbp-28]
mov DWORD PTR [rbp-8], eax
mov eax, DWORD PTR [rbp-28]
add eax, 1
mov eax, eax
lea rdx, [0+rax*4]
mov rax, QWORD PTR [rbp-24]
add rax, rdx
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-4], eax
jmp .L2
.L3:
shr DWORD PTR [rbp-8]
add DWORD PTR [rbp-4], 1
.L2:
cmp DWORD PTR [rbp-4], 7
jle .L3
mov eax, DWORD PTR [rbp-8]
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
The question is as follow. what is the command that place j (variable in the c function) on top of the stack?
I sincerely cannot find out please enlighten me XD.
The variable j is the second parameter for func; it is stored in the register esi in the x86-64 System V ABI calling convention. This instruction mov DWORD PTR [rbp-28], esi put j into the stack.
You can see it very clearly by writing a simple function that calls "func" and compiling it with -O0 (or with -O2 and marking it as noinline, or only providing a prototype so there's nothing for the compiler to inline).
unsigned int func(int *ptr, unsigned int j) {
unsigned int res = j;
int i = ptr[j+1];
for(; i<8; ++i) {
res >>= 1;
}
return res;
}
int main()
{
int a = 1;
int array[10];
func (array, a);
return 0;
}
Using the Godbolt compiler explorer, we can easily get gcc -O0 -fverbose-asm assembly output.
Please focus on the following instructions:
# in main:
...
mov DWORD PTR [rbp-4], 1
mov edx, DWORD PTR [rbp-4]
...
mov esi, edx
...
func(int*, unsigned int):
...
mov DWORD PTR [rbp-28], esi # j, j
...
j, j is a comment added by gcc -fverbose-asm tell you that the source and destination operands are both the C variable j in that instruction.
The full assembly instructions:
func(int*, unsigned int):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
mov eax, DWORD PTR [rbp-28]
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-28]
add eax, 1
mov eax, eax
lea rdx, [0+rax*4]
mov rax, QWORD PTR [rbp-24]
add rax, rdx
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-8], eax
jmp .L2
.L3:
shr DWORD PTR [rbp-4]
add DWORD PTR [rbp-8], 1
.L2:
cmp DWORD PTR [rbp-8], 7
jle .L3
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-4], 1
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-48]
mov esi, edx
mov rdi, rax
call func(int*, unsigned int)
mov eax, 0
leave
ret
Taking into account these instructions
mov eax, DWORD PTR [rbp-28]
add eax, 1
it seems that j is stored at address rbp-28 While ptr is stored at address rbp-24.
These are instructions where the values are stored in the stack
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
It seems the arguments are passed to the function using registers rdi and esi.
Compilers can optimize their calls of functions and use registers instead of the stack to pass arguments of small sizes to functions. Within the functions they can use the stack to temporary store the arguments passed through registers.
Just a suggestion for further explorations on your own. Use gcc -O0 -g2 f.c -Wa,-adhln. It will turn off optimizations and generate assembly code intermixed with the source. It might give you better ideas about what it does.
As an alternative you can use the objdump -Sd f.o on the output '.o' or executable. Just make sure that you add debugging info and turn off optimizations at compilation.
I'm stuck at figuring out to copy the string source to target, which should be initialized to all zeroes. It appears as though I need to find the size of the string, start a counter register, and push stringitem[counter] to the stack, increment counter register. I can't figure out how to even get started, let alone search for a word in the string.
Thanks!
bits 64
global main
extern printf
section .text
main:
; function setup
push rbp
mov rbp, rsp
sub rsp, 32
;
lea rdi, [rel message]
mov al, 0
call printf
;
lea rdi, [rel source]
mov al, 0
call printf
;
;mov edi, source
;mov esi, target
;lea rdi, [esi]
;mov al, 0
;call printf
;mov ecx,sizeof source -1
; mov esi,0
;L1:
; mov eax,source[esi];
; push eax
; inc esi
; loop L1
; function return
mov eax, 0
add rsp, 32
pop rbp
ret
section .data
message: db 'Project',0x0D,0x0a,'Author',0x0D,0x0a,0
source: db 0x0D,0x0a,"I can't figure out how to copy this text to target.",0x0D,0x0a,0
target: db '0000000000000000000000000000000000000000000',0x0D,0x0a,0
For your data memory layout this will do
lea rdi, [rel target]
lea rsi, [rel source]
mov rcx, target-source
cld
rep movsb
Otherwise as Jester said, a simple byte-to-byte copy will also do
lea rdi, [rel target]
lea rsi, [rel source]
cld
.copy:
lodsb
stosb
test al, al
jnz .copy