Understanding these assembly instructions [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm learning assembly by writing C programs and viewing the assembly output. I've included the C program at the bottom for the page to make it easier. I'm struggling to understand one line of assembly:
cdqe
movzx eax, BYTE PTR [rbp-32+rax] <--- what is this doing?
movsx eax, al
So I think cdqe extends eax into rax (64 bits). Its clear that the string I want to print fits into the al register but I don't understand what is happening deep down with rbp-32+rax. Can someone explain for me?
.file "string_manip.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "Hello"
.string ""
.zero 3
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 48
mov rax, QWORD PTR fs:40
mov QWORD PTR [rbp-8], rax
xor eax, eax
mov DWORD PTR [rbp-36], 0
mov eax, DWORD PTR .LC0[rip]
mov DWORD PTR [rbp-32], eax
movzx eax, WORD PTR .LC0[rip+4]
mov WORD PTR [rbp-28], ax
movzx eax, BYTE PTR .LC0[rip+6]
mov BYTE PTR [rbp-26], al
mov WORD PTR [rbp-25], 0
mov BYTE PTR [rbp-23], 0
mov DWORD PTR [rbp-36], 0
jmp .L2
.L3:
mov eax, DWORD PTR [rbp-36]
cdqe
movzx eax, BYTE PTR [rbp-32+rax] <--- what is this doing?
movsx eax, al
mov edi, eax
call putchar
add DWORD PTR [rbp-36], 1
.L2:
cmp DWORD PTR [rbp-36], 5
jle .L3
mov edi, 10
call putchar
mov eax, 0
mov rdx, QWORD PTR [rbp-8]
xor rdx, QWORD PTR fs:40
je .L5
call __stack_chk_fail
.L5:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
#include <string.h>
#include <stdio.h>
int main()
{
int i = 0;
char array[10] = "Hello\0";
for(i=0; i<6; i++)
printf("%c", array[i]);
printf("\n");
return 0;
}

It's just calculating the address of one of the characters.
Presumably your string starts at rbp-32 and then the instruction does the C equivalent of ch = string[rax].
I guess this is unoptimized code, so the compiler does a few extra sign extend and zero extend that are not really needed.

Related

Understanding pointers in assembler from machine's view

Here is a basic program I written on the godbolt compiler, and it's as simple as:
#include<stdio.h>
void main()
{
int a = 10;
int *p = &a;
printf("%d", *p);
}
The results after compilation I get:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 10
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
Question: Pushing the rbp, making the stack frame by making a 16 byte block, how from a register, a value is moved to a stack location and vice versa, how the job of LEA is to figure out the address, I got this part.
Problem:
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
Lea -> getting address of rbp-12 into rax,
then moving the value which is the address of rbp-12 into rax,
but next line again says, move to rax, the value of rbp-8. This seems ambiguous. Then again moving the value of rax to eax. I don't understand the amount of work here. Why couldn't I have done
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov eax, QWORD PTR [rbp-8]
and be done with it? coz on the original line, rbp-12's address is stored onto rax, then rax stored to rbp-8. then rbp-8 stored again into rax, and then again rax is stored into eax? couldn't we have just copied the rbp-8 directly to eax? i guess not. But my question is why?
I know there is de-referencing in pointers, so How LEA helps grabbing the address of rbp-12, I understand, but on the next parts, when did it went from grabbing values from addresses I completely lost. And also, after that, I didn't understand any of the asm lines.
You're seeing very un-optimized code. Here's my line-by-line interpretation:
.LC0:
.string "%d" ; Format string for printf
main:
push rbp ; Save original base pointer
mov rbp, rsp ; Set base pointer to beginning of stack frame
sub rsp, 16 ; Allocate space for stack frame
mov DWORD PTR [rbp-12], 10 ; Initialize variable 'a'
lea rax, [rbp-12] ; Load effective address of 'a'
mov QWORD PTR [rbp-8], rax ; Store address of 'a' in 'p'
mov rax, QWORD PTR [rbp-8] ; Load 'p' into rax (even though it's already there - heh!)
mov eax, DWORD PTR [rax] ; Load 32-bit value of '*p' into eax
mov esi, eax ; Load value to print into esi
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
mov eax, 0 ; Zero out eax (not sure why -- likely printf call protocol)
call printf ; Make the printf call
nop ; No-op (not sure why)
leave ; Remove the stack frame
ret ; Return
Compilers, when not optimizing, generate code like this as they parse the code you gave them. It's doing a lot of unnecessary stuff, but it is quicker to generate and makes using a debugger easier.
Compare this with the optimized code (-O2):
.LC0:
.string "%d" ; Format string for printf
main:
mov esi, 10 ; Don't need those variables -- just a 10 to pass to printf!
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
xor eax, eax ; It's a few cycles faster to xor a register with itself than to load an immediate 0
jmp printf ; Just jmp to printf -- it will handle the return
The optimizer found that the variables weren't necessary, so no stack frame is created. Nothing is left but the printf call! And that's done as a jmp since nothing else need be done here when the printf is complete.

Converting a C program to inline Assembly?

I am trying to convert the following C program to inline assembly:
#include <stdio.h>
#include <string.h>
int main()
{
int counter = 0;
int input = 0;
char name[20];
printf("Please input an integer value 1 - 99: ");
scanf("%d", &input);
printf("You entered: %d\n", input);
if (input <= 0)
{
printf("Error, invalid input \n");
}
if (input > 0)
{
printf("Please input name \n");
scanf("%s", name);
for (int i = 1; i <=input; i++)
{
printf("Your name is %s\n", name);
printf("%d\n", i);
}
}
return 0;
}
Here is my attempt at the inline:
.LC0:
.string "please input an interger value 1-99"
.LC1:
.string "%d"
.LC2:
.string "you entered %d"
.LC3:
.string "Error invalid input"
.LC4:
.string "please input name"
.LC5:
.string "%s"
.LC6:
.string "Your name is %s"
main:
push rbp
mov rbp, rsp
sub rsp, 32
mov DWORD PTR [rbp-8], 0
mov DWORD PTR [rbp-12], 0
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
lea rax, [rbp-12]
mov rsi, rax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call scanf
mov eax, DWORD PTR [rbp-12]
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
mov eax, DWORD PTR [rbp-12]
test eax, eax
jg .L2
mov edi, OFFSET FLAT:.LC3
mov eax, 0
call printf
.L2:
mov eax, DWORD PTR [rbp-12]
test eax, eax
jle .L3
mov edi, OFFSET FLAT:.LC4
mov eax, 0
call printf
lea rax, [rbp-32]
mov rsi, rax
mov edi, OFFSET FLAT:.LC5
mov eax, 0
call scanf
mov DWORD PTR [rbp-4], 1
jmp .L4
.L5:
lea rax, [rbp-32]
mov rsi, rax
mov edi, OFFSET FLAT:.LC6
mov eax, 0
call printf
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
add DWORD PTR [rbp-4], 1
.L4:
mov eax, DWORD PTR [rbp-12]
cmp DWORD PTR [rbp-4], eax
jle .L5
.L3:
mov eax, 0
leave
ret
So I guess my question is, am I on the right track? I find this stuff extremely confusing. It doesn't help that all my instructor sais is that "I am on the right track" so any help would be much appreciated! (Whenever I run this I get an error on the first line complaining about the ".")
Thank you guys! Figured out that the inline was unnecessary and that we were supposed to be doing regular assembly! Really appreciate the help!

assembly output + questions about stacks

I was studying one of my courses when I ran into a specific exercise that I cannot seem to resolve... It is pretty basic because I am VERY new to assembly. So lets begin.
I have a C function
unsigned int func(int *ptr, unsigned int j) {
unsigned int res = j;
int i = ptr[j+1];
for(; i<8; ++i) {
res >>= 1;
}
return res;
}
I translated it with gcc to assembly
.file "func.c"
.intel_syntax noprefix
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
mov eax, DWORD PTR [rbp-28]
mov DWORD PTR [rbp-8], eax
mov eax, DWORD PTR [rbp-28]
add eax, 1
mov eax, eax
lea rdx, [0+rax*4]
mov rax, QWORD PTR [rbp-24]
add rax, rdx
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-4], eax
jmp .L2
.L3:
shr DWORD PTR [rbp-8]
add DWORD PTR [rbp-4], 1
.L2:
cmp DWORD PTR [rbp-4], 7
jle .L3
mov eax, DWORD PTR [rbp-8]
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
The question is as follow. what is the command that place j (variable in the c function) on top of the stack?
I sincerely cannot find out please enlighten me XD.
The variable j is the second parameter for func; it is stored in the register esi in the x86-64 System V ABI calling convention. This instruction mov DWORD PTR [rbp-28], esi put j into the stack.
You can see it very clearly by writing a simple function that calls "func" and compiling it with -O0 (or with -O2 and marking it as noinline, or only providing a prototype so there's nothing for the compiler to inline).
unsigned int func(int *ptr, unsigned int j) {
unsigned int res = j;
int i = ptr[j+1];
for(; i<8; ++i) {
res >>= 1;
}
return res;
}
int main()
{
int a = 1;
int array[10];
func (array, a);
return 0;
}
Using the Godbolt compiler explorer, we can easily get gcc -O0 -fverbose-asm assembly output.
Please focus on the following instructions:
# in main:
...
mov DWORD PTR [rbp-4], 1
mov edx, DWORD PTR [rbp-4]
...
mov esi, edx
...
func(int*, unsigned int):
...
mov DWORD PTR [rbp-28], esi # j, j
...
j, j is a comment added by gcc -fverbose-asm tell you that the source and destination operands are both the C variable j in that instruction.
The full assembly instructions:
func(int*, unsigned int):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
mov eax, DWORD PTR [rbp-28]
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-28]
add eax, 1
mov eax, eax
lea rdx, [0+rax*4]
mov rax, QWORD PTR [rbp-24]
add rax, rdx
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-8], eax
jmp .L2
.L3:
shr DWORD PTR [rbp-4]
add DWORD PTR [rbp-8], 1
.L2:
cmp DWORD PTR [rbp-8], 7
jle .L3
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-4], 1
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-48]
mov esi, edx
mov rdi, rax
call func(int*, unsigned int)
mov eax, 0
leave
ret
Taking into account these instructions
mov eax, DWORD PTR [rbp-28]
add eax, 1
it seems that j is stored at address rbp-28 While ptr is stored at address rbp-24.
These are instructions where the values are stored in the stack
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
It seems the arguments are passed to the function using registers rdi and esi.
Compilers can optimize their calls of functions and use registers instead of the stack to pass arguments of small sizes to functions. Within the functions they can use the stack to temporary store the arguments passed through registers.
Just a suggestion for further explorations on your own. Use gcc -O0 -g2 f.c -Wa,-adhln. It will turn off optimizations and generate assembly code intermixed with the source. It might give you better ideas about what it does.
As an alternative you can use the objdump -Sd f.o on the output '.o' or executable. Just make sure that you add debugging info and turn off optimizations at compilation.

GCC vs CLANG pointer to char* optimization

I have this code:
mainP.c:
int main(int c, char **v){
char *s = v[0];
while (*s++ != 0) {
if ((*s == 'a') && (*s != 'b')) {
return 1;
}
}
return 0;
}
which I compile with clang and gcc generating the assembly code to compare the optimizations:
clang-3.9 -S -masm=intel -O3 mainP.c
gcc -S -masm=intel -O3 mainP.c
The compiler version are:
clang version 3.9.1-9 (tags/RELEASE_391/rc2)
Target: x86_64-pc-linux-gnu
gcc (Debian 6.3.0-18) 6.3.0 20170516
The 2 resulting assembly codes are:
gcc assembly code:
main:
.LFB0:
.cfi_startproc
mov rax, QWORD PTR [rsi]
jmp .L2
.p2align 4,,10
.p2align 3
.L4:
cmp BYTE PTR [rax], 97
je .L5
.L2:
add rax, 1
cmp BYTE PTR -1[rax], 0
jne .L4
xor eax, eax
ret
.L5:
mov eax, 1
ret
clang assembly code:
main: # #main
.cfi_startproc
# BB#0:
mov rcx, qword ptr [rsi]
mov dl, byte ptr [rcx]
inc rcx
.p2align 4, 0x90
.LBB0_1: # =>This Inner Loop Header: Depth=1
xor eax, eax
test dl, dl
je .LBB0_3
# BB#2: # in Loop: Header=BB0_1 Depth=1
movzx edx, byte ptr [rcx]
inc rcx
mov eax, 1
cmp dl, 97
jne .LBB0_1
.LBB0_3:
ret
I notice this: in the gcc assembly code, *s is accessed twice in the loop while *s is accessed only once the clang assembly code.
is there an explanation for the difference?
Then after changing the C code a bit (adding a local char variable), I get about same the assembly code with GCC:
int main(int c, char **v){
char *s = v[0];
char ch;
ch = *s;
while (ch != 0) {
if ((ch == 'a') && (ch != 'b')) {
return 1;
}
ch = *s++;
}
return 0;
}
Resulting assembly code with GCC:
main:
.LFB0:
.cfi_startproc
mov rax, QWORD PTR [rsi]
movzx edx, BYTE PTR [rax]
test dl, dl
je .L6
add rax, 1
cmp dl, 97
jne .L5
jmp .L8
.p2align 4,,10
.p2align 3
.L4:
cmp dl, 97
je .L8
.L5:
add rax, 1
movzx edx, BYTE PTR -1[rax]
test dl, dl
jne .L4
.L6:
xor eax, eax
ret
.L8:
mov eax, 1
ret
The explanation for the difference: compiler optimizations are hard to do, and different compilers will optimize your code in different ways.
Because the expression is relatively simple, we can assume that *s is the same value in both places and only one load is really needed.
But figuring out that optimization is tricky, because the compiler has to absolutely "know" that "whatever s points to" can't be changed between the first reference and the second.
In your example, clang is better at figuring that out than gcc is.

incrementing struct members

Say I have a struct defined as follows
struct my_struct
{
int num;
};
....
Here I have a pointer to my_struct and I want to do an increment on num
void foo(struct my_struct* my_ptr)
{
// increment num
// method #1
my_ptr->num++;
// method #2
++(my_ptr->num);
// method #3
my_ptr->++num;
}
Do these 3 ways of incrementing num do the same thing?
While we're at it, is it true that pre-increment is more efficient than post-increment?
Thanks!
First two will have the same effect (when on a line on their own like that), but the third method isn't valid C code (you can't put the ++ there).
As for efficiency, there is no difference. The difference you may have heard people talking about is when, in C++, you increment a non-pointer data type, such as an iterator. In some cases, pre-increment can be faster there.
You can see the generated code using GCC Explorer.
void foo(struct my_struct* my_ptr)
{
my_ptr->num++;
}
void bar(struct my_struct* my_ptr)
{
++(my_ptr->num);
}
Output:
foo(my_struct*): # #foo(my_struct*)
incl (%rdi)
ret
bar(my_struct*): # #bar(my_struct*)
incl (%rdi)
ret
As you can see, there's no difference whatsoever.
The only possible difference between the first two is when you use them in expressions:
my_ptr->num = 0;
int x = my_ptr->num++; // x = 0
my_ptr->num = 0;
int y = ++my_ptr->num; // y = 1
If your only intention is to increment the value of num then the 1st and 2nd method will yield same intented result to the callee method.
However, if you change your code to the following, you can see the difference between the code generated by gcc (assembly level code):
struct my_struct
{
int num;
};
void foo(struct my_struct* my_ptr)
{
printf("\nPost Increment: %d", my_ptr->num++);
}
int main()
{
struct my_struct a;
a.num = 10;
foo(&a);
}
Now compile it using: gcc -masm=intel -S structTest.c -o structTest.s
This asks gcc to generate the assembly code:
Open structTest.s in a text editor.
foo:
.LFB0:
push rbp
mov rbp, rsp
sub rsp, 16
**mov QWORD PTR [rbp-8], rdi**
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov edx, eax
**lea ecx, [rax+1]**
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], ecx
mov eax, OFFSET FLAT:.LC0
mov esi, edx
mov rdi, rax
mov eax, 0
call printf
leave
ret
.cfi_endproc
main:
.LFB1:
push rbp
mov rbp, rsp
sub rsp, 16
**mov DWORD PTR [rbp-16], 10
lea rax, [rbp-16]
mov rdi, rax
call foo**
leave
ret
.cfi_endproc
And when you change the operation to pre-increment, the follwoing code is generated:
foo:
.LFB0:
.cfi_startproc
push rbp
mov rbp, rsp
sub rsp, 16
**mov QWORD PTR [rbp-8], rdi**
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
**lea edx, [rax+1]**
mov rax, QWORD PTR [rbp-8]
**mov DWORD PTR [rax], edx**
mov rax, QWORD PTR [rbp-8]
**mov edx, DWORD PTR [rax]**
mov eax, OFFSET FLAT:.LC0
mov esi, edx
mov rdi, rax
mov eax, 0
call printf
leave
ret
.cfi_endproc
So, you would see that in the second case, the compiler increments the num value and passes on this num value to printf().
In terms of performance, I would expect the post-increment to be more efficient since the memory locations are touched a fewer number of times.
The important lines have been marked between ** in the above code.

Resources