Why does this generated assembly code seem to contain nonsense? [duplicate] - c

This question already has an answer here:
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 3 years ago.
I used https://godbolt.org/ with "x86-64 gcc 9.1" to assemble the following C code to understand why passing a pointer to a local variable as a function argument works. Now I have difficulties to understand some steps.
I commented on the lines I have difficulties with.
void printStr(char* cpStr) {
printf("str: %s", cpStr);
}
int main(void) {
char str[] = "abc";
printStr(str);
return 0;
}
.LC0:
.string "str: %s"
printStr:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
mov QWORD PTR [rbp-8], rdi ; why copy rdi to the stack...
mov rax, QWORD PTR [rbp-8] ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when "abc" is just 4 bytes long?
mov DWORD PTR [rbp-4], 6513249
lea rax, [rbp-4] ; pointer to str[0] copied into rax (*)
mov rdi, rax ; why copy the pointer to str[0] to rdi?
call printStr
mov eax, 0
leave
ret

Thanks to the help of Jester I could solve my confusion. The following code is compiled with the "-O1" flag of GCC (for me the best optimization level to understand what's going on):
.LC0:
.string "str: %s"
printStr:
sub rsp, 8
; now the call to printf gets prepared, rdi = first argument, rsi = second argument
mov rsi, rdi ; move str[0] to rsi
mov edi, OFFSET FLAT:.LC0 ; move address of static string literal "str: %s" to edi
mov eax, 0 ; set eax to the number of vector registers used, because printf is a varargs function
call printf
add rsp, 8
ret
main:
sub rsp, 24
mov DWORD PTR [rsp+12], 6513249 ; create string "abc" on the stack
lea rdi, [rsp+12] ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
call printStr
mov eax, 0
add rsp, 24
ret
As Jester said, the 16 bytes were allocated for alignment. There is a good post on Stack Overflow which explains this here.
Edit:
There is a post on Stack Overflow which explains why al is zeroed before a call to a varargs function here.

Related

In which register does the scanf function store input values?

I have the following disassembly of a main function in which a user input is stored using scanf function (at address 0x0000089c). Due to the comparison that is made, I suppose that the user input is stored into the rsp register but I cannot figure out why, as rsp doesn't seem to be pushed on the stack (at least, not near the call to the scanf function).
Here is the disassembly:
0x00000850 sub rsp, 0x18
0x00000854 mov rax, qword fs:[0x28]
0x0000085d mov qword [canary], rax
0x00000862 xor eax, eax
0x00000864 call fcn.00000a3c
0x00000869 lea rsi, str.Insert_input:
0x00000870 mov edi, 1
0x00000875 xor eax, eax
0x00000877 mov dword [rsp], 0
0x0000087e mov dword [var_4h], 0
0x00000886 call sym.imp.__printf_chk
0x0000088b lea rdx, [var_4h]
0x00000890 lea rdi, str.u__u ; "%u %u" ;const char *format
0x00000897 xor eax, eax
0x00000899 mov rsi, rsp
0x0000089c call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x000008a1 mov eax, dword [rsp]
0x000008a4 cmp eax, 0x1336
0x000008a9 jg 0x867
On x86_64, parameters are passed in registers, so your call to scanf has 3 parameters stored in 3 registers:
rdi pointer to the string "%u %u", the format to parse (two unsigned integers)
rsi should be a unsigned *, pointer to where to put the first parsed integer
rdx pointer to where to put the second parsed integer.
If you look just before the call, rsi is set to rsp (the stack pointer) while rdx is set to point at the global variable var_4h (an extern symbol not defined here).
The stack is used to hold local variables, and in this case rsp points at a block 0x18 "free" bytes (allocated in the first instruction in your block), which is enough space for 6 integers. The one at offset 0 from rsp is what rsi points to, and it is the value read by the mov instruction immediately after the call.

Understanding pointers in assembler from machine's view

Here is a basic program I written on the godbolt compiler, and it's as simple as:
#include<stdio.h>
void main()
{
int a = 10;
int *p = &a;
printf("%d", *p);
}
The results after compilation I get:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 10
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
Question: Pushing the rbp, making the stack frame by making a 16 byte block, how from a register, a value is moved to a stack location and vice versa, how the job of LEA is to figure out the address, I got this part.
Problem:
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
Lea -> getting address of rbp-12 into rax,
then moving the value which is the address of rbp-12 into rax,
but next line again says, move to rax, the value of rbp-8. This seems ambiguous. Then again moving the value of rax to eax. I don't understand the amount of work here. Why couldn't I have done
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov eax, QWORD PTR [rbp-8]
and be done with it? coz on the original line, rbp-12's address is stored onto rax, then rax stored to rbp-8. then rbp-8 stored again into rax, and then again rax is stored into eax? couldn't we have just copied the rbp-8 directly to eax? i guess not. But my question is why?
I know there is de-referencing in pointers, so How LEA helps grabbing the address of rbp-12, I understand, but on the next parts, when did it went from grabbing values from addresses I completely lost. And also, after that, I didn't understand any of the asm lines.
You're seeing very un-optimized code. Here's my line-by-line interpretation:
.LC0:
.string "%d" ; Format string for printf
main:
push rbp ; Save original base pointer
mov rbp, rsp ; Set base pointer to beginning of stack frame
sub rsp, 16 ; Allocate space for stack frame
mov DWORD PTR [rbp-12], 10 ; Initialize variable 'a'
lea rax, [rbp-12] ; Load effective address of 'a'
mov QWORD PTR [rbp-8], rax ; Store address of 'a' in 'p'
mov rax, QWORD PTR [rbp-8] ; Load 'p' into rax (even though it's already there - heh!)
mov eax, DWORD PTR [rax] ; Load 32-bit value of '*p' into eax
mov esi, eax ; Load value to print into esi
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
mov eax, 0 ; Zero out eax (not sure why -- likely printf call protocol)
call printf ; Make the printf call
nop ; No-op (not sure why)
leave ; Remove the stack frame
ret ; Return
Compilers, when not optimizing, generate code like this as they parse the code you gave them. It's doing a lot of unnecessary stuff, but it is quicker to generate and makes using a debugger easier.
Compare this with the optimized code (-O2):
.LC0:
.string "%d" ; Format string for printf
main:
mov esi, 10 ; Don't need those variables -- just a 10 to pass to printf!
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
xor eax, eax ; It's a few cycles faster to xor a register with itself than to load an immediate 0
jmp printf ; Just jmp to printf -- it will handle the return
The optimizer found that the variables weren't necessary, so no stack frame is created. Nothing is left but the printf call! And that's done as a jmp since nothing else need be done here when the printf is complete.

how to pass a variable by reference to a c function `sscanf()`

the following code tries to read the command line arguments and then scans them with sscanf() and use the result to emit utf8 text.
I'm failing to call sscanf() and getting segfault error at the line where I call this function
I have already debugged this and I know where is the problem but not how to solve it.
global main
extern puts
extern sscanf
extern printf
extern emit_utf_8
section .text
main:
cmp rdi, 2
jl argumentsError
add rsi, 8 ; skip the name of the program
forloop:
mov r12, rsi
push rdi
push rsi
push r12
sub rsp, 8 ; must align stack before call
; start of for bloc
xor rax, rax
mov rdi, qword [r12]
mov rsi, codePointFormat
mov rdx, qword [codePoint]
call sscanf
cmp rax, 1
je ifthen
jmp else
ifthen:
mov rdi, codePoint
call emit_utf_8
jmp endif
else:
mov rdi, incorrectFormat
mov rsi, r12
call printf
endif:
; end of for bloc
add rsp, 8 ; restore %rsp to pre-aligned value
pop r12
pop rsi
pop rdi
add rsi, 8 ; point to next argument
dec rdi ; count down
jnz forloop ; if not done counting keep going
ret
argumentsError: mov rdi, argumentsRequiredMessage
call puts
mov rdi, argumentDescription
call puts
xor rax, rax
inc rax
ret
section .data
argumentsRequiredMessage:
db "This program requires one or more command line arguments,", 0
argumentDescription: db "one for each code point to encode as UTF-8.", 0
incorrectFormat: db "(%s incorrect format)", 0
codePointFormat: db "U+%6X", 0
section .bss
codePoint: resb 8 ; The code point from sscanf should go here.
Is there a way to pass that third argument?
sscanf() signature.
in __cdecl sscanf(const char *const _buffer, const char *const _Format, ...)
I'm using ubuntu 19.04 64 bit

How are allocated arrays declared in a loop?

I'm puzzled over this function.
int i;
for(i = 1; i<10; i++){
int arr[i];
printf("%d\n",sizeof(arr));
}
return 0;
How can the space grow in a bounded (by ESP) stack memory?
Is there a sort of compilation trick?
EDIT for explanation:
Shouldn't the stack be something like that?
0 ---> val of i uninitialized
-4 ---> arr[0] uninitialized
and after the first loop
0 ---> val of i uninitialized
-4 ---> arr[1] uninitialized
-8 ---> arr[0] uninitialized
I'm tempted to say: is ESP moving below each iteration of the loop?
How can the space grow in a bounded size stack memory?
You refer to the space of char arr - its space does not grow. It's a local variable inside the scope of the for loop. So everytime the loop has a new i it's a brand new char arr.
On every loop there is allocated stack for the array and then dealocated.
A bit different example
#include "stdio.h"
#include "string.h"
int h(int x)
{
for(int i = 1; i<x; i++){
char arr[i];
memset(arr, i, sizeof(arr));
printf("%d\n",sizeof(arr));
}
return 0;
}
int main()
{
h(50);
}
in the compiled code
.LC0:
.string "%d\n"
h:
push rbp
mov rbp, rsp
push r13
push r12
mov r12d, edi
push rbx
mov ebx, 1
push rax
.L2:
cmp r12d, ebx
jle .L6
lea rax, [rbx+15]
mov r13, rsp
mov ecx, ebx
mov rsi, rbx
and rax, -16
sub rsp, rax
mov eax, ebx
inc rbx
mov rdi, rsp
rep stosb
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
mov rsp, r13
jmp .L2
.L6:
lea rsp, [rbp-24]
xor eax, eax
pop rbx
pop r12
pop r13
pop rbp
ret
main:
push rax
mov edi, 50
call h
xor eax, eax
pop rdx
ret
lines 15,19 & 20 allocate the space
and thew line 28 deallocates the space for the array
https://godbolt.org/z/msgrc2
Is there a sort of compilation trick?
Yes, sort of. It uses VLAs (https://en.wikipedia.org/wiki/Variable-length_array)
Godbolt is very useful for inspecting things like this:
https://godbolt.org/z/_uR9Ac
As you can see the -Wvla warning is indeed triggered for the line in question.

C pointers and references

I would like to know what's really happening calling & and * in C.
Is that it costs a lot of resources? Should I call & each time I wanna get an adress of a same given variable or keep it in memory i.e in a cache variable. Same for * i.e when I wanna get a pointer value ?
Example
void bar(char *str)
{
check_one(*str)
check_two(*str)
//... Could be replaced by
char c = *str;
check_one(c);
check_two(c);
}
I would like to know what's really happening calling & and * in C.
There's no such thing as "calling" & or *. They are the address operator, or the dereference operator, and instruct the compiler to work with the address of an object, or with the object that a pointer points to, respectively.
And C is not C++, so there's no references; I think you just misused that word in your question's title.
In most cases, that's basically two ways to look at the same thing.
Usually, you'll use & when you actually want the address of an object. Since the compiler needs to handle objects in memory with their address anyway, there's no overhead.
For the specific implications of using the operators, you'll have to look at the assembler your compiler generates.
Example: consider this trivial code, disassembled via godbolt.org:
#include <stdio.h>
#include <stdlib.h>
void check_one(char c)
{
if(c == 'x')
exit(0);
}
void check_two(char c)
{
if(c == 'X')
exit(1);
}
void foo(char *str)
{
check_one(*str);
check_two(*str);
}
void bar(char *str)
{
char c = *str;
check_one(c);
check_two(c);
}
int main()
{
char msg[] = "something";
foo(msg);
bar(msg);
}
The compiler output can far wildly depending on the vendor and optimization settings.
clang 3.8 using -O2
check_one(char): # #check_one(char)
movzx eax, dil
cmp eax, 120
je .LBB0_2
ret
.LBB0_2:
push rax
xor edi, edi
call exit
check_two(char): # #check_two(char)
movzx eax, dil
cmp eax, 88
je .LBB1_2
ret
.LBB1_2:
push rax
mov edi, 1
call exit
foo(char*): # #foo(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB2_3
movzx eax, al
cmp eax, 120
je .LBB2_2
pop rax
ret
.LBB2_3:
mov edi, 1
call exit
.LBB2_2:
xor edi, edi
call exit
bar(char*): # #bar(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB3_3
movzx eax, al
cmp eax, 120
je .LBB3_2
pop rax
ret
.LBB3_3:
mov edi, 1
call exit
.LBB3_2:
xor edi, edi
call exit
main: # #main
xor eax, eax
ret
Notice that foo and bar are identical. Do other compilers do something similar? Well...
gcc x64 5.4 using -O2
check_one(char):
cmp dil, 120
je .L6
rep ret
.L6:
push rax
xor edi, edi
call exit
check_two(char):
cmp dil, 88
je .L11
rep ret
.L11:
push rax
mov edi, 1
call exit
bar(char*):
sub rsp, 8
movzx eax, BYTE PTR [rdi]
cmp al, 120
je .L16
cmp al, 88
je .L17
add rsp, 8
ret
.L16:
xor edi, edi
call exit
.L17:
mov edi, 1
call exit
foo(char*):
jmp bar(char*)
main:
sub rsp, 24
movabs rax, 7956005065853857651
mov QWORD PTR [rsp], rax
mov rdi, rsp
mov eax, 103
mov WORD PTR [rsp+8], ax
call bar(char*)
mov rdi, rsp
call bar(char*)
xor eax, eax
add rsp, 24
ret
Well, if there were any doubt foo and bar are equivalent, a least by the compiler, I think this:
foo(char*):
jmp bar(char*)
is a strong argument they indeed are.
In C, there's no runtime cost associated with either the unary & or * operators; both are evaluated at compile time. So there's no difference in runtime between
check_one(*str)
check_two(*str)
and
char c = *str;
check_one( c );
check_two( c );
ignoring the overhead of the assignment.
That's not necessarily true in C++, since you can overload those operators.
tldr;
If you are programming in C, then the & operator is used to obtain the address of a variable and * is used to get the value of that variable, given it's address.
This is also the reason why in C, when you pass a string to a function, you must state the length of the string otherwise, if someone unfamiliar with your logic sees the function signature, they could not tell if the function is called as bar(&some_char) or bar(some_cstr).
To conclude, if you have a variable x of type someType, then &x will result in someType* addressOfX and *addressOfX will result in giving the value of x. Functions in C only take pointers as parameters, i.e. you cannot create a function where the parameter type is &x or &&x
Also your examples can be rewritten as:
check_one(str[0])
check_two(str[0])
AFAIK, in x86 and x64 your variables are stored in memory (if not stated with register keyword) and accessed by pointers.
const int foo = 5 equal to foo dd 5 and check_one(*foo) equal to push dword [foo]; call check_one.
If you create additional variable c, then it looks like:
c resd 1
...
mov eax, [foo]
mov dword [c], eax ; Variable foo just copied to c
push dword [c]
call check_one
And nothing changed, except additional copying and memory allocation.
I think that compiler's optimizer deals with it and makes both cases as fast as it is possible. So you can use more readable variant.

Resources