What is the decompiled (C) code construct of this assembly x86 code? - c

This code compares every char of a string (located at ebp+arg_0) with different constants (ASCII chars) like 'I', 'o' and 'S'. I guess, based on other code parts, this code is originally written in C.
This compare-code-part looks very inefficient to. My question, how do you think this code will look in C? What code construct was used originally? my thoughts so far
It's not a for loop. Because i don't see any upward jump and stop-condition.
It's not a while/case/switch code construct
My best guess is that this are a lot of consecutive if/else statements.
Can you help?
Yes it's part of a challenge, i already have the flag/solution, no worries about that. Just trying to better understand the code.

It's not a for loop. Because i don't see any upward jump and stop-condition.
Correct.
It's not a while/case/switch code constuct
It can't be, it compares different indicies of the array.
My best guess is that this are a lot of consecutive if/elses. Can you help?
Looks like it could be this code:
void f(const char* arg_0) {
if(arg_0[4] == 'I' && arg_0[5] == 'o' && arg_0[6] == 'S') {
printf("Gratz man :)");
exit(0); //noreturn, hence your control flow ends here in the assembly
}
puts("Wrong password"); // Or `printf("Wrong password\n");` which gets optimized to `puts`
// leave, retn
}
This is how gcc compiles it without optimizations:
.LC0:
.string "Gratz man :)"
.LC1:
.string "Wrong password"
f(char const*):
push ebp
mov ebp, esp
sub esp, 8
mov eax, DWORD PTR [ebp+8]
add eax, 4
movzx eax, BYTE PTR [eax]
cmp al, 73
jne .L2
mov eax, DWORD PTR [ebp+8]
add eax, 5
movzx eax, BYTE PTR [eax]
cmp al, 111
jne .L2
mov eax, DWORD PTR [ebp+8]
add eax, 6
movzx eax, BYTE PTR [eax]
cmp al, 83
jne .L2
sub esp, 12
push OFFSET FLAT:.LC0
call printf
add esp, 16
sub esp, 12
push 0
call exit
.L2:
sub esp, 12
push OFFSET FLAT:.LC1
call puts
add esp, 16
nop
leave
ret
Looks very similar to your disassembled code.
This compare-code-part looks very inefficient
Looks like it was compiled without optimizations. With optimizations enabled, gcc compiled the code to:
.LC0:
.string "Gratz man :)"
.LC1:
.string "Wrong password"
f(char const*):
sub esp, 12
mov eax, DWORD PTR [esp+16]
cmp BYTE PTR [eax+4], 73
jne .L2
cmp BYTE PTR [eax+5], 111
je .L5
.L2:
mov DWORD PTR [esp+16], OFFSET FLAT:.LC1
add esp, 12
jmp puts
.L5:
cmp BYTE PTR [eax+6], 83
jne .L2
sub esp, 12
push OFFSET FLAT:.LC0
call printf
mov DWORD PTR [esp], 0
call exit
Not sure why gcc decided to jump down and back up again instead of a straight line of jnes. Also, the ret is gone, your printf got tail-call-optimized, i.e. a jmp printf istead of a call printf followed by a ret.

The first argument (arg_0) is a pointer to the given password string, e.g., const char *arg_0. This pointer (the address of the first character) is loaded into the eax register (mov eax, [ebp+arg_0]), and then the index of the current character is added to advance it to that index (add eax, 4 etc.). The single byte at that address is then loaded into eax (movzx eax, byte ptr [eax]).
Then that byte/character is compared against the correct one (cmp eax, 'I', etc). If the result is not zero (i.e., if they are not equal), the program jumps to the "Wrong password" branch (jnz - jump if not zero), otherwise it continues on to the next comparison (and ultimately success).
The nearest direct C equivalent would therefore be something like:
void check(const char *arg_0) {
// presumably comparisons 0-3 are omitted
if (arg_0[4] != 'I') goto fail;
if (arg_0[5] != 'o') goto fail;
if (arg_0[6] != 'S') goto fail;
printf("Gratz man :)");
exit(0);
fail:
puts("Wrong password");
}
(Of course the actual C code is unlikely to have looked like this, since the goto fail arrangement is not typical.)

Related

Am I experiencing demand paging when not altering the values of a newly created array?

I'm learning about OSs memory management and just learned about virtual memory and how it can be implemented using demand paging.
I made this simple program:
#include<stdio.h>
#include<stdlib.h>
int main(void){
int x=0;
scanf("%d",&x);
if(x==1){
int *a=malloc(1073741824*sizeof(int));
while(1){
for(size_t i=0;i<1073741824;i++){
a[i]=1;
}
}
}
else if(x==2){
int *b=malloc(1073741824*sizeof(int));
while(1);
}
return 0;
}
It has 2 paths:
One allocated an array of 4 gb and keeps changing its values.
The other allocated an array of 4 gb, but doesn't change its values.
As I expected, after running the first option, the memory of the program increases to about 4 gb, but the second one doesn't change. I'm guessing this is due to the memory not being accessed, so its pages are "swapped out" to a backing store, until they're needed again.
Is my understanding correct?
Demand paging is when you access pages of memory that aren't in Linux's page cache. For example, if you use memory mapping you can access a file on disk as if it were in RAM. When you dereference a memory mapped pointer Linux first checks the page cache for the data; if the page isn't in the cache then it page faults and loads the missing pages from the disk. Your program doesn't see all the on-demand disk access going on behind the scenes.
Demand paging isn't relevant with a fresh malloc() call since there's nothing on disk. What you're seeing is called overcommit. Linux doesn't allocate memory when malloc() is called; it lies and always returns a pointer whether it can satisfy the request or not. The memory is only actually allocated when (if) you access it. If you ask for 4GB of memory but don't do anything with it nothing's actually allocated.
Unsurprisingly, lying doesn't always work out. Linux might not actually be able to satisfy a program's memory needs, but the program has no way of knowing because malloc() didn't return NULL. It's like an airline that overbooks its flights. The airline gives you a ticket, but when you show up at the airport they tell you they ran out of seats and kick you off the flight. You try to access the memory you allocated and Linux panics, invokes the OOM killer, and starts killing processes to free up memory. Who does it kill? Maybe your process. Maybe you're spared and others die. Suffice it to say, overcommit is a controversial feature.
If you enable code optimizations the generated code will not call malloc at all.
.LC0:
.string "%d"
main:
sub rsp, 24
mov edi, OFFSET FLAT:.LC0
xor eax, eax
lea rsi, [rsp+12]
mov DWORD PTR [rsp+12], 0
call __isoc99_scanf
mov eax, DWORD PTR [rsp+12]
cmp eax, 1
je .L4
cmp eax, 2
je .L6
xor eax, eax
add rsp, 24
ret
.L4:
mov eax, 1073741824
.L3:
sub rax, 1
jne .L3
jmp .L4
.L6:
jmp .L6
To prevent it make pointer volatile:
int main(void){
int x=0;
scanf("%d",&x);
if(x==1){
int * volatile a=malloc(1073741824*sizeof(int));
while(1){
for(size_t i=0;i<1073741824;i++){
a[i]=1;
}
}
}
else if(x==2){
int * volatile b=malloc(1073741824*sizeof(int));
while(1);
}
return 0;
}
Now it should malloc in both cases:
.LC0:
.string "%d"
main:
sub rsp, 24
mov edi, OFFSET FLAT:.LC0
xor eax, eax
lea rsi, [rsp+4]
mov DWORD PTR [rsp+4], 0
call __isoc99_scanf
mov eax, DWORD PTR [rsp+4]
cmp eax, 1
je .L10
cmp eax, 2
je .L11
xor eax, eax
add rsp, 24
ret
.L10:
mov edi, 1
sal rdi, 32
call malloc
mov QWORD PTR [rsp+8], rax
.L4:
mov eax, 1073741824
.L3:
mov rdx, QWORD PTR [rsp+8]
sub rax, 1
jne .L3
jmp .L4
.L11:
mov edi, 1
sal rdi, 32
call malloc
mov QWORD PTR [rsp+8], rax
.L6:
jmp .L6

Understanding pointers in assembler from machine's view

Here is a basic program I written on the godbolt compiler, and it's as simple as:
#include<stdio.h>
void main()
{
int a = 10;
int *p = &a;
printf("%d", *p);
}
The results after compilation I get:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 10
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
Question: Pushing the rbp, making the stack frame by making a 16 byte block, how from a register, a value is moved to a stack location and vice versa, how the job of LEA is to figure out the address, I got this part.
Problem:
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
Lea -> getting address of rbp-12 into rax,
then moving the value which is the address of rbp-12 into rax,
but next line again says, move to rax, the value of rbp-8. This seems ambiguous. Then again moving the value of rax to eax. I don't understand the amount of work here. Why couldn't I have done
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov eax, QWORD PTR [rbp-8]
and be done with it? coz on the original line, rbp-12's address is stored onto rax, then rax stored to rbp-8. then rbp-8 stored again into rax, and then again rax is stored into eax? couldn't we have just copied the rbp-8 directly to eax? i guess not. But my question is why?
I know there is de-referencing in pointers, so How LEA helps grabbing the address of rbp-12, I understand, but on the next parts, when did it went from grabbing values from addresses I completely lost. And also, after that, I didn't understand any of the asm lines.
You're seeing very un-optimized code. Here's my line-by-line interpretation:
.LC0:
.string "%d" ; Format string for printf
main:
push rbp ; Save original base pointer
mov rbp, rsp ; Set base pointer to beginning of stack frame
sub rsp, 16 ; Allocate space for stack frame
mov DWORD PTR [rbp-12], 10 ; Initialize variable 'a'
lea rax, [rbp-12] ; Load effective address of 'a'
mov QWORD PTR [rbp-8], rax ; Store address of 'a' in 'p'
mov rax, QWORD PTR [rbp-8] ; Load 'p' into rax (even though it's already there - heh!)
mov eax, DWORD PTR [rax] ; Load 32-bit value of '*p' into eax
mov esi, eax ; Load value to print into esi
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
mov eax, 0 ; Zero out eax (not sure why -- likely printf call protocol)
call printf ; Make the printf call
nop ; No-op (not sure why)
leave ; Remove the stack frame
ret ; Return
Compilers, when not optimizing, generate code like this as they parse the code you gave them. It's doing a lot of unnecessary stuff, but it is quicker to generate and makes using a debugger easier.
Compare this with the optimized code (-O2):
.LC0:
.string "%d" ; Format string for printf
main:
mov esi, 10 ; Don't need those variables -- just a 10 to pass to printf!
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
xor eax, eax ; It's a few cycles faster to xor a register with itself than to load an immediate 0
jmp printf ; Just jmp to printf -- it will handle the return
The optimizer found that the variables weren't necessary, so no stack frame is created. Nothing is left but the printf call! And that's done as a jmp since nothing else need be done here when the printf is complete.

how to pass a variable by reference to a c function `sscanf()`

the following code tries to read the command line arguments and then scans them with sscanf() and use the result to emit utf8 text.
I'm failing to call sscanf() and getting segfault error at the line where I call this function
I have already debugged this and I know where is the problem but not how to solve it.
global main
extern puts
extern sscanf
extern printf
extern emit_utf_8
section .text
main:
cmp rdi, 2
jl argumentsError
add rsi, 8 ; skip the name of the program
forloop:
mov r12, rsi
push rdi
push rsi
push r12
sub rsp, 8 ; must align stack before call
; start of for bloc
xor rax, rax
mov rdi, qword [r12]
mov rsi, codePointFormat
mov rdx, qword [codePoint]
call sscanf
cmp rax, 1
je ifthen
jmp else
ifthen:
mov rdi, codePoint
call emit_utf_8
jmp endif
else:
mov rdi, incorrectFormat
mov rsi, r12
call printf
endif:
; end of for bloc
add rsp, 8 ; restore %rsp to pre-aligned value
pop r12
pop rsi
pop rdi
add rsi, 8 ; point to next argument
dec rdi ; count down
jnz forloop ; if not done counting keep going
ret
argumentsError: mov rdi, argumentsRequiredMessage
call puts
mov rdi, argumentDescription
call puts
xor rax, rax
inc rax
ret
section .data
argumentsRequiredMessage:
db "This program requires one or more command line arguments,", 0
argumentDescription: db "one for each code point to encode as UTF-8.", 0
incorrectFormat: db "(%s incorrect format)", 0
codePointFormat: db "U+%6X", 0
section .bss
codePoint: resb 8 ; The code point from sscanf should go here.
Is there a way to pass that third argument?
sscanf() signature.
in __cdecl sscanf(const char *const _buffer, const char *const _Format, ...)
I'm using ubuntu 19.04 64 bit

Why does this generated assembly code seem to contain nonsense? [duplicate]

This question already has an answer here:
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 3 years ago.
I used https://godbolt.org/ with "x86-64 gcc 9.1" to assemble the following C code to understand why passing a pointer to a local variable as a function argument works. Now I have difficulties to understand some steps.
I commented on the lines I have difficulties with.
void printStr(char* cpStr) {
printf("str: %s", cpStr);
}
int main(void) {
char str[] = "abc";
printStr(str);
return 0;
}
.LC0:
.string "str: %s"
printStr:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
mov QWORD PTR [rbp-8], rdi ; why copy rdi to the stack...
mov rax, QWORD PTR [rbp-8] ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when "abc" is just 4 bytes long?
mov DWORD PTR [rbp-4], 6513249
lea rax, [rbp-4] ; pointer to str[0] copied into rax (*)
mov rdi, rax ; why copy the pointer to str[0] to rdi?
call printStr
mov eax, 0
leave
ret
Thanks to the help of Jester I could solve my confusion. The following code is compiled with the "-O1" flag of GCC (for me the best optimization level to understand what's going on):
.LC0:
.string "str: %s"
printStr:
sub rsp, 8
; now the call to printf gets prepared, rdi = first argument, rsi = second argument
mov rsi, rdi ; move str[0] to rsi
mov edi, OFFSET FLAT:.LC0 ; move address of static string literal "str: %s" to edi
mov eax, 0 ; set eax to the number of vector registers used, because printf is a varargs function
call printf
add rsp, 8
ret
main:
sub rsp, 24
mov DWORD PTR [rsp+12], 6513249 ; create string "abc" on the stack
lea rdi, [rsp+12] ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
call printStr
mov eax, 0
add rsp, 24
ret
As Jester said, the 16 bytes were allocated for alignment. There is a good post on Stack Overflow which explains this here.
Edit:
There is a post on Stack Overflow which explains why al is zeroed before a call to a varargs function here.

C pointers and references

I would like to know what's really happening calling & and * in C.
Is that it costs a lot of resources? Should I call & each time I wanna get an adress of a same given variable or keep it in memory i.e in a cache variable. Same for * i.e when I wanna get a pointer value ?
Example
void bar(char *str)
{
check_one(*str)
check_two(*str)
//... Could be replaced by
char c = *str;
check_one(c);
check_two(c);
}
I would like to know what's really happening calling & and * in C.
There's no such thing as "calling" & or *. They are the address operator, or the dereference operator, and instruct the compiler to work with the address of an object, or with the object that a pointer points to, respectively.
And C is not C++, so there's no references; I think you just misused that word in your question's title.
In most cases, that's basically two ways to look at the same thing.
Usually, you'll use & when you actually want the address of an object. Since the compiler needs to handle objects in memory with their address anyway, there's no overhead.
For the specific implications of using the operators, you'll have to look at the assembler your compiler generates.
Example: consider this trivial code, disassembled via godbolt.org:
#include <stdio.h>
#include <stdlib.h>
void check_one(char c)
{
if(c == 'x')
exit(0);
}
void check_two(char c)
{
if(c == 'X')
exit(1);
}
void foo(char *str)
{
check_one(*str);
check_two(*str);
}
void bar(char *str)
{
char c = *str;
check_one(c);
check_two(c);
}
int main()
{
char msg[] = "something";
foo(msg);
bar(msg);
}
The compiler output can far wildly depending on the vendor and optimization settings.
clang 3.8 using -O2
check_one(char): # #check_one(char)
movzx eax, dil
cmp eax, 120
je .LBB0_2
ret
.LBB0_2:
push rax
xor edi, edi
call exit
check_two(char): # #check_two(char)
movzx eax, dil
cmp eax, 88
je .LBB1_2
ret
.LBB1_2:
push rax
mov edi, 1
call exit
foo(char*): # #foo(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB2_3
movzx eax, al
cmp eax, 120
je .LBB2_2
pop rax
ret
.LBB2_3:
mov edi, 1
call exit
.LBB2_2:
xor edi, edi
call exit
bar(char*): # #bar(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB3_3
movzx eax, al
cmp eax, 120
je .LBB3_2
pop rax
ret
.LBB3_3:
mov edi, 1
call exit
.LBB3_2:
xor edi, edi
call exit
main: # #main
xor eax, eax
ret
Notice that foo and bar are identical. Do other compilers do something similar? Well...
gcc x64 5.4 using -O2
check_one(char):
cmp dil, 120
je .L6
rep ret
.L6:
push rax
xor edi, edi
call exit
check_two(char):
cmp dil, 88
je .L11
rep ret
.L11:
push rax
mov edi, 1
call exit
bar(char*):
sub rsp, 8
movzx eax, BYTE PTR [rdi]
cmp al, 120
je .L16
cmp al, 88
je .L17
add rsp, 8
ret
.L16:
xor edi, edi
call exit
.L17:
mov edi, 1
call exit
foo(char*):
jmp bar(char*)
main:
sub rsp, 24
movabs rax, 7956005065853857651
mov QWORD PTR [rsp], rax
mov rdi, rsp
mov eax, 103
mov WORD PTR [rsp+8], ax
call bar(char*)
mov rdi, rsp
call bar(char*)
xor eax, eax
add rsp, 24
ret
Well, if there were any doubt foo and bar are equivalent, a least by the compiler, I think this:
foo(char*):
jmp bar(char*)
is a strong argument they indeed are.
In C, there's no runtime cost associated with either the unary & or * operators; both are evaluated at compile time. So there's no difference in runtime between
check_one(*str)
check_two(*str)
and
char c = *str;
check_one( c );
check_two( c );
ignoring the overhead of the assignment.
That's not necessarily true in C++, since you can overload those operators.
tldr;
If you are programming in C, then the & operator is used to obtain the address of a variable and * is used to get the value of that variable, given it's address.
This is also the reason why in C, when you pass a string to a function, you must state the length of the string otherwise, if someone unfamiliar with your logic sees the function signature, they could not tell if the function is called as bar(&some_char) or bar(some_cstr).
To conclude, if you have a variable x of type someType, then &x will result in someType* addressOfX and *addressOfX will result in giving the value of x. Functions in C only take pointers as parameters, i.e. you cannot create a function where the parameter type is &x or &&x
Also your examples can be rewritten as:
check_one(str[0])
check_two(str[0])
AFAIK, in x86 and x64 your variables are stored in memory (if not stated with register keyword) and accessed by pointers.
const int foo = 5 equal to foo dd 5 and check_one(*foo) equal to push dword [foo]; call check_one.
If you create additional variable c, then it looks like:
c resd 1
...
mov eax, [foo]
mov dword [c], eax ; Variable foo just copied to c
push dword [c]
call check_one
And nothing changed, except additional copying and memory allocation.
I think that compiler's optimizer deals with it and makes both cases as fast as it is possible. So you can use more readable variant.

Resources