How an assembly language works? - c

I am learning assembly and I have this assembly code and having much trouble understanding it can someone clarify it?
Dump of assembler code for function main:
0x080483ed <+0>: push ebp
0x080483ee <+1>: mov ebp,esp
0x080483f0 <+3>: sub esp,0x10
0x080483f3 <+6>: mov DWORD PTR [ebp-0x8],0x0
0x080483fa <+13>: mov eax,DWORD PTR [ebp-0x8]
0x080483fd <+16>: add eax,0x1
0x08048400 <+19>: mov DWORD PTR [ebp-0x4],eax
0x08048403 <+22>: leave
0x08048404 <+23>: ret
Until now, my understood knowledge is the following:
Push something (don't know what) in ebp register. then move content of esp register into ebp (I think the data of ebp should be overwritten), then subtract 10 from the esp and store it in the esp (The function will take 10 byte, This reg is never used again, so no point of doing this operation). Now assign value 0 to the address pointed by 8 bytes less than ebp.
Now store that address into register eax. Now add 1 to the value pointed by eax (the previous value is lost). Now store the eax value on [ebp-0x4], then leave to the return address of main.
Here is my C code for the above program:
int main(){
int x=0;
int y = x+1;
}
Now, can someone figure out if I am wrong at anything,and I also don't understand the mov at <+13> it adds 1 to the addrs ebp-0x8, but that is the address of int x so, x no longer contain 0. Where am I wrong?

first of all, push ebp and then mov ebp, esp are two instructions that are common at the beggining of a procedure. ESP register is an indicator for the top of the stack - so it changes constantly as the stack grows or shrinks. EBP is a helping register here. First we push content of ebp on stack. then we copy ESP (current stack top adress) to ebp - that is why when we refer to other items on the stack, we use constant value of ebp (and not changing one of esp).
sub esp, 0x10 ; means we reserve 16 bytes on the stack (0x10 is 16 in hex)
now for the real fun:
mov DWORD PTR [ebp-0x8],0x0 ; remember ebp was showing on the stack
; top BEFORE reserving 16 bytes.
; DWORD PTR means Double-word property which is 32 bits.
; so the whole instruction means
; "move 0 to the 32 bits of the stack in a place which
; starts with the adress ebp-8.
; this is our`int x = 0`
mov eax,DWORD PTR [ebp-0x8] ; send x to EAX register.
add eax,0x1` ; add 1 to the eax register
mov DWORD PTR [ebp-0x4],eax ; send the result (which is in eax) to the stack adress
; [ebp-4]
leave ; Cleanup stack (reverse the "mov ebp, esp" from above).
ret ; let's say this instruction returns to the program, (it's slightly more
; complicated than that)
Hope this helps! :)

0x080483ed <+0>: push ebp
0x080483ee <+1>: mov ebp,esp
Setting up the stack frame. Save the old base pointer and set the top of the stack as the new base pointer. This allows local variables and arguments within this function to be referenced relative to ebp (base pointer). The advantage of this is that its value is stable unlike esp which is affected by pushes and pops.
0x080483f0 <+3>: sub esp,0x10
On the x86 platform the stack 'grows' downwards. Generally speaking this means esp has a lower value (address in memory) than ebp. When ebp == esp the stack has not reserved any memory for local variables. This does not mean it is 'empty' - a common usage of a stack is [ebp+0x8] for instance. In this case the code is looking for something on the stack which was previously pushed on prior to the call (this could be arguments in the stdcall convention).
In this case the stack is extended by 16 bytes. In this case more space is reserved than necessary for alignment purposes:
0x080483f3 <+6>: mov DWORD PTR [ebp-0x8],0x0
The 4 bytes at [ebp-0x8] are initialised to the value 0. This is your x local variable.
0x080483fa <+13>: mov eax,DWORD PTR [ebp-0x8]
The 4 bytes at [ebp-0x8] are moved to a register. Arithmetic opcodes can not operate with two memory operands. Data needs to be moved to a register first before the arithmetic is performed. eax now holds the value of your x variable.
0x080483fd <+16>: add eax,0x1
The value of eax is increased so it now holds the value x + 1.
0x08048400 <+19>: mov DWORD PTR [ebp-0x4],eax
Stores the calculated value back on the stack. Notice the local variable is now [ebp-0x4] - this is your y variable.
0x08048403 <+22>: leave
Destroys the stack frame. Essentially maps to pop ebp and restores the old stack base pointer.
0x08048404 <+23>: ret
Pops the top of the stack treating the value as a return address and sets the program pointer (eip) to this value. The return address typically holds the address of the instruction directly after the call instruction that brought execution into this function.

Related

Assembly and callstack

I'm trying to get an understanding of assembly but unfortunately I have problems to understand the following C code in assembly:
void test_function(int a, int b, int c, int d) {
int flag;
char buffer[10]
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_fuction(1,2,3,4);
}
The assembly of main() looks like this:
push ebp
mov ebp, esp
sub esp,0x18
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
mov DWORD PTR [esp+12], 0x4
mov DWORD PTR [esp+12], 0x3
mov DWORD PTR [esp+12], 0x2
mov DWORD PTR [esp+12], 0x1
call <test_function>
The assembly for test_function(...) looks like this:
push ebp
mov ebp, esp
sub esp,0x28
mov DWORD PTR [ebp-12], 0x7a69 ;this is 31337 in hexadecimal
mov BYTE PTR [ebp-40], 0x41 ;this is the 'A' in ASCII
leave
ret
What is hard for me to understand is:
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
Why are we operating an and with 0xffffffff0 on esp?
And why do we move a 0 to eax and sub the content of eax from esp?
Second:
Through sub esp,0x28 we are allocating 40 bytes of RAM. Why 40? The integer and the 10 chars of the array are altogether only 14 bytes, aren't they?
And why are we moving 0x7a69 to the position [ebp-12] and not to [ebp]? By operating mov ebp, esp I set ebp to the current ESP. Now ESP is pointing to the end of the stack. The last value I pushed on the stack was the ebp by operating push ebp. So EBP (= esp) points behind the saved ebp. So why couldn't I move 0x7a69 to [ebp] just directly behind the saved EBP?
And why is the 'A' moved to [ebp-40]?
This seems to be some standard compiler-generated assembler.
and esp,0xffffffff0
mov eax,0x0
sub esp,eax
The and will make esp a multiple of 16, i.e. alligns it on 16 bytes. Because the stack grows downward it is essentially a substraction, not an addition.
The next mov and add reserve space for the local variables. in main there are no local variables, so their total is 0x0. Because test_function has local variables, 0x28 is moved to eax and added to esp. Probably the compiler has also alligned this on some multiple. Lastly, [ebp-40] is the location on the reserved stack space the compiler has assigned to buffer.

Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

I have recently exploited a dangerous program and found something interesting about the difference between versions of gcc on x86-64 architecture.
Note:
Wrongful usage of gets is not the issue here.
If we replace gets with any other functions, the problem doesn't change.
This is the source code I use:
#include <stdio.h>
int main()
{
char buf[16];
gets(buf);
return 0;
}
I use gcc.godbolt.org to disassemble the program with flag -m32 -fno-stack-protector -z execstack -g.
At the disassembled code, when gcc with version >= 4.9.0:
lea ecx, [esp+4] # begin of main
and esp, -16
push DWORD PTR [ecx-4] # push esp
push ebp
mov ebp, esp
/* between these comment is not related to the question
push ecx
sub esp, 20
sub esp, 12
lea eax, [ebp-24]
push eax
call gets
add esp, 16
mov eax, 0
*/
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret # end of main
But gcc with version < 4.9.0 just:
push ebp # begin of main
mov ebp, esp
/* between these comment is not related to the question
and esp, -16
sub esp, 32
lea eax, [esp+16]
mov DWORD PTR [esp], eax
call gets
mov eax, 0
*/
leave
ret # end of main
My question is: What is the causes of this difference on the disassembled code and its benefits? Does it have a name for this technique?
I can't say for sure without the actual values in:
and esp, 0xXX # XX is a number
but this looks a lot like extra code to align the stack to a larger value than the ABI requires.
Edit: The value is -16, which is 32-bit 0xFFFFFFF0 or 64-bit 0xFFFFFFFFFFFFFFF0 so this is indeed stack alignment to 16 bytes, likely meant for use of SSE instructions. As mentioned in comments, there is more code in the >= 4.9.0 version because it also aligns the frame pointer instead of only the stack pointer.
The i386 ABI, used for 32-bit programs, imposes that a process, immediately after loaded, has to have the stack aligned on 32-bit values:
%esp Performing its usual job, the stack pointer holds the address of the
bottom of the stack, which is guaranteed to be word aligned.
confront this with the x86_64 ABI1 used for 64-bit programs:
%rsp The stack pointer holds the address of the byte with lowest address which
is part of the stack. It is guaranteed to be 16-byte aligned at process entry
The opportunity gave by the new AMD's 64-bit technology to rewrite the old i386 ABI allow a number of optimizations that were lacking due to backward compatibility, among these a bigger (stricter?) stack alignment.
I won't dwell on the benefits of stack alignment but it suffices to say that if a 4-byte alignment was good, so is a 16-byte one.
So much that it is worth spending some instructions aligning the stack.
That's what GCC 4.9.0+ does, it aligns the stack at 16-bytes.
That explains the and esp, -16 but not the other instructions.
Aligning the stack with and esp, -16 is the fastest way to do it when the compiler only knows that the stack is 4-byte aligned (since esp MOD 16 can be 0, 4, 8 or 12).
However it is a destructive method, the compiler loses the original esp value.
But now it comes the chicken or the egg problem: if we save the original esp on the stack before aligning the stack, we lose it because we don't know how far the stack pointer is lowered by the alignment. If we save it after the alignment, well, we can't. We lost it in the alignment.
So the only possible solution is to save it in a register, align the stack and then save said register on the stack.
;Save the stack pointer in ECX, actually is ESP+4 but still does
lea ecx, [esp+4] #ECX = ESP+4
;Align the stack
and esp, -16 #This lowers ESP by 0, 4, 8 or 12
;IGNORE THIS FOR NOW
push DWORD PTR [ecx-4]
;Usual prolog
push ebp
mov ebp, esp
;Save the original ESP (before alignment), actually is ESP+4 but OK
push ecx
GCC saves esp+4 in ecx, I don't know why2 but this values still does the trick.
The only mystery left is the push DWORD PTR [ecx-4].
But it turns out to be a simple mystery: for debugging purposes GCC pushes the return addresses just before the old frame pointer (before push ebp), this is where 32-bit tools expect it to be.
Since ecx=esp_o+4, where esp_o is the original stack pointer pre-alignment, [ecx-4] = [esp_o] = return address.
Note that now the stack is at 12 bytes modulo 16, thus the local variable area must be of size 16*k+4 to have the stack aligned at 16-byte again.
In your example k is 1 and the area is of 20 bytes in size.
The subsequent sub esp, 12 is to align the stack for the gets function (the requirement is to have the stack aligned at the function call).
Finally, the code
mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp
leave
lea esp, [ecx-4]
ret
The first instruction is copy-paste error.
One could check it out or simply reason that
if it were there the [ebp-4] would be below the stack pointer (and there is no red zone for the i386 ABI).
The rest is just undoing what's is done in the prolog:
;Get the original stack pointer
mov ecx, DWORD PTR [ebp-4] ;ecx = esp_o+4
;Standard epilog
leave ;mov esp, ebp / pop ebp
;The stack pointer points to the copied return address
;Restore the original stack pointer
lea esp, [ecx-4] ;esp = esp_o
ret
GCC has to first get the original stack pointer (+4) saved on the stack, then restore the old frame pointer (ebp) and finally, restore the original stack pointer.
The return address is on the top of the stack when lea esp, [ecx-4] is executed, so in theory GCC could just return but it has to restore the original esp because main is not the first function to be executed in a C program, so it cannot leave the stack unbalanced.
1 This is not the latest version but the text quoted went unchanged in the successive editions.
2 This has been discussed here on SO but I can't remember if in some comment or in an answer.

Why is the assembler using FS:40 [duplicate]

My hello & regards to all. I have a C program, basically wrote for testing Buffer overflow.
#include<stdio.h>
void display()
{
char buff[8];
gets(buff);
puts(buff);
}
main()
{
display();
return(0);
}
Now i disassemble display and main sections of it using GDB. The code:-
Dump of assembler code for function main:
0x080484ae <+0>: push %ebp # saving ebp to stack
0x080484af <+1>: mov %esp,%ebp # saving esp in ebp
0x080484b1 <+3>: call 0x8048474 <display> # calling display function
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
0x080484bb <+13>: pop %ebp # remove ebp from stack
0x080484bc <+14>: ret # return
End of assembler dump.
Dump of assembler code for function display:
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
0x08048490 <+28>: lea -0xc(%ebp),%eax # LEA of 12 bytes lower to eax
0x08048493 <+31>: mov %eax,(%esp) # make esp point to eax contained address
0x08048496 <+34>: call 0x80483a4 <puts#plt> # again what is "#plt" ????
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
End of assembler dump.
So folks, you should consider my homework. Rest all of the code is known to me, except few lines. I have included a big "WHY ????" and some more questions in the comments ahead of each line. The first hurdle for me is "mov %gs:0x14,%eax" instruction, I cant make flow chart after this instruction. Somebody plz explain me, what these few instructions are meant for and doing what in the program? Thanks...
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
Don't you have this?:
return(0);
They are probably related. :)
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
It means reading 4 bytes into eax from memory at address gs:0x14. gs is a segment register. Most likely thread-local storage (AKA TLS) is referenced through this register.
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Don't know. Could be optimization-related.
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
It makes eax point to a local variable that lives on the stack. sub $0x10,%esp allocated some space for them.
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
Wrong. It writes eax to the stack, to the stack top. It will be passed as an on-stack argument to the called function:
0x0804848b <+23>: call 0x8048374 <gets#plt> # calling get, what is "#plt" ????
I don't know. Could be some name mangling.
By now you should've guessed what local variable that was. buff, what else could it be?
0x080484ac <+56>: leave # a new instruction, not known to me
Why don't you look it up in the CPU manual?
Now, I can probably explain you the gs/TLS thing...
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
...
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>
So, this code takes a value from the TLS (at gs:0x14) and stores it right below the saved ebp value (at ebp-4). Then there's your stuff with get() and put(). Then this code checks whether the copy of the value from the TLS is unchanged. xor %gs:0x14,%eax does the compare.
If XORed values are the same, the result of the XOR is 0 and flags.zf is 1. Else, the result isn't 0 and flags.zf is 0.
je 0x80484ac <display+56> checks flags.zf and skips call 0x8048394 <__stack_chk_fail#plt> if flags.zf = 1. IOW, this call is skipped if the copy of the value from the TLS is unchanged.
What is that all about? That's a way to try to catch a buffer overflow. If you write beyond the end of the buffer, you will overwrite that value copied from the TLS to the stack.
Why do we take this value from the TLS, why not just a constant, hard-coded value? We probably want to use different, non-hard-coded values to catch overflows more often (and so the value in the TLS will change from a run to another run of your program and it will be different in different threads of your program). That also lowers chances of successfully exploiting the buffer overflow by an attacker if the value is chosen randomly each time your program runs.
Finally, if the copy of the value is found to have been overwritten due to a buffer overflow, call 0x8048394 <__stack_chk_fail#plt> will call a special function dedicated to doing whatever's necessary, e.g. reporting a problem and terminating the program.
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail#plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
The gs segment can be used for thread local storage. E.g. it's used for errno, so that each thread in a multi-threaded program effectively has its own errno variable.
The function name above is a big clue. This must be a stack canary.
(leave is some CISC instruction that does everything you need to do before the actual ret. I don't know the details).
Others already explained the GS thing (has to do with threads)..
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Explaining this requires some history of the X86 architecture:
the xor eax, eax instruction clears out all bits in register eax (loads a zero), but as you've already found it this seems to be unnecessary because the register gets loaded with a new value in the next instruction.
However, xor eax, eax does something else on the x86 as well. You probably know that you are able to access parts of the register eax by using al, ah and ax. It has been that way since the 386, and it was okay back then when eax really was a single register.
However, this is no more. The registers that you see and use in your code are just placeholders. Inside the CPU is working with much more internal registers and a completely different instruction set. Instructions that you write are translated into this internal instruction set.
If you use AL, AH and EAX for example you are using three different registers from the CPU point of view.
Now if you access EAX after you have used AL or AH, the CPU has to merge back these different registers to build a valid EAX value.
The line:
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Does not only clear out register eax. It also tells the CPU that all renamed sub-registers: AL, AH and AX can now considered to be invalidated (set to zero) and the CPU does not have to do any sub-register merging.
Why is the compiler emitting this instruction?
Because the compiler does not know in which context display() will get called. You may call it from a piece of code that does lots of byte arithmetic using AL and AH. If it would not clear out the EAX register via XOR, the CPU would have to do the costly register merging which takes a lot of cycles.
So doing this extra work at the function start improves performance. It is unnecessary in your case, but since the compiler can't know that emits the instruction to be sure.
The stack_check_fail is part of gcc buffer overflow check. It uses libssp (stack-smash-protection), and your move at the beginning sets up a guard for the stack, and the xor %gs:0x14... is a check if the guard is still ok. When it is ok, it jumps to the leave (check assembler doc for it, its an helper instruction for stack handling) and skips the jump to the stack_chk_fail, which would abort the program and emit an error message.
You can disable the emitting of this overflow check with the gcc option -fno-stack-protector.
And as already mentioned in the comments, the xor x,x is just a quick command to clear x, and the final mov 0, %eax is for the return value of your main.

Mixing C and Assembly

I'm doing a program in assembly to read a disk through ports (0x1f0-0x1f7) and I'm mixing it with c. I have a function in assembly that I will call in my c main funtion. My main function as 1 parameter: sectors to read:
Kernel.c
extern int _readd(int nmrsector);
(...)
int sector = 257;
int error = _readd(sector);
if(error == 0) PrintString("Error"); //It is declared on my screen.h file
disk.asm
global _readd
_readd:
push eax
push ebx
push ecx
push edx
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, eax
cmp ecx, 256
jg short _fail
jne short _good
_fail:
xor eax, eax
leave
ret
_good:
xor eax, eax
mov eax, 12
leave
ret
It crashes when run it with VirtualBox. Any ideas?
If you save CPU registers when you enter a function, you need to restore them when you are finished. Your PUSHs need to be matched with POPs.
Also, if you use a stack frame to access local variables and parameters, setup the frame (push ebp ; mov ebp, esp) before everything, so you can more easily refer to them. Here [ebp+8] doesn't refer to a parameter, because you alter the stack before setting up the frame.

Disassembly of a C function

I was trying to understand disassembled code of the following function.
void func(char *string) {
printf("the string is %s\n",string);
}
The disassembled code is given below.
1) 0x080483e4 <+0>: push %ebp
2) 0x080483e5 <+1>: mov %esp,%ebp
3) 0x080483e7 <+3>: sub $0x18,%esp
4) 0x080483ea <+6>: mov $0x80484f0,%eax
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
7) 0x080483f6 <+18>: mov %eax,(%esp)
8) 0x080483f9 <+21>: call 0x8048300 <printf#plt>
Could anyone tell me what lines 4-7 means (not the literal explanation). Also why 24 bytes are allocated on stack on line 3?
Basically what happens here:
4) 0x080483ea <+6> : mov $0x80484f0,%eax
Load the address of "the string is %s\n" into eax.
5) 0x080483ef <+11>: mov 0x8(%ebp),%edx
Move the argument string into edx.
6) 0x080483f2 <+14>: mov %edx,0x4(%esp)
Push the value of edx or string into the stack, second argument of printf
7) 0x080483f6 <+18>: mov %eax,(%esp)
Push the value of eax or "the string is %s\n" into the stack, first argument of printf, and then it will call printf.
sub $0x18,%esp is not necessary since the function has no local variables, gcc seems to like making extra space but honestly I don't know why.
The stack is a continuous region of memory that starts at a higher address and ends at esp. Whenever you need your stack to grow, you subtract from esp. Every function can have a frame on the stack. It is the part of the stack that the function owns and is responsible for cleaning after it is done. It means, when the function starts it decreases esp to create its frame. When it ends it increases it back. ebp usually points to the beginning of your frame.
Initially this function pushs ebp to tha stack so that it can be stored when the function ends, sets esp = ebp to mark the begin of its frame and allocate 28 bytes. Why 28? For alignment. It already allocated 4 bytes for the ebp. 4 + 28 = 32.
The lines 4-7 will prepare the call to printf. It expects its arguments to be on the frame of the caller. When we read mov 0x8(%ebp), %edx, we are taking our argument char* string from the caller's frame. printf will do the same.
Note that your assembly is missing a leave and a ret instructions to clear the stack and return to the caller.

Resources