Given Assembly, translate to C - c

I am originally given the function prototype:
void decode1(int *xp, int *yp, int *zp)
now i am told to convert the following assembly into C code:
movl 8(%ebp), %edi //line 1 ;; gets xp
movl 12(%ebp), %edx //line 2 ;; gets yp
movl 16(%ebp),%ecx //line 3 ;; gets zp
movl (%edx), %ebx //line 4 ;; gets y
movl (%ecx), %esi //line 5 ;; gets z
movl (%edi), %eax //line 6 ;; gets x
movl %eax, (%edx) //line 7 ;; stores x into yp
movl %ebx, (%ecx) //line 8 ;; stores y into zp
movl %esi, (%edi) //line 9 ;; stores z into xp
These comments were not given to me in the problem this is what I believe they are doing but am not 100% sure.
My question is, for lines 4-6, am I able to assume that the command
movl (%edx), %ebx
movl (%ecx), %esi
movl (%edi), %eax
just creates a local variables to y,z,x?
also, do the registers that each variable get stored in i.e (edi,edx,ecx) matter or can I use any register in any order to take the pointers off of the stack?
C code:
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
If I wasn't given the function prototype how would I tell what type of return type is used?

Let's focus on a simpler set of instructions.
First:
movl 8(%ebp), %edi
will load into the EDI register the content of the 4 bytes that are situated on memory at 8 eight bytes beyond the address set in the EBP register. This special EBP usage is a convention followed by the compiler code generator, that per each function, saves the stack pointer ESP into the EBP registers, and then creates a stack frame for the function local variables.
Now, in the EDI register, we have the first parameter passed to the function, that is a pointer to an integer, so EDI contains now the address of that integer, but not the integer itself.
movl (%edi), %eax
will get the 4 bytes pointed by the EDI register and load them into the EAX register.
Now in EAX we have the value of the integer pointed by the xp in the first parameter.
And then:
movl %eax, (%edx)
will save this integer value into the memory pointed by the content of the EDX register which was in turn loaded from EBP+12 which is the second parameter passed to the function.
So, your first question, is this assembly code equivalent to this?
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
is, yes, but note that there are no tx,ty,tz local variables created, but just processor registers.
And your second question, is no, you can't tell the type of return, it is, again, a convention on the register usage that you can't infer just by looking at the generated assembly code.

Congratulations, you got everything right :)
You can use any register but some need to be preserved, that is they should be saved before use and restored afterwards. In typical calling conventions you can use eax, ecx and edx, the rest need to be preserved. The assembly you showed doesn't include code to do this, but presumably it is there.
As for the return type, that's hard to deduce. Simple types are returned in the eax register, and something is always in there. We can't tell if that's intended as a return value, or just remains of a local variable. That is, if your function had return tx; it could be the same assembly code. Also, we don't know the type for eax either, it could be anything that fits in there and is expected to be returned there according to the calling convention.

Related

LEAL and MOVL in machine instruction

what is the difference between following machine codes??
movl 8(%ebp), %ecx
leal 8(%ebp), %ecx
can someone explain this to me???
The first fetches the 32 bit value pointed to by 8(%ebp).
The latter computes the flat address.
Thus, in C, given int x = 0; and it is located at 8(%ebp) (i.e. x is in the stack frame of the function):
The first is int y = x;
The latter is int *z = &x;
In machine code [for most/many architectures, such as x86--but not all (e.g. mc68000)] registers are the same regardless of whether they contain a value or address.

Understanding C vs its Assembly counterpart

Say we are given a function:
int exchange(int*xp, int y)
{
x = *xp;
*xp = y;
return x;
}
So, the book I am reading explains that xp is stored at offsets 8 and 12 relative to the address register %ebp. What I am not understanding is why they are stored as any kind of unit 8 and 12, further more: What is an offset in this context? Finally, how do 8 and 12 fit when the register accepts movement in units of 1 2 and 4 bytes respectively?
The assembly code :
xp at %ebp+8, y at%ebp+12
1 movl 8(%ebp), %edx (Get xp By copying to %eax below, x becomes the return value)
2 movl (%edx), %eax (Get x at xp)
3 movl 12(%ebp), %ecx (Get y)
4 movl %ecx, (%edx) (Store y at xp)
What I think the answer is:
So, when examining registries, it was common to see something like registry %rdi holding a value of 0x1004 which is an address and 0x1004 is in the address which holds a value 0xAA.
Of course, this is a hypothetical example that doesn't line up with the registries listed in the book. Each registry is 16-32 bit and the top four can be used to store integers freely. Does offsetting it by 8 make it akin to 0x1000 + 8? Again, I'm not entirely sure what the offset in this scenario is for when we are storing new units into empty space.
Because of how the call stack is structured when using C declaration.
First the caller will push the 4-byte y, then the 4-byte xp (this order is important so C can support Variadic Functions), then the call to your function will implicitly push the return address which is also 4-byte (this is a 32-bit program).
The first thing your function does is push the state of ebp which it will need to recover later so that the caller can continue working properly, and then copy the current state of esp (stack pointer) to ebp. In sum:
push %ebp
movl %esp, %ebp
This is also known as function prologue.
When all this is done you are finally ready to actually run the code you wrote, at this stage the stack is something like this:
%ebp- ? = address of your local variables (which in this example you don't have)
%ebp+ 0 = address of the saved state of previous ebp
%ebp+ 4 = ret address
%ebp+ 8 = address where is stored the value of xp
%ebp+12 = address where is stored the value of y
%ebp+16 = out of bonds, this memory space belongs to the caller
When your function is done it will wrap it up by setting esp back to ebp, then pop the original ebp and ret.
movl %ebp, %esp
pop %ebp
ret
ret is basically a shortcut to pop a pointer from the stack and jmp to it.
Edit: Fixed order of parameters for AT&T assembly
Look at the normal function entry in assembler:
push ebp
mov ebp, esp
sub esp, <size of local variables>
So ebp+4 holds the previous value of ebp. Before the old ebp was the return address, at ebp+8. Before that are the parameters of the function, in reverse order, so the first parameter is at ebp+12 and the second at ebp+8.

Simplifying Assembly Instruction

I'm trying to convert the following code into a single line using leal.
movl 4(%esp), %eax
sall $2, %eax
addl 8(%esp), %eax
addl $4, %eax
My question is of 3 parts:
Does the '%' in front of the register simply define the following string as a register?
Does the '$' in front of the integers define the following value type as int?
Is leal 4(%rsi, 4, %rdi), %eax a correct conversion from the above assembly? (ignoring the change from 32-bit to 64-bit)
Edit: Another question. would
unsigned int fun3(unsigned int x, unsigned int y)
{
unsigned int *z = &x;
unsigned int w = 4+y;
return (4*(*z)+w);
}
generate the above code? I'm unfamiliar with pointers.
1: if % yes
2: there is no int or float or bool or char or... in asm. You are dealing with the machine. It means it is a constant
3: 1 move value in (esp - 4) to eax. esp is the stack pointer, eax is the register used by c function to return values.
2 shift to left two times. same as multiply by 4
3 add value in (esp - 8) to value in eax
4 add 4 to value in eax
x*4+y+4 = eax x is (esp -4), y is (esp-8)
leal is the same as, 4+rsi+4*rdi =eax
so yes it the same in a way.
That depend on the compiler, but yes that is valid translation. 4*x+y+4

C Code represented as Assembler Code - How to interpret?

I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret
Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;
Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame

Help translating from assembly to C

I have some code from a function
subl $24, %esp
movl 8(%ebp), %eax
cmpl 12(%ebp), %eax
Before the code is just the 'ENTER' command and afterwards there's an if statement to return 1 if ebp > eax or 0 if it's less. I'm assuming cmpl means compare, but I can't tell what the concrete values are. Can anyone tell me what's happening?
Yes cmpl means compare (with 4-byte arguments). Suppose the piece of code is followed by a jg <addr>:
movl 8(%ebp), %eax
cmpl 12(%ebp), %eax
jg <addr>
Then the code is similar to
eax = ebp[8];
if (eax > ebp[12])
goto <addr>;
Your code fragment resembles the entry code used by some processors and compilers. The entry code is assembly code that a compiler issues when entering a function.
Entry code is responsible for saving function parameters and allocating space for local variables and optionally initializing them. The entry code uses pointers to the storage area of the variables. Some processors use a combination of the EBP and ESP registers to point to the location of the local variables (and function parameters).
Since the compiler knows where the variables (and function parameters) are stored, it drops the variable names and uses numerical indexing. For example, the line:
movl 8(%ebp), %eax
would either move the contents of the 8th local variable into the register EAX, or move the value at 8 bytes from the start of the local area (assuming the the EBP register pointers to the start of the local variable area).
The instruction:
subl $24, %esp
implies that the compiler is reserving 24 bytes on the stack. This could be to protect some information in the function calling convention. The function would be able to use the area after this for its own usage. This reserved area may contain function parameters.
The code fragment you supplied looks like it is comparing two local variables inside a function:
void Unknown_Function(long param1, long param2, long param3)
{
unsigned int local_variable_1;
unsigned int local_variable_2;
unsigned int local_variable_3;
if (local_variable_2 < local_variable_3)
{
//...
}
}
Try disassembling the above function and see how close it matches your code fragment.
This is a comparison between (EBP + 8) and (EBP + 12). Based on the comparison result, the cmpl instruction sets flags that are used by following jump instructions.
In Mac OS X 32 bit ABI EBP + 8 is the first function parameter, and EBP + 12 is the second parameter.

Resources