LEAL and MOVL in machine instruction - c

what is the difference between following machine codes??
movl 8(%ebp), %ecx
leal 8(%ebp), %ecx
can someone explain this to me???

The first fetches the 32 bit value pointed to by 8(%ebp).
The latter computes the flat address.
Thus, in C, given int x = 0; and it is located at 8(%ebp) (i.e. x is in the stack frame of the function):
The first is int y = x;
The latter is int *z = &x;
In machine code [for most/many architectures, such as x86--but not all (e.g. mc68000)] registers are the same regardless of whether they contain a value or address.

Related

Given Assembly, translate to C

I am originally given the function prototype:
void decode1(int *xp, int *yp, int *zp)
now i am told to convert the following assembly into C code:
movl 8(%ebp), %edi //line 1 ;; gets xp
movl 12(%ebp), %edx //line 2 ;; gets yp
movl 16(%ebp),%ecx //line 3 ;; gets zp
movl (%edx), %ebx //line 4 ;; gets y
movl (%ecx), %esi //line 5 ;; gets z
movl (%edi), %eax //line 6 ;; gets x
movl %eax, (%edx) //line 7 ;; stores x into yp
movl %ebx, (%ecx) //line 8 ;; stores y into zp
movl %esi, (%edi) //line 9 ;; stores z into xp
These comments were not given to me in the problem this is what I believe they are doing but am not 100% sure.
My question is, for lines 4-6, am I able to assume that the command
movl (%edx), %ebx
movl (%ecx), %esi
movl (%edi), %eax
just creates a local variables to y,z,x?
also, do the registers that each variable get stored in i.e (edi,edx,ecx) matter or can I use any register in any order to take the pointers off of the stack?
C code:
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
If I wasn't given the function prototype how would I tell what type of return type is used?
Let's focus on a simpler set of instructions.
First:
movl 8(%ebp), %edi
will load into the EDI register the content of the 4 bytes that are situated on memory at 8 eight bytes beyond the address set in the EBP register. This special EBP usage is a convention followed by the compiler code generator, that per each function, saves the stack pointer ESP into the EBP registers, and then creates a stack frame for the function local variables.
Now, in the EDI register, we have the first parameter passed to the function, that is a pointer to an integer, so EDI contains now the address of that integer, but not the integer itself.
movl (%edi), %eax
will get the 4 bytes pointed by the EDI register and load them into the EAX register.
Now in EAX we have the value of the integer pointed by the xp in the first parameter.
And then:
movl %eax, (%edx)
will save this integer value into the memory pointed by the content of the EDX register which was in turn loaded from EBP+12 which is the second parameter passed to the function.
So, your first question, is this assembly code equivalent to this?
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
is, yes, but note that there are no tx,ty,tz local variables created, but just processor registers.
And your second question, is no, you can't tell the type of return, it is, again, a convention on the register usage that you can't infer just by looking at the generated assembly code.
Congratulations, you got everything right :)
You can use any register but some need to be preserved, that is they should be saved before use and restored afterwards. In typical calling conventions you can use eax, ecx and edx, the rest need to be preserved. The assembly you showed doesn't include code to do this, but presumably it is there.
As for the return type, that's hard to deduce. Simple types are returned in the eax register, and something is always in there. We can't tell if that's intended as a return value, or just remains of a local variable. That is, if your function had return tx; it could be the same assembly code. Also, we don't know the type for eax either, it could be anything that fits in there and is expected to be returned there according to the calling convention.

Confused about leal. Does it access the contents of a memory address, too?

32bit, AT&T/GAS syntax
I get a little confused between what is being stored into a register sometimes. Is it a value or is it an address?
Let's say the start of our function has this in its code.
movl 12(%ebp), %eax //Get i
leal (%eax,%eax,2), %eax //Compute 3*i
So 12(%ebp) isn't the value of i itself. It has the address of where the value of i is actually located in memory. So that means &i is being sent to the register eax.
The next line
Leal. Does it actually dereference the address that's in eax?
Here's what I think leal does. It's about loading an address and sending it to %eax register.
If I follow my definition, does it actual take the &i in eax and multiply it by 2, then add itself again? So it's &i + &i*2? Obviously that can't be right. If &i was FFFF FFF1 (impossible, but just an example), it's going to calculate an address that's outside of my memory range.
Shouldn't the line be ...
movl (%eax,%eax,2), %eax
where it will access the memory location stored in eax and see that at that mem address, i could equal, say 5, for example.
(If you wanted to know the rest of the code, here it is. It's about get an array element's address)
movl 16(%ebp), %edx //Get j
sall $2, %edx Compute //j*4
addl 8(%ebp), %edx //ComputexA+4j
movl (%edx,%eax,4), %eax //Read fromM[xA+4j+12i]
Thank you.
Although lea stands for "load effective address", it does not actually perform a load. Instead it calculates the address that the memory operand would reference, and leaves that value in the destination register. Perhaps it should have been called "calculate effective address".
In this case it is simply being (ab)used to calculate reg + reg * 2, a multiply by 3. It looks like the x3 has been factored out of a multiply by 12 (the array element size, I suppose) in order to use addressing modes to perform the multiplication work more efficiently. This is a pretty common pattern in x86 code.
Meaning of arguments of mnemonics varies greatly in general.
You should think arguments of "movl" and arguments of leal differently.
movl 12(%ebp), %eax //Get i
movl means "move long data from the location of first argument to the location of second argument."
12(%ebp) is a location on memory which 12 above from 'base pointer' (%ebp). That is location of variable "i" on the memory also. That assignment is determined by compiler and compiler output that information in comment.
"#eax" is a location of register on chip named "ax".
To convine these together , "movl 12(%ebp)" means move data at &i to register "ax".
leal (%eax,%eax,2), %eax //Compute 3*i
leal means "do mathematical operation of first argument and store result to the location of second argument".
(%eax,%eax,2) means ('ax' * 2) + 'ax'. ('ax' means value stored in register "ax")
As the result, value of 'ax' is multiplied by 3.
"ax" had a value of i, so 'ax' becomes a value i*3 after leal is executed.

Simplifying Assembly Instruction

I'm trying to convert the following code into a single line using leal.
movl 4(%esp), %eax
sall $2, %eax
addl 8(%esp), %eax
addl $4, %eax
My question is of 3 parts:
Does the '%' in front of the register simply define the following string as a register?
Does the '$' in front of the integers define the following value type as int?
Is leal 4(%rsi, 4, %rdi), %eax a correct conversion from the above assembly? (ignoring the change from 32-bit to 64-bit)
Edit: Another question. would
unsigned int fun3(unsigned int x, unsigned int y)
{
unsigned int *z = &x;
unsigned int w = 4+y;
return (4*(*z)+w);
}
generate the above code? I'm unfamiliar with pointers.
1: if % yes
2: there is no int or float or bool or char or... in asm. You are dealing with the machine. It means it is a constant
3: 1 move value in (esp - 4) to eax. esp is the stack pointer, eax is the register used by c function to return values.
2 shift to left two times. same as multiply by 4
3 add value in (esp - 8) to value in eax
4 add 4 to value in eax
x*4+y+4 = eax x is (esp -4), y is (esp-8)
leal is the same as, 4+rsi+4*rdi =eax
so yes it the same in a way.
That depend on the compiler, but yes that is valid translation. 4*x+y+4

C Code represented as Assembler Code - How to interpret?

I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret
Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;
Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame

Help translating from assembly to C

I have some code from a function
subl $24, %esp
movl 8(%ebp), %eax
cmpl 12(%ebp), %eax
Before the code is just the 'ENTER' command and afterwards there's an if statement to return 1 if ebp > eax or 0 if it's less. I'm assuming cmpl means compare, but I can't tell what the concrete values are. Can anyone tell me what's happening?
Yes cmpl means compare (with 4-byte arguments). Suppose the piece of code is followed by a jg <addr>:
movl 8(%ebp), %eax
cmpl 12(%ebp), %eax
jg <addr>
Then the code is similar to
eax = ebp[8];
if (eax > ebp[12])
goto <addr>;
Your code fragment resembles the entry code used by some processors and compilers. The entry code is assembly code that a compiler issues when entering a function.
Entry code is responsible for saving function parameters and allocating space for local variables and optionally initializing them. The entry code uses pointers to the storage area of the variables. Some processors use a combination of the EBP and ESP registers to point to the location of the local variables (and function parameters).
Since the compiler knows where the variables (and function parameters) are stored, it drops the variable names and uses numerical indexing. For example, the line:
movl 8(%ebp), %eax
would either move the contents of the 8th local variable into the register EAX, or move the value at 8 bytes from the start of the local area (assuming the the EBP register pointers to the start of the local variable area).
The instruction:
subl $24, %esp
implies that the compiler is reserving 24 bytes on the stack. This could be to protect some information in the function calling convention. The function would be able to use the area after this for its own usage. This reserved area may contain function parameters.
The code fragment you supplied looks like it is comparing two local variables inside a function:
void Unknown_Function(long param1, long param2, long param3)
{
unsigned int local_variable_1;
unsigned int local_variable_2;
unsigned int local_variable_3;
if (local_variable_2 < local_variable_3)
{
//...
}
}
Try disassembling the above function and see how close it matches your code fragment.
This is a comparison between (EBP + 8) and (EBP + 12). Based on the comparison result, the cmpl instruction sets flags that are used by following jump instructions.
In Mac OS X 32 bit ABI EBP + 8 is the first function parameter, and EBP + 12 is the second parameter.

Resources