Writing assembly Language in C [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing assembly language in C. I was given the following assembly code:
fn:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
movl %eax, -4(%ebp)
subl $1, 8(%ebp)
.f2:
cmpl $1, 8(%ebp)
jg .f3
movl -4(%ebp), %eax
leave
ret
I have written my code in C below:
fn (int x)
{
int y = 1;
if(x > 1):
int z = y *= x
x – 1;
return z;
}
Can someone tell me if I am on the right track with my C code? or if I am off and if I am can you point me in the right direction.
Thank You in advanced

fn:
pushl %ebp
movl %esp, %ebp
This is the usual function startup used in the calling convertion __cdecl, save existing %ebp to the stack then save %esp (current stack address) to %ebp. %ebp will be from now on the reference to the stack call point of this function used by the assembly code to access both parameters and local variables. To conclude this function, the opposite will be done in the future, as well as setting the "return" value to %eax.
At this point you can assume %ebp points to the current stack position where the previous %ebp is stored, %ebp + 4 is the function's return point, and everything from %ebp + 8 on is the data of the function parameters. Negative values will be accessing unused stack space, where the function can store it's local variables.
subl $16, %esp
This is reserving 22 bytes of information in the stack for local variables. It can be 22 variables of 1 byte or 1 variable of 22 bytes, there is no way to know. It can reserve unused bytes to make stack alignment.
Its important to change the value of %esp at this point so any call to push, pop and call won't overwrite this function's local variables.
movl $1, -4(%ebp)
Here it is possible to assume that the first of the local variables present in this function is a 32-bits integer at the address %ebp - 4. It's value was just set to 1. Lets call this variable i.
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
Based on what this imull is doing we can assume the function takes at least one parameter at the address %ebp + 8, and it seems its a 32-bits integer being multiplied by i. Lets call this parameter x.
movl %eax, -4(%ebp)
So far we can assume i = x * i.
subl $1, 8(%ebp)
And now x = x - 1.
.f2:
cmpl $1, 8(%ebp)
jg .f3
Here we are checking if x is greater than 1. If true, jumps to f3, that is a backward jump therefore it seems we have a loop going on.
movl -4(%ebp), %eax
leave
ret
Here outside the loop we see it set %eax to i and terminate the function, we can interpret it as return i.
Compiling the analyzed information together, we can assume the original function must have looked somewhat like this:
int fn(int x)
{
int i = 1;
while (x > 1)
{
i = i * x; // or just i *= x, same thing
x = x - 1; // or just x--, same thing
}
return i;
}

Related

Calling C function from the x86 assembler

I am trying to write a function that converts decimal numbers into binary in assembler. Since printing is so troublesome in there, I have decided to make a separate function in C that just prints the numbers. But when I run the code, it always prints '0110101110110100'
Heres the C function (both print and conversion):
void printBin(int x) {
printf("%d", x);
}
void DecToBin(int n)
{
// Size of an integer is assumed to be 16 bits
for (int i = 15; i >= 0; i--) {
int k = n >> i;
printBin(k & 1);
}
heres the code in asm:
.globl _DecToBin
.extern _printBin
_DecToBin:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp),%eax
movl $15, %ebx
cmpl $0, %ebx
jl end
start:
movl %ebx, %ecx
movl %eax, %edx
shrl %cl, %eax
andl $1, %eax
pushl %eax
call _printBin
movl %edx, %eax
dec %ebx
cmpl $0, %ebx
jge start
end:
movl %ebp, %esp
popl %ebp
ret
Cant figure out where the mistake is. Any help would be appreciated
disassembled code using online program
Your principle problem is that it is very unlikely that %edx is preserved across the function call to printBin.
Also:
%ebx is not a volatile register in most (any?) C calling convention rules. You need to check your compilers documentation and conform to it.
If you are going to use ebx, you need to save and restore it.
The stack pointer needs to be kept aligned to 16 bytes. On my machine (macos), it SEGVs under printBin if you don’t.

Using "lea ($30, %edx), %eax to add edx and 30 and put it into eax

I'm studying for a midterm tomorrow and one of the questions on a previous midterm is:
Consider the following C function. Write the corresponding assembly language function to perform the same operation.
int myFunction (int a)
{
return (a + 30);
}
What I wrote down is:
.global _myFunction
_myFunction:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
lea ($30, %edx), %eax
leave
ret
where a is edx and a+30 would be eax. Is the use of lea correct in this case? Would it instead need to be
lea ($30, %edx, 1), %eax
Thanks.
If you are looking to simply add 30 using leal then you should do it this way:
leal 30(%edx), %eax
The notation is displacement(baseregister, offsetregister, scalarmultiplier). The displacement is placed on the outside. 30 is added to edx and stored in eax. In AT&T/GAS notation you can leave off both the offset and multiplier. In our example this leaves us with the equivalent of base + displacement or edx + 30 in this example.
cHao also brings up a good point. Let us say the professor asks you to optimize your code. There are some inefficiencies in the fact that myFunction uses no local variables and doesn't need stack space for itself. Because of that all the stack frame creation and destruction can be removed. If you remove the stack frame then you no longer push %ebp as well. That means your first parameter int a is at 4(%esp) . With that in mind your function can be reduced to something this simple:
.global _myFunction
_myFunction:
movl 4(%esp), %eax
addl $30, %eax
ret
Of course the moment you change your function so that it needs to store things on the stack you would have to put the stack frame code back in (pushl %ebp, pushl %ebp, leave etc)

C Code represented as Assembler Code - How to interpret?

I got this short C Code.
#include <stdint.h>
uint64_t multiply(uint32_t x, uint32_t y) {
uint64_t res;
res = x*y;
return res;
}
int main() {
uint32_t a = 3, b = 5, z;
z = multiply(a,b);
return 0;
}
There is also an Assembler Code for the given C code above.
I don't understand everything of that assembler code. I commented each line and you will find my question in the comments for each line.
The Assembler Code is:
.text
multiply:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // takes the current stack pointer and uses it as the frame for the called function
subl $16, %esp // it leaves room on the stack, but why 16Bytes. sizeof(res) = 8Bytes
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
movl $0, -4(%ebp) // here as well
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
leave
ret
main:
pushl %ebp // stores the stack frame of the calling function on the stack
movl %esp, %ebp // // takes the current stack pointer and uses it as the frame for the called function
andl $-8, %esp // what happens here and why?
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
movl $3, 20(%esp) // 3 gets pushed on the stack
movl $5, 16(%esp) // 5 also get pushed on the stack
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
movl %eax, 4(%esp) // we got the here as well
movl 20(%esp), %eax // and also here
movl %eax, (%esp) // what does happen in this line?
call multiply // thats clear, the function multiply gets called
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
movl $0, %eax // I suppose, this line is because of "return 0;"
leave
ret
Negative references relative to %ebp are for local variables on the stack.
movl 8(%ebp), %eax // I don't know quite what "8(%ebp) mean? It has to do something with res, because`
%eax = x
imull 12(%ebp), %eax // here is the multiplication done. And again "12(%ebp).
%eax = %eax * y
movl %eax, -8(%ebp) // Now, we got a negative number in front of. How to interpret this?
(u_int32_t)res = %eax // sets low 32 bits of res
movl $0, -4(%ebp) // here as well
clears upper 32 bits of res to extend 32-bit multiplication result to uint64_t
movl -8(%ebp), %eax // and here again.
movl -4(%ebp), %edx // also here
return ret; //64-bit results are returned as a pair of 32-bit registers %edx:%eax
As for the main, see x86 calling convention which may help making sense of what happens.
andl $-8, %esp // what happens here and why?
stack boundary is aligned by 8. I believe it's ABI requirement
subl $24, %esp // here, it leaves room for local variables, but why 24 bytes? a, b, c: the size of each of them is 4 Bytes. So 3*4 = 12
Multiples of 8 (probably due to alignment requirements)
movl $3, 20(%esp) // 3 gets pushed on the stack
a = 3
movl $5, 16(%esp) // 5 also get pushed on the stack
b = 5
movl 16(%esp), %eax // what does 16(%esp) mean and what happened with z?
%eax = b
z is at 12(%esp) and is not used yet.
movl %eax, 4(%esp) // we got the here as well
put b on the stack (second argument to multiply())
movl 20(%esp), %eax // and also here
%eax = a
movl %eax, (%esp) // what does happen in this line?
put a on the stack (first argument to multiply())
call multiply // thats clear, the function multiply gets called
multiply returns 64-bit result in %edx:%eax
movl %eax, 12(%esp) // it looks like the same as two lines before, except it contains the number 12
z = (uint32_t) multiply()
movl $0, %eax // I suppose, this line is because of "return 0;"
yup. return 0;
Arguments are pushed onto the stack when the function is called. Inside the function, the stack pointer at that time is saved as the base pointer. (You got that much already.) The base pointer is used as a fixed location from which to reference arguments (which are above it, hence the positive offsets) and local variables (which are below it, hence the negative offsets).
The advantage of using a base pointer is that it is stable throughout the entire function, even when the stack pointer changes (due to function calls and new scopes).
So 8(%ebp) is one argument, and 12(%ebp) is the other.
The code is likely using more space on the stack than it needs to, because it is using temporary variables that could be optimized out of you had optimization turned on.
You might find this helpful: http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
I started typing this as a comment but it was getting too long to fit.
You can compile your example with -masm=intel so the assembly is more readable. Also, don't confuse the push and pop instructions with mov. push and pop always increments and decrements esp respectively before derefing the address whereas mov does not.
There are two ways to store values onto the stack. You can either push each item onto it one item at a time or you can allocate up-front the space required and then load each value onto the stackslot using mov + relative offset from either esp or ebp.
In your example, gcc chose the second method since that's usually faster because, unlike the first method, you're not constantly incrementing esp before saving the value onto the stack.
To address your other question in comment, x86 instruction set does not have a mov instruction for copying values from memory location a to another memory location b directly. It is not uncommon to see code like:
mov eax, [esp+16]
mov [esp+4], eax
mov eax, [esp+20]
mov [esp], eax
call multiply(unsigned int, unsigned int)
mov [esp+12], eax
Register eax is being used as an intermediate temporary variable to help copy data between the two stack locations. You can mentally translate the above as:
esp[4] = esp[16]; // argument 2
esp[0] = esp[20]; // argument 1
call multiply
esp[12] = eax; // eax has return value
Here's what the stack approximately looks like right before the call to multiply:
lower addr esp => uint32_t:a_copy = 3 <--. arg1 to 'multiply'
esp + 4 uint32_t:b_copy = 5 <--. arg2 to 'multiply'
^ esp + 8 ????
^ esp + 12 uint32_t:z = ? <--.
| esp + 16 uint32_t:b = 5 | local variables in 'main'
| esp + 20 uint32_t:a = 3 <--.
| ...
| ...
higher addr ebp previous frame

understanding some assembly code

I am trying to learn some assembly code, so I read in some tutorial that the assembly code for
int proc(void)
{
int x,y;
scanf("%x %x", &y, &x);
return x-y;
}
is
1 proc:
2 pushl %ebp
3 movl %esp, %ebp
4 subl $40, %esp
5 leal -4(%ebp), %eax
6 movl %eax, 8(%esp)
7 leal -8(%ebp), %eax
8 movl %eax, 4(%esp)
9 movl $.LC0, (%esp)
10 call scanf
Diagram stack frame at this point
11 movl -4(%ebp), %eax
12 subl -8(%ebp), %eax
13 leave
14 ret
If I well understood, the instructions of line 5 to 8 store some addresses that will be used to store the values of scanf's input. So is it right to say that scanf uses systematically the address %esp plus a certain number of bytes (depending on the sizeof the input) to fetch the address at which is the data will be stored ?
What's happening here is that a stack frame is built up to pass arguments to scanf. subl is used to allocate space for the new stack frame and the movl is used with offsets from the stack pointer, %esp, to write values for the arguments on the freshly allocated stack frame.
A more thorough explanation on x86 calling conventions and cdecl in particular can be found here. Understanding the high-level structure of the stack frame and the cdecl convention will help you make sense of the intent of this code snippet.
Calling convention of scanf is cdecl. It passes its arguments to stack pointed by esp.

Analyzing the assembly code generated to manipulate command line arguments

#include <stdio.h>
int main(int argc, char * argv[])
{
argv[1][2] = 'A';
return 0;
}
Here is the corresponding assembly code from GCC for a 32-bit Intel architecture. I can't totally understand what is going on.
main:
leal 4(%esp), %ecx - Add 4 to esp and store the address in ecx
andl $-16, %esp - Store first 28 bits from esp's address into esp??
pushl -4(%ecx) - Push the old esp on stack
pushl %ebp - Preamble
movl %esp, %ebp
pushl %ecx - push old esp + 4 on stack
movl 4(%ecx), %eax - move ecx + 4 to eax. this is the address of argv. argc stored at (%ecx).
addl $4, %eax - argv[1]
movl (%eax), %eax - argv[1][0]
addl $2, %eax - argv[1][2]
movb $65, (%eax) - move 'A'
movl $0, %eax - move return value (0)
popl %ecx - get old value of ecx
leave
leal -4(%ecx), %esp - restore esp
ret
What is going on in the beginning of the code before the preamble? Where is argv store according to the following code? On the stack?
The funny code (the first two lines) that you are seeing is the alignment of the stack to 16 bytes (-16 is the same as ~15, and x & ~15 rounds x to a multiple of 16).
argv would be stored at ESP + 8 when entering the function, what leal 4(%esp), %ecx does is create a pointer to a pseudo-struct containing argc and argv, then it proceeds to access them from there. movl 4(%ecx), %eax access argv from this pseudo-struct.
argv is a parameter to "main()", so in many ABIs, it will indeed be passed on the stack.

Resources