I am trying to convert this assembly code into a C snippet.
movl $0, -4(%ebp) # 4
movl -4(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax
jg .L6
.L6:
nop
Here's what I have so far, but I think something is wrong. The line "movl (%eax), eax" confuses me in particular.
int local = 0;
if ((int*)((local << 2) + param1) > parameter2) {
; // do nothing
}
Your interpretation of movl %(eax), %eax is correct, but that of the line addl 8(%ebp), %eax is not. The correct code whould be something like this:
// parameter1 is an int* at 8(%ebp)
// parameter2 is an int at 12(%ebp)
int local = 0; // at -4(%ebp)
if (parameter1[local] > parameter2) {
; // nop
} else {
// whatever is betwween jg and .L6
}
Related
Recent times I am having a look at assembly IA32, I did a simple toy example:
#include <stdio.h>
int array[10];
int i = 0;
int sum = 0;
int main(void)
{
for (i = 0; i < 10; i++)
{
array[i] = i;
sum += array[i];
}
printf("SUM = %d\n",sum);
return 0;
}
Yes, I know it is not very recommended to use global variables. I compiled the above code without optimizations and using the flag -s, I got this assembly :
main:
...
movl $0, %eax
subl %eax, %esp
movl $0, i
.L2:
cmpl $9, i
jle .L5
jmp .L3
.L5:
movl i, %edx
movl i, %eax
movl %eax, array(,%edx,4)
movl i, %eax
movl array(,%eax,4), %eax
addl %eax, sum
incl i
jmp .L2
Nothing too fancy and easy to understand, it is a normal while loop. Then I compiled the same code with -O2 and got the following assembly :
main:
...
xorl %eax, %eax
movl $0, i
movl $1, %edx
.p2align 2,,3
.L6:
movl sum, %ecx
addl %eax, %ecx
movl %eax, array-4(,%edx,4)
movl %edx, %eax
incl %edx
cmpl $9, %eax
movl %ecx, sum
movl %eax, i
jle .L6
subl $8, %esp
pushl %ecx
pushl $.LC0
call printf
xorl %eax, %eax
leave
ret
In this case it transformed in a do while type of loop. From the above assembly what I am not understanding is why "movl $1, %edx" and then "movl %eax, array-4(,%edx,4)".
The %edx starts with 1 instead of 0 and then when accessing the array it does -4 from the initial position (4 bytes = integer) . Why not simply ?
movl $0, %edx
...
array (,%edx,4)
instead of starting with 1 if you need to do -4 all the time.
I am using "GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-24)", for educational reasons to generate easily understandable assembly.
I think I finally get the point, I test with:
...
int main(void)
{
for (i = 0; i < 10; i+=2)
{
...
}
}
and got:
movl $2, %edx
and with for (i = 0; i < 10; i +=3) and got :
movl $3, %edx
and finally with (i = 1; i < 10; i +=3) and got:
movl $4, %edx
Therefore, the compiler is initializing %edx = i (initial value of i) + incrementStep;
So I've been working on a problem (and before you ask, yes, it is homework, but I've been putting in faithful effort!) where I have some assembly code and want to be able to convert it (as faithfully as possible) to C.
Here is the assembly code:
A1:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $0, -4(%ebp)
jmp .L2
.L4:
movl -4(%ebp), %eax
sall $2, %eax
addl 8(%ebp), %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax
jg .L6
.L2:
movl -4(%ebp), %eax
cmpl 16(%ebp), %eax
jl .L4
jmp .L3
.L6:
nop
.L3:
movl -4(%ebp), %eax
leave
ret
And here's some of the C code I wrote to mimic it:
int A1(int a, int b, int c) {
int local = 0;
while(local < c) {
if(b > (int*)((local << 2) + a)) {
return local;
}
}
return local;
}
I have a few questions about how assembly works.
First, I notice that in L4, the body of the while loop, nothing is ever assigned to local. It's initialized to be 0 at the start of the function, and then never modified again. Looking at the C code I made for it, though, that seems odd, considering that the loop will go on indefinitely if the if-condition fails. Am I missing something there? I was under the impression that you'd need a snippet of code like:
movl %eax, -4(%ebp)
in order to actually assign anything to the local variable, and I don't see anything like that in the body of the while loop.
Secondly, you'll see that in the assembly code, the only local variable that's declared is "local". Hence, I have to use a snippet of code like:
if(b > (int*)((local << 2) + a))
The output of this line doesn't look much like the assembly code, though, and I think I might have made a mistake. What did I do wrong here?
And finally (thanks for your patience!), on a related note, I understand that the purpose of this if-loop in the while loop is to break out if the condition is fulfilled, and then to return local. Hence L6 and "nop" (which is basically saying nothing). However, I don't know how to replicate this in my program. I've tried "break", and I've tried returning local as you see here. I understand the functionality - I just don't know how to replicate it in C (short of using goto, but that kind of defeats the purpose of the exercise...).
Thank you for your time!
This is my guess:
int A1 (int *a, int value, int size)
{
int i = 0;
while (i<size)
{
if (a[i] <= value)
break;
}
return i;
}
Which, compiled back to assembly, gives me this code:
A1:
.LFB0:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $0, -4(%ebp)
jmp .L2
.L4:
movl -4(%ebp), %eax
leal 0(,%eax,4), %edx
movl 8(%ebp), %eax
addl %edx, %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax
jg .L2
jmp .L3
.L2:
movl -4(%ebp), %eax
cmpl 16(%ebp), %eax
jl .L4
.L3:
movl -4(%ebp), %eax
leave
ret
Now this seems to be identical to your original ASM code, just the code starting at L4 is not the same, but if we anotate both codes:
ORIGINAL
movl -4(%ebp), %eax ;EAX = local
sall $2, %eax ;EAX = EAX*4
addl 8(%ebp), %eax ;EAX = EAX+a, hence EAX=a+local*4
ASM-C-ASM
movl -4(%ebp), %eax ;EAX = i
leal 0(,%eax,4), %edx ;EDX = EAX*4
movl 8(%ebp), %eax ;EAX = a
addl %edx, %eax ;EAX = EAX+EDX, hence EAX=a+i*4
Both codes continue with
movl (%eax), %eax
Because of this, I guess a is actually a pointer to some variable type that uses 4 bytes. By the comparison between the second argument and the value read from memory, I guess that type must be either int or long. I choose int solely by convenience.
Of course this also means that this code (and the original one) does not make any sense. It lacks the i++ part somewhere. If this is so, then a is an array, and the third argument is the size of the array. I've named my local variable i to keep with the tradition of naming index variables like this.
This code would scan the array searching for a value inside it that is equal or less than value. If it finds it, the index to that value is returned. If not, the size of the array is returned.
I have this IA32 assembly language code I'm trying to convert into regular C code.
.globl fn
.type fn, #function
fn:
pushl %ebp #setup
movl $1, %eax #setup 1 is in A
movl %esp, %ebp #setup
movl 8(%ebp), %edx # pointer X is in D
cmpl $1, %edx # (*x > 1)
jle .L4
.L5:
imull %edx, %eax
subl $1, %edx
cmpl $1, %edx
jne .L5
.L4:
popl %ebp
ret
The trouble I'm having is deciding what type of comparison is going on. I don't get how the program gets to the L5 cache. L5 seems to be a loop since there's a comparison within it. I'm also unsure of what is being returned because it seems like most of the work is done is the %edx register, but doesn't go back to %eax for returning.
What I have so far:
int fn(int x)
{
}
It looks to me like it's computing a factorial. Ignoring the stack frame manipulation and such, we're left with:
movl $1, %eax #setup 1 is in A
Puts 1 into eax.
movl 8(%ebp), %edx # pointer X is in D
Retrieves a parameter into edx
imull %edx, %eax
Multiplies eax by edx, putting the result into eax.
subl $1, %edx
cmpl $1, %edx
jne .L5
Decrements edx and repeats if edx != 1.
In other words, this is roughly equivalent to:
unsigned fact(unsigned input) {
unsigned retval = 1;
for ( ; input != 1; --input)
retval *= input;
return retval;
}
Need some help converting assembly code to C. To my understanding it is a while loop with condition (a < c) but I do not understand the body of the while loop.
movl $0, -8(%ebp) # variable B is at ebp - 8
movl $0, -4(%ebp) # variable A is at ebp - 4
jmp .L3
.L2
movl 8(%ebp), %eax # parameter C is at ebp + 8
addl $2, %eax
addl %eax, %eax
addl %eax, -8(%ebp)
addl $1, -4(%ebp)
.L3
movl -4(%ebp), %eax
cmpl 8(%ebp), %eax
jl .L2
Also explain why you did what you did, thanks.
This is what I got so far
int a,b = 0;
while (a < c) {
c += 4 + 2*c;
a++;
}
If I did all that correctly, then the only thing I don't understand is the line
addl %eax, -8(%ebp)
addl %eax, -8(%ebp) will add the value in eax to the value stored at ebp-8. If you can understand the other add instructions then it's just the same. There's no add 4 intruction so I don't know how you can get the expression 4 + 2*c
movl $0, -8(%ebp) # B = 0
movl $0, -4(%ebp) # A = 0
jmp .L3
.L2
movl 8(%ebp), %eax # eax = C
addl $2, %eax # eax = C + 2
addl %eax, %eax # eax *= 2
addl %eax, -8(%ebp) # B += eax
addl $1, -4(%ebp) # A++
.L3
movl -4(%ebp), %eax
cmpl 8(%ebp), %eax
jl .L2
So the result is as below
int a, b = 0;
while (a < c) {
b += (c + 2)*2;
a++;
}
which is simply
int a = c, b = c*(c+2)*2;
I have the following recursive function to count all the nodes having value 20, in a circular doubly linked list. I need to convert this to tail recursive function to prevent safety issues. Please help me with the same. Thanks
int count(node *start)
{
return count_helper(start, start);
}
int count_helper(node *current, node *start)
{
int c;
c = 0;
if(current == NULL)
return 0;
if((current->roll_no) == 20)
c = 1;
if(current->next == start) return c;
return (c + count_helper(current->next, start));
}
In order to take advantage of tail recursion, the recursive call simply has to be the last thing performed. Currently, the only thing standing in the way of this goal is an addition. So, to transform the function, that addition has to be moved around. A common way to accomplish this is by passing the variable c as a parameter to the recursive helper function, as so:
int count(node *start)
{
return count_helper(start,start,0);
}
int count_helper(node *current, node *start, int c)
{
if(current == NULL)
return c;
if((current->roll_no) == 20)
c+=1;
if(current->next == start)
return c;
return count_helper(current->next, start,c);
}
This unrolls as follows (using gcc 4.6.1, as produced by gcc -S -O2):
count_helper:
.LFB23:
.cfi_startproc
pushl %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
movl 8(%esp), %edx
movl 12(%esp), %ebx
movl 16(%esp), %eax
testl %edx, %edx
jne .L15
jmp .L10
.p2align 4,,7
.p2align 3
.L14:
testl %edx, %edx
je .L10
.L15:
xorl %ecx, %ecx
cmpl $20, 4(%edx)
movl (%edx), %edx
sete %cl
addl %ecx, %eax
cmpl %ebx, %edx
jne .L14 # <-- this is the key line right here
.L10:
popl %ebx
.cfi_def_cfa_offset 4
.cfi_restore 3
ret
.cfi_endproc
Compare this to your original (done without -O2, as apparently the compiler finds a way to make your original tail recursive as well, although in the process it mucks it up so much that I can barely read it):
count_helper:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $40, %esp
movl $0, -12(%ebp)
cmpl $0, 8(%ebp)
jne .L3
movl $0, %eax
jmp .L4
.L3:
movl 8(%ebp), %eax
movl 4(%eax), %eax
cmpl $20, %eax
jne .L5
movl $1, -12(%ebp)
.L5:
movl 8(%ebp), %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax
jne .L6
movl -12(%ebp), %eax
jmp .L4
.L6:
movl 8(%ebp), %eax
movl (%eax), %eax
movl 12(%ebp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call count_helper # <-- this is the key line right here
addl -12(%ebp), %eax
.L4:
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc