Reverse engineer optimized c code from assembly - c

The point of this problem is to reverse engineer c code that was made after running the compiler with level 2 optimization. The original c code is as follows (computes the greatest common divisor):
int gcd(int a, int b){
int returnValue = 0;
if (a != 0 && b != 0){
int r;
int flag = 0;
while (flag == 0){
r = a % b;
if (r ==0){
flag = 1;
} else {
a = b;
b = r;
}
}
returnValue = b;
}
return(returnValue);
}
when I ran the optimized compile I ran this from the command line:
gcc -O2 -S Problem04b.c
to get the assembly file for this optimized code
.gcd:
.LFB12:
.cfi_startproc
testl %esi, %esi
je .L2
testl %edi, %edi
je .L2
.L7:
movl %edi, %edx
movl %edi, %eax
movl %esi, %edi
sarl $31, %edx
idivl %esi
testl %edx, %edx
jne .L9
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L2:
xorl %esi, %esi
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L9:
movl %edx, %esi
jmp .L7
.cfi_endproc
I need to convert this assembly code back to c code here is where I am at right now:
int gcd(int a int b){
/*
testl %esi %esi
sets zero flag if a is 0 (ZF) but doesn't store anything
*/
if (a == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
/*
testl %edi %edi
sets zero flag if b is 0 (ZF) but doesn't store anything
*/
if (b == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
do{
int r = b;
int returnValue = b;
}while();
}
Can anyone help me write this back in to c code? I'm pretty much lost.

First of all, you have the values mixed in your code. %esi begins with the value b and %edi begins with the value a.
You can infer from the testl %edx, %edx line that %edx is used as the condition variable for the loop beginning with .L7 (if %edx is different from 0 then control is transferred to the .L9 block and then returned to .L7). We'll refer to %edx as remainder in our reverse-engineered code.
Let's begin reverse-engineering the main loop:
movl %edi, %edx
Since %edi stores a, this is equivalent to initializing the value of remainder with a: int remainder = a;.
movl %edi, %eax
Store int temp = a;
movl %esi, %edi
Perform int a = b; (remember that %edi is a and %esi is b).
sarl $31, %edx
This arithmetic shift instruction shifts our remainder variable 31 bits to the right whilst maintaining the sign of the number. By shifting 31 bits you're setting remainder to 0 if it's positive (or zero) and to -1 if it's negative. So it's equivalent to remainder = (remainder < 0) ? -1 : 0.
idivl %esi
Divide %edx:%eax by %esi, or in our case, divide remainder * temp by b (the variable). The remainder will be stored in %edx, or in our code, remainder. When combining this with the previous instruction: if remainder < 0 then remainder = -1 * temp % b, and otherwise remainder = temp % b.
testl %edx, %edx
jne .L9
Check to see if remainder is equal to 0 - if it's not, jump to .L9. The code there simply sets b = remainder; before returning to .L7. In order to implement this in C, we'll keep a count variable that will store the amount of times the loop has iterated. We'll perform b = remainder at the beginning of the loop but only after the first iteration, meaning when count != 0.
We're now ready to build our full C loop:
int count = 0;
do {
if (count != 0)
b = remainder;
remainder = a;
temp = a;
a = b;
if (remainder < 0){
remainder = -1 * temp % b;
} else {
remainder = temp % b;
}
count++;
} while (remainder != 0)
And after the loop terminates,
movl %esi, %eax
ret
Will return the GCD that the program computed (in our code it'll be stored in the b variable).

Related

A reference for converting assembly 'shl', 'OR', 'AND', 'SHR' operations into C?

I'm to convert the following AT&T x86 assembly into C:
movl 8(%ebp), %edx
movl $0, %eax
movl $0, %ecx
jmp .L2
.L1
shll $1, %eax
movl %edx, %ebx
andl $1, %ebx
orl %ebx, %eax
shrl $1, %edx
addl $1, %ecx
.L2
cmpl $32, %ecx
jl .L1
leave
But must adhere to the following skeleton code:
int f(unsigned int x) {
int val = 0, i = 0;
while(________) {
val = ________________;
x = ________________;
i++;
}
return val;
}
I can tell that the snippet
.L2
cmpl $32, %ecx
jl .L1
can be interpreted as while(i<32). I also know that x is stored in %edx, val in %eax, and i in %ecx. However, I'm having a hard time converting the assembly within the while/.L1 loop into condensed high-level language that fits into the provided skeleton code. For example, can shll, shrl, orl, and andl simply be written using their direct C equivalents (<<,>>,|,&), or is there some more nuance to it?
Is there a standardized guide/"cheat sheet" for Assembly-to-C conversions?
I understand assembly to high-level conversion is not always clear-cut, but there are certainly patterns in assembly code that can be consistently interpreted as certain C operations.
For example, can shll, shrl, orl, and andl simply be written using
their direct C equivalents (<<,>>,|,&), or is there some more nuance
to it?
they can. Let's examine the loop body step-by-step:
shll $1, %eax // shift left eax by 1, same as "eax<<1" or even "eax*=2"
movl %edx, %ebx
andl $1, %ebx // ebx &= 1
orl %ebx, %eax // eax |= ebx
shrl $1, %edx // shift right edx by 1, same as "edx>>1" = "edx/=2"
gets us to
%eax *=2
%ebx = %edx
%ebx = %ebx & 1
%eax |= %ebx
%edx /= 2
ABI tells us (8(%ebp), %edx) that %edx is x, and %eax (return value) is val:
val *=2
%ebx = x // a
%ebx = %ebx & 1 // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert a into b:
val *=2
%ebx = (x & 1) // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert b into c:
val *=2
val |= (x & 1)
x /= 2
final step: combine both 'val =' into one
val = 2*val | (x & 1)
x /= 2
while (i < 32) { val = (val << 1) | (x & 1); x = x >> 1; i++; } except val and the return value should be unsigned and they aren't in your template. The function returns the bits in x reversed.
The actual answer to your question is more complicated and is pretty much: no there is no such guide and it can't exist because compilation loses information and you can't recreate that lost information from assembler. But you can often make a good educated guess.

Count even numbers in matrix

In this function I want to get even numbers in matrix. My parameters are a bidimensional pointer and the dimensions of matrix.
Anyone could help me? Where I got wrong...
subl $12, %esp # 12 bytes to local variables
movl $0, -12(%ebp) # counter = 0
movl 8(%ebp), %esi # **m
movl $0, %ebx # i = 0
it_rows:
cmpl 12(%ebp), %ebx # while(i != y)
jz end # if not equal jump
leal (%esi, %ebx, 4), %edi # m+i
movl $0, %ecx # j = 0
it_col:
cmpl 16(%ebp), %ecx # while(j != k)
jz next_row
movl (%edi, %ecx, 4), %eax # *(*(m+i)+j)
andl $0x1, %eax # LSB
cmpl $0, %eax # LSB == 0
jnz not_even
movl -12(%ebp), %eax # counter
incl %eax # counter++
movl %eax, -12(%ebp) # update local variable
not_even:
incl %ecx # j++
jmp it_col
next_row:
incl %ebx # i++
jmp it_rows
My implementation in C:
int count_even_matrix(int **m, int y, int k) {
int i, j;
int c = 0;
for(i = 0; i < y; i++) {
for(j = 0; j < k; j++) {
if(*(*(m+i)+j)%2 == 0) {
c++;
}
}
}
return c;
}
I'm just a beginner in C, but if you're passing the array name of your two dimensional array to your function, you're actually passing a pointer to the first element of the array, which in this case is a 3 element one dimensional array. So I believe that your pointer should be declared as int (*m)[3].
Change the if with
if ((m[i][j] % 2 ) == 0) {
c++;
}
i is your matrix rows index, and j the columns index. The two for already increment them.
I want to write my C implementation to assembly.
Your assembly has just one error: the leal, which has to be movl. Consider, what you call a bidimensional pointer is, as the type int ** conveys, a pointer to pointers to ints. With
leal (%esi, %ebx, 4), %edi # m+i
you only load the address of the pointer to the ints in the ebxth row, but you actually need the value of that pointer, i. e. the address of the ints, and that you get with
movl (%esi, %ebx, 4), %edi # *(m+i)

Recursive function in x86-assembly

I have a method method(int a, int b) in x86 assembly code:
method:
pushl %ebx
subl $24 , %esp
movl 32(%esp ) , %ebx
movl 36(%esp ) , %edx
movl $1 , %eax
testl %edx , %edx
je .L2
subl $1 , %edx
movl %edx , 4(%esp )
movl %ebx , (%esp )
call method
imull %ebx , %eax
.L2:
addl $24 , %esp
popl %ebx
ret
But I just cant wrap my head around its function.
a is written on %ebx, b is written on %edx.
%eax is initialized with 1.
If %edx is not 0, I substract 1 from %edx and push %edx and %ebx on the stack and call method again. I just dont understand what it does. And isn't it impossible to reach the line imull %ebx, %eax?
I would be really happy, if someone could explain the basic function of this method to me.
It is basically equivalent to following C function:
int method(int a, int b)
{
if (b == 0)
return 1;
return method(a, b-1) * a;
}
or closer to the assembly code:
int method(int a, int b)
{
if (b == 0)
return 1;
int temp = method(a, b-1);
return temp * a; // we do get here when the recursion is over,
// the same ways as we get to imull %ebx, %eax
// in your assembly code when the recursion is over
}
imull %ebx, %eax is reached when the recursive call returns.
The function appears to be calculating the power of the input variables (ab) via recursion and the value is returned via %eax.
The way this works is that the base case is when b is 0, and 1 is returned. When b > 0, method(a, b-1) * a is returned.

Interpret Assembly Code

I found the following assembly code and I have no idea what it is supposed to be doing (mainly because cmovg follows the movl instruction ):
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl %edx, %eax
sarl $31, %eax
testl %edx, %edx
movl $1, %edx
cmovg %edx, %eax
popl %ebp
ret
So here is how I have interpreted it so far:
pushes onto stack
a new pointer (stack pointer) creates to point at the same location as base pointer
gets the input (let's call it x)
copies x into register %eax (res = x)
res = res >> 31 sign extension
tests x
sets x = 1
if >, res = x
restores pointer
returns res
However, I am not sure what the significance of this subroutine is. To me it seems useless. I would appreciate it if you could point out what is being done here.
This code returns the sign of X. In C:
int sign(int x) {
if (x>0)
return 1;
else if (x==0)
return 0;
else
return -1;
}
The instruction sarl $31, %eax will put -1 in eax if it was negative, or 0 otherwise. Then the cmovg instruction will replace this value with 1 if x was positive.

Assembly language to C

So I have the following assembly language code which I need to convert into C. I am confused on a few lines of the code.
I understand that this is a for loop. I have added my comments on each line.
I think the for loop goes like this
for (int i = 1; i > 0; i << what?) {
//Calculate result
}
What is the test condition? And how do I change it?
Looking at the assembly code, what does the variable 'n' do?
This is Intel x86 so the format is movl = source, dest
movl 8(%ebp), %esi //Get x
movl 12(%ebp), %ebx //Get n
movl $-1, %edi //This should be result
movl $1, %edx //The i of the loop
.L2:
movl %edx, %eax
andl %esi, %eax
xorl %eax, %edi //result = result ^ (i & x)
movl %ebx, %ecx //Why do we do this? As we never use $%ebx or %ecx again
sall %cl, %edx //Where did %cl come from?
testl %edx, %edx //Tests if i != what? - condition of the for loop
jne .L2 //Loop again
movl %edi, %eax //Otherwise return result.
sall %cl, %edx shifts %edx left by %cl bits. (%cl, for reference, is the low byte of %ecx.) The subsequent testl tests whether that shift zeroed out %edx.
The jne is called that because it's often used in the context of comparisons, which in ASM are often just subtractions. The flags would be set based on the difference; ZF would be set if the items are equal (since x - x == 0). It's also called jnz in Intel syntax; i'm not sure whether GNU allows that too.
All together, the three instructions translate to i <<= n; if (i != 0) goto L2;. That plus the label seem to make a for loop.
for (i = 1; i != 0; i <<= n) { result ^= i & x; }
Or, more correctly (but achieving the same goal), a do...while loop.
i = 1;
do { result ^= i & x; i <<= n; } while (i != 0);

Resources