So I have the following assembly language code which I need to convert into C. I am confused on a few lines of the code.
I understand that this is a for loop. I have added my comments on each line.
I think the for loop goes like this
for (int i = 1; i > 0; i << what?) {
//Calculate result
}
What is the test condition? And how do I change it?
Looking at the assembly code, what does the variable 'n' do?
This is Intel x86 so the format is movl = source, dest
movl 8(%ebp), %esi //Get x
movl 12(%ebp), %ebx //Get n
movl $-1, %edi //This should be result
movl $1, %edx //The i of the loop
.L2:
movl %edx, %eax
andl %esi, %eax
xorl %eax, %edi //result = result ^ (i & x)
movl %ebx, %ecx //Why do we do this? As we never use $%ebx or %ecx again
sall %cl, %edx //Where did %cl come from?
testl %edx, %edx //Tests if i != what? - condition of the for loop
jne .L2 //Loop again
movl %edi, %eax //Otherwise return result.
sall %cl, %edx shifts %edx left by %cl bits. (%cl, for reference, is the low byte of %ecx.) The subsequent testl tests whether that shift zeroed out %edx.
The jne is called that because it's often used in the context of comparisons, which in ASM are often just subtractions. The flags would be set based on the difference; ZF would be set if the items are equal (since x - x == 0). It's also called jnz in Intel syntax; i'm not sure whether GNU allows that too.
All together, the three instructions translate to i <<= n; if (i != 0) goto L2;. That plus the label seem to make a for loop.
for (i = 1; i != 0; i <<= n) { result ^= i & x; }
Or, more correctly (but achieving the same goal), a do...while loop.
i = 1;
do { result ^= i & x; i <<= n; } while (i != 0);
Related
I'm to convert the following AT&T x86 assembly into C:
movl 8(%ebp), %edx
movl $0, %eax
movl $0, %ecx
jmp .L2
.L1
shll $1, %eax
movl %edx, %ebx
andl $1, %ebx
orl %ebx, %eax
shrl $1, %edx
addl $1, %ecx
.L2
cmpl $32, %ecx
jl .L1
leave
But must adhere to the following skeleton code:
int f(unsigned int x) {
int val = 0, i = 0;
while(________) {
val = ________________;
x = ________________;
i++;
}
return val;
}
I can tell that the snippet
.L2
cmpl $32, %ecx
jl .L1
can be interpreted as while(i<32). I also know that x is stored in %edx, val in %eax, and i in %ecx. However, I'm having a hard time converting the assembly within the while/.L1 loop into condensed high-level language that fits into the provided skeleton code. For example, can shll, shrl, orl, and andl simply be written using their direct C equivalents (<<,>>,|,&), or is there some more nuance to it?
Is there a standardized guide/"cheat sheet" for Assembly-to-C conversions?
I understand assembly to high-level conversion is not always clear-cut, but there are certainly patterns in assembly code that can be consistently interpreted as certain C operations.
For example, can shll, shrl, orl, and andl simply be written using
their direct C equivalents (<<,>>,|,&), or is there some more nuance
to it?
they can. Let's examine the loop body step-by-step:
shll $1, %eax // shift left eax by 1, same as "eax<<1" or even "eax*=2"
movl %edx, %ebx
andl $1, %ebx // ebx &= 1
orl %ebx, %eax // eax |= ebx
shrl $1, %edx // shift right edx by 1, same as "edx>>1" = "edx/=2"
gets us to
%eax *=2
%ebx = %edx
%ebx = %ebx & 1
%eax |= %ebx
%edx /= 2
ABI tells us (8(%ebp), %edx) that %edx is x, and %eax (return value) is val:
val *=2
%ebx = x // a
%ebx = %ebx & 1 // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert a into b:
val *=2
%ebx = (x & 1) // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert b into c:
val *=2
val |= (x & 1)
x /= 2
final step: combine both 'val =' into one
val = 2*val | (x & 1)
x /= 2
while (i < 32) { val = (val << 1) | (x & 1); x = x >> 1; i++; } except val and the return value should be unsigned and they aren't in your template. The function returns the bits in x reversed.
The actual answer to your question is more complicated and is pretty much: no there is no such guide and it can't exist because compilation loses information and you can't recreate that lost information from assembler. But you can often make a good educated guess.
The point of this problem is to reverse engineer c code that was made after running the compiler with level 2 optimization. The original c code is as follows (computes the greatest common divisor):
int gcd(int a, int b){
int returnValue = 0;
if (a != 0 && b != 0){
int r;
int flag = 0;
while (flag == 0){
r = a % b;
if (r ==0){
flag = 1;
} else {
a = b;
b = r;
}
}
returnValue = b;
}
return(returnValue);
}
when I ran the optimized compile I ran this from the command line:
gcc -O2 -S Problem04b.c
to get the assembly file for this optimized code
.gcd:
.LFB12:
.cfi_startproc
testl %esi, %esi
je .L2
testl %edi, %edi
je .L2
.L7:
movl %edi, %edx
movl %edi, %eax
movl %esi, %edi
sarl $31, %edx
idivl %esi
testl %edx, %edx
jne .L9
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L2:
xorl %esi, %esi
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L9:
movl %edx, %esi
jmp .L7
.cfi_endproc
I need to convert this assembly code back to c code here is where I am at right now:
int gcd(int a int b){
/*
testl %esi %esi
sets zero flag if a is 0 (ZF) but doesn't store anything
*/
if (a == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
/*
testl %edi %edi
sets zero flag if b is 0 (ZF) but doesn't store anything
*/
if (b == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
do{
int r = b;
int returnValue = b;
}while();
}
Can anyone help me write this back in to c code? I'm pretty much lost.
First of all, you have the values mixed in your code. %esi begins with the value b and %edi begins with the value a.
You can infer from the testl %edx, %edx line that %edx is used as the condition variable for the loop beginning with .L7 (if %edx is different from 0 then control is transferred to the .L9 block and then returned to .L7). We'll refer to %edx as remainder in our reverse-engineered code.
Let's begin reverse-engineering the main loop:
movl %edi, %edx
Since %edi stores a, this is equivalent to initializing the value of remainder with a: int remainder = a;.
movl %edi, %eax
Store int temp = a;
movl %esi, %edi
Perform int a = b; (remember that %edi is a and %esi is b).
sarl $31, %edx
This arithmetic shift instruction shifts our remainder variable 31 bits to the right whilst maintaining the sign of the number. By shifting 31 bits you're setting remainder to 0 if it's positive (or zero) and to -1 if it's negative. So it's equivalent to remainder = (remainder < 0) ? -1 : 0.
idivl %esi
Divide %edx:%eax by %esi, or in our case, divide remainder * temp by b (the variable). The remainder will be stored in %edx, or in our code, remainder. When combining this with the previous instruction: if remainder < 0 then remainder = -1 * temp % b, and otherwise remainder = temp % b.
testl %edx, %edx
jne .L9
Check to see if remainder is equal to 0 - if it's not, jump to .L9. The code there simply sets b = remainder; before returning to .L7. In order to implement this in C, we'll keep a count variable that will store the amount of times the loop has iterated. We'll perform b = remainder at the beginning of the loop but only after the first iteration, meaning when count != 0.
We're now ready to build our full C loop:
int count = 0;
do {
if (count != 0)
b = remainder;
remainder = a;
temp = a;
a = b;
if (remainder < 0){
remainder = -1 * temp % b;
} else {
remainder = temp % b;
}
count++;
} while (remainder != 0)
And after the loop terminates,
movl %esi, %eax
ret
Will return the GCD that the program computed (in our code it'll be stored in the b variable).
I'm reviewing a practice midterm at the moment. the question gives a piece of assembly code (IA32) and instructs to write the C equivalent of it. Just want to make sure I'm doing it correctly. Thanks!
Assembly program given:
.global _someOperation
_someOperation:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ebx
movl 12(%ebp), %edx
decl %edx
xorl %esi, %esi
movl (%ebx, %esi, 4), %eax
continue:
incl %esi
cmpl (%ebx, %esi, 4), %eax
jl thelabel
movl (%ebx, %esi, 4), %eax
thelabel:
cmp %esi, %edx
jne continue
movl %ebp, %esp
popl %ebp
ret
This is the code I've written:
void someOperation(int *num, int count) //Given
{
int k; //Given
count--;
int i = 0;
k = num[i];
i++;
while(count != i)
{
if(k >= num[i]
k = num[i];
i++;
}
return (k);
}
Looks pretty close to me, although in the ASM the increment is only at the beginning of the loop, and the condition is not checked the first time through. Consider using DO...WHILE instead.
EDIT: also, your assignment is wrong. MOV instruction copies from the 2nd parameter to the first. You have it going the other way in your C code.
I found the following assembly code and I have no idea what it is supposed to be doing (mainly because cmovg follows the movl instruction ):
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl %edx, %eax
sarl $31, %eax
testl %edx, %edx
movl $1, %edx
cmovg %edx, %eax
popl %ebp
ret
So here is how I have interpreted it so far:
pushes onto stack
a new pointer (stack pointer) creates to point at the same location as base pointer
gets the input (let's call it x)
copies x into register %eax (res = x)
res = res >> 31 sign extension
tests x
sets x = 1
if >, res = x
restores pointer
returns res
However, I am not sure what the significance of this subroutine is. To me it seems useless. I would appreciate it if you could point out what is being done here.
This code returns the sign of X. In C:
int sign(int x) {
if (x>0)
return 1;
else if (x==0)
return 0;
else
return -1;
}
The instruction sarl $31, %eax will put -1 in eax if it was negative, or 0 otherwise. Then the cmovg instruction will replace this value with 1 if x was positive.
I can usually figure out most C code but this one is over my head.
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
an example usage would be something like:
int x = 57;
kroundup32(x);
//x is now 64
A few other examples are:
1 to 1
2 to 2
7 to 8
31 to 32
60 to 64
3000 to 4096
I know it's rounding an integer to it's nearest power of 2, but that's about as far as my knowledge goes.
Any explanations would be greatly appreciated.
Thanks
(--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
Decrease x by 1
OR x with (x / 2).
OR x with (x / 4).
OR x with (x / 16).
OR x with (x / 256).
OR x with (x / 65536).
Increase x by 1.
For a 32-bit unsigned integer, this should move a value up to the closest power of 2 that is equal or greater. The OR sections set all the lower bits below the highest bit, so it ends up as a power of 2 minus one, then you add one back to it. It looks like it's somewhat optimized and therefore not very readable; doing it by bitwise operations and bit shifting alone, and as a macro (so no function call overhead).
The bitwise or and shift operations essentially set every bit between the highest set bit and bit zero. This will produce a number of the form 2^n - 1. The final increment adds one to get a number of the form 2^n. The initial decrement ensures that you don't round numbers which are already powers of two up to the next power, so that e.g. 2048 doesn't become 4096.
At my machine kroundup32 gives 6.000m rounds/sec
And next function gives 7.693m rounds/sec
inline int scan_msb(int x)
{
#if defined(__i386__) || defined(__x86_64__)
int y;
__asm__("bsr %1, %0"
: "=r" (y)
: "r" (x)
: "flags"); /* ZF */
return y;
#else
#error "Implement me for your platform"
#endif
}
inline int roundup32(int x)
{
if (x == 0) return x;
else {
const int bit = scan_msb(x);
const int mask = ~((~0) << bit);
if (x & mask) return (1 << (bit+1));
else return (1 << bit);
}
}
So #thomasrutter I woudn't say that it is "highly optimized".
And appropriate (only meaningful part) assembly (for GCC 4.4.4):
kroundup32:
subl $1, %edi
movl %edi, %eax
sarl %eax
orl %edi, %eax
movl %eax, %edx
sarl $2, %edx
orl %eax, %edx
movl %edx, %eax
sarl $4, %eax
orl %edx, %eax
movl %eax, %edx
sarl $8, %edx
orl %eax, %edx
movl %edx, %eax
sarl $16, %eax
orl %edx, %eax
addl $1, %eax
ret
roundup32:
testl %edi, %edi
movl %edi, %eax
je .L6
movl $-1, %edx
bsr %edi, %ecx
sall %cl, %edx
notl %edx
testl %edi, %edx
jne .L10
movl $1, %eax
sall %cl, %eax
.L6:
rep
ret
.L10:
addl $1, %ecx
movl $1, %eax
sall %cl, %eax
ret
By some reason I haven't found appropriate implementation of scan_msb (like #define scan_msb(x) if (__builtin_constant_p (x)) ...) within standart headers of GCC (only __TBB_machine_lg/__TBB_Log2).