I have C code and Assembly code. I do not understand line 3 and 6 of assembled code.
C code:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z*48
int t3= t1& 0xFFFF
int t4 = t2 * t3
return t4;
}
Assembly code:
x at % ebp+8, y at %ebp*12, z at %ebp+16
mol 16(%ebp), %eax
leal (%eax, %eax, 2) % eax
sall $4, %eax
movl 12(%ebp) %edx
addl 8(%ebp) %edx
andl $65535, %edx
imull %edx. $eax
on line 6, I do not understand how 65535 becomes 0xFFFF so that we have t3 = t1 & 0xFFFF.
Different question:
Consider the following C functino prototype, where num_t is a data type declared using typedef:
void store_prod(num_t *dest, unsigned x, num_t y)
{*dest=x*yl}
Gcc generates the following assemblyu code implementing the body of the computation:
dest at %ebp+8, x at ebp+12, y at %ebp+16
mov1 12($ebp), $eax
movl 20($ebp), $ecx
imull $eax, $ecx
mull 16(%ebp)
leal (%ecx,%edx), %edx
movl 8(%ebp), %ecx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
line movl 20(%ebp), %ecx there is value in 20(%ebp), how is it grabbing y_t?
line leal (%ecx, %edx), %edx; there is nothing in edx? so what is being added to %ecx to be stored in %edx?
We want to calculate
t2=z*48
So to do that we first do z=z*3 then shift z left by 4(multiply by 16 ==left shift by 4)
line 3 calculates
z=z*3 (z+2z)
And line 4 does left shift by 4.
The compiler often generates combination of add and shift to perform multiplication as multiplication is more costly
As for line 6 65535 is decimal for 0xFFFF.
Related
I'm to convert the following AT&T x86 assembly into C:
movl 8(%ebp), %edx
movl $0, %eax
movl $0, %ecx
jmp .L2
.L1
shll $1, %eax
movl %edx, %ebx
andl $1, %ebx
orl %ebx, %eax
shrl $1, %edx
addl $1, %ecx
.L2
cmpl $32, %ecx
jl .L1
leave
But must adhere to the following skeleton code:
int f(unsigned int x) {
int val = 0, i = 0;
while(________) {
val = ________________;
x = ________________;
i++;
}
return val;
}
I can tell that the snippet
.L2
cmpl $32, %ecx
jl .L1
can be interpreted as while(i<32). I also know that x is stored in %edx, val in %eax, and i in %ecx. However, I'm having a hard time converting the assembly within the while/.L1 loop into condensed high-level language that fits into the provided skeleton code. For example, can shll, shrl, orl, and andl simply be written using their direct C equivalents (<<,>>,|,&), or is there some more nuance to it?
Is there a standardized guide/"cheat sheet" for Assembly-to-C conversions?
I understand assembly to high-level conversion is not always clear-cut, but there are certainly patterns in assembly code that can be consistently interpreted as certain C operations.
For example, can shll, shrl, orl, and andl simply be written using
their direct C equivalents (<<,>>,|,&), or is there some more nuance
to it?
they can. Let's examine the loop body step-by-step:
shll $1, %eax // shift left eax by 1, same as "eax<<1" or even "eax*=2"
movl %edx, %ebx
andl $1, %ebx // ebx &= 1
orl %ebx, %eax // eax |= ebx
shrl $1, %edx // shift right edx by 1, same as "edx>>1" = "edx/=2"
gets us to
%eax *=2
%ebx = %edx
%ebx = %ebx & 1
%eax |= %ebx
%edx /= 2
ABI tells us (8(%ebp), %edx) that %edx is x, and %eax (return value) is val:
val *=2
%ebx = x // a
%ebx = %ebx & 1 // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert a into b:
val *=2
%ebx = (x & 1) // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert b into c:
val *=2
val |= (x & 1)
x /= 2
final step: combine both 'val =' into one
val = 2*val | (x & 1)
x /= 2
while (i < 32) { val = (val << 1) | (x & 1); x = x >> 1; i++; } except val and the return value should be unsigned and they aren't in your template. The function returns the bits in x reversed.
The actual answer to your question is more complicated and is pretty much: no there is no such guide and it can't exist because compilation loses information and you can't recreate that lost information from assembler. But you can often make a good educated guess.
The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to
be equivalent to long long.
gcc generates the following assembly code implementing the computation:
typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y)
{
*dest = x*y;
}
dest at %ebp+8, x at %ebp+12, y at %ebp+16
1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)
This code uses three multiplications to implement the multi precision arithmetic required to implement 64-bit arithmetic
on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how
it realizes your algorithm.
Question: What does line 5 do? what value is it moving to register ecx?
also what does line 11 do ?
Line 5: it's copying the value of some local variable to ECX. The value is unkown as of this listing, as we lack part of the original function code.
Line 11: it's equivalent to: EDX = EDX+ECX. The LEA instruction is used to compute the EA of a memory value and store that EA into a destination register, thus, it can be used to quickly do additions and constant multiplication.
C version:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z*48;
int t3 = t1 & 0xFFFF;
int t4 = t2 * t3;
return t4;
}
ATT Assembly version of the same program:
x at %ebp+8, y at %ebp+12, z at %ebp+16
movl 16(ebp), %eax
leal (%eax, %eax, 2), %eax
sall $4, %eax // t2 = z* 48... This is where I get confused
movl 12(%ebp), %edx
addl 8(%ebp), %edx
andl $65535, %edx
imull %edx, %eax
I understand everything it is doing at all points of the program besides the shift left.
I assume it is going to shift left 4 times. Why is that?
Thank you!
Edit: I also understand that the part I'm confused on is equivalent to the z*48 part of the C version.
What I'm not understanding is how does shifting left 4 times equate to z*48.
You missed the leal (%eax, %eax, 2), %eax line. Applying some maths the assembly code reads:
a := x
a := a + 2*a // a = 3*x
a := a * 2^4 // a = x * 3*16
I found the following assembly code and I have no idea what it is supposed to be doing (mainly because cmovg follows the movl instruction ):
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl %edx, %eax
sarl $31, %eax
testl %edx, %edx
movl $1, %edx
cmovg %edx, %eax
popl %ebp
ret
So here is how I have interpreted it so far:
pushes onto stack
a new pointer (stack pointer) creates to point at the same location as base pointer
gets the input (let's call it x)
copies x into register %eax (res = x)
res = res >> 31 sign extension
tests x
sets x = 1
if >, res = x
restores pointer
returns res
However, I am not sure what the significance of this subroutine is. To me it seems useless. I would appreciate it if you could point out what is being done here.
This code returns the sign of X. In C:
int sign(int x) {
if (x>0)
return 1;
else if (x==0)
return 0;
else
return -1;
}
The instruction sarl $31, %eax will put -1 in eax if it was negative, or 0 otherwise. Then the cmovg instruction will replace this value with 1 if x was positive.
I can usually figure out most C code but this one is over my head.
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
an example usage would be something like:
int x = 57;
kroundup32(x);
//x is now 64
A few other examples are:
1 to 1
2 to 2
7 to 8
31 to 32
60 to 64
3000 to 4096
I know it's rounding an integer to it's nearest power of 2, but that's about as far as my knowledge goes.
Any explanations would be greatly appreciated.
Thanks
(--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
Decrease x by 1
OR x with (x / 2).
OR x with (x / 4).
OR x with (x / 16).
OR x with (x / 256).
OR x with (x / 65536).
Increase x by 1.
For a 32-bit unsigned integer, this should move a value up to the closest power of 2 that is equal or greater. The OR sections set all the lower bits below the highest bit, so it ends up as a power of 2 minus one, then you add one back to it. It looks like it's somewhat optimized and therefore not very readable; doing it by bitwise operations and bit shifting alone, and as a macro (so no function call overhead).
The bitwise or and shift operations essentially set every bit between the highest set bit and bit zero. This will produce a number of the form 2^n - 1. The final increment adds one to get a number of the form 2^n. The initial decrement ensures that you don't round numbers which are already powers of two up to the next power, so that e.g. 2048 doesn't become 4096.
At my machine kroundup32 gives 6.000m rounds/sec
And next function gives 7.693m rounds/sec
inline int scan_msb(int x)
{
#if defined(__i386__) || defined(__x86_64__)
int y;
__asm__("bsr %1, %0"
: "=r" (y)
: "r" (x)
: "flags"); /* ZF */
return y;
#else
#error "Implement me for your platform"
#endif
}
inline int roundup32(int x)
{
if (x == 0) return x;
else {
const int bit = scan_msb(x);
const int mask = ~((~0) << bit);
if (x & mask) return (1 << (bit+1));
else return (1 << bit);
}
}
So #thomasrutter I woudn't say that it is "highly optimized".
And appropriate (only meaningful part) assembly (for GCC 4.4.4):
kroundup32:
subl $1, %edi
movl %edi, %eax
sarl %eax
orl %edi, %eax
movl %eax, %edx
sarl $2, %edx
orl %eax, %edx
movl %edx, %eax
sarl $4, %eax
orl %edx, %eax
movl %eax, %edx
sarl $8, %edx
orl %eax, %edx
movl %edx, %eax
sarl $16, %eax
orl %edx, %eax
addl $1, %eax
ret
roundup32:
testl %edi, %edi
movl %edi, %eax
je .L6
movl $-1, %edx
bsr %edi, %ecx
sall %cl, %edx
notl %edx
testl %edi, %edx
jne .L10
movl $1, %eax
sall %cl, %eax
.L6:
rep
ret
.L10:
addl $1, %ecx
movl $1, %eax
sall %cl, %eax
ret
By some reason I haven't found appropriate implementation of scan_msb (like #define scan_msb(x) if (__builtin_constant_p (x)) ...) within standart headers of GCC (only __TBB_machine_lg/__TBB_Log2).