implement 64-bit arithmetic on a 32-bit machine - assembly code

implement 64-bit arithmetic on a 32-bit machine - assembly code - c

The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to
be equivalent to long long.
gcc generates the following assembly code implementing the computation:
typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y)
{
*dest = x*y;
}
dest at %ebp+8, x at %ebp+12, y at %ebp+16
1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)
This code uses three multiplications to implement the multi precision arithmetic required to implement 64-bit arithmetic
on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how
it realizes your algorithm.
Question: What does line 5 do? what value is it moving to register ecx?
also what does line 11 do ?

Line 5: it's copying the value of some local variable to ECX. The value is unkown as of this listing, as we lack part of the original function code.
Line 11: it's equivalent to: EDX = EDX+ECX. The LEA instruction is used to compute the EA of a memory value and store that EA into a destination register, thus, it can be used to quickly do additions and constant multiplication.

Related

How can translate this statement in C to assembly language?

(C)
d = b * 7
(Assembly language)
(1)
movl b, %eax
movl $7, %ebx
mull %ebx
movl %eax, d
movl %edx, e
if 32bit integer type is multiplied with same type, it needs 64bit space for return value. so i use eax register(for lower 32bit) and edx register(for upper 32bit).
(2)
movl b, %eax
movl %eax, %ebx
sall $2, %eax
subl %ebx, %eax
movl %eax, d
in (2) case, i dont have idea how can control upper 32bit return value. pls tell me some idea...

Calling C function from the x86 assembler

I am trying to write a function that converts decimal numbers into binary in assembler. Since printing is so troublesome in there, I have decided to make a separate function in C that just prints the numbers. But when I run the code, it always prints '0110101110110100'
Heres the C function (both print and conversion):
void printBin(int x) {
printf("%d", x);
}
void DecToBin(int n)
{
// Size of an integer is assumed to be 16 bits
for (int i = 15; i >= 0; i--) {
int k = n >> i;
printBin(k & 1);
}
heres the code in asm:
.globl _DecToBin
.extern _printBin
_DecToBin:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp),%eax
movl $15, %ebx
cmpl $0, %ebx
jl end
start:
movl %ebx, %ecx
movl %eax, %edx
shrl %cl, %eax
andl $1, %eax
pushl %eax
call _printBin
movl %edx, %eax
dec %ebx
cmpl $0, %ebx
jge start
end:
movl %ebp, %esp
popl %ebp
ret
Cant figure out where the mistake is. Any help would be appreciated
disassembled code using online program

Your principle problem is that it is very unlikely that %edx is preserved across the function call to printBin.
Also:
%ebx is not a volatile register in most (any?) C calling convention rules. You need to check your compilers documentation and conform to it.
If you are going to use ebx, you need to save and restore it.
The stack pointer needs to be kept aligned to 16 bytes. On my machine (macos), it SEGVs under printBin if you don’t.

Disassembling IA32 32 bit AT&T assembly code of a function in C

[Edited]
Could someone explain to me how we get the values of M and N in this problem, going through each line of the corresponding assembly code?
I always get stumped at the movl array2 part.
M and N are constants defined using #define
#define M <some value>
#define N <some value>
int array1[M][N];
int array2[N][M];
int copy(int i, int j)
{
array1[i][j] = array2[j][i];
}
If the above code generates the following assembly code: How do we deduce the values of the constants M and N?
copy:
pushl %ebp
movl %esp, %ebp
pushl %ebx
movl 8(%ebp), %ecx
movl 12(%ebp), %ebx
leal (%ecx, %ecx, 8), %edx
sall $2, %edx
movl %ebx, %eax
sall $4, %eax
subl %ebx, %eax
sall $2, %eax
movl array2(%eax, %ecx, 4), %eax
movl %eax, array1(%edx, %ebx, 4)
popl %ebx
movl %ebp,%esp
popl %ebp
ret

You need to check other parts of the assembly. For example, if you define M and N as 8 both, you will find the following in the assembly
array1:
.zero 256
array2:
.zero 256
because on my machine, int is 4 bytes and 8 times 8 is 64. And 64 * 4 = 256. The sample assembly can be found here .

Alright guys, after much research I was able to find a solution. Correct me if I am wrong.
So going through the following assembly step by step: (Added line numbers for ease)
M and N are constants defined using #define
int array1[M][N];
int array2[N][M];
int copy(int i, int j)
{
array1[i][j] = array2[j][i];
}
copy:
1 pushl %ebp
2 movl %esp, %ebp
3 pushl %ebx
4 movl 8(%ebp), %ecx
5 movl 12(%ebp), %ebx
6 leal (%ecx, %ecx, 8), %edx
7 sall $2, %edx
8 movl %ebx, %eax
9 sall $4, %eax
10 subl %ebx, %eax
11 sall $2, %eax
12 movl array2(%eax, %ecx, 4), %eax
13 movl %eax, array1(%edx, %ebx, 4)
14 popl %ebx
15 movl %ebp,%esp
16 popl %ebp
ret
Push %ebp into stack
%ebp points to %esp
Push %ebx into stack
%ecx equals int i (index for array access)
%ebx equals int j (index for array access)
%edx equals 8 * %ecx + %ecx or 9i
%edx equals 36i after a left binary shift of 2
%eax equals %ebx or j
%eax equals 16j after a left binary shift of 4
%eax equals %eax - %ebx = 16j - j = 15j
%eax equals 60j after a left binary shift of 2
%eax equals array2 element with index [4%ecx + %ebx] or [4i + 60j]
Element with index [ 4%ebx + %edx ] or [ 4j + 36i ] of array1 equals %eax or [4i + 60j]
A swap of the two array elements done in 12 and 13 using %eax as
intermediary register.
%ebx popped
%esp's old value restored
%ebp popped
Now we assume array1[i][j]'s element access to be equal to 4Ni + 4j
And array2[j][i]'s element access to be equal to 4Mj + 4i.
( The 4 in each index term as int is of 4 bytes and i, j are individual offsets from starting array location )
This is true because C stores arrays in a row major form.
So equating we get, M = 15 and N = 9.

understand assembled code line andl 65535 %edx

I have C code and Assembly code. I do not understand line 3 and 6 of assembled code.
C code:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z*48
int t3= t1& 0xFFFF
int t4 = t2 * t3
return t4;
}
Assembly code:
x at % ebp+8, y at %ebp*12, z at %ebp+16
mol 16(%ebp), %eax
leal (%eax, %eax, 2) % eax
sall $4, %eax
movl 12(%ebp) %edx
addl 8(%ebp) %edx
andl $65535, %edx
imull %edx. $eax
on line 6, I do not understand how 65535 becomes 0xFFFF so that we have t3 = t1 & 0xFFFF.
Different question:
Consider the following C functino prototype, where num_t is a data type declared using typedef:
void store_prod(num_t *dest, unsigned x, num_t y)
{*dest=x*yl}
Gcc generates the following assemblyu code implementing the body of the computation:
dest at %ebp+8, x at ebp+12, y at %ebp+16
mov1 12($ebp), $eax
movl 20($ebp), $ecx
imull $eax, $ecx
mull 16(%ebp)
leal (%ecx,%edx), %edx
movl 8(%ebp), %ecx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
line movl 20(%ebp), %ecx there is value in 20(%ebp), how is it grabbing y_t?
line leal (%ecx, %edx), %edx; there is nothing in edx? so what is being added to %ecx to be stored in %edx?

We want to calculate
t2=z*48
So to do that we first do z=z*3 then shift z left by 4(multiply by 16 ==left shift by 4)
line 3 calculates
z=z*3 (z+2z)
And line 4 does left shift by 4.
The compiler often generates combination of add and shift to perform multiplication as multiplication is more costly
As for line 6 65535 is decimal for 0xFFFF.

Could someone help explain what this C one liner does?

I can usually figure out most C code but this one is over my head.
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
an example usage would be something like:
int x = 57;
kroundup32(x);
//x is now 64
A few other examples are:
1 to 1
2 to 2
7 to 8
31 to 32
60 to 64
3000 to 4096
I know it's rounding an integer to it's nearest power of 2, but that's about as far as my knowledge goes.
Any explanations would be greatly appreciated.
Thanks

(--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
Decrease x by 1
OR x with (x / 2).
OR x with (x / 4).
OR x with (x / 16).
OR x with (x / 256).
OR x with (x / 65536).
Increase x by 1.
For a 32-bit unsigned integer, this should move a value up to the closest power of 2 that is equal or greater. The OR sections set all the lower bits below the highest bit, so it ends up as a power of 2 minus one, then you add one back to it. It looks like it's somewhat optimized and therefore not very readable; doing it by bitwise operations and bit shifting alone, and as a macro (so no function call overhead).

The bitwise or and shift operations essentially set every bit between the highest set bit and bit zero. This will produce a number of the form 2^n - 1. The final increment adds one to get a number of the form 2^n. The initial decrement ensures that you don't round numbers which are already powers of two up to the next power, so that e.g. 2048 doesn't become 4096.

At my machine kroundup32 gives 6.000m rounds/sec
And next function gives 7.693m rounds/sec
inline int scan_msb(int x)
{
#if defined(__i386__) || defined(__x86_64__)
int y;
__asm__("bsr %1, %0"
: "=r" (y)
: "r" (x)
: "flags"); /* ZF */
return y;
#else
#error "Implement me for your platform"
#endif
}
inline int roundup32(int x)
{
if (x == 0) return x;
else {
const int bit = scan_msb(x);
const int mask = ~((~0) << bit);
if (x & mask) return (1 << (bit+1));
else return (1 << bit);
}
}
So #thomasrutter I woudn't say that it is "highly optimized".
And appropriate (only meaningful part) assembly (for GCC 4.4.4):
kroundup32:
subl $1, %edi
movl %edi, %eax
sarl %eax
orl %edi, %eax
movl %eax, %edx
sarl $2, %edx
orl %eax, %edx
movl %edx, %eax
sarl $4, %eax
orl %edx, %eax
movl %eax, %edx
sarl $8, %edx
orl %eax, %edx
movl %edx, %eax
sarl $16, %eax
orl %edx, %eax
addl $1, %eax
ret
roundup32:
testl %edi, %edi
movl %edi, %eax
je .L6
movl $-1, %edx
bsr %edi, %ecx
sall %cl, %edx
notl %edx
testl %edi, %edx
jne .L10
movl $1, %eax
sall %cl, %eax
.L6:
rep
ret
.L10:
addl $1, %ecx
movl $1, %eax
sall %cl, %eax
ret
By some reason I haven't found appropriate implementation of scan_msb (like #define scan_msb(x) if (__builtin_constant_p (x)) ...) within standart headers of GCC (only __TBB_machine_lg/__TBB_Log2).