movl in assembly language - c

I'm having trouble understanding how the following equivalents work:
x86-64:
/*long loop(long x, int n)*/
/*x in %rdi, n in %esi*/
1.loop:
2 movl %esi, %ecx
3 movl $1, %edx
4 movl $0, %eax
5 jmp .L2
6.L3:
7 movq %rdi,%r8
8 andq %rdx,%r8
9 orq %r8,%rax
10 salq %cl,%rdx
11.L2
12 testq %rdx,%rdx
13 jne .L3
14 rep; ret
C:
1 long loop(long x, int n)
2 {
3 long result = 0​;
4 long mask;
5 for (mask = 1​; mask != 0​; mask = mask << n​) {
6 result |= (x & mask)​;
7 }
8 return result;
9 }
From what I see,
n = %esi and is copied into %ecx.
1 is copied into mask.
0 is copied into result.
I would like to know why 1 is copied to mask when the first variable in the C code is result? Wouldn't result = 1 and mask = 0 since that is the correct order in the C program? Furthermore, when I convert the C code to assembly language, I get:
1.loop:
2 movl %rsi, %rcx
3 movl $1, %eax
4 movl $0, %edx
5 jmp .L2
...
So are the registers %eax and %edx interchangeable?

Related

A reference for converting assembly 'shl', 'OR', 'AND', 'SHR' operations into C?

I'm to convert the following AT&T x86 assembly into C:
movl 8(%ebp), %edx
movl $0, %eax
movl $0, %ecx
jmp .L2
.L1
shll $1, %eax
movl %edx, %ebx
andl $1, %ebx
orl %ebx, %eax
shrl $1, %edx
addl $1, %ecx
.L2
cmpl $32, %ecx
jl .L1
leave
But must adhere to the following skeleton code:
int f(unsigned int x) {
int val = 0, i = 0;
while(________) {
val = ________________;
x = ________________;
i++;
}
return val;
}
I can tell that the snippet
.L2
cmpl $32, %ecx
jl .L1
can be interpreted as while(i<32). I also know that x is stored in %edx, val in %eax, and i in %ecx. However, I'm having a hard time converting the assembly within the while/.L1 loop into condensed high-level language that fits into the provided skeleton code. For example, can shll, shrl, orl, and andl simply be written using their direct C equivalents (<<,>>,|,&), or is there some more nuance to it?
Is there a standardized guide/"cheat sheet" for Assembly-to-C conversions?
I understand assembly to high-level conversion is not always clear-cut, but there are certainly patterns in assembly code that can be consistently interpreted as certain C operations.
For example, can shll, shrl, orl, and andl simply be written using
their direct C equivalents (<<,>>,|,&), or is there some more nuance
to it?
they can. Let's examine the loop body step-by-step:
shll $1, %eax // shift left eax by 1, same as "eax<<1" or even "eax*=2"
movl %edx, %ebx
andl $1, %ebx // ebx &= 1
orl %ebx, %eax // eax |= ebx
shrl $1, %edx // shift right edx by 1, same as "edx>>1" = "edx/=2"
gets us to
%eax *=2
%ebx = %edx
%ebx = %ebx & 1
%eax |= %ebx
%edx /= 2
ABI tells us (8(%ebp), %edx) that %edx is x, and %eax (return value) is val:
val *=2
%ebx = x // a
%ebx = %ebx & 1 // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert a into b:
val *=2
%ebx = (x & 1) // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert b into c:
val *=2
val |= (x & 1)
x /= 2
final step: combine both 'val =' into one
val = 2*val | (x & 1)
x /= 2
while (i < 32) { val = (val << 1) | (x & 1); x = x >> 1; i++; } except val and the return value should be unsigned and they aren't in your template. The function returns the bits in x reversed.
The actual answer to your question is more complicated and is pretty much: no there is no such guide and it can't exist because compilation loses information and you can't recreate that lost information from assembler. But you can often make a good educated guess.

Disassembling IA32 32 bit AT&T assembly code of a function in C

[Edited]
Could someone explain to me how we get the values of M and N in this problem, going through each line of the corresponding assembly code?
I always get stumped at the movl array2 part.
M and N are constants defined using #define
#define M <some value>
#define N <some value>
int array1[M][N];
int array2[N][M];
int copy(int i, int j)
{
array1[i][j] = array2[j][i];
}
If the above code generates the following assembly code: How do we deduce the values of the constants M and N?
copy:
pushl %ebp
movl %esp, %ebp
pushl %ebx
movl 8(%ebp), %ecx
movl 12(%ebp), %ebx
leal (%ecx, %ecx, 8), %edx
sall $2, %edx
movl %ebx, %eax
sall $4, %eax
subl %ebx, %eax
sall $2, %eax
movl array2(%eax, %ecx, 4), %eax
movl %eax, array1(%edx, %ebx, 4)
popl %ebx
movl %ebp,%esp
popl %ebp
ret
You need to check other parts of the assembly. For example, if you define M and N as 8 both, you will find the following in the assembly
array1:
.zero 256
array2:
.zero 256
because on my machine, int is 4 bytes and 8 times 8 is 64. And 64 * 4 = 256. The sample assembly can be found here .
Alright guys, after much research I was able to find a solution. Correct me if I am wrong.
So going through the following assembly step by step: (Added line numbers for ease)
M and N are constants defined using #define
int array1[M][N];
int array2[N][M];
int copy(int i, int j)
{
array1[i][j] = array2[j][i];
}
copy:
1 pushl %ebp
2 movl %esp, %ebp
3 pushl %ebx
4 movl 8(%ebp), %ecx
5 movl 12(%ebp), %ebx
6 leal (%ecx, %ecx, 8), %edx
7 sall $2, %edx
8 movl %ebx, %eax
9 sall $4, %eax
10 subl %ebx, %eax
11 sall $2, %eax
12 movl array2(%eax, %ecx, 4), %eax
13 movl %eax, array1(%edx, %ebx, 4)
14 popl %ebx
15 movl %ebp,%esp
16 popl %ebp
ret
Push %ebp into stack
%ebp points to %esp
Push %ebx into stack
%ecx equals int i (index for array access)
%ebx equals int j (index for array access)
%edx equals 8 * %ecx + %ecx or 9i
%edx equals 36i after a left binary shift of 2
%eax equals %ebx or j
%eax equals 16j after a left binary shift of 4
%eax equals %eax - %ebx = 16j - j = 15j
%eax equals 60j after a left binary shift of 2
%eax equals array2 element with index [4%ecx + %ebx] or [4i + 60j]
Element with index [ 4%ebx + %edx ] or [ 4j + 36i ] of array1 equals %eax or [4i + 60j]
A swap of the two array elements done in 12 and 13 using %eax as
intermediary register.
%ebx popped
%esp's old value restored
%ebp popped
Now we assume array1[i][j]'s element access to be equal to 4Ni + 4j
And array2[j][i]'s element access to be equal to 4Mj + 4i.
( The 4 in each index term as int is of 4 bytes and i, j are individual offsets from starting array location )
This is true because C stores arrays in a row major form.
So equating we get, M = 15 and N = 9.

Assembly x86/C -Recursive Binomial Coefficient Segfault/printing pascals triangle

I've written some code (main in c, subprogram in assembly x86) to calculate all the binomial coefficients recursively and print out all the binomial coefficients with n=10, restricted by m<=n.
So basically I'm trying to output a pascals triangle for n=10. (without the whole format of a triangle)
My problem is that I'm getting a segfault on compile and I'm having trouble figuring out how to print the individual values generated by the recursive function.
Segmentation fault (core dumped)
Here's the main program:
#include <stdio.h>
unsigned int result,m,n,i;
unsigned int binom(int,int);
int main(){
n=10;
for (i=0; i<n+1;i++){
printf("i=%d | %d \n", i, binom(n,i) );
}
return;
}
And the recursive sub program:
.text
.globl binom
binom:
mov $0x00, %edx #for difference calculation
cmp %edi, %esi #m=n?
je equalorzero #jump to equalorzero for returning of value 1
cmp $0x00, %esi #m=0?
je equalorzero
cmp $0x01, %esi #m=1?
mov %esi,%edx
sub %edi, %edx
cmp $0x01, %edx # n-m = 1 ?
je oneoronedifference
jmp otherwise
equalorzero:
add $1, %eax #return 1
ret
oneoronedifference:
add %edi, %eax #return n
ret
otherwise:
sub $1, %edi #binom(n-1,m)
call binom
sub $1, %esi #binom(n-1,m-1)
call binom
This is what gcc is giving me
./runtimes
i=0 | 12
Segmentation fault (core dumped)
The two major issues with your assembly code are: 1) you niether add nor return the sum of the two recursive calls; 2) you don't save your locals on the stack so they are wiped out by the recursive calls -- you're using the wrong values once you return from the calls. Here's my rework of your code, some of the changes are due to my writing this under OSX:
The recursive sub program:
.text
.globl _binom
_binom:
pushq %rbp # allocate space on stack for locals
movq %rsp, %rbp
subq $24, %rsp
cmpl %edi, %esi # m == n ?
je equalorzero # jump to equalorzero for returning of value 1
cmpl $0, %esi # m == 0 ?
je equalorzero
movl %esi, %edx
subl %edi, %edx
cmpl $1, %edx # n - m == 1 ?
je oneoronedifference
subl $1, %edi # binom(n - 1, m)
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
callq _binom
movl %eax, -12(%rbp) # save result to stack
movl -4(%rbp), %edi
movl -8(%rbp), %esi
subl $1, %esi # binom(n - 1, m - 1)
callq _binom
addl -12(%rbp), %eax # add results of the two recursive calls
addq $24, %rsp # release locals space on stack
popq %rbp
retq
equalorzero:
movl $1, %eax # return 1
addq $24, %rsp # release locals space on stack
popq %rbp
retq
oneoronedifference:
movl %edi, %eax # return n
addq $24, %rsp # release locals space on stack
popq %rbp
retq
The main program:
#include <stdio.h>
extern unsigned int binom(int, int);
int main() {
int n = 10;
for (int i = 0; i <= n; i++) {
printf("i=%d | %d\n", i, binom(n, i));
}
return 0;
}
And the results:
i=0 | 1
i=1 | 10
i=2 | 45
i=3 | 120
i=4 | 210
i=5 | 252
i=6 | 210
i=7 | 120
i=8 | 45
i=9 | 10
i=10 | 1

intro to x86 assembly

I'm looking over an example on assembly in CSAPP (Computer Systems - A programmer's Perspective 2nd) and I just want to know if my understanding of the assembly code is correct.
Practice problem 3.23
int fun_b(unsigned x) {
int val = 0;
int i;
for ( ____;_____;_____) {
}
return val;
}
The gcc C compiler generates the following assembly code:
x at %ebp+8
// what I've gotten so far
1 movl 8(%ebp), %ebx // ebx: x
2 movl $0, %eax // eax: val, set to 0 since eax is where the return
// value is stored and val is being returned at the end
3 movl $0, %ecx // ecx: i, set to 0
4 .L13: // loop
5 leal (%eax,%eax), %edx // edx = val+val
6 movl %ebx, %eax // val = x (?)
7 andl $1, %eax // x = x & 1
8 orl %edx, %eax // x = (val+val) | (x & 1)
9 shrl %ebx Shift right by 1 // x = x >> 1
10 addl $1, %ecx // i++
11 cmpl $32, %ecx // if i < 32 jump back to loop
12 jne .L13
There was a similar post on the same problem with the solution, but I'm looking for more of a walk-through and explanation of the assembly code line by line.
You already seem to have the meaning of the instructions figured out. The comment on lines 7-8 are slightly wrong however, because those assign to eax which is val not x:
7 andl $1, %eax // val = val & 1 = x & 1
8 orl %edx, %eax // val = (val+val) | (x & 1)
Putting this into the C template could be:
for(i = 0; i < 32; i++, x >>= 1) {
val = (val + val) | (x & 1);
}
Note that (val + val) is just a left shift, so what this function is doing is shifting out bits from x on the right and shifting them in into val from the right. As such, it's mirroring the bits.
PS: if the body of the for must be empty you can of course merge it into the third expression.

Could someone help explain what this C one liner does?

I can usually figure out most C code but this one is over my head.
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
an example usage would be something like:
int x = 57;
kroundup32(x);
//x is now 64
A few other examples are:
1 to 1
2 to 2
7 to 8
31 to 32
60 to 64
3000 to 4096
I know it's rounding an integer to it's nearest power of 2, but that's about as far as my knowledge goes.
Any explanations would be greatly appreciated.
Thanks
(--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
Decrease x by 1
OR x with (x / 2).
OR x with (x / 4).
OR x with (x / 16).
OR x with (x / 256).
OR x with (x / 65536).
Increase x by 1.
For a 32-bit unsigned integer, this should move a value up to the closest power of 2 that is equal or greater. The OR sections set all the lower bits below the highest bit, so it ends up as a power of 2 minus one, then you add one back to it. It looks like it's somewhat optimized and therefore not very readable; doing it by bitwise operations and bit shifting alone, and as a macro (so no function call overhead).
The bitwise or and shift operations essentially set every bit between the highest set bit and bit zero. This will produce a number of the form 2^n - 1. The final increment adds one to get a number of the form 2^n. The initial decrement ensures that you don't round numbers which are already powers of two up to the next power, so that e.g. 2048 doesn't become 4096.
At my machine kroundup32 gives 6.000m rounds/sec
And next function gives 7.693m rounds/sec
inline int scan_msb(int x)
{
#if defined(__i386__) || defined(__x86_64__)
int y;
__asm__("bsr %1, %0"
: "=r" (y)
: "r" (x)
: "flags"); /* ZF */
return y;
#else
#error "Implement me for your platform"
#endif
}
inline int roundup32(int x)
{
if (x == 0) return x;
else {
const int bit = scan_msb(x);
const int mask = ~((~0) << bit);
if (x & mask) return (1 << (bit+1));
else return (1 << bit);
}
}
So #thomasrutter I woudn't say that it is "highly optimized".
And appropriate (only meaningful part) assembly (for GCC 4.4.4):
kroundup32:
subl $1, %edi
movl %edi, %eax
sarl %eax
orl %edi, %eax
movl %eax, %edx
sarl $2, %edx
orl %eax, %edx
movl %edx, %eax
sarl $4, %eax
orl %edx, %eax
movl %eax, %edx
sarl $8, %edx
orl %eax, %edx
movl %edx, %eax
sarl $16, %eax
orl %edx, %eax
addl $1, %eax
ret
roundup32:
testl %edi, %edi
movl %edi, %eax
je .L6
movl $-1, %edx
bsr %edi, %ecx
sall %cl, %edx
notl %edx
testl %edi, %edx
jne .L10
movl $1, %eax
sall %cl, %eax
.L6:
rep
ret
.L10:
addl $1, %ecx
movl $1, %eax
sall %cl, %eax
ret
By some reason I haven't found appropriate implementation of scan_msb (like #define scan_msb(x) if (__builtin_constant_p (x)) ...) within standart headers of GCC (only __TBB_machine_lg/__TBB_Log2).

Resources