Understanding ATT Assembly Language - c

C version:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z*48;
int t3 = t1 & 0xFFFF;
int t4 = t2 * t3;
return t4;
}
ATT Assembly version of the same program:
x at %ebp+8, y at %ebp+12, z at %ebp+16
movl 16(ebp), %eax
leal (%eax, %eax, 2), %eax
sall $4, %eax // t2 = z* 48... This is where I get confused
movl 12(%ebp), %edx
addl 8(%ebp), %edx
andl $65535, %edx
imull %edx, %eax
I understand everything it is doing at all points of the program besides the shift left.
I assume it is going to shift left 4 times. Why is that?
Thank you!
Edit: I also understand that the part I'm confused on is equivalent to the z*48 part of the C version.
What I'm not understanding is how does shifting left 4 times equate to z*48.

You missed the leal (%eax, %eax, 2), %eax line. Applying some maths the assembly code reads:
a := x
a := a + 2*a // a = 3*x
a := a * 2^4 // a = x * 3*16

Related

How to deal with single line wrapped into multiple lines in gdb?

For example,
int main()
{
int aa = 1, bb =2, cc = 3;
int dd = ( (aa + 3 - 1)
/ bb)
<< cc;
printf("%d\n", dd);
return 1;
}
So I break int dd line to multiple lines just to demonstrate.
Then I use gdb to debug.
Breakpoint 1, main () at a.c:25
25 int aa = 1, bb =2, cc = 3;
(gdb) n
26 int dd = ( (aa + 3 - 1)
(gdb) n
27 / bb)
(gdb) n
26 int dd = ( (aa + 3 - 1)
(gdb) n
29 printf("%d\n", dd);
(gdb) n
8
30 return 1;
As you can see int dd shown multiple times and << cc does not show.
How to avoid this? For example when I type n and enter, gdb show the complete int dd line once, and when I type n again, gdb goes to next line?
There is nothing you can do from within GDB because it merely follows the debugging information produced by the compiler. You can see the assembly output with gcc -g -S -fverbose-asm -fno-dwarf2-cfi-asm -o - main.c. Thanks to the annotations you should be able to get the gist of it even if you don't know assembly.
Here is an excerpt:
# main.c:4: int dd = ( (aa + 3 - 1)
movl -4(%rbp), %eax # aa, tmp91
addl $2, %eax #, _1
# main.c:5: / bb)
cltd
idivl -8(%rbp) # bb
movl %eax, %edx # tmp92, _2
# main.c:4: int dd = ( (aa + 3 - 1)
movl -12(%rbp), %eax # cc, tmp94
movl %eax, %ecx # tmp94, tmp99
sall %cl, %edx # tmp99, _2
movl %edx, %eax # _2, tmp95
movl %eax, -16(%rbp) # tmp95, dd
The last block is equivalent to
int dd = _2 << cc;
Since the result of an operation has to be stored somewhere the shift and assignment are not split. The reference is to the line containing the assignment.
You should rewrite the code to have one operation per statement:
int t1 = (aa + 3 - 1);
int t2 = t1 / bb;
int dd = t2 << cc;
Here is the corresponding assembly:
# main.c:4: int t1 = (aa + 3 - 1);
movl -4(%rbp), %eax # aa, tmp92
addl $2, %eax #, tmp91
movl %eax, -16(%rbp) # tmp91, t1
# main.c:5: int t2 = t1 / bb;
movl -16(%rbp), %eax # t1, tmp96
cltd
idivl -8(%rbp) # bb
movl %eax, -20(%rbp) # tmp94, t2
# main.c:6: int dd = t2 << cc;
movl -12(%rbp), %eax # cc, tmp100
movl -20(%rbp), %edx # t2, tmp102
movl %eax, %ecx # tmp100, tmp106
sall %cl, %edx # tmp106, tmp102
movl %edx, %eax # tmp102, tmp101
movl %eax, -24(%rbp) # tmp101, dd
It is essentially the same as before but now the blocks match the statements perfectly. You can't control the compiler's output but it will try to preserve the boundaries between statements, so that is your best option.

A reference for converting assembly 'shl', 'OR', 'AND', 'SHR' operations into C?

I'm to convert the following AT&T x86 assembly into C:
movl 8(%ebp), %edx
movl $0, %eax
movl $0, %ecx
jmp .L2
.L1
shll $1, %eax
movl %edx, %ebx
andl $1, %ebx
orl %ebx, %eax
shrl $1, %edx
addl $1, %ecx
.L2
cmpl $32, %ecx
jl .L1
leave
But must adhere to the following skeleton code:
int f(unsigned int x) {
int val = 0, i = 0;
while(________) {
val = ________________;
x = ________________;
i++;
}
return val;
}
I can tell that the snippet
.L2
cmpl $32, %ecx
jl .L1
can be interpreted as while(i<32). I also know that x is stored in %edx, val in %eax, and i in %ecx. However, I'm having a hard time converting the assembly within the while/.L1 loop into condensed high-level language that fits into the provided skeleton code. For example, can shll, shrl, orl, and andl simply be written using their direct C equivalents (<<,>>,|,&), or is there some more nuance to it?
Is there a standardized guide/"cheat sheet" for Assembly-to-C conversions?
I understand assembly to high-level conversion is not always clear-cut, but there are certainly patterns in assembly code that can be consistently interpreted as certain C operations.
For example, can shll, shrl, orl, and andl simply be written using
their direct C equivalents (<<,>>,|,&), or is there some more nuance
to it?
they can. Let's examine the loop body step-by-step:
shll $1, %eax // shift left eax by 1, same as "eax<<1" or even "eax*=2"
movl %edx, %ebx
andl $1, %ebx // ebx &= 1
orl %ebx, %eax // eax |= ebx
shrl $1, %edx // shift right edx by 1, same as "edx>>1" = "edx/=2"
gets us to
%eax *=2
%ebx = %edx
%ebx = %ebx & 1
%eax |= %ebx
%edx /= 2
ABI tells us (8(%ebp), %edx) that %edx is x, and %eax (return value) is val:
val *=2
%ebx = x // a
%ebx = %ebx & 1 // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert a into b:
val *=2
%ebx = (x & 1) // b
val |= %ebx // c
x /= 2
combine a,b,c: #2 insert b into c:
val *=2
val |= (x & 1)
x /= 2
final step: combine both 'val =' into one
val = 2*val | (x & 1)
x /= 2
while (i < 32) { val = (val << 1) | (x & 1); x = x >> 1; i++; } except val and the return value should be unsigned and they aren't in your template. The function returns the bits in x reversed.
The actual answer to your question is more complicated and is pretty much: no there is no such guide and it can't exist because compilation loses information and you can't recreate that lost information from assembler. But you can often make a good educated guess.

intro to x86 assembly

I'm looking over an example on assembly in CSAPP (Computer Systems - A programmer's Perspective 2nd) and I just want to know if my understanding of the assembly code is correct.
Practice problem 3.23
int fun_b(unsigned x) {
int val = 0;
int i;
for ( ____;_____;_____) {
}
return val;
}
The gcc C compiler generates the following assembly code:
x at %ebp+8
// what I've gotten so far
1 movl 8(%ebp), %ebx // ebx: x
2 movl $0, %eax // eax: val, set to 0 since eax is where the return
// value is stored and val is being returned at the end
3 movl $0, %ecx // ecx: i, set to 0
4 .L13: // loop
5 leal (%eax,%eax), %edx // edx = val+val
6 movl %ebx, %eax // val = x (?)
7 andl $1, %eax // x = x & 1
8 orl %edx, %eax // x = (val+val) | (x & 1)
9 shrl %ebx Shift right by 1 // x = x >> 1
10 addl $1, %ecx // i++
11 cmpl $32, %ecx // if i < 32 jump back to loop
12 jne .L13
There was a similar post on the same problem with the solution, but I'm looking for more of a walk-through and explanation of the assembly code line by line.
You already seem to have the meaning of the instructions figured out. The comment on lines 7-8 are slightly wrong however, because those assign to eax which is val not x:
7 andl $1, %eax // val = val & 1 = x & 1
8 orl %edx, %eax // val = (val+val) | (x & 1)
Putting this into the C template could be:
for(i = 0; i < 32; i++, x >>= 1) {
val = (val + val) | (x & 1);
}
Note that (val + val) is just a left shift, so what this function is doing is shifting out bits from x on the right and shifting them in into val from the right. As such, it's mirroring the bits.
PS: if the body of the for must be empty you can of course merge it into the third expression.

Reverse engineer optimized c code from assembly

The point of this problem is to reverse engineer c code that was made after running the compiler with level 2 optimization. The original c code is as follows (computes the greatest common divisor):
int gcd(int a, int b){
int returnValue = 0;
if (a != 0 && b != 0){
int r;
int flag = 0;
while (flag == 0){
r = a % b;
if (r ==0){
flag = 1;
} else {
a = b;
b = r;
}
}
returnValue = b;
}
return(returnValue);
}
when I ran the optimized compile I ran this from the command line:
gcc -O2 -S Problem04b.c
to get the assembly file for this optimized code
.gcd:
.LFB12:
.cfi_startproc
testl %esi, %esi
je .L2
testl %edi, %edi
je .L2
.L7:
movl %edi, %edx
movl %edi, %eax
movl %esi, %edi
sarl $31, %edx
idivl %esi
testl %edx, %edx
jne .L9
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L2:
xorl %esi, %esi
movl %esi, %eax
ret
.p2align 4,,10
.p2align 3
.L9:
movl %edx, %esi
jmp .L7
.cfi_endproc
I need to convert this assembly code back to c code here is where I am at right now:
int gcd(int a int b){
/*
testl %esi %esi
sets zero flag if a is 0 (ZF) but doesn't store anything
*/
if (a == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
/*
testl %edi %edi
sets zero flag if b is 0 (ZF) but doesn't store anything
*/
if (b == 0){
/*
xorl %esi %esi
sets the value of a variable to 0. More compact than movl
*/
int returnValue = 0;
/*
movl %esi %eax
ret
return the value just assigned
*/
return(returnValue);
}
do{
int r = b;
int returnValue = b;
}while();
}
Can anyone help me write this back in to c code? I'm pretty much lost.
First of all, you have the values mixed in your code. %esi begins with the value b and %edi begins with the value a.
You can infer from the testl %edx, %edx line that %edx is used as the condition variable for the loop beginning with .L7 (if %edx is different from 0 then control is transferred to the .L9 block and then returned to .L7). We'll refer to %edx as remainder in our reverse-engineered code.
Let's begin reverse-engineering the main loop:
movl %edi, %edx
Since %edi stores a, this is equivalent to initializing the value of remainder with a: int remainder = a;.
movl %edi, %eax
Store int temp = a;
movl %esi, %edi
Perform int a = b; (remember that %edi is a and %esi is b).
sarl $31, %edx
This arithmetic shift instruction shifts our remainder variable 31 bits to the right whilst maintaining the sign of the number. By shifting 31 bits you're setting remainder to 0 if it's positive (or zero) and to -1 if it's negative. So it's equivalent to remainder = (remainder < 0) ? -1 : 0.
idivl %esi
Divide %edx:%eax by %esi, or in our case, divide remainder * temp by b (the variable). The remainder will be stored in %edx, or in our code, remainder. When combining this with the previous instruction: if remainder < 0 then remainder = -1 * temp % b, and otherwise remainder = temp % b.
testl %edx, %edx
jne .L9
Check to see if remainder is equal to 0 - if it's not, jump to .L9. The code there simply sets b = remainder; before returning to .L7. In order to implement this in C, we'll keep a count variable that will store the amount of times the loop has iterated. We'll perform b = remainder at the beginning of the loop but only after the first iteration, meaning when count != 0.
We're now ready to build our full C loop:
int count = 0;
do {
if (count != 0)
b = remainder;
remainder = a;
temp = a;
a = b;
if (remainder < 0){
remainder = -1 * temp % b;
} else {
remainder = temp % b;
}
count++;
} while (remainder != 0)
And after the loop terminates,
movl %esi, %eax
ret
Will return the GCD that the program computed (in our code it'll be stored in the b variable).

understand assembled code line andl 65535 %edx

I have C code and Assembly code. I do not understand line 3 and 6 of assembled code.
C code:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z*48
int t3= t1& 0xFFFF
int t4 = t2 * t3
return t4;
}
Assembly code:
x at % ebp+8, y at %ebp*12, z at %ebp+16
mol 16(%ebp), %eax
leal (%eax, %eax, 2) % eax
sall $4, %eax
movl 12(%ebp) %edx
addl 8(%ebp) %edx
andl $65535, %edx
imull %edx. $eax
on line 6, I do not understand how 65535 becomes 0xFFFF so that we have t3 = t1 & 0xFFFF.
Different question:
Consider the following C functino prototype, where num_t is a data type declared using typedef:
void store_prod(num_t *dest, unsigned x, num_t y)
{*dest=x*yl}
Gcc generates the following assemblyu code implementing the body of the computation:
dest at %ebp+8, x at ebp+12, y at %ebp+16
mov1 12($ebp), $eax
movl 20($ebp), $ecx
imull $eax, $ecx
mull 16(%ebp)
leal (%ecx,%edx), %edx
movl 8(%ebp), %ecx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
line movl 20(%ebp), %ecx there is value in 20(%ebp), how is it grabbing y_t?
line leal (%ecx, %edx), %edx; there is nothing in edx? so what is being added to %ecx to be stored in %edx?
We want to calculate
t2=z*48
So to do that we first do z=z*3 then shift z left by 4(multiply by 16 ==left shift by 4)
line 3 calculates
z=z*3 (z+2z)
And line 4 does left shift by 4.
The compiler often generates combination of add and shift to perform multiplication as multiplication is more costly
As for line 6 65535 is decimal for 0xFFFF.

Resources