Translate C code to assembly code?

Translate C code to assembly code? - c

I have to translate this C code to assembly code:
#include <stdio.h>
int main(){
int a, b,c;
scanf("%d",&a);
scanf("%d",&b);
if (a == b){
b++;
}
if (a > b){
c = a;
a = b;
b = c;
}
printf("%d\n",b-a);
return 0;
}
My code is below, and incomplete.
rdint %eax # reading a
rdint %ebx # reading b
irmovl $1, %edi
subl %eax,%ebx
addl %ebx, %edi
je Equal
irmov1 %eax, %efx #flagged as invalid line
irmov1 %ebx, %egx
irmov1 %ecx, %ehx
irmovl $0, %eax
irmovl $0, %ebx
irmovl $0, %ecx
addl %eax, %efx #flagged as invalid line
addl %ebx, %egx
addl %ecx, %ehx
halt
Basically I think it is mostly done, but I have commented next to two lines flagged as invalid when I try to run it, but I'm not sure why they are invalid. I'm also not sure how to do an if statment for a > b. I could use any suggestions from people who know about y86 assembly language.

From what I can find online (1, 2), the only supported registers are: eax, ecx, edx, ebx, esi, edi, esp, and ebp.
You are requesting non-existent registers (efx and further).
Also irmov is for moving an immediate operand (read: constant numerical value) into a register operand, whereas your irmov1 %eax, %efx has two register operands.
Finally, in computer software there's a huge difference between the character representing digit "one" and the character representing letter "L". Mind your 1's and l's. I mean irmov1 vs irmovl.

Jens,
First, Y86 does not have any efx, egx, and ehx registers, which is why you are getting the invalid lines when you pour the code through YAS.
Second, you make conditional branches by subtracting two registers using the subl instruction and jumping on the condition code set by the Y86 ALU by ways of the jxx instructions.
Check my blog at http://y86tutoring.wordpress.com for details.

Related

Trying to reverse engineer a function

I'm trying to understand assembly in x86 more. I have a mystery function here that I know returns an int and takes an int argument.
So it looks like int mystery(int n){}. I can't figure out the function in C however. The assembly is:
mov %edi, %eax
lea 0x0(,%rdi, 8), %edi
sub %eax, %edi
add $0x4, %edi
callq < mystery _util >
repz retq
< mystery _util >
mov %edi, %eax
shr %eax
and $0x1, %edi
and %edi, %eax
retq
I don't understand what the lea does here and what kind of function it could be.

The assembly code appeared to be computer generated, and something that was probably compiled by GCC since there is a repz retq after an unconditional branch (call). There is also an indication that because there isn't a tail call (jmp) instead of a call when going to mystery_util that the code was compiled with -O1 (higher optimization levels would likely inline the function which didn't happen here). The lack of frame pointers and extra load/stores indicated that it isn't compiled with -O0
Multiplying x by 7 is the same as multiplying x by 8 and subtracting x. That is what the following code is doing:
lea 0x0(,%rdi, 8), %edi
sub %eax, %edi
LEA can compute addresses but it can be used for simple arithmetic as well. The syntax for a memory operand is displacement(base, index, scale). Scale can be 1, 2, 4, 8. The computation is displacement + base + index * scale. In your case lea 0x0(,%rdi, 8), %edi is effectively EDI = 0x0 + RDI * 8 or EDI = RDI * 8. The full calculation is n * 7 - 4;
The calculation for mystery_util appears to simply be
n &= (n>>1) & 1;
If I take all these factors together we have a function mystery that passes n * 7 - 4 to a function called mystery_util that returns n &= (n>>1) & 1.
Since mystery_util returns a single bit value (0 or 1) it is reasonable that bool is the return type.
I was curious if I could get a particular version of GCC with optimization level 1 (-O1) to reproduce this assembly code. I discovered that GCC 4.9.x will yield this exact assembly code for this given C program:
#include<stdbool.h>
bool mystery_util(unsigned int n)
{
n &= (n>>1) & 1;
return n;
}
bool mystery(unsigned int n)
{
return mystery_util (7*n+4);
}
The assembly output is:
mystery_util:
movl %edi, %eax
shrl %eax
andl $1, %edi
andl %edi, %eax
ret
mystery:
movl %edi, %eax
leal 0(,%rdi,8), %edi
subl %eax, %edi
addl $4, %edi
call mystery_util
rep ret
You can play with this code on godbolt.
Important Update - Version without bool
I apparently erred in interpreting the question. I assumed the person asking this question determined by themselves that the prototype for mystery was int mystery(int n). I thought I could change that. According to a related question asked on Stackoverflow a day later, it seems int mystery(int n) is given to you as the prototype as part of the assignment. This is important because it means that a modification has to be made.
The change that needs to be made is related to mystery_util. In the code to be reverse engineered are these lines:
mov %edi, %eax
shr %eax
EDI is the first parameter. SHR is logical shift right. Compilers would only generate this if EDI was an unsigned int (or equivalent). int is a signed type an would generate SAR (arithmetic shift right). This means that the parameter for mystery_util has to be unsigned int (and it follows that the return value is likely unsigned int. That means the code would look like this:
unsigned int mystery_util(unsigned int n)
{
n &= (n>>1) & 1;
return n;
}
int mystery(int n)
{
return mystery_util (7*n+4);
}
mystery now has the prototype given by your professor (bool is removed) and we use unsigned int for the parameter and return type of mystery_util. In order to generate this code with GCC 4.9.x I found you need to use -O1 -fno-inline. This code can be found on godbolt. The assembly output is the same as the version using bool.
If you use unsigned int mystery_util(int n) you would discover that it doesn't quite output what we want:
mystery_util:
movl %edi, %eax
sarl %eax ; <------- SAR (arithmetic shift right) is not SHR
andl $1, %edi
andl %edi, %eax
ret

The LEA is just a left-shift by 3, and truncating the result to 32 bit (i.e. zero-extending EDI into RDI implicilty). x86-64 System V passes the first integer arg in RDI, so all of this is consistent with one int arg. LEA uses memory-operand syntax and machine encoding, but it's really just a shift-and-add instruction. Using it as part of a multiply by a constant is a common compiler optimization for x86.
The compiler that generated this function missed an optimization here; the first mov could have been avoided with
lea 0x0(,%rdi, 8), %eax # n << 3 = n*8
sub %edi, %eax # eax = n*7
lea 4(%rax), %edi # rdi = 4 + n*7
But instead, the compiler got stuck on generating n*7 in %edi, probably because it applied a peephole optimization for the constant multiply too late to redo register allocation.
mystery_util returns the bitwise AND of the low 2 bits of its arg, in the low bit, so a 0 or 1 integer value, which could also be a bool.
(shr with no count means a count of 1; remember that x86 has a special opcode for shifts with an implicit count of 1. 8086 only has counts of 1 or cl; immediate counts were added later as an extension and the implicit-form opcode is still shorter.)

The LEA performs an address computation, but instead of dereferencing the address, it stores the computed address into the destination register.
In AT&T syntax, lea C(b,c,d), reg means reg = C + b + c*d where C is a constant, and b,c are registers and d is a scalar from {1,2,4,8}. Hence you can see why LEA is popular for simple math operations: it does quite a bit in a single instruction. (*includes correction from prl's comment below)
There are some strange features of this assembly code: the repz prefix is only strictly defined when applied to certain instructions, and retq is not one of them (though the general behavior of the processor is to ignore it). See Michael Petch's comment below with a link for more info. The use of lea (,rdi,8), edi followed by sub eax, edi to compute arg1 * 7 also seemed strange, but makes sense once prl noted the scalar d had to be a constant power of 2. In any case, here's how I read the snippet:
mov %edi, %eax ; eax = arg1
lea 0x0(,%rdi, 8), %edi ; edi = arg1 * 8
sub %eax, %edi ; edi = (arg1 * 8) - arg1 = arg1 * 7
add $0x4, %edi ; edi = (arg1 * 7) + 4
callq < mystery _util > ; call mystery_util(arg1 * 7 + 4)
repz retq ; repz prefix on return is de facto nop.
< mystery _util >
mov %edi, %eax ; eax = arg1
shr %eax ; eax = arg1 >> 1
and $0x1, %edi ; edi = 1 iff arg1 was odd, else 0
and %edi, %eax ; eax = 1 iff smallest 2 bits of arg1 were both 1.
retq
Note the +4 on the 4th line is entirely spurious. It cannot affect the outcome of mystery_util.
So, overall this ASM snippet computes the boolean (arg1 * 7) % 4 == 3.

Y86 Stopped in 1 Steps Exception HLT

I translated a simple c program into IA32 and then transliterated it into Y86 but I am getting an error I don't understand or know how to debug since I am just learning Y86. The error is:
Stopped in 1 steps at PC = 0x1. Exception 'HLT', CC Z=1 S=0 O=0
Changes to registers:
Changes to memory:
The program is supposed to initialize i to 0 and then proceed through a for loop until i is greater than or equal to 5 and increment i each time. Inside the for loop I set j equal to i*2 and k is equal to j+1. My Y86 code is as follows:
main:
irmovl $0, %ebx
jmp L2
halt
L3:
rrmovl %ebx, %eax
addl %eax, %eax
rrmovl %eax, %ecx
rrmovl %ecx, %eax
irmovl $1, %esi
addl %esi, %eax
rrmovl %eax, %edx
addl %esi, %ebx
L2:
irmovl $4, %edi
subl %edi, %ebx
jle L3
I can provide the C code and IA32 code I transliterated from if it would help you answer my problem I really need some help thanks.

You forgot to add the extra NL (CR) in the source file. YAS is flawed. When it assembles it wraps the assembly around if there is no NL (CR) at the end of the source (ys) file, overwriting the beginning of the created object code (yo) file.
The result... well it is bad. In your case, YAS probably inserted a HLT instruction in the first character of the object file.

xorl %eax - Instruction set architecture in IA-32

I am experiencing some difficulties interpreting this exercise;
What does exactly xorl does in this assembly snippet?
C Code:
int i = 0;
if (i>=55)
i++;
else
i--;
Assembly
xorl ____ , %ebx
cmpl ____ , %ebx
Jel .L2
____ %ebx
.L2:
____ %ebx
.L3:
What's happening on the assembly part?

It's probably:
xorl %ebx, %ebx
This is a common idiom for zeroing a register on x86. This would correspond with i = 0 in the C code.
If you are curious "but why ?" the short answer is that the xor instruction is fewer bytes than mov $0, %ebx. The long answer includes other subtle reasons.
I am leaving out the rest of the exercise since there's nothing idiosyncratic left.

This is the completed and commented assembly equivalent to your C code:
xorl %ebx , %ebx ; i = 0
cmpl $54, %ebx
jle .L2 ; if (i <= 54) jump to .L2, otherwise continue with the next instruction (so if i>54... which equals >=55 like in your C code)
addl $2, %ebx ; >54 (or: >=55)
.L2:
decl %ebx ; <=54 (or <55, the else-branch of your if) Note: This code also gets executed if i >= 55, hence why we need +2 above so we only get +1 total
.L3:
So, these are the (arithmetic) instructions that get executed for all numbers >=55:
addl $2, %ebx
decl %ebx
So for numbers >=55, this is equal to incrementing. The following (arithmetic) instructions get executed for numbers <55:
decl %ebx
We jump over the addl $2, %ebx instruction, so for numbers <55 this is equal to decrementing.
In case you're not allowed to type addl $2, (since it's not just the instruction but also an argument) into a single blank there's probably an error in the asm code you've been given (missing a jump between line 4 and 5 to .L3).
Also note that jel is clearly a typo for jle in the question.

XORL is used to initialize a register to Zero, mostly used for the counter. The code from ccKep is correct, only that he incremented by a wrong value ie. 2 instead of 1. The correct version is therefore:
xorl %ebx , %ebx # i = 0
cmpl $54, %ebx # compare the two
jle .L2 #if (i <= 54) jump to .L2, otherwise continue with the next instruction (so if i>54... which equals >=55 like in your C code)
incl %ebx #i++
jmp .DONE # jump to exit position
.L2:
decl %ebx # <=54 (or <55
.DONE:

Segmentation Fault when getting the max of three numbers in Assembly x86

I am trying to get the max of three numbers using C to call a method in Assembly 32 bit AT & T. When the program runs, I get a segmentation fault(core dumped) error and cannot figure out why. My input has been a mix of positive/negative numbers and 1,2,3, both with the same error as a result.
Assembly
# %eax - first parameter
# %ecx - second parameter
# %edx - third parameter
.code32
.file "maxofthree.S"
.text
.global maxofthree
.type maxofthree #function
maxofthree:
pushl %ebp # old ebp
movl %esp, %ebp # skip over
movl 8(%ebp), %eax # grab first value
movl 12(%ebp), %ecx # grab second value
movl 16(%ebp), %edx # grab third value
#test for first
cmpl %ecx, %eax # compare first and second
jl firstsmaller # first smaller than second, exit if
cmpl %edx, %eax # compare first and third
jl firstsmaller # first smaller than third, exit if
leave # reset the stack pointer and pop the old base pointer
ret # return since first > second and first > third
firstsmaller: # first smaller than second or third, resume comparisons
#test for second and third against each other
cmpl %edx, %ecx # compare second and third
jg secondgreatest # second is greatest, so jump to end
movl %eax, %edx # third is greatest, move third to eax
leave # reset the stack pointer and pop the old base pointer
ret # return third
secondgreatest: # second > third
movl %ecx, %eax #move second to eax
leave # reset the stack pointer and pop the old base pointer
ret # return second
C code
#include <stdio.h>
#include <inttypes.h>
long int maxofthree(long int, long int, long int);
int main(int argc, char *argv[]) {
if (argc != 4) {
printf("Missing command line arguments. Instructions to"
" execute this program:- .\a.out <num1> <num2> <num3>");
return 0;
}
long int x = atoi(argv[1]);
long int y = atoi(argv[2]);
long int z = atoi(argv[3]);
printf("%ld\n", maxofthree(x, y, z)); // TODO change back to (x, y, z)
}

The code is causing a segmentation fault because it is trying to jump back to an invalid return address when the ret instruction is executed. This happens for all three different ret instructions.
The reason why it is occurring is because you don't pop the old base pointer before returning. A small change to the code will remove the fault. Change each ret instruction to:
leave
ret
The leave instruction will do the following:
movl %ebp, %esp
popl %ebp
Which will reset the stack pointer and pop the old base pointer that you saved.
Also, your comparisons are not doing what they are specified to do in the comments. When you do:
cmp %eax, %edx
jl firstsmaller
The jump will happen when %edx is smaller than %eax. So you want the code be
cmpl %edx, %eax
jl firstsmaller
which will jump when %eax is smaller than %edx, as specified in the comment.
Reference this this page for details on the cmp instruction in AT&T/GAS syntax.

You forgot to pop ebp before returning from the function.
Also, cmpl %eax, %ecx compares ecx to eax not the other way. So the code
cmpl %eax, %ecx
jl firstsmaller
will jump if ecx is smaller than eax.

How to interpret this IA32 assembly code [duplicate]

This question already has an answer here:
Pointers in Assembly
(1 answer)
Closed 9 years ago.
As an assignment, I must read the assembly code and write its high-level C program. The structure in this case is a switch statement, so for each case, the assembly code is translated to the case in C code. Below will only be one of the cases. If you can help me interpret this, it should give me a better understanding of the rest of the cases.
p1 is in %ebp+8
p2 is in %ebp+12
action is in %ebp+16
result is in %edx
...
.L13:
movl 8(%ebp), %eax # get p1
movl (%eax), %edx # result = *p1?
movl 12(%ebp), %ecx # get p2
movl (%ecx), %eax # p1 = *p2
movl 8(%ebp), %ecx # p2 = p1
movl %eax, (%ecx) # *p1 = *p2?
jmp .L19 #jump to default
...
.L19
movl %edx, %eax # set return value
Of course, the comments were added by me to try to make sense of it, except it leaves me more confused. Is this meant to be a swap? Probably not; the formatting would be different. What really happens in the 2nd and 6th lines? Why is %edx only changed once so early if it's the return value? Please answer with some guidelines to interpreting this code.

The above snippet is x86_32 assembly in the (IMO broken) AT&T syntax.
AT&T syntax attaches a size suffix to every operand.
movl means a 32 bit operand. (l for long)
movw means 16 bit operand (w for word)
movb means an 8 bit operand (b for byte)
The operands are reversed in order, so the destination is on the right and the source is on the left.
This is opposite to almost every other programming language.
Register names are prefixed by a % to distinguish them from variable names.
If a register is surrounded by brackets () that means that the memory address pointed to by the register is used, rather than the value inside the register itself.
This makes sense, because EBP is used as a pointer to a stackframe.
Stackframes are used to access parameters and local variables in functions.
Instead of writing: mov eax, dword ptr [ebp+8] (Intel syntax)
AT&T syntax lists it as: movl 8(%ebp), %eax (gas syntax)
Which means: put the contends of the memory address pointed to by (ebp + 8) into eax.
Here's the translation:
.L13: <<-- label used as a jump target.
movl 8(%ebp), %eax <<-- p1, stored at ebp+8 goes into EAX
movl (%eax), %edx <<-- p1 is a pointer, EDX = p1->next
movl 12(%ebp), %ecx <<-- p2, stored at ebp+12 goes in ECX
movl (%ecx), %eax <<-- p2 is (again) a pointer, EAX = p2->next
movl 8(%ebp), %ecx <<-- ECX = p1
movl %eax, (%ecx) <<-- p2->next = p1->next
jmp .L19 <<-- jump to exit
...
.L19
movl %edx, %eax <<-- EAX is always the return value
<<-- return p1->data.
In all of the many calling conventions on x86 the return value of a function is put into the EAX register. (or EAX:EDX if it's an INT64)
In prose: p1 and p2 are pointers to data, in this data pointers to pointers.
This code looks like it manipulates a linked list.
p2->next is set to p1->next.
Other than that the snippet looks incomplete because whatever was in p2->next to begin with is not worked on so there probably more code that you're not showing.
Apart from the confusing AT&T syntax it's really very simple code.
In C the code would look like:
(void *)p2->next = (void *)p1->next;
Note that the code is quite inefficient and no decent compiler (or human) would generate this code.
The following equivalent would make more sense:
mov eax,[ebp+8]
mov ecx,[ebp+12]
mov eax,[eax]
mov [ecx],eax
jmp done
More info on the difference between AT&T and Intel syntax can be found here: http://www.ibm.com/developerworks/linux/library/l-gas-nasm/index.html

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Translate C code to assembly code? - c

Related

Trying to reverse engineer a function

Y86 Stopped in 1 Steps Exception HLT

xorl %eax - Instruction set architecture in IA-32

Segmentation Fault when getting the max of three numbers in Assembly x86

How to interpret this IA32 assembly code [duplicate]

Categories

Resources