understanding decompiling assembly code

understanding decompiling assembly code - c

I have the following code that I was to decompile:
movl $0x2feaf, -0x18(%ebp)
mov 0x8(%ebp), %eax
mov %eax, -0x14(%ebp)
my problem is, I don't understand what 0x8(%ebp) means in the context. I tried the following c code:
int b = 196271;
int a = b;
but that gives me
movl $0x2feaf, -0x8(%ebp)
mov -0x8(%ebp), $eax
mov %eax, -0x4(%ebp)
what does 0x8(%ebp) mean? Thanks!

It means move whatever is at [EBP+8] into the EAX register.
In most contexts, [EBP+8] will be a parameter to the current function.

Related

How to import a library from C to a Assembly code?

I'm trying to convert a C code to Assembly but I have a problem in the Assembly code. In the C code i'm importing the 'conio.h' library to use the getch() function, but in the Assembly code when I have the call of this function the output says error because the getch() function is undefined. I've already tried use the extern command in the asm code but it didnt work. Any ideias how to solve this?
C code for reference:
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
int main()
{
int cont=0;
char *string;
string = (char*)malloc(sizeof(char)*20);
do{
string[cont] = getch();
cont++;
}while (string[cont] != '\0');
printf("%s", string);
return 0;
}
Assembly code for reference:
.intel_syntax
.global main
.text
main:
push %rbp
mov %rbp, %rsp
push %rbx
sub %rsp, 40
mov DWORD PTR [%rbp-28], 0
mov %edi, 20
call malloc
mov QWORD PTR [%rbp-24], %rax
.L2:
mov %eax, DWORD PTR [%rbp-28]
cdqe
mov %rbx, %rax
add %rbx, QWORD PTR [%rbp-24]
mov %eax, 0
call getch ; Call of the getch function
mov BYTE PTR [%rbx], %al
inc DWORD PTR [%rbp-28]
mov %eax, DWORD PTR [%rbp-28]
cdqe
add %rax, QWORD PTR [%rbp-24]
movzx %eax, BYTE PTR [%rax]
test %al, %al
jne .L2
mov %rsi, QWORD PTR [%rbp-24]
mov %edi, OFFSET FLAT:.LC0
mov %eax, 0
call printf
mov %eax, 0
add %rsp, 40
pop %rbx
leave
ret
.LC0:
.string "%s"

Segmentation fault in assembly when multiplying registers?

I was trying to convert the following C code into assembly. Here is the C code:
typedef struct {
int x;
int y;
} point;
int square_distance( point * p ) {
return p->x * p->x + p->y * p->y;
}
My assembly code is as follows:
square_distance:
.LFB23:
.cfi_startproc
movl (%edi), %edx
imull %edx, %edx
movl 4(%edi), %eax
imull %eax, %eax
addl %edx, %eax
ret
.cfi_endproc
I get a segmentation fault when I try to run this program. Could someone please explain why? Thanks! I would be grateful!

Your code is 32 bit code (x86) but you apply the calling convention used with 64 bit code (x64). This can obviously not work.
The x86 calling convention is passing all parameters on the stack.
The x64 calling convention is passing the first parameter in rdi, the second in rsi, the third in rdx, etc. (I'm not sure which registers are used if there are more than 3 parameters, this might also depend on your platform).
Your code is presumably more or less correct for x64 code, that would be something like this:
square_distance:
movl (%rdi), %edx
imull %edx, %edx
movl 4(%rdi), %eax
imull %eax, %eax
addl %edx, %eax
ret
With x86 code the parameters are passed on the stack and the corresponding correct code would be something like this:
square_distance:
movl 4(%esp), edx
movl (%edx), eax
imull eax, eax
movl 4(%edx), edx
imull edx, edx
addl edx, eax
ret
In general the Calling conventions subject is vast, there are other calling conventions depending on the platform and even within the same platform different calling conventions can exist in certain cases.

Just want to supplement Jabberwocky answer. Because my reputation is not enough to comment.
The way of passing paraments when calling functions (also known as calling convention) are different from architectures and operating systems(OS). You can find out many common calling conventions from this wiki
From the wiki we can know that The x64 calling convention on *nix is passing the first six parameters through RDI, RSI, RDX, RCX, R8, R9 registers, while others through stack.

I cannot find where the logical errors are in these two loops. GAS/AT&T Assembly

The goal of this function is to replicate the strupr C function. It's stuck in an infinite iteration and I cannot figure out why, it doesn't seem to end. The way I'm looking at it is the argv from the command line is sent as a single array of chars, I simply try to access it and determine if it has to be changed and then printed - or simply just printed. It receives a single parameter which is that very same argv value.
void lowerToUpper (char *msg);
.globl lowerToUpper
.type lowerToUpper, #function
lowerToUpper:
mov $0,%r12
lea (%rdi), %rdx
jmp .restartLoop
.restartLoop:
add $8, %rdx
cmp $0,%r13
jne .isValid
ret
.isValid:
cmp $97, %r13
jbe .printValue
jg .changeValue
.changeValue:
#sub $32, %r13
jmp .printValue
.printValue:
mov $format2, %edi
mov %r13, %rsi
mov $0, %eax
call printf
add $1,%rdx
jmp .restartLoop
The goal of this function is to print the highest number in an array. It receives a pointer to an unsigned int (an array of numbers) and it's length.
void printHigher (unsigned int *data, int len);
printHigher:
mov $0,%r14
mov $0,%r15
jmp .Loop
ret
.Loop:
mov (%rdi), %r8
cmp %r8,%rdi
jl .calculateMax
jmp .printMax
.calculateMax:
cmp %r8, %r15
jg .assignMax
add $1,%r14
add $8,(%rdi)
jmp .Loop
.assignMax:
mov %r8, %r15
.printMax:
mov $format, %edi
mov %r15, %rsi
mov $0, %eax
call printf

Explain compiled code structure and static allocation

I was looking at the difference in C between char* c = "thomas"; and char c[] = "thomas";. I saw questions about this here and while trying to understand the answers I wanted to check that I was right by looking at the assembly. And a few questions were born.
Here is what I thought :
char* c = ... : the characters are allocated somewhere on the static memory (read only from the program's perspective), alongside with the code. That's why it should be marked const. The string can be printed but not modified.
char c[] = ... : Same as 1. except that when a function is called, the characters are copied in an array on the stack, so it can be modified etc etc.
I wanted to check this so I made this C code :
#include <stdio.h>
int main(){
char c [] = "thomas blabljbflkjbsdflkjbds";
printf("%s\n", c);
}
Looking at the generated assembly :
0x400564 <main>: push rbp
0x400565 <main+1>: mov rbp,rsp
0x400568 <main+4>: sub rsp,0x30
0x40056c <main+8>: mov rax,QWORD PTR fs:0x28
0x400575 <main+17>: mov QWORD PTR [rbp-0x8],rax
0x400579 <main+21>: xor eax,eax
0x40057b <main+23>: mov DWORD PTR [rbp-0x30],0x6978616d
0x400582 <main+30>: mov DWORD PTR [rbp-0x2c],0x6220656d
0x400589 <main+37>: mov DWORD PTR [rbp-0x28],0x6c62616c
0x400590 <main+44>: mov DWORD PTR [rbp-0x24],0x6c66626a
0x400597 <main+51>: mov DWORD PTR [rbp-0x20],0x73626a6b
0x40059e <main+58>: mov DWORD PTR [rbp-0x1c],0x6b6c6664
0x4005a5 <main+65>: mov DWORD PTR [rbp-0x18],0x7364626a
0x4005ac <main+72>: mov BYTE PTR [rbp-0x14],0x0
0x4005b0 <main+76>: lea rax,[rbp-0x30]
0x4005b4 <main+80>: mov rdi,rax
0x4005b7 <main+83>: call 0x400450 <puts#plt>
0x4005bc <main+88>: mov rdx,QWORD PTR [rbp-0x8]
0x4005c0 <main+92>: xor rdx,QWORD PTR fs:0x28
0x4005c9 <main+101>: je 0x4005d0 <main+108>
So characters are copied into the stack, which is what I thought.
Questions :
The characters are stored by bytes at addresses 0x6978616d, 0x6220656d and so on. Why aren't they allocated contiguously in an array ? Simple optimization of the compiler ?
explains why char* doesn't behave like an array and why c[10] isn't the 11th character of the string. However it doesn't explain why
char* c = "thomas blabljbflkjbsdflkjbds";
printf("%s\n", c);
works. (Note the [] -> *). I guess that printf reads characters by characters until it reaches a 0, so knowing just c (i.e &c[0]) how does it access c[10] ? (because of the non contiguous and the fact that this time chars are not copied to an array on the stack)
I hope that I am clear, I can reformulate if you ask/don't understand a point. Thanks

1: 0x6978616d, 0x6220656d are not addresses, it is the data in your string. When converted to from hex to ascii, 0x6978616d = moht, 0x6220656d = b sa.
2: When used in a function call, arrays decay into pointers. So printf will receive a pointer to char regardless of if c is an array or a pointer.

A compiler may actually choose to compile character array initialisation as a copy from read-only storage, but as Klas suggests, that is not happening in your example.
Here is an example of code for which that does happen (using gcc). It may be illuminating to change the definition of STR to strings of various lengths and look at the difference in assembly output.
/* 99 characters */
#define STR "123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789"
void observe(const char *);
void test1() {
char *str = STR;
observe(str);
}
void test2() {
char str[] = STR;
observe(str);
}
And the assembly:
.section .rodata.str1.4,"aMS",#progbits,1
.align 4
.LC0:
.string "123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789"
.text
test2:
pushl %ebp
movl $25, %ecx
movl %esp, %ebp
subl $136, %esp
movl %esi, -8(%ebp)
movl $.LC0, %esi
movl %edi, -4(%ebp)
leal -108(%ebp), %edi
rep movsl
leal -108(%ebp), %eax
movl %eax, (%esp)
call observe
movl -8(%ebp), %esi
movl -4(%ebp), %edi
movl %ebp, %esp
popl %ebp
ret
test1:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $.LC0, (%esp)
call observe
leave
ret

Assembly operator AND

In order to continue this:
Debugging C program (int declaration)
I decided to test more code and see how compiler reacts to it.
So I decided to try this one to test local variables:
#include <stdio.h>
main()
{
int a,b,c,d,e,f,g;
a=0xbeef;
b=0xdead;
c=0x12;
d=0x65;
e=0xfed;
f=0xaa;
g=0xfaceb00c;
a=a+b;
printf("%d",a);
}
Ok I did that int a,b,c... just to test the main's frame size and see the sub $0x10,%esp growing up, (I'm under linux so that is why maybe is sub), now to sub $0x30,%esp
so here is the the gdb output with "disas main" command:
0x0804841c <+0>: push %ebp
0x0804841d <+1>: mov %esp,%ebp
0x0804841f <+3>: and $0xfffffff0,%esp
0x08048422 <+6>: sub $0x30,%esp ;7 int vars 4-byte is 7*4=28. 30 is enough
0x08048425 <+9>: movl $0xbeef,0x14(%esp)
0x0804842d <+17>: movl $0xdead,0x18(%esp)
0x08048435 <+25>: movl $0x12,0x1c(%esp)
0x0804843d <+33>: movl $0x65,0x20(%esp)
0x08048445 <+41>: movl $0xfed,0x24(%esp)
0x0804844d <+49>: movl $0xaa,0x28(%esp)
0x08048455 <+57>: movl $0xfaceb00c,0x2c(%esp)
0x0804845d <+65>: mov 0x18(%esp),%eax
0x08048461 <+69>: add %eax,0x14(%esp)
0x08048465 <+73>: mov 0x14(%esp),%eax
0x08048469 <+77>: mov %eax,0x4(%esp)
0x0804846d <+81>: movl $0x8048510,(%esp)
0x08048474 <+88>: call 0x80482f0 <printf#plt>
0x08048479 <+93>: leave
0x0804847a <+94>: ret
This line: 0x0804841f <+3>:and $0xfffffff0,%esp
what is and operator and why is there a large number?
And why the offset in movl commands isn't negative like: movl $0xa,-0x4(%ebp)
So far I know is the AND is a logical operator like 1 and 1 is 1, 0 and 0 is 0, 1 and 0 is 0 etc...
If it is the case, %esp has the ebp value that was the base frame address of who called the main function.
can any of you explain why this is compiled like this?
I think I'm missing something.
Edit: I saw some "topics" on stackoverflow talking about this. Going to share: link1
link2
link3

Why is the offset in movl $0xbeef,0x14(%esp) not negative?
Because unlike in the other example, addressing is relative to esp, not ebp. esp is on one end of the stack, esp on the other one. So in order to get an address inside the current stack frame, you need to add to esp or subtract from ebp.
Why and $0xfffffff0,%esp?
For alignment. #BlackBear explains this in the answer to your previous question: Debugging C program (int declaration)