Not understanding Hopper decompiler output - c

I know some C and a little bit of assembly and wanted to start learning about reverse engineering, so I downloaded the trial of Hopper Disassembler for Mac. I created a super basic C program:
int main() {
int a = 5;
return 0;
}
And compiled it with the -g flag (because I saw this before and wasn't sure if it mattered):
gcc -g simple.c
Then I opened the a.out file in Hopper Disassembler and clicked on the Pseudo Code button and it gave me:
int _main() {
rax = 0x0;
var_4 = 0x0;
var_8 = 0x5;
rsp = rsp + 0x8;
rbp = stack[2047];
return 0x0;
}
The only line I sort of understand here is setting a variable to 0x5. I'm unable to comprehend what all these additional lines are for (such as the rsp = rsp + 0x8;), for such a simple program. Would anyone be willing to explain this to me?
Also if anyone knows of good sources/tutorials for an intro into reverse engineering that'd be very helpful as well. Thanks.

Looks like it is doing a particularly poor job of producing "disassembly pseudocode" (whatever that is -- is it a disassembler or a decompliler? Can't decide)
In this case it looks like it has has elided the stack frame setup (the function prolog), but not the cleanup (function epilog). So you'll get a much better idea of what is going on by using an actual disassembler to look at the actual disassembly code:
$ gcc -c simple.c
$ objdump -d simple.o
simple.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
b: b8 00 00 00 00 mov $0x0,%eax
10: 5d pop %rbp
11: c3 retq
So what we have here is code to set up a stack frame (address 0-1), the assignment you have (4), setting up the return value (b), tearing down the frame (10) and then returning (11). You might see something different due to using a different version of gcc or a different target.
In the case of your disassembly, the first part has been elided (left out as being an uninteresting housekeeping task) by the disassembler, but the second to last part (which undoes the first part) has not.

What you're looking at is decompiled code. Every decompiler ouptutwill look something close to that because it's not going to try and get variable names because they can be changed so often and usually are.
So it will put them in a 'var_??' with a number attached to the end. Once you learn about reverse engineering and know the language you're programming in very well, you can understand the code. It's no different when you're trying to de-obfuscate PHP, JavaScript code, etc.
If you ever get into reverse engineering malware be prepared because nothing is going to be easy. You're going to have different packers, obfuscators, messed-up code, VM detection routines, etc. So buckle down and get ready for a long road ahead if reverse engineering is your goal.

Related

Apple clang -O1 not optimizing enough?

I have this code in C:
int main(void)
{
int a = 1 + 2;
return 0;
}
When I objdump -x86-asm-syntax=intel -d a.out which is compiled with -O0 flag with GCC 9.3.0_1, I get:
0000000100000f9e _main:
100000f9e: 55 push rbp
100000f9f: 48 89 e5 mov rbp, rsp
100000fa2: c7 45 fc 03 00 00 00 mov dword ptr [rbp - 4], 3
100000fa9: b8 00 00 00 00 mov eax, 0
100000fae: 5d pop rbp
100000faf: c3 ret
and with -O1 flag:
0000000100000fc2 _main:
100000fc2: b8 00 00 00 00 mov eax, 0
100000fc7: c3 ret
which removes the unused variable a and stack managing altogether.
However, when I use Apple clang version 11.0.3 with -O0 and -O1, I get
0000000100000fa0 _main:
100000fa0: 55 push rbp
100000fa1: 48 89 e5 mov rbp, rsp
100000fa4: 31 c0 xor eax, eax
100000fa6: c7 45 fc 00 00 00 00 mov dword ptr [rbp - 4], 0
100000fad: c7 45 f8 03 00 00 00 mov dword ptr [rbp - 8], 3
100000fb4: 5d pop rbp
100000fb5: c3 ret
and
0000000100000fb0 _main:
100000fb0: 55 push rbp
100000fb1: 48 89 e5 mov rbp, rsp
100000fb4: 31 c0 xor eax, eax
100000fb6: 5d pop rbp
100000fb7: c3 ret
respectively.
I never get the stack managing part stripped off as in GCC.
Why does (Apple) Clang keep unnecessary push and pop?
This may or may not be a separate question, but with the following code:
int main(void)
{
// return 0;
}
GCC creates a same ASM with or without the return 0;.
However, Clang -O0 leaves this extra
100000fa6: c7 45 fc 00 00 00 00 mov dword ptr [rbp - 4], 0
when there is return 0;.
Why does Clang keep these (probably) redundant ASM codes?
I suspect you were trying to see the addition happen.
int main(void)
{
int a = 1 + 2;
return 0;
}
but with optimization say -O2, your dead code went away
00000000 <main>:
0: 2000 movs r0, #0
2: 4770 bx lr
The variable a is local, it never leaves the function it does not rely on anything outside of the function (globals, input variables, return values from called functions, etc). So it has no functional purpose it is dead code it doesn't do anything so an optimizer is free to remove it and did.
So I assume you went to use no or less optimization and then saw it was too verbose.
00000000 <main>:
0: cf 93 push r28
2: df 93 push r29
4: 00 d0 rcall .+0 ; 0x6 <main+0x6>
6: cd b7 in r28, 0x3d ; 61
8: de b7 in r29, 0x3e ; 62
a: 83 e0 ldi r24, 0x03 ; 3
c: 90 e0 ldi r25, 0x00 ; 0
e: 9a 83 std Y+2, r25 ; 0x02
10: 89 83 std Y+1, r24 ; 0x01
12: 80 e0 ldi r24, 0x00 ; 0
14: 90 e0 ldi r25, 0x00 ; 0
16: 0f 90 pop r0
18: 0f 90 pop r0
1a: df 91 pop r29
1c: cf 91 pop r28
1e: 08 95 ret
If you want to see addition happen instead first off don't use main() it has baggage, and the baggage varies among toolchains. So try something else
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+b);
}
now the addition relies on external items so the compiler cannot optimize any of this away.
00000000 <_fun>:
0: 1d80 0002 mov 2(sp), r0
4: 6d80 0004 add 4(sp), r0
8: 0087 rts pc
If we want to figure out which one is a and which one is b then.
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+(b<<1));
}
00000000 <_fun>:
0: 1d80 0004 mov 4(sp), r0
4: 0cc0 asl r0
6: 6d80 0002 add 2(sp), r0
a: 0087 rts pc
Want to see an immediate value
unsigned int fun ( unsigned int a )
{
return(a+0x321);
}
00000000 <fun>:
0: 8b 44 24 04 mov eax,DWORD PTR [esp+0x4]
4: 05 21 03 00 00 add eax,0x321
9: c3 ret
you can figure out what the compilers return address convention is, etc.
But you will hit some limits trying to get the compiler to do things for you to learn asm likewise you can easily take the code generated by these compilations
(using -save-temps or -S or disassemble and type it in (I prefer the latter)) but you can only get so far on your operating system in high level/C callable functions. Eventually you will want to do something bare-metal (on a simulator at first) to get maximum freedom and to try instructions you cant normally try or try them in a way that is hard or you don't quite understand yet how to use in the confines of an operating system in a function call. (please do not use inline assembly until down the road or never, use real assembly and ideally the assembler not the compiler to assemble it, down the road then try those things).
The one compiler was built for or defaults to using a stack frame so you need to tell the compiler to omit it. -fomit-frame-pointer. Note that one or both of these can be built to default not to have a frame pointer.
../gcc-$GCCVER/configure --target=$TARGET --prefix=$PREFIX --without-headers --with-newlib --with-gnu-as --with-gnu-ld --enable-languages='c' --enable-frame-pointer=no
(Don't assume gcc nor clang/llvm have a "standard" build as they are both customizable and the binary you downloaded has someone's opinion of the standard build)
You are using main(), this has the return 0 or not thing and it can/will carry other baggage. Depends on the compiler and settings. Using something not main gives you the freedom to pick your inputs and outputs without it warning that you didn't conform to the short list of choices for main().
For gcc -O0 is ideally no optimization although sometimes you see some. -O3 is max give me all you got. -O2 is historically where folks live if for no other reason than "I did it because everyone else is doing it". -O1 is no mans land for gnu it has some items not in -O0 but not a lot of good ones in -O2, so depends heavily on your code as to whether or not you landed in one/some of the optimizations associated with -O1. These numbered optimization things if your compiler even has a -O option is just a pre-defined list 0 means this list 1 means that list and so on.
There is no reason to expect any two compilers or the same compiler with different options to produce the same code from the same sources. If two competing compilers were able to do that most if not all of the time something very fishy is going on...Likewise no reason to expect the list of optimizations each compiler supports, what each optimization does, etc, to match much less the -O1 list to match between them and so on.
There is no reason to assume that any two compilers or versions conform to the same calling convention for the same target, it is much more common now and further for the processor vendor to create a recommended calling convention and then the competing compilers to often conform to that because why not, everyone else is doing it, or even better, whew I don't have to figure one out myself, if this one fails I can blame them.
There are a lot of implementation defined areas in C in particular, less so in C++ but still...So your expectations of what come out and comparing compilers to each other may differ for this reason as well. Just because one compiler implements some code in some way doesn't mean that is how that language works sometimes it is how that compiler author(s) interpreted the language spec or had wiggle room.
Even with full optimizations enabled, everything that compiler has to offer there is no reason to assume that a compiler can outperform a human. Its an algorithm with limits programmed by a human, it cannot outperform us. With experience it is not hard to examine the output of a compiler for sometimes simple functions but often for larger functions and find missed optimizations, or other things that could have been done "better" for some opinion of "better". And sometimes you find the compiler just left something in that you think it should have removed, and sometimes you are right.
There is education as shown above in using a compiler to start to learn assembly language, and even with decades of experience and dabbling with dozens of assembly languages/instruction sets, if there is a debugged compiler available I will very often start with disassembling simple functions to start learning that new instruction set, then look those up then start to get a feel from what I find there for how to use it.
Very often starting with this one first:
unsigned int fun ( unsigned int a )
{
return(a+5);
}
or
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+b);
}
And going from there. Likewise when writing a disassembler or a simulator for fun to learn the instruction set I often rely on an existing assembler since it is often the documentation for a processor is lacking, the first assembler and compiler for that processor are very often done with direct access to the silicon folks and then those that follow can also use existing tools as well as documentation to figure things out.
So you are on a good path to start learning assembly language I have strong opinions on which ones to or not to start with to improve the experience and chances of success, but I have been in too many battles on Stack Overflow this week, I'll let that go. You can see that I chose an array of instruction sets in this answer. And even if you don't know them you can probably figure out what the code is doing. "standard" installs of llvm provide the ability to output assembly language for several instruction sets from the same source code. The gnu approach is you pick the target (family) when you compile the toolchain and that compiled toolchain is limited to that target/family but you can easily install several gnu toolchains on your computer at the same time be they variations on defaults/settings for the same target or different targets. A number of these are apt gettable without having to learn to build the tools, arm, avr, msp430, x86 and perhaps some others.
I cannot speak to the why does it not return zero from main when you didn't actually have any return code. See comments by others and read up on the specs for that language. (or ask that as a separate question, or see if it was already answered).
Now you said Apple clang not sure what that reference was to I know that Apple has put a lot of work into llvm in general. Or maybe you are on a mac or in an Apple supplied/suggested development environment, but check Wikipedia and others, clang had a lot of corporate help not just Apple, so not sure what the reference was there. If you are on an Apple computer then the apt gettable isn't going to make sense, but there are still lots of pre-built gnu (and llvm) based toolchains you can download and install rather than attempt to build the toolchain from sources (which isn't difficult BTW).

How do I run a function from its hex version in C?

Say I want to convert a certain function into hex
void func(char* string) {
puts(string);
}
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 83 ec 10 sub $0x10,%rsp
1141: 48 89 7d f8 mov %rdi,-0x8(%rbp)
1145: 48 8b 45 f8 mov -0x8(%rbp),%rax
1149: 48 89 c7 mov %rax,%rdi
114c: e8 df fe ff ff callq 1030 <puts#plt>
1151: 90 nop
1152: c9 leaveq
1153: c3 retq
This is what I got on x86_64: \x55\x48\x89\xe5\x48\x83\xec\x10\x48\x89\x7d\xf8\x48\x8b\x45\xf8\x48\x89\xc7\xe8\xdf\xfe\xff\xff\x90\xc9\xc3
encrypt it and use it in this program. A decryptor at the start to decrypt these instructions at run time so it can't be analyzed statically.
Converting the above function into hex and creating a function pointer for it doesn't run and ends with SIGSEGV at push %rbp.
My aim is to make this code print Hi.
int main() {
char* decrypted = decrypt(hexcode);
void (*func)(char*) = (void)(*)(char)) decrypted;
func("HI");
}
My questions are:
How do I convert a function into hex properly.
How do I then run this hex code from main as shown above?
If you want to execute a binary blob; then you need to do something like this:
void *p = mmap(0, blob_size, PROT_WRITE, MAP_ANON, NOFD, 0);
read(blob_file, p, blob_size);
mprotect(p, blob_size, PROT_EXEC);
void (*UndefinedBehaviour)(char *x) = p;
UndefinedBehaviour("HI");
The allocates some memory, copies a blob into it, changes the memory to be PROT_EXEC, then invokes the blob at its beginning. You need to add some error checking, and depending upon what sort of system you are on, it may be running malware monitors to prevent you from doing this.
Answer for 1. : It is near impossible to do it automatically, because there is no simple way for determining the length of function code - it depends to machine CPU, compiler optimizations etc. Only way is "manual" analysis of disassembled binary.
You can't for those instructions because they're not fully position-independent and self-contained.
e8 df fe ff ff is a call rel32 (with a little-endian relative displacement as the call target). It only works if that displacement reaches the puts#plt stub, and that only happens in the executable you're disassembling, where this code appears at a fixed distance from the PLT. (So the executable itself is position-independent when relocated as a whole, but taking the machine code for one function and trying to run it from some other address will break.)
In theory you could fixup the call target using a function pointer to puts in some code that included this machine code in an array, but if you're trying to make shellcode you can't depend on the "target" process helping you that way.
Instead you should use system calls directly via the syscall instruction, for example Linux syscall with RAX=1=__NR_write is write. (Not via their libc wrapper functions like write(), that would have exactly the same problem as puts).
Then you can refer to How to get c code to execute hex bytecode? for how to put machine code in a C array, make sure that's in an executable page (e.g. gcc -z execstack or mprotect or mmap), and cast that to a function pointer + call it like you're doing here.
ends with SIGSEGV at push %rbp
Yup, code-fetch from a page without EXEC permission will do that. gcc -z execstack is an easy way to fix that, or mmap like other answers suggest, at which point execution will get as far as the call -289 and fault or run bad instructions.

Why assembly code is different for simple C program with different gcc version?

I'm understanding the basics of assembly and c programming.
I compiled following simple program in C,
#include <stdio.h>
int main()
{
int a;
int b;
a = 10;
b = 88
return 0;
}
Compiled with following command,
gcc -ggdb -fno-stack-protector test.c -o test
The disassembled code for above program with gcc version 4.4.7 is:
5 push %ebp
89 e5 mov %esp,%ebp
83 ec 10 sub $0x10,%esp
c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
c7 45 fc 58 00 00 00 movl $0x58,-0x4(%ebp)
b8 00 00 00 00 mov $0x0,%eax
c9 leave
c3 ret
90 nop
However disassembled code for same program with gcc version 4.3.3 is:
8d 4c 23 04 lea 0x4(%esp), %ecx
83 e4 f0 and $0xfffffff0, %esp
55 push -0x4(%ecx)
89 e5 mov %esp,%ebp
51 push %ecx
83 ec 10 sub $0x10,%esp
c7 45 f4 0a 00 00 00 00 movl $0xa, -0xc(%ebp)
c7 45 f8 58 00 00 00 00 movl $0x58, -0x8(%ebp)
b8 00 00 00 00 mov $0x0, %eax
83 c4 10 add $0x10,%esp
59 pop %ecx
5d pop %ebp
8d 61 fc lea -0x4(%ecx),%esp
c3 ret
Why there is difference in the assembly code?
As you can see in second assembled code, Why pushing %ecx on stack?
What is significance of and $0xfffffff0, %esp?
note: OS is same
Compilers are not required to produce identical assembly code for the same source code. The C standard allows the compiler to optimize the code as they see fit as long as the observable behaviour is the same. So, different compilers may generate different assembly code.
For your code, GCC 6.2 with -O3 generates just:
xor eax, eax
ret
because your code essentially does nothing. So, it's reduced to a simple return statement.
To give you some idea, how many ways exists to create valid code for particular task, I thought this example may help.
From time to time there are size coding competitions, obviously targetting Assembly programmers, as you can't compete with compiler against hand written assembly at this level at all.
The competition tasks are fairly trivial to make the entry level and total effort reasonable, with precise input and output specifications (down to single byte or pixel perfection).
So you have almost trivial exact task, human produced code (at the moment still outperforming compilers for trivial task), with single simple rule "minimal size" as a goal.
With your logic it's absolutely clear every competitor should produce the same result.
The real world answer to this is for example:
Hugi Size Coding Competition Series - Compo29 - Random Maze Builder
12 entries, size of code (in bytes): 122, 122, 128, 135, 136, 137, 147, ... 278 (!).
And I bet the first two entries, both having 122B are probably different enough (too lazy to actually check them).
Now producing valid machine code from high level programming language and by machine (compiler) is lot more complex task. And compilers can't compete with humans in reasoning, most of the "how good code is produced by c++ compiler" stems from C++ language itself being defined quite close to machine code (easy to compile) and from brute CPU power allowing the compilers to work on thousands of variants for particular code path, searching for near-optimal solution mostly by brute force.
Still the numerical "reasoning" behind the optimizers are state of art in their own way, getting to the point where human are still unreachable, but more like in their own way, just as humans can't achieve the efficiency of compilers within reasonable effort for full-sized app compilation.
At this point reasoning about some debug code being different in few helper prologue/epilogue instructions... Even if you would find difference in optimized code, and the difference being "obvious" to human, it's still quite a feat the compiler can produce at least that, as compiler has to apply universal rules on specific code, without truly understanding the context of task.

How do I get the size of C instruction ?

How do you get the size of a C instruction? For example I want to know, if I am working on Windows XP with Intel Core2Duo processor, how much space does the instruction:
while(1==9)
{
printf("test");
}
takes?
C does not have a notion of "instruction", let alone a notion of "size of an instruction", so the question doesn't make sense in the context of "programming in C".
The fundamental element of a C program is a statement, which can be arbitrarily complex.
If you care about the details of your implementation, simply inspect the machine code that your compiler generates. That one is broken down in machine instructions, and you can usually see how many bytes are required to encode each instruction.
For example, I compile the program
#include <cstdio>
int main() { for (;;) std::printf("test"); }
with
g++ -c -o /tmp/out.o /tmp/in.cpp
and disassemble with
objdump -d /tmp/out.o -Mintel
and I get:
00000000 <main>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: 83 ec 10 sub esp,0x10
9: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
10: e8 fc ff ff ff call 11 <main+0x11>
15: eb f2 jmp 9 <main+0x9>
As you can see, the call instruction requires 5 bytes in my case, and the jump for the loop requires 2 bytes. (Also note the conspicuous absence of a ret statement.)
You can look up the instruction in your favourite x86 documentation and see that E8 is the single-byte call opcode; the subsequent four bytes are the operand.
gcc has an extension that returns the address of a label as a value of type void*.
Maybe you can use that for your intents.
/* UNTESTED --- USE AT YOUR OWN RISK */
labelbegin:
while (1 == 9) {
printf("test\n");
}
labelend:
printf("length: %d\n",
(int)((unsigned char *)&&labelend - (unsigned char *)&&labelbegin));
You can find size of a binary code block like following (x86 target, gcc compiler):
#define CODE_BEGIN(name) __asm__ __volatile__ ("jmp ."name"_op_end\n\t""."name"_op_begin:\n\t")
#define CODE_END(name) __asm__ __volatile__ ("."name"_op_end:\n\t""movl $."name"_op_begin, %0\n\t""movl $."name"_op_end, %1\n\t":"=g"(begin), "=g"(end));
void func() {
unsigned int begin, end;
CODE_BEGIN("func");
/* ... your code goes here ... */
CODE_END("func");
printf("Code length is %d\n", end - begin);
}
This technique is used by "lazy JIT" compilers that copy binary code written in high level language (C), instead of emitting assembly binary codes, like normal JIT compilers do.
It's usually a job for a tracer or a debugger that includes a tracer, there are also external tools that can inspect and profile the memory usage of your program like valgrind.
Refer to the documentation of your IDE of choice or your compiler of choice.
To add on previous answers, there are occasions on which you can determine the size of the code pieces - look into the linker-generated map file, you should be able to see there global symbols and their size, including global functions. See here for more info: What's the use of .map files the linker produces? (for VC++, but the concept is similar for evey compiler).

learning disassembly

In an attempt to understand what occurs underneath I am making small C programs and then reversing it, and trying to understand its objdump output.
The C program is:
#include <stdio.h>
int function(int a, int b, int c) {
printf("%d, %d, %d\n", a,b,c);
}
int main() {
int a;
int *ptr;
asm("nop");
function(1,2,3);
}
The objdump output for function gives me the following.
080483a4 <function>:
80483a4: 55 push ebp
80483a5: 89 e5 mov ebp,esp
80483a7: 83 ec 08 sub esp,0x8
80483aa: ff 75 10 push DWORD PTR [ebp+16]
80483ad: ff 75 0c push DWORD PTR [ebp+12]
80483b0: ff 75 08 push DWORD PTR [ebp+8]
80483b3: 68 04 85 04 08 push 0x8048504
80483b8: e8 fb fe ff ff call 80482b8 <printf#plt>
80483bd: 83 c4 10 add esp,0x10
80483c0: c9 leave
Notice that before the call to printf, three DWORD's with offsets 8,16,12(they must be the arguments to function in the reverse order) are being pushed onto the stack. Later a hex address which must be the address of the format string is being pushed.
My doubt is
Rather than pushing 3 DWORDS and the format specifier onto the stack directly, I expected to see the esp being manually decremented and the values being pushed onto the stack after that. How can one explain this behaviour?
Well, some machines have a stack pointer that is kind of like any other register, so the way you push something is, yes, with a decrement followed by a store.
But some machines, like x8632/64 have a push instruction that does a macro-op: decrementing the pointer and doing the store.
Macro-ops, btw, have a funny history. At times, certain examples on certain machines have been slower than performing the elementary operations with simple instructions.
I doubt if that's frequently the case today. Modern x86 is amazingly sophisticated. The CPU will be disassembling your opcodes themselves into micro-ops which it then stores in a cache. The micro-ops have specific pipeline and time slot requirements and the end result is that there is a RISC cpu inside the x86 these days, and the whole thing goes really fast and has good architectural-layer code density.
The stack pointer is adjusted with the push instruction. So it's copied to ebp and the parameters are pushed onto the stack so they exist in 2 places each: function's stack and printf's stack. The pushes affect esp, thus ebp is copied.
There is no mov [esp+x], [ebp+y] instruction, too many operands. It would take two instructions and use a register. Push does it in one instruction.
This is a standard cdecl calling convention for x86 machine. There are several different types of calling conventions. You can read the following article in the Wikipedia about it:
http://en.wikipedia.org/wiki/X86_calling_conventions
It explains the basic principle.
You raise an interesting point which I think has not been directly addressed so far. I suppose that you have seen assembly code which looks something like this:
sub esp, X
...
mov [ebp+Y], eax
call Z
This sort of disassembly is generated by certain compilers. What it is doing is extending the stack, then assigning the value of the new space to be eax (which has hopefully been populated with something meaningful by that point). This is actually equivalent to what the push mnemonic does. I can't answer why certain compilers generate this code instead but my guess is that at some point doing it this way was judged to be more efficient.
In your effort to learn assembly language and disassemble binaries, you might find ODA useful. It's a web-based disassembler, which is handy for disassembling lots of different architectures without having to build binutil's objdump for each one of them.
http://onlinedisassembler.com/

Resources