GCC + LD + NDISASM = huge amount of assembler instructions - c

I'm a newbie to C and GCC compilers and trying to study how C is compiled into machine code by disassembling binaries produced, but the result of compiling and then disassembling a very simple function seems overcomplicated.
I have basic.c file:
int my_function(){
int a = 0xbaba;
int b = 0xffaa;
return a + b;
}
Then I compile it using gcc -ffreestanding -c basic.c -o basic.o
And when I dissasemble basic.o object file I get quite an expected output:
0000000000000000 <my_function>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 fc ba ba 00 00 movl $0xbaba,-0x4(%rbp)
b: c7 45 f8 aa ff 00 00 movl $0xffaa,-0x8(%rbp)
12: 8b 55 fc mov -0x4(%rbp),%edx
15: 8b 45 f8 mov -0x8(%rbp),%eax
18: 01 d0 add %edx,%eax
1a: 5d pop %rbp
1b: c3 retq
Looks great. But then I use linker to produce raw binary: ld -o basic.bin -Ttext 0x0 --oformat binary basic.o
So after disassembling this basic.bin file with command ndisasm -b 32 basic.bin > basic.dis, I get something interesting here:
00000000 55 push ebp
00000001 48 dec eax
00000002 89E5 mov ebp,esp
00000004 C745FCBABA0000 mov dword [ebp-0x4],0xbaba
0000000B C745F8AAFF0000 mov dword [ebp-0x8],0xffaa
00000012 8B55FC mov edx,[ebp-0x4]
00000015 8B45F8 mov eax,[ebp-0x8]
00000018 01D0 add eax,edx
0000001A 5D pop ebp
0000001B C3 ret
0000001C 0000 add [eax],al
0000001E 0000 add [eax],al
00000020 1400 adc al,0x0
00000022 0000 add [eax],al
00000024 0000 add [eax],al
00000026 0000 add [eax],al
00000028 017A52 add [edx+0x52],edi
0000002B 0001 add [ecx],al
0000002D 7810 js 0x3f
0000002F 011B add [ebx],ebx
00000031 0C07 or al,0x7
00000033 08900100001C or [eax+0x1c000001],dl
00000039 0000 add [eax],al
0000003B 001C00 add [eax+eax],bl
0000003E 0000 add [eax],al
00000040 C0FFFF sar bh,byte 0xff
00000043 FF1C00 call far [eax+eax]
00000046 0000 add [eax],al
00000048 00410E add [ecx+0xe],al
0000004B 108602430D06 adc [esi+0x60d4302],al
00000051 57 push edi
00000052 0C07 or al,0x7
00000054 0800 or [eax],al
00000056 0000 add [eax],al
I don't really know where the commands like SAR, JS, DEC come from and why they are required. I guess, that's because I specify invalid arguments for compiler or linker.

As I concluded from #Michael Petch comments:
The binary representation of required function is represented by 00000000-0000001B lines of code snippet of the disassembled file and executes command ret at the end so the second part of the file (0000001B-00000056) is never executed - it's metadata.
As per #Michael Petch and #Jester comments:
I could figure out that the object file consists of many sections https://en.wikipedia.org/wiki/Object_file
The generated basic.o file originally had three sections:
.text (function itself)
.comment (not represented in the binary file)
.eh_frame
What is .eh_frame section and why GCC compiler creates it, is described here:
Why GCC compiled C program needs .eh_frame section?
By running gcc with argument -fno-asynchronous-unwind-tables I could get rid of .eh_frame section from object file.

Related

Reverse engineering C object files

I'm practicing reverse engineering C object files. Suppose I have an object file of the C program:
#include <stdio.h>
#include <string.h>
int main (int argc, char ** argv) {
char * input = argv[1];
int result = strcmp(input, "text_to_compare");
if (result == 0) {
printf("%s\n", "text matches");
}
else {
printf("%s\n", "text doeesn't match");
}
return 0;
}
How would I go about finding "text_to_compare" from the object file given it was compiled with a -g flag and an x86-64 architecture?
Running strings on a binary file will all sequences of four or more printable characters in the file. For a simple file this might be sufficient, but for a larger file you can end up with a lot of false positives. For example, compiling your code with gcc and running strings on the resulting binary will return 295 results.
We can start by using the objdump command to disassemble the code in your sample file:
$ objdump --disassemble=main a.out
a.out: file format elf64-x86-64
Disassembly of section .init:
Disassembly of section .plt:
Disassembly of section .text:
0000000000401136 <main>:
401136: 55 push %rbp
401137: 48 89 e5 mov %rsp,%rbp
40113a: 48 83 ec 20 sub $0x20,%rsp
40113e: 89 7d ec mov %edi,-0x14(%rbp)
401141: 48 89 75 e0 mov %rsi,-0x20(%rbp)
401145: 48 8b 45 e0 mov -0x20(%rbp),%rax
401149: 48 8b 40 08 mov 0x8(%rax),%rax
40114d: 48 89 45 f8 mov %rax,-0x8(%rbp)
401151: 48 8b 45 f8 mov -0x8(%rbp),%rax
401155: be 10 20 40 00 mov $0x402010,%esi
40115a: 48 89 c7 mov %rax,%rdi
40115d: e8 de fe ff ff call 401040 <strcmp#plt>
401162: 89 45 f4 mov %eax,-0xc(%rbp)
401165: 83 7d f4 00 cmpl $0x0,-0xc(%rbp)
401169: 75 0c jne 401177 <main+0x41>
40116b: bf 20 20 40 00 mov $0x402020,%edi
401170: e8 bb fe ff ff call 401030 <puts#plt>
401175: eb 0a jmp 401181 <main+0x4b>
401177: bf 2d 20 40 00 mov $0x40202d,%edi
40117c: e8 af fe ff ff call 401030 <puts#plt>
401181: b8 00 00 00 00 mov $0x0,%eax
401186: c9 leave
401187: c3 ret
Disassembly of section .fini:
Looking at the disassembly, we can see a call to strcmp at offset 40115d:
40115d: e8 de fe ff ff call 401040 <strcmp#plt>
If we look a couple of lines before that, we can see a instruction that is moving data from an address outside of this section (0x402010):
401155: be 10 20 40 00 mov $0x402010,%esi
If we look at the output of objdump -h a.out, we see that this address falls in the .rodata section (we're looking for sections for which the given address is in the block of memory starting at the address in the VMA column):
$ objdump -h a.out
Idx Name Size VMA LMA File off Algn
[...]
15 .rodata 00000041 0000000000402000 0000000000402000 00002000 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
[...]
We can extract the data in that section using the objcopy command:
$ objcopy -j .rodata -O binary a.out >(xxd -o 0x402000)
00402000: 0100 0200 0000 0000 0000 0000 0000 0000 ................
00402010: 7465 7874 5f74 6f5f 636f 6d70 6172 6500 text_to_compare.
00402020: 7465 7874 206d 6174 6368 6573 0074 6578 text matches.tex
00402030: 7420 646f 6565 736e 2774 206d 6174 6368 t doeesn't match
00402040: 00 .
And we can see that the string at address 0x402010 is text_to_compare.

Understanding RIP relative relocations

I am trying to understand the relocations for the object file generated by this simple program:
int answer = 42;
int compute()
{
return answer;
}
int main()
{
return compute();
}
I compile it with simply gcc -c main.cpp.
Then, examining objdump -DCrw -M intel main.o, we see among other things:
Disassembly of section .text:
0000000000000000 <compute()>:
0: f3 0f 1e fa endbr64
4: 55 push rbp
5: 48 89 e5 mov rbp,rsp
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0] # e <compute()+0xe> a: R_X86_64_PC32 answer-0x4
e: 5d pop rbp
f: c3 ret
(...)
Disassembly of section .data:
0000000000000000 <answer>:
0: 2a 00 sub al,BYTE PTR [rax]
...
We can also look at the relocations as reported by readelf -rW main.o:
Relocation section '.rela.text' at offset 0x250 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
(...)
Let's look at the relocation for the use of answer in compute():
To return answer from the function, we have this instruction to put the value of answer into eax:
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0]
I.e. "copy whatever is at rip + an offset we don't know yet so let's just put 0s for it for now into eax".
The relocation for that instruction is
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
The offset value 0xa means that there's a relocation to be done at 0xa, or the third byte of the mov instruction above which starts on 0x8. This is the placeholder 00 00 00 00 we saw above. The relocation type is R_X86_64_PC32, which according to the System V Application Binary Interface means "S+A-P", where:
S = the value of the symbol whose index resides in the relocation entry
A = the addend used to compute the value of the relocatable field
P = the place (section offset or address) of the storage unit being relocated (computed using r_offset).
The value of the symbol (the value at 0xa) is 0. A is the addend -4 on that line.
I am not certain what P is, is it the address of the .data section?
And where does the -4 come from?
This calculation seems to end up as "4 bytes less the address of the .data section". How does that represent the distance from RIP (0xe) to wherever answer ends up in the final executable?
If I look at the final executable though (gcc -o main main.o, objdump -DCrw -M intel main), I see our mov instruction has been relocated:
1131: 8b 05 d9 2e 00 00 mov eax,DWORD PTR [rip+0x2ed9]
The 00 00 00 00 placeholder has been replaced with d9 2e 00 00. rip is now at 0x1137, so rip+0x2ed9 is 0x4010. And sure enough, at 0x4010 we find the answer:
Disassembly of section .data:
(...)
0000000000004010 <answer>:
4010: 2a 00 sub al,BYTE PTR [rax]
So everything works out fine in the end, I just don't understand exactly how it happens.

How to define a subroutine/macro in sperate .asm file?

I'm programming in AVR for Atmega32 using AtmelStudio. I was wondering how you can write a subroutine or function in a separate file and then call it within main.asm?
current issue is I have a subroutine genArrays inside of genArray.asm .
Using .include "genArray.asm" at the start of main.asm causes the program to run genArray immediately at the start of main.asm when i don’t want it to be called until i actually call it using the call instruction (example below)
main.asm:
.include "genArray.asm"
.org 0x0000
start:
...
... ---- ; do some stuff
...
...
call genArrays ---- ; call to genArrays subroutine that is defined in genArray.asm (separate file)
... ---- ; return here and continue with program
... ---- ; do some more stuff
genArray.asm:
genArrays: ---- ; start of subroutine
...
...
...
... -----; do some stuff
...
ret
With gnu assembler/binutils. I am very rusty on my avr, but...
so.s
.globl _start
_start:
nop
nop
rcall fun
nop
nop
here:
rjmp here
fun.s
.globl fun
fun:
nop
nop
ret
build
avr-as so.s -o so.o
avr-as fun.s -o fun.o
avr-ld -Ttext=0 so.o fun.o -o so.elf
avr-objdump -d so.elf
so.elf: file format elf32-avr
Disassembly of section .text:
00000000 <__ctors_end>:
0: 00 00 nop
2: 00 00 nop
4: 03 d0 rcall .+6 ; 0xc <fun>
6: 00 00 nop
...
0000000a <here>:
a: ff cf rjmp .-2 ; 0xa <here>
0000000c <fun>:
c: 00 00 nop
e: 00 00 nop
10: 08 95 ret
assembly language is specific to the assembler, not the target, so you need to use the language for the assembler you are using, the above is gnu assembler for avr. Likewise how you link them is very much tool specific. gnu ld has a myriad of features, AVR is a PITA to build for (from scratch) so you may want to use an already built toolchain and linker script. (on linux I simply apt-got a toolchain).
As utterly horrible as this is:
avr-gcc -nostdlib so.s fun.s -o so.elf
avr-objdump -d so.elf
so.elf: file format elf32-avr
Disassembly of section .text:
00000000 <__ctors_end>:
0: 00 00 nop
2: 00 00 nop
4: 03 d0 rcall .+6 ; 0xc <fun>
6: 00 00 nop
...
0000000a <here>:
a: ff cf rjmp .-2 ; 0xa <here>
0000000c <fun>:
c: 00 00 nop
e: 00 00 nop
10: 08 95 ret
works (thus far).
Try the simple thing: put .include "genArray.asm" at the end of main.asm so its code will come after main in program memory.
That should be good enough for now. You could also take a look at the assembly generated by avr-gcc and see how it defines its functions.

Understanding assembly .long directive

In Secure programming cookbook for C and C++ from John Viega I met the following statement
asm("value_stored: \n"
".long 0xFFFFFFFF \n"
);
I do not really understand the use of .long directive in assembly, but here it is used to embed a precalculated value in the executable. Can I somehow force the position of these bytes in the executable? I have tried to put it at the end of main (thinking that this way will be at the end of .text section), but I got segmentation fault. Putting it outside the main works.
Even at the end of main the inline assembler sequence will generate code to be executed. In my environment objdump -d foo.o shows:
00000000004004b4 <main>:
4004b4: 55 push %rbp
4004b5: 48 89 e5 mov %rsp,%rbp
00000000004004b8 <value>:
4004b8: ff (bad)
4004b9: ff (bad)
4004ba: ff (bad)
4004bb: ff (bad)
4004bc: b8 01 00 00 00 mov $0x1,%eax
4004c1: 5d pop %rbp
4004c2: c3 retq
This can be mitigated by jumping over it
asm("jmp 1f"
"value: .long 0xffffffff"
"1:");
Keywords Nf or Nb create local temporary labels to jump forward or backwards.
Another option will be to place the variable to a named segment, which can be sorted in the linker file as the last segment in either .text or .data.

GCC 4.3/4.4 vs MSC 6 on i386 optimization for size fail

I am not sure what am I doing wrong, but I've tried reading manuals about calling conventions of GCC and found nothing useful there. My current problem is GCC generates excessively LARGE code for a very simple operation, like shown below.
main.c:
#ifdef __GNUC__
// defines for GCC
typedef void (* push1)(unsigned long);
#define PUSH1(P,A0)((push1)P)((unsigned long)A0)
#else
// defines for MSC
typedef void (__stdcall * push1)(unsigned long);
#define PUSH1(P,A0)((push1)P)((unsigned long)A0)
#endif
int main() {
// pointer to nasm-linked exit syscall "function".
// will not work for win32 target, provided as an example.
PUSH1(0x08048200,0x7F);
}
Now, let's build and dump it with gcc: gcc -c main.c -Os;objdump -d main.o:
main.o: file format elf32-i386
Disassembly of section .text:
00000000 <.text>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: b8 00 82 04 08 mov $0x8048200,%eax
f: 55 push %ebp
10: 89 e5 mov %esp,%ebp
12: 51 push %ecx
13: 83 ec 10 sub $0x10,%esp
16: 6a 7f push $0x7f
18: ff d0 call *%eax
1a: 8b 4d fc mov -0x4(%ebp),%ecx
1d: 83 c4 0c add $0xc,%esp
20: c9 leave
21: 8d 61 fc lea -0x4(%ecx),%esp
24: c3 ret
That's the minimum size code I am able to get... If I don't specify -O* or specify other values, it will be 0x29 + bytes long.
Now, let's build it with ms c compiler v 6 (yea, one of year 98 iirc): wine /mnt/ssd/msc/6/cl /c /TC main.c;wine /mnt/ssd/msc/6/dumpbin /disasm main.obj:
Dump of file main.obj
File Type: COFF OBJECT
_main:
00000000: 55 push ebp
00000001: 8B EC mov ebp,esp
00000003: 6A 7F push 7Fh
00000005: B8 00 82 04 08 mov eax,8048200h
0000000A: FF D0 call eax
0000000C: 5D pop ebp
0000000D: C3 ret
How do I make GCC generate the similar by size code? any hints, tips? Don't you agree resulting code should be small as that? Why does GCC append so much useless code? I thought it'd be smarter than such old stuff like msc6 when optimizing for size. What am I missing here?
main() is special here: gcc is doing some extra work to make the stack 16-byte aligned at the entry point of the program. So the size of the result aren't directly comparable... try renaming main() to f() and you'll see gcc generates drastically different code.
(The MSVC-compiled code doesn't need to care about alignment because Windows has different rules for stack alignment.)
This is the best reference I can get. I'm on Windows now and too lazy to login to my Linux to test. Here (MinGW GCC 4.5.2), the code is smaller than yours. One difference is the calling convention, stdcall of course has a few bytes advantage over cdecl (default on GCC if not specified or with -O1 and I guess with -Os, too) to clean up the stack.
Here's the way I compile and the result (source code is purely copy pasted from your post)
gcc -S test.c:
_main:
pushl %ebp #
movl %esp, %ebp #,
andl $-16, %esp #,
subl $16, %esp #,
call ___main #
movl $127, (%esp) #,
movl $134513152, %eax #, tmp59
call *%eax # tmp59
leave
ret
gcc -c -o test.o test.c && objdump -d test.o:
test.o: file format pe-i386
Disassembly of section .text:
00000000 <_main>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 e4 f0 and $0xfffffff0,%esp
6: 83 ec 10 sub $0x10,%esp
9: e8 00 00 00 00 call e <_main+0xe>
e: c7 04 24 7f 00 00 00 movl $0x7f,(%esp)
15: b8 00 82 04 08 mov $0x8048200,%eax
1a: ff d0 call *%eax
1c: c9 leave
1d: c3 ret
1e: 90 nop
1f: 90 nop

Resources