powerpc disassemble instruction by its raw data - disassembly

How can I disassemble some intructions from memory dump? I have only raw dump, objdump understands only object format.
My processor is PowerPC 440 (PowerPC Book E Architecture).

In fact, objdump can disassemble raw binary just fine. Try this:
objdump -m ppc -D -b binary -EB dump.bin

$ xxd -p asm
3800000060000000
Using devkitPPC:
$ powerpc-eabi-objdump --disassemble-zeroes -m powerpc -D -b binary -EB asm
Yields:
asm: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 38 00 00 00 li r0,0
4: 60 00 00 00 nop

Related

What would happen if a system executes a part of the file that is zero-padded?

I've seen in some posts/videos/files that they are zero-padded to look bigger than they are, or match "same file size" criteria some file system utilities have for moving files, mostly they are either prank programs, or malware.
But I often wondered, what would happen if the file corrupted, and would "load" the next set of "instructions" that are in the big zero-padded space at the end of the file?
Would anything happen? What's the instruction set for 0x0?
The decoding of 0 bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00 (using hexdump notation).
On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target and llvm-objdump), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary, and my clang won't create SPARC binaries on its own.)
On x86, 00 00 (2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.
This usually segfaults right away (if eax/rax isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)
x86 64-bit mode: ndisasm -b64 /dev/zero | head:
address machine code disassembly
00000000 0000 add [rax],al
x86 32-bit mode (-b32):
00000000 0000 add [eax],al
x86 16-bit mode: (-b16):
00000000 0000 add [bx+si],al
AArch32 ARM mode: cd /tmp && dd if=/dev/zero of=zero bs=16 count=1 && arm-none-eabi-objdump -z -D -b binary -marm zero. (Without -z, objdump skips over large blocks of all-zero and shows ...)
addr machine code disassembly
0: 00000000 andeq r0, r0, r0
ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero
0: 0000 movs r0, r0
2: 0000 movs r0, r0
AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero
0: 00000000 .inst 0x00000000 ; undefined
MIPS32: echo .long 0 > zero.S && clang -c -target mips zero.S && llvm-objdump -d zero.o
zero.o: file format ELF32-mips
Disassembly of section .text:
0: 00 00 00 00 nop
PowerPC 32 and 64-bit: -target powerpc and -target powerpc64. IDK if any extensions to PowerPC use the 00 00 00 00 instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.
zero.o: file format ELF32-ppc (or ELF64-ppc64)
Disassembly of section .text:
0: 00 00 00 00 <unknown>
IBM S390: clang -c -target systemz zero.S
zero.o: file format ELF64-s390
Disassembly of section .text:
0: 00 00 <unknown>
2: 00 00 <unknown>

How can a linker determine the address of certain data in the .rodata section?

So the test platform is on Linux 32 bit.
I use gcc to generate a obj file of quickSort in this way:
gcc -S quickSort.c
and the generated quickSort.o is a relocatable ELF:
#file quickSort.o
quickSort.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
I then use objdump to disassemble it :
objdump -d quickSort.o
and looking into the asm file generated, I am confused with this:
51: b8 00 00 00 00 mov $0x0,%eax
56: 89 04 24 mov %eax,(%esp)
59: e8 fc ff ff ff call 5a <main+0x5a>
5e: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%esp)
The above code is call printf function and print out a string, and if I compile quicksort.c into quicksort.s, it should like this:
movl $.LC0, %eax
movl %eax, (%esp)
call printf
So by looking at the relocation table, I can easily find out the relation between "5a" and printf, and I am sure linker can use this way to relocate printf and substitute "fc ff ff ff" into the real address of printf,
But I am confused with how the address of .LC0 (which is a string in the .rodata section) be relocated? I cannot find any clue in the relation table (I got the relocation table using readelf -r quickSort.o)
Could anyone give me some help about how the linker will find the real memory address of some data in the .rodata section?
It's done in the same way. You should be seeing a relocation entry against .rodata, where .rodata means the start address of the part of .rodata that's in the current object file.
Note that objdump -dr might be a better tool for the job.

Why isn't a.out in machine language?

I compile the following program with gcc and receive an output executable file a.out.:
#include <stdio.h>
int main () {
printf("hello, world\n");
}
When I execute cat a.out, why is the file in "gibberish" (what is this called?) and not machine language of 0s and 1s:
??????? H__PAGEZERO(__TEXT__text__TEXT?`??__stubs__TEXT
P__unwind_info__TEXT]P]__eh_frame__TEXT?H??__DATA__program_vars [continued]
The file is in 0 and 1, but when you open it with text editor those bits are grouped in bytes and then treated as text ;) In Linux you could try to disassemble the output file to ensure that it contains machine instructions (x86 architecture):
objdump -D -mi386 a.out
Example output:
1: 83 ec 08 sub $0x8,%esp
4: be 01 00 00 00 mov $0x1,%esi
9: bf 00 00 00 00 mov $0x0,%edi
The second column contains that 0's and 1's in hexadecimal notation and the third column contains mnemonic assembler instructions.
If you want to display those 0's and 1's simply type:
xxd -b a.out
Example output:
0000000: 01111111 01000101 01001100 01000110 00000010 00000001 .ELF..
0000006: 00000001 00000000 00000000 00000000 00000000 00000000 ......
It's in some kind of executable file format. On Linux, it's probably ELF, on Mac OS X it's probably Mach-O, and so on. There's even an a.out format, but it's not that common anymore.
It can't just be bare machine instructions - the operating system needs some information about how to load it, what dynamic libraries to attach to it, etc.
Characters are also made of 0's and 1's, and the computer has no way of knowing the difference. You asked it to show the file and it did.
In addition to the machine instructions, the binary file also contains layout and optional debug information which can be readable strings.
The a.out is in a format the loader of the OS you are using can understand. Those different texts you see are markers for different parts of the 0s and 1s you expect.
The ? and ` show spots where there are binary unprintable data.
The typical format on Linux systems these days is ELF. The ELF file may contain machine code, which you can examine with the objdump utility.
$ gcc main.c
$ objdump -d -j .text a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
(code omitted for brevity)
00000000004005ac :
4005ac: 55 push %rbp
4005ad: 48 89 e5 mov %rsp,%rbp
4005b0: bf 6c 06 40 00 mov $0x40066c,%edi
4005b5: e8 d6 fe ff ff callq 400490
4005ba: 5d pop %rbp
4005bb: c3 retq
4005bc: 0f 1f 40 00 nopl 0x0(%rax)
See? Machine code. The objdump utility helpfully prints it in hexadecimal with the corresponding disassempled code on the right, and the addresses on the left.

What are these extra bytes in my binary file?

I am in the process of writing a small operating system in C. I have written a bootloader and I'm now trying to get a simple C file (the "kernel") to compile with gcc:
int main(void) { return 0; }
I compile the file with the following command:
gcc kernel.c -o kernel.o -nostdlib -nostartfiles
I use the linker to create the final image using this command:
ld kernel.o -o kernel.bin -T linker.ld --oformat=binary
The contents of the linker.ld file are as follows:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
}
(The bootloader loads the image at address 0x7e00.)
This seems to work quite well - ld produces a 128-byte file containing the following instructions in the first 11 bytes:
00000000 55 push ebp
00000001 48 dec eax
00000002 89 E5 mov ebp, esp
00000004 B8 00 00 00 00 mov eax, 0x00000000
00000009 5D pop ebp
0000000A C3 ret
However, I can't figure out what the other 117 bytes are for. Disassembling them seems to produce a bunch of garbage that doesn't make any sense. The existence of the additional bytes has me wondering if I'm doing something wrong.
Should I be concerned?
These are additional sections, which were not stripped and not discarded. You want your linker.ld file to look like this:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
/DISCARD/ :
{
*(.comment)
*(.eh_frame_hdr)
*(.eh_frame)
}
}
I know what sections to discard from the output of objdump -t kernel.o.
Simple, you're using gcc, and it always put its initialization code before passing control to your main.
What's on that start up code I don't know, but they are there. As you may see there's also an comment 'GNU' on your binary, you can't print specific sectors by using objdump -s -j 'section name'.

How can I get the _GLOBAL_OFFSET_TABLE_ address in my program?

I want to get the address of _GLOBAL_OFFSET_TABLE_ in my program. One way is to use the nm command in Linux, maybe redirect the output to a file and parse that file to get address of _GLOBAL_OFFSET_TABLE_. However, that method seems to be quite inefficient. What are some more efficient methods of doing it?
This appears to work:
// test.c
#include <stdio.h>
extern void *_GLOBAL_OFFSET_TABLE_;
int main()
{
printf("_GLOBAL_OFFSET_TABLE = %p\n", &_GLOBAL_OFFSET_TABLE_);
return 0;
}
In order to get consistent address of _GLOBAL_OFFSET_TABLE_, matching nm's result, you will need to compile your code with -fPIE to do code-gen as if linking into a position-independent executable. (Otherwise you get a small integer like 0x2ed6 with -fno-pie -no-pie). The GCC default for most modern Linux distros is -fPIE -pie, which would make nm addresses be just offsets relative to an image base, and the runtime address be ASLRed. (This is normally good for security, but you may not want it.)
$: gcc -fPIE -no-pie test.c -o test
It gives:
$ ./test
_GLOBAL_OFFSET_TABLE = 0x6006d0
However, nm thinks different:
$ nm test | fgrep GLOBAL
0000000000600868 d _GLOBAL_OFFSET_TABLE_
Or with a GCC too old to know about PIEs at all, let alone have it -fPIE -pie as the default, -fpic can work.
If you use assembly language, you can get _GLOBAL_OFFSET_TABLE_ address without get_pc_thunk.
It is tricky way. :)
Here is the sample code :
$ cat test.s
.global main
main:
lea HEREIS, %eax # Now %eax holds address of _GLOBAL_OFFSET_TABLE_
.section .got
HEREIS:
$ gcc -o test test.s
This is available because .got section is adjacent to the <.got.plt>
Therefore the symbol HEREIS and _GLOBAL_OFFSET_TABLE_ locate at same address.
PS. You can check it works with objdump.
Disassembly of section .got:
080495e8 <HEREIS-0x4>:
80495e8: 00 00 add %al,(%eax)
...
Disassembly of section .got.plt:
080495ec <_GLOBAL_OFFSET_TABLE_>:
80495ec: 00 95 04 08 00 00 add %dl,0x804(%ebp)
80495f2: 00 00 add %al,(%eax)
80495f4: 00 00 add %al,(%eax)

Resources