defining code offset for objdump disassembler in ARMv8 - disassembly

I have a file containing ARM v8 binary code. I want to disassemble the file and get the actual assembly code contained in it.
Assuming the file name is tmp.o I run:
/opt/linaro/A64-tools/bin/aarch64-linux-gnu-objdump -b binary -m AARCH64 -D tmp.o
This gives me the correct disassembly. However, the offset for branch instructions assumes that this code sits in address 0x00000000.
If I know that the code will sit in address 0x12345678 in memory:
Is there a way to tell objdump to use this address as the start address?
If not, can I add some header to the binary file that says something like:
. = 0x12345678
Thanks in Advance..

A quick poke around reveals objdump's --adjust-vma option, which seems to do exactly this.
Using the first raw binary which came to hand:
$ aarch64-linux-gnu-objdump -b binary -m aarch64 -D arch/arm64/boot/Image
arch/arm64/boot/Image: file format binary
Disassembly of section .data:
0000000000000000 <.data>:
0: 91005a4d add x13, x18, #0x16
4: 140003ff b 0x1000
...
vs.
$ aarch64-linux-gnu-objdump -b binary -m aarch64 --adjust-vma=0x12345678 -D arch/arm64/boot/Image
arch/arm64/boot/Image: file format binary
Disassembly of section .data:
0000000012345678 <.data>:
12345678: 91005a4d add x13, x18, #0x16
1234567c: 140003ff b 0x12346678
...

Related

Aarch64 baremetal Hello World program on QEMU

I have compiled and ran a simple hello world program on ARM Foundation Platform. The code is given below.
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello world!\n");
return 0;
}
This program is compiled and linked using ARM GNU tool chain as given below.
$aarch64-none-elf-gcc -c -march=armv8-a -g hello.c hello.o
aarch64-none-elf-gcc: warning: hello.o: linker input file unused because linking not done
$aarch64-none-elf-gcc -specs=aem-ve.specs -Wl,-Map=linkmap.txt hello.o -o hello.axf
I could successfully execute this program on ARM Foundation Platform (I think, foundation platform is similar to ARM Fixed Virtual Platform) and it prints "Hello world!"
The contents of 'aem-ve.specs' file is given below:
cat ./aarch64-none-elf/lib/aem-ve.specs
# aem-ve.specs
#
# Spec file for AArch64 baremetal newlib, libgloss on VE platform with version 2
# of AngelAPI semi-hosting.
#
# This Spec file is also appropriate for the foundation model.
%rename link old_link
*link:
-Ttext-segment 0x80000000 %(old_link)
%rename lib libc
*libgloss:
-lrdimon
*lib:
cpu-init/rdimon-aem-el3.o%s --start-group %(libc) %(libgloss) --end-group
*startfile:
crti%O%s crtbegin%O%s %{!pg:rdimon-crt0%O%s} %{pg:rdimon-crt0%O%s}
Is it possible to execute the same binary on QEMU? If so could you please share the sample command. If not, could you please share the right and best way to execute it on QEMU?
I have traced instruction execution of Foundation platform using the "trace" option and analysed using 'tarmac-profile' tool and 'Tarmac-calltree' tool. The outputs are given below:
$ ./tarmac-profile hello.trace --image hello.axf
Address Count Time Function name
0x80000000 1 120001
0x80001030 1 30001 register_fini
0x80001050 1 60001 deregister_tm_clones
0x800010c0 1 210001 __do_global_dtors_aux
0x80001148 1 23500001 frame_dummy
0x800012c0 1 10480001 main
0x80002000 1 40001 main
0x80003784 1 360001 main
0x80003818 1 460001 _cpu_init_hook
0x80003870 1 590001 _write_r
0x800038d0 1 1090001 __call_exitprocs
...
...
./tarmac-calltree hello.trace --image hello.axf
o t:10000 l:2288 pc:0x80001148 - t:23510000 l:7819 pc:0x80008060 : frame_dummy
- t:240000 l:2338 pc:0x800011a8 - t:720000 l:2443 pc:0x800011ac
o t:250000 l:2340 pc:0x80003818 - t:710000 l:2442 pc:0x80003828 : _cpu_init_hook
- t:260000 l:2343 pc:0x8000381c - t:320000 l:2354 pc:0x80003820
o t:270000 l:2345 pc:0x80002000 - t:310000 l:2353 pc:0x80002010 : main
- t:320000 l:2354 pc:0x80003820 - t:700000 l:2436 pc:0x80003824
o t:330000 l:2356 pc:0x80003784 - t:690000 l:2435 pc:0x80003814 : main
- t:760000 l:2453 pc:0x800011bc - t:2010000 l:2970 pc:0x800011c0
o t:770000 l:2455 pc:0x80004200 - t:2000000 l:2969 pc:0x800042c0 : memset
- t:2010000 l:2970 pc:0x800011c0 - t:4870000 l:3587 pc:0x800011c4
o t:2020000 l:2972 pc:0x80007970 - t:4860000 l:3586 pc:0x80007b04 : initialise_monitor_handles
- t:2960000 l:3165 pc:0x80007b24 - t:4340000 l:3465 pc:0x80007b28
I have tried the following method to execute on QEMU, without any success.
I have started the QEMU with gdb based debugging through TCP Port
$ qemu-system-aarch64 -semihosting -m 128M -nographic -monitor none -serial stdio -machine virt,gic-version=2,secure=on,virtualization=on -cpu cortex-a53 -kernel hello.axf -S -gdb tcp::9000
The result of debug session is given below:
gdb) target remote localhost:9000
Remote debugging using localhost:9000
_start () at /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/aarch64/crt0.S:90
90 /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory.
(gdb) si
<The system hangs here>
I have tried to disassemble the code using gdb and its output is given below. It looks like the code is not loaded correctly
(gdb) disas frame_dummy
Dump of assembler code for function frame_dummy:
0x0000000080001110 <+0>: udf #0
0x0000000080001114 <+4>: udf #0
0x0000000080001118 <+8>: udf #0
0x000000008000111c <+12>: udf #0
0x0000000080001120 <+16>: udf #0
0x0000000080001124 <+20>: udf #0
0x0000000080001128 <+24>: udf #0
0x000000008000112c <+28>: udf #0
0x0000000080001130 <+32>: udf #0
0x0000000080001134 <+36>: udf #0
0x0000000080001138 <+40>: udf #0
0x000000008000113c <+44>: udf #0
0x0000000080001140 <+48>: udf #0
0x0000000080001144 <+52>: udf #0
End of assembler dump.
Could you please shed some light on this. Any hint given is appreciated very much.
The Foundation Model and the QEMU 'virt' board are not the same machine type. They have different devices at different physical addresses, and in particular they do not have RAM at the same address. To run bare metal code on the 'virt' machine type you will need to adjust your code. This is normal for bare metal -- the whole point is that you are running directly on some piece of (emulated) hardware and need to match the details of it.
Specifically, the minimum change you need to make here is that the RAM on the 'virt' board starts at 0x4000_0000, not the 0x8000_0000 that the Foundation Model uses. There may be others as well, but that is the immediate cause of what you're seeing.

GDB shows 'no line number information' for the address pointed by the instruction pointer

My program is crashing and this is the segfault message. python3[26871]: segfault at 4cac ip 00007fe49938248a sp 00007fe498a64820 error 4 in libswsscommon.so.0.0.0[7fe499359000+7a000]
address should be (IP - loaded address of shared lib). (7fe49938248a - 7fe499359000) = 0x2948A.
When I try to get the info, gdb says 'no line number info'
Reading symbols from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0.0.0...Reading symbols from /usr/lib/debug/.build-id/94/d57d3ce6dd6901ddf7f7d8985a8334c7622fc6.debug...done.
done.
(gdb) info line *0x2948A
No line number information available for address 0x2948a
How is it possible? Does it have anything to do with the start address which is greater than 0x2948A, as shown by objdump -f
objdump -f /usr/lib/x86_64-linux-gnu/libswsscommon.so.0.0.0
/usr/lib/x86_64-linux-gnu/libswsscommon.so.0.0.0: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000034d60

Translate Instruction Pointer Address (in shared library) to Source Instruction

Are there any tools or libraries one can use on Linux to get the original (source) instruction only from the PID and the current instruction pointer address, even if the IP currently points into a shared library?
AFAIK it should be possible, since the location of the library mapping is available through /proc/[PID]/maps, though I haven't found any applications or examples doing so.
Any suggestions?
EDIT: an assembly instruction or the nearest symbol suffice (source code line is not necessarily needed)
I found a way to do this with GDB:
Interactive:
$ gdb --pid 1566
(gdb) info symbol 0x7fe28b8a2b79
pselect + 89 in section .text of /lib/x86_64-linux-gnu/libc.so.6
(gdb) info symbol 0x5612550f14a4
copy_word_list + 20 in section .text of /usr/bin/bash
(gdb) info symbol 0x7fe28b878947
execve + 7 in section .text of /lib/x86_64-linux-gnu/libc.so.6
Shows exactly what I wanted!
It can also be scripted:
gdb -q --pid PID --batch -ex 'info symbol HEX_SYMBOL_ADDR'

Debugging the Code

Hello Guys I am starting the voyage of debugging the code, and ran the following commands as per the book just for some analysis for the source code below
// hello_world-1.c
#include <stdio.h>
int main(void)
{
printf("hello world\n");
return 0;
gcc -Wall -Wextra -c hello_world-1.c // What is wall and wextra here ?
$ size hello_world-1 hello_world-1.o
text data bss dec hex filename
916 256 4 1176 498 hello_world-1
48 0 0 48 30 hello_world-1.o
$ objdump -h hello_world-1.o
hello_world-1.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000023 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 00000000 00000000 00000058 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 00000058 2**2
ALLOC
3 .rodata 0000000d 00000000 00000000 00000058 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .note.GNU-stack 00000000 00000000 00000000 00000065 2**0
CONTENTS, READONLY
5 .comment 0000001b 00000000 00000000 00000065 2**0
CONTENTS, READONLY
48 0 0 48 30 hello_world-1.o
I have some question here
1) There are no global variables in hello_world-1.c. Then why the size reports that the data and bss segments have zero length for the object file but non zero for the executable?
2) Why size and objdump report different sizes for the text segment?
Object file consists of .text (i.e. binary CPU instructions), .rodata (read-only data - "hello world"\10\0 - 13 bytes total) and .comment (additional linking information).
Executable file consists of the same minus .comment plus standard library stuff plus import dynamic library data, if any.
Standard library adds at least startup code, which makes executable bigger. So your difference is: executable .text = .object text + startup code + stdlibrary code (if static linking).
regarding your question:
"gcc -Wall -Wextra -c hello_world-1.c // What is wall and wextra here ?"
(note capitalization counts)
-Wall tell the compiler to enable most warnings
-Wextra tell the compiler to enable even more warnings
-c tells the compiler to only compile, not link.
because no '-o objfilename.o' parameter was included,
the compiler will ouput an object file with the same name as the input file, with a '.o' extension.
suggest always include the '-o objfilename.o' parameter explicitly
suggest performing some online googling for such things, wherein you would have found pages similar to :
https://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcc_3.html
here is a copy of the man page for 'size'
SIZE(1) GNU Development Tools SIZE(1)
NAME
size - list section sizes and total size.
SYNOPSIS
size [-A|-B|--format=compatibility]
[--help]
[-d|-o|-x|--radix=number]
[--common]
[-t|--totals]
[--target=bfdname] [-V|--version]
[objfile...]
DESCRIPTION
The GNU size utility lists the section sizes---and the total size---for
each of the object or archive files objfile in its argument list. By
default, one line of output is generated for each object file or each
module in an archive.
objfile... are the object files to be examined. If none are specified,
the file "a.out" will be used.
OPTIONS
The command line options have the following meanings:
-A
-B
--format=compatibility
Using one of these options, you can choose whether the output from
GNU size resembles output from System V size (using -A, or
--format=sysv), or Berkeley size (using -B, or --format=berkeley).
The default is the one-line format similar to Berkeley's.
Here is an example of the Berkeley (default) format of output from
size:
$ size --format=Berkeley ranlib size
text data bss dec hex filename
294880 81920 11592 388392 5ed28 ranlib
294880 81920 11888 388688 5ee50 size
This is the same data, but displayed closer to System V
conventions:
$ size --format=SysV ranlib size
ranlib :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11592 385024
Total 388392
size :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11888 385024
Total 388688
--help
Show a summary of acceptable arguments and options.
-d
-o
-x
--radix=number
Using one of these options, you can control whether the size of
each section is given in decimal (-d, or --radix=10); octal (-o, or
--radix=8); or hexadecimal (-x, or --radix=16). In --radix=number,
only the three values (8, 10, 16) are supported. The total size is
always given in two radices; decimal and hexadecimal for -d or -x
output, or octal and hexadecimal if you're using -o.
--common
Print total size of common symbols in each file. When using
Berkeley format these are included in the bss size.
-t
--totals
Show totals of all objects listed (Berkeley format listing mode
only).
--target=bfdname
Specify that the object-code format for objfile is bfdname. This
option may not be necessary; size can automatically recognize many
formats.
-V
--version
Display the version number of size.
#file
Read command-line options from file. The options read are inserted
in place of the original #file option. If file does not exist, or
cannot be read, then the option will be treated literally, and not
removed.
Options in file are separated by whitespace. A whitespace
character may be included in an option by surrounding the entire
option in either single or double quotes. Any character (including
a backslash) may be included by prefixing the character to be
included with a backslash. The file may itself contain additional
#file options; any such options will be processed recursively.
SEE ALSO
ar(1), objdump(1), readelf(1), and the Info entries for binutils.
COPYRIGHT
Copyright (c) 1991-2013 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled "GNU
Free Documentation License".
binutils-2.23.91 2013-11-18 SIZE(1)

ARM Instruction Decoding

I need to decode ARM(ARM926EJ) instructions in C. I have the 32 bit instruction in hex. I want to decode and get the opcode operands. Anyone know any good material for this.
N.B. I looked into QEMU translate.c file. But its so complex and doesn't even tell why is doing what.
Assuming you don't want/can't use a program to do it for you, you can refer to the ARM Reference Manual.
There are sections in it that are dedicated to instruction encoding.
I use a script which combines gas and objdump to do this for me. I'm sure there are better ways but this works well for me.
#!/bin/sh
cat > /tmp/foo.S <<EOF
.text
.arm
.word $1
EOF
arm-linux-gnueabi-as /tmp/foo.S -o /tmp/foo.o
echo "ARM: " `arm-linux-gnueabi-objdump -d /tmp/foo.o | grep " 0:"`
echo "Thumb:" `arm-linux-gnueabi-objdump --disassembler-options=force-thumb -d /tmp/foo.o | grep " 0:"`
rm -rf /tmp/foo.o /tmp/foo.S

Resources