Can objdump use bss variable names in text section? - disassembly

I am using objdump to generate the disassembly of C code and wondering if there is a way to get the names of variables from the heap (.bss section) to be used in the .text section disassembly, rather than the hex addresses.
For example,
int main(void)
{
while (1)
{
request_ID = Receive(0, buffer, MSG_SIZE, 0);
}
return 0;
}
This is compiled, and then using objdump -D file.o I get the disassembly, including the .bss section:
400: 55 push ebp
401: 89 e5 mov ebp,esp
403: 83 ec 18 sub esp,0x18
...
43d: a1 cc 24 00 00 mov eax,ds:0x24cc
...
Disassembly of section .bss:
000024cc <request_pID>:
What I would like is for the hex addresses of variables to be replaced by their name:
43d: a1 cc 24 00 00 mov eax, <request_pID>
I could write a sed script or something similar to achieve this, but was wondering if there was a simpler option.
Even better would be for both the address and the variable name to be printed to aid in debugging.
43d: a1 cc 24 00 00 mov eax, ds:0x24cc <request_pID>
The code is for an operating system development being tested in Bochs, so if there is some other way of loading symbols into Bochs' debugger that would be a good workaround, although I would still like the objdump output to be created as well.
thanks, Paul

Related

Understanding assembly .long directive

In Secure programming cookbook for C and C++ from John Viega I met the following statement
asm("value_stored: \n"
".long 0xFFFFFFFF \n"
);
I do not really understand the use of .long directive in assembly, but here it is used to embed a precalculated value in the executable. Can I somehow force the position of these bytes in the executable? I have tried to put it at the end of main (thinking that this way will be at the end of .text section), but I got segmentation fault. Putting it outside the main works.
Even at the end of main the inline assembler sequence will generate code to be executed. In my environment objdump -d foo.o shows:
00000000004004b4 <main>:
4004b4: 55 push %rbp
4004b5: 48 89 e5 mov %rsp,%rbp
00000000004004b8 <value>:
4004b8: ff (bad)
4004b9: ff (bad)
4004ba: ff (bad)
4004bb: ff (bad)
4004bc: b8 01 00 00 00 mov $0x1,%eax
4004c1: 5d pop %rbp
4004c2: c3 retq
This can be mitigated by jumping over it
asm("jmp 1f"
"value: .long 0xffffffff"
"1:");
Keywords Nf or Nb create local temporary labels to jump forward or backwards.
Another option will be to place the variable to a named segment, which can be sorted in the linker file as the last segment in either .text or .data.

Creating x86 bootloader

I am writing a bootloader as follows:
bits 16
[org 0x7c00]
KERN_OFFSET equ 0x1000
mov [BOOTDISK], dl
mov dl, 0x0 ;0 is for floppy-disk
mov ah, 0x2 ;Read function for the interrupt
mov al, 0x15 ;Read 15 sectors conating kernel
mov ch, 0x0 ;Use cylinder 0
mov cl, 0x2 ;Start from the second sector which contains kernel
mov dh, 0x0 ;Read head 0
mov bx, KERN_OFFSET
int 0x13
jc disk_error
cmp al, 0x15
jne disk_error
jmp KERN_OFFSET:0x0
jmp $
disk_error:
jmp $
BOOTDISK: db 0
times 510-($-$$) db 0
dw 0xaa55
The kernel is a simple C program which prints "e" on the VGA display (seen on QEmu):
void main()
{
extern void put_in_mem();
char c = 'e';
put_in_mem(c, 0xA0);
}
I am using this code in 16 bit (real mode) in QEmu so I am using the compiler bcc for this code using:
bcc -ansi -c -o kernel.o kernel.c
I have the following questions:
1. When I try to disassemble this code, using
objdump -D -b binary -mi386 kernel.o
I get an output like this (only initial portion of output):
kernel.o: file format binary
Disassembly of section .data:
00000000 <.data>:
0: a3 86 01 00 2a mov %eax,0x2a000186
5: 3e 00 00 add %al,%ds:(%eax)
8: 00 22 add %ah,(%edx)
a: 00 00 add %al,(%eax)
c: 00 19 add %bl,(%ecx)
e: 00 00 add %al,(%eax)
10: 00 55 55 add %dl,0x55(%ebp)
13: 55 push %ebp
14: 55 push %ebp
15: 00 00 add %al,(%eax)
17: 00 02 add %al,(%edx)
19: 22 00 and (%eax),%al
This output does not seem to correspond to the kernel.c file I made. For example I could not see where 'e' is stored as ASCII 0x65 or where is the call to put_in_mem made. Is something wrong with the way I am disassembling the code?
To make the object file of the kernel for QEmu I used the following command:
ld86 -o kernel -d kernel.o put_in_mem.o
Here put_in_mem.o is the object file created after assembling the put_in_mem.asm file which contains the definition of the function put_in_mem() used in kernel.c.
Then floppy image for QEmu is made using:
cat boot.o kernel > floppy_img
But when I try to look at the address 0x10000 (using GDB), where the kernel was supposed to be present after loading (using the boot.asm program), it was not present.
Why is this happening?
Further, in ld command we used -Ttext option to specify the load address of the binary, should we use some similar option here with ld86?
Your kernel.o is in an object file format not understood by objdump so it tries to disassemble everything in it, including headers and whatnot. Try to disassemble the linked output kernel instead. Also objdump might not understand 16 bit code. Better try objdump86 if you have that available.
As to why it's not present: you are looking at the wrong place. You are loading it to offset 0x1000 (3 zeroes) but you are looking at 0x10000 (4 zeroes). Also note that you don't set up ES which is bad practice. Maybe you intended to set ES to 0x1000 and BX to 0x0000 and then you would find your kernel at 0x10000 physical address.
The -Ttext doesn't influence loading, it only specifies where the code expects to find itself.

How can a linker determine the address of certain data in the .rodata section?

So the test platform is on Linux 32 bit.
I use gcc to generate a obj file of quickSort in this way:
gcc -S quickSort.c
and the generated quickSort.o is a relocatable ELF:
#file quickSort.o
quickSort.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
I then use objdump to disassemble it :
objdump -d quickSort.o
and looking into the asm file generated, I am confused with this:
51: b8 00 00 00 00 mov $0x0,%eax
56: 89 04 24 mov %eax,(%esp)
59: e8 fc ff ff ff call 5a <main+0x5a>
5e: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%esp)
The above code is call printf function and print out a string, and if I compile quicksort.c into quicksort.s, it should like this:
movl $.LC0, %eax
movl %eax, (%esp)
call printf
So by looking at the relocation table, I can easily find out the relation between "5a" and printf, and I am sure linker can use this way to relocate printf and substitute "fc ff ff ff" into the real address of printf,
But I am confused with how the address of .LC0 (which is a string in the .rodata section) be relocated? I cannot find any clue in the relation table (I got the relocation table using readelf -r quickSort.o)
Could anyone give me some help about how the linker will find the real memory address of some data in the .rodata section?
It's done in the same way. You should be seeing a relocation entry against .rodata, where .rodata means the start address of the part of .rodata that's in the current object file.
Note that objdump -dr might be a better tool for the job.

Why isn't a.out in machine language?

I compile the following program with gcc and receive an output executable file a.out.:
#include <stdio.h>
int main () {
printf("hello, world\n");
}
When I execute cat a.out, why is the file in "gibberish" (what is this called?) and not machine language of 0s and 1s:
??????? H__PAGEZERO(__TEXT__text__TEXT?`??__stubs__TEXT
P__unwind_info__TEXT]P]__eh_frame__TEXT?H??__DATA__program_vars [continued]
The file is in 0 and 1, but when you open it with text editor those bits are grouped in bytes and then treated as text ;) In Linux you could try to disassemble the output file to ensure that it contains machine instructions (x86 architecture):
objdump -D -mi386 a.out
Example output:
1: 83 ec 08 sub $0x8,%esp
4: be 01 00 00 00 mov $0x1,%esi
9: bf 00 00 00 00 mov $0x0,%edi
The second column contains that 0's and 1's in hexadecimal notation and the third column contains mnemonic assembler instructions.
If you want to display those 0's and 1's simply type:
xxd -b a.out
Example output:
0000000: 01111111 01000101 01001100 01000110 00000010 00000001 .ELF..
0000006: 00000001 00000000 00000000 00000000 00000000 00000000 ......
It's in some kind of executable file format. On Linux, it's probably ELF, on Mac OS X it's probably Mach-O, and so on. There's even an a.out format, but it's not that common anymore.
It can't just be bare machine instructions - the operating system needs some information about how to load it, what dynamic libraries to attach to it, etc.
Characters are also made of 0's and 1's, and the computer has no way of knowing the difference. You asked it to show the file and it did.
In addition to the machine instructions, the binary file also contains layout and optional debug information which can be readable strings.
The a.out is in a format the loader of the OS you are using can understand. Those different texts you see are markers for different parts of the 0s and 1s you expect.
The ? and ` show spots where there are binary unprintable data.
The typical format on Linux systems these days is ELF. The ELF file may contain machine code, which you can examine with the objdump utility.
$ gcc main.c
$ objdump -d -j .text a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
(code omitted for brevity)
00000000004005ac :
4005ac: 55 push %rbp
4005ad: 48 89 e5 mov %rsp,%rbp
4005b0: bf 6c 06 40 00 mov $0x40066c,%edi
4005b5: e8 d6 fe ff ff callq 400490
4005ba: 5d pop %rbp
4005bb: c3 retq
4005bc: 0f 1f 40 00 nopl 0x0(%rax)
See? Machine code. The objdump utility helpfully prints it in hexadecimal with the corresponding disassempled code on the right, and the addresses on the left.

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able to find a way with GCC.
I am trying to re-write a C method in assembly and seeing how GCC does it would be a big help.
If you compile with debug symbols (add -g to your GCC command line, even if you're also using -O31),
you can use objdump -S to produce a more readable disassembly interleaved with C source.
>objdump --help
[...]
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output
objdump -drwC -Mintel is nice:
-r shows symbol names on relocations (so you'd see puts in the call instruction below)
-R shows dynamic-linking relocations / symbol names (useful on shared libraries)
-C demangles C++ symbol names
-w is "wide" mode: it doesn't line-wrap the machine-code bytes
-Mintel: use GAS/binutils MASM-like .intel_syntax noprefix syntax instead of AT&T
-S: interleave source lines with disassembly.
You could put something like alias disas="objdump -drwCS -Mintel" in your ~/.bashrc. If not on x86, or if you like AT&T syntax, omit -Mintel.
Example:
> gcc -g -c test.c
> objdump -d -M intel -S test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main(void)
{
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: 83 ec 10 sub esp,0x10
puts("test");
9: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
10: e8 fc ff ff ff call 11 <main+0x11>
return 0;
15: b8 00 00 00 00 mov eax,0x0
}
1a: c9 leave
1b: c3 ret
Note that this isn't using -r so the call rel32=-4 isn't annotated with the puts symbol name. And looks like a broken call that jumps into the middle of the call instruction in main. Remember that the rel32 displacement in the call encoding is just a placeholder until the linker fills in a real offset (to a PLT stub in this case, unless you statically link libc).
Footnote 1: Interleaving source can be messy and not very helpful in optimized builds; for that, consider https://godbolt.org/ or other ways of visualizing which instructions go with which source lines. In optimized code there's not always a single source line that accounts for an instruction but the debug info will pick one source line for each asm instruction.
If you give GCC the flag -fverbose-asm, it will
Put extra commentary information in the generated assembly code to make it more readable.
[...] The added comments include:
information on the compiler version and command-line options,
the source code lines associated with the assembly instructions, in the form FILENAME:LINENUMBER:CONTENT OF LINE,
hints on which high-level expressions correspond to the various assembly instruction operands.
Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S foo.c
will leave the generated assembly code on the file foo.s.
Ripped straight from http://www.delorie.com/djgpp/v2faq/faq8_20.html (but removing erroneous -c)
Using the -S switch to GCC on x86 based systems produces a dump of AT&T syntax, by default, which can be specified with the -masm=att switch, like so:
gcc -S -masm=att code.c
Whereas if you'd like to produce a dump in Intel syntax, you could use the -masm=intel switch, like so:
gcc -S -masm=intel code.c
(Both produce dumps of code.c into their various syntax, into the file code.s respectively)
In order to produce similar effects with objdump, you'd want to use the --disassembler-options= intel/att switch, an example (with code dumps to illustrate the differences in syntax):
$ objdump -d --disassembler-options=att code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483c8: 83 e4 f0 and $0xfffffff0,%esp
80483cb: ff 71 fc pushl -0x4(%ecx)
80483ce: 55 push %ebp
80483cf: 89 e5 mov %esp,%ebp
80483d1: 51 push %ecx
80483d2: 83 ec 04 sub $0x4,%esp
80483d5: c7 04 24 b0 84 04 08 movl $0x80484b0,(%esp)
80483dc: e8 13 ff ff ff call 80482f4 <puts#plt>
80483e1: b8 00 00 00 00 mov $0x0,%eax
80483e6: 83 c4 04 add $0x4,%esp
80483e9: 59 pop %ecx
80483ea: 5d pop %ebp
80483eb: 8d 61 fc lea -0x4(%ecx),%esp
80483ee: c3 ret
80483ef: 90 nop
and
$ objdump -d --disassembler-options=intel code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea ecx,[esp+0x4]
80483c8: 83 e4 f0 and esp,0xfffffff0
80483cb: ff 71 fc push DWORD PTR [ecx-0x4]
80483ce: 55 push ebp
80483cf: 89 e5 mov ebp,esp
80483d1: 51 push ecx
80483d2: 83 ec 04 sub esp,0x4
80483d5: c7 04 24 b0 84 04 08 mov DWORD PTR [esp],0x80484b0
80483dc: e8 13 ff ff ff call 80482f4 <puts#plt>
80483e1: b8 00 00 00 00 mov eax,0x0
80483e6: 83 c4 04 add esp,0x4
80483e9: 59 pop ecx
80483ea: 5d pop ebp
80483eb: 8d 61 fc lea esp,[ecx-0x4]
80483ee: c3 ret
80483ef: 90 nop
godbolt is a very useful tool, they list only has C++ compilers but you can use -x c flag in order to get it treat the code as C. It will then generate an assembly listing for your code side by side and you can use the Colourise option to generate colored bars to visually indicate which source code maps to the generated assembly. For example the following code:
#include <stdio.h>
void func()
{
printf( "hello world\n" ) ;
}
using the following command line:
-x c -std=c99 -O3
and Colourise would generate the following:
Did you try gcc -S -fverbose-asm -O source.c then look into the generated source.s assembler file ?
The generated assembler code goes into source.s (you could override that with -o assembler-filename ); the -fverbose-asm option asks the compiler to emit some assembler comments "explaining" the generated assembler code. The -O option asks the compiler to optimize a bit (it could optimize more with -O2 or -O3).
If you want to understand what gcc is doing try passing -fdump-tree-all but be cautious: you'll get hundreds of dump files.
BTW, GCC is extensible thru plugins or with MELT (a high level domain specific language to extend GCC; which I abandoned in 2017)
You can use gdb for this like objdump.
This excerpt is taken from http://sources.redhat.com/gdb/current/onlinedocs/gdb_9.html#SEC64
Here is an example showing mixed source+assembly for Intel x86:
(gdb) disas /m main
Dump of assembler code for function main:
5 {
0x08048330 : push %ebp
0x08048331 : mov %esp,%ebp
0x08048333 : sub $0x8,%esp
0x08048336 : and $0xfffffff0,%esp
0x08048339 : sub $0x10,%esp
6 printf ("Hello.\n");
0x0804833c : movl $0x8048440,(%esp)
0x08048343 : call 0x8048284
7 return 0;
8 }
0x08048348 : mov $0x0,%eax
0x0804834d : leave
0x0804834e : ret
End of assembler dump.
Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S -c foo.c
I haven't given a shot to gcc, but in case of g++, the command below works for me.
-g for debug build
-Wa,-adhln are passed to assembler for listing with source code
g++ -g -Wa,-adhln src.cpp
For risc-v dissasembly, these flags are nice:
riscv64-unknown-elf-objdump -d -S -l --visualize-jumps --disassembler-color=color --inlines
-d: disassemble, most basic flag
-S: intermix source. Note: must use -g flag while compiling
-l: line numbers
--visualize-jumps: fancy arrows, not too useful but why not. Sometimes get's too messy and actually makes reading the source harder. Taken from Peter Cordes's comment: --visualize-jumps=coloris also an option, to use different colors for different arrows
--disassembler-color=color: give the disassembly some color
--inlines: print out inlines
Maybe usefull:
-M numeric: Use numeric reg names instead of abi names, useful if you are doing cpu dev and don't know the abi names by heart
-M no-aliases: don't use psudoinstructions like li and call
Example:
main.o:
#include <stdio.h>
#include <stdint.h>
static inline void example_inline(const char* str) {
for (int i = 0; str[i] != 0; i++)
putchar(str[i]);
}
int main() {
printf("Hello world");
example_inline("Hello! I am inlined");
return 0;
}
I recommend to use -O0 if you want intermix sources. Intermix sources becomes very messy if using -O2.
Command:
riscv64-unknown-elf-gcc main.c -c -O0 -g
riscv64-unknown-elf-objdump -d -S -l --disassembler-color=color --inlines main.o
Dissasembly:
main.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <example_inline>:
example_inline():
/Users/cyao/test/main.c:4
#include <stdio.h>
#include <stdint.h>
static inline void example_inline(const char* str) {
0: 7179 addi sp,sp,-48
2: f406 sd ra,40(sp)
4: f022 sd s0,32(sp)
6: 1800 addi s0,sp,48
8: fca43c23 sd a0,-40(s0)
000000000000000c <.LBB2>:
/Users/cyao/test/main.c:5
for (int i = 0; str[i] != 0; i++)
c: fe042623 sw zero,-20(s0)
10: a01d j 36 <.L2>
0000000000000012 <.L3>:
/Users/cyao/test/main.c:6 (discriminator 3)
putchar(str[i]);
12: fec42783 lw a5,-20(s0)
16: fd843703 ld a4,-40(s0)
1a: 97ba add a5,a5,a4
1c: 0007c783 lbu a5,0(a5)
20: 2781 sext.w a5,a5
22: 853e mv a0,a5
24: 00000097 auipc ra,0x0
28: 000080e7 jalr ra # 24 <.L3+0x12>
/Users/cyao/test/main.c:5 (discriminator 3)
for (int i = 0; str[i] != 0; i++)
2c: fec42783 lw a5,-20(s0)
30: 2785 addiw a5,a5,1
32: fef42623 sw a5,-20(s0)
0000000000000036 <.L2>:
/Users/cyao/test/main.c:5 (discriminator 1)
36: fec42783 lw a5,-20(s0)
3a: fd843703 ld a4,-40(s0)
3e: 97ba add a5,a5,a4
40: 0007c783 lbu a5,0(a5)
44: f7f9 bnez a5,12 <.L3>
0000000000000046 <.LBE2>:
/Users/cyao/test/main.c:7
}
46: 0001 nop
48: 0001 nop
4a: 70a2 ld ra,40(sp)
4c: 7402 ld s0,32(sp)
4e: 6145 addi sp,sp,48
50: 8082 ret
0000000000000052 <main>:
main():
/Users/cyao/test/main.c:9
int main() {
52: 1141 addi sp,sp,-16
54: e406 sd ra,8(sp)
56: e022 sd s0,0(sp)
58: 0800 addi s0,sp,16
/Users/cyao/test/main.c:10
printf("Hello world");
5a: 000007b7 lui a5,0x0
5e: 00078513 mv a0,a5
62: 00000097 auipc ra,0x0
66: 000080e7 jalr ra # 62 <main+0x10>
/Users/cyao/test/main.c:11
example_inline("Hello! I am inlined");
6a: 000007b7 lui a5,0x0
6e: 00078513 mv a0,a5
72: 00000097 auipc ra,0x0
76: 000080e7 jalr ra # 72 <main+0x20>
/Users/cyao/test/main.c:13
return 0;
7a: 4781 li a5,0
/Users/cyao/test/main.c:14
}
7c: 853e mv a0,a5
7e: 60a2 ld ra,8(sp)
80: 6402 ld s0,0(sp)
82: 0141 addi sp,sp,16
84: 8082 ret
PS. There are colors in the dissembled code
use -Wa,-adhln as option on gcc or g++ to produce a listing output to stdout.
-Wa,... is for command line options for the assembler part (execute in gcc/g++ after C/++ compilation). It invokes as internally (as.exe in Windows).
See
>as --help
as command line to see more help for the assembler tool inside gcc

Resources