How does objdump manage to display source code with the -S option? - c

Is there a reference to the source file in the binary? I tried running strings on the binary and couldn't find any reference to the source file listed...

objdump uses the DWARF debugging information compiled into the binary, which references the source file name. objdump tries to open the named source file to load the source and display it in the output. If the binary isn't compiled with debugging information, or objdump can't find the source file, then you don't get source code in your output - only assembly.
You don't see the source file name when you use strings on the binary, because DWARF uses compression.

The dwarf information in a binary stores the mapping between instructions(the instruction pointer or IP actually) and the source file and line number. The source file is specified using the complete path so it can be found even if the binary is moved around. To see this information you can use objdump --dwarf=decodedline <binary> (the binary ofcourse has to be compiled with -g).
Once you say objdump -S <binary> it use this dwarf info to give you source code along with the disassembly.

My understanding is that for objdump to display source code from the binary code, there is a precondition: the DWARF debugging information must be compiled into the binary. (by gcc -g sourcefile or gcc -gdwarf-2 sourcefile)
And by processing this DWARF information objdump is able to get the source code as #vlcekmi3 and #vkrnt answered

Related

How to dump path to source from object-file

Assume I have a C object-file app.o compiled with gcc. How can I dump the file path to the original app.c from which app.o was compiled. My goal is to create a listing of all symbols + respective source file path using the binutils and gcc toolsuite.
By no means am I expecting an all-in-one solution. So I tried playing with multiple tools to gather the information I need.
Inspecting the object-file with a text-editor reveals that (appart from a lot of unreadable binary gibberish) the file does contain a reference to app.c as a string embedded into the object-file format. However I did not find a way to extract that string using objdump or nm.
I was hoping objdump would have some flag that could extract this source file string, but after trying virtually all options documented in the man page I still couldn't find it.
With the path of the source file I was hoping I could run gcc -M <path-to-source>. This would allow me to look through all the headers included by app.c and find the in-source declarations.
Suppose a simple app.c like this:
void foo(void) {
}
Compile it via gcc -c app.c -o app.o.
Running objdump -t app.o dumps the symbol table, but does not refer anywhere to the original app.c.
Running cat app.o does show that the object-file contains the file path to app.c (relative to pwd at compile-time). But I wasn't exactly planning on writing my own object-file parser just to get to that string.
To answer my own question minutes after posting it (duh!):
readelf -s app.o prints a symbol table including the name of the source file (app.c). With that I am able to run gcc -M app.c and then parse through all header files to gather the symbol declarations.

Difference using avr-gcc and avr-ld in ELF (Executable and Linkable Format) file generation

Sorry if this might be off-topic.
In the process of generating .hex(Intel HEX format) files using avr-gcc or avr-ld the output(final result) is significantly different. As an minimal clarification I am talking about the step of generating ELF file just after generating the Object files.
On my first attempt, I used avr-ld to generate my ELF file. Process works smoothly but after generating HEX files and uploading to my board it did nothing (as in uploading an blank HEX file).
On my second try, I followed the advice found here:
It is important to specify the MCU type when linking. The compiler uses the -mmcu option to choose start-up files and run-time libraries that get linked together. If this option isn't specified, the compiler defaults to the 8515 processor environment, which is most certainly what you didn't want.
It did as I expected. Uploaded the HEX file and my board updated accordingly.
So my questions are as follows:
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
What is the logic behind this configuration? Is my way of thinking wrong (in using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the EFL files, and avr-objcopy to strip only usefull information and changing the format of the file ELF -> HEX)?
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
The linker doesn't llose that information, it was never supplied in the first place. Object files resp. ELF headers are on level of "emulation", i.e. granularity like -mmcu=arch where arch is one of avr2, avr25, avrxmega2, avrtiny etc.
using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the ELF files, and avr-objcopy to strip only usefull information and changing the format of the file ELF → HEX?
avr-gcc is not a compiler, it's just a driver program that calls other programs like compiler proper cc1 or cc1plus, assembler as, linker ld depending on file type and options provided. The driver will add options to these programs which greatly simplifies their usage, many of which are described in specs-attiny25 (since v5 onwards).
As an example, take a simple main.c with a main function returning 0, and process it with
avr-gcc main.c -o main.elf -mmcu=attiny25 -save-temps -v -Wl,-v
The -v makes the driver show the commands it is issuing, and -save-temps stores intermediate files like assembly. For avr-gcc v8.5, the link process starts with a call of collect2:
.../bin/../libexec/gcc/avr/8.5.0/collect2 -plugin .../bin/../libexec/gcc/avr/8.5.0/liblto_plugin.so -plugin-opt=.../bin/../libexec/gcc/avr/8.5.0/lto-wrapper -plugin-opt=-fresolution=main.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lm -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lattiny25 -mavr25 -o main.elf .../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack/crtattiny25.o -L.../bin/../lib/gcc/avr/8.5.0/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0 -L.../bin/../lib/gcc -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib main.o -v --start-group -lgcc -lm -lc -lattiny25 --end-group
where ... stands for the absolute path where the tools are installed. As you can see, the driver adds some salt, for example it links against startup code crtattiny25.o, standard libs like libc.a, libm.a, libgcc.a. collect2 gathers some extra information needed to build startup code, and then calls back the compiler1 and finally ld.
The options provided to ld look very much like the ones provided to collect2. The only device-specific stuff is: startup code crtattiny25.o and device lib libattiny25.a. Many other device-specific stuff has already been compiler into the code, like SFR addresses, #ifdef __AVR_ATtiny25__ etc.
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
You could provide all that options by hand.
1Calling back the compiler is needed for LTO (link-time optimization) as of -flto. The linker calls a plugin which calls the compiler with LTO byte-code, compiles it to assemblwith the LTO-compiler lto1, then as, then ld. Newer versions of the tool are always using linker plugin when without lto compilation; one can -fno-use-linker-plugin which makes the call chain and options somewhat simpler.

Why shows readelf on an ARM binary an odd entry point address?

I compiled a C++ HelloWorld on an Odroid-XU3 with gcc/g++ version 4.8.2 and clang version 3.5. I also wrote a C HelloWorld for comparison.
g++ -static -o HelloWorld hello.cc
readelf -h HelloWorld shows the following entry point addresses:
HelloWorld: 0x8be5
HelloClang: 0x8c45
HelloC: 0x88b5
These are odd addresses. Thumb has odd addresses, so has this something to do with Thumb?
Additionally, objdump -lSd HelloWorld shows the _start Symbol at 0x8be4, which looks like the "right" address.
Why show these two tools different addresses?
Yes addresses are odd because they are Thumb functions, which is a simple question, however why two tools report differently to me is a good question.
readelf on purpose doesn't use BFD (unlike objdump) and mostly used to verify other tools against.
Here:
The difference between readelf and objdump: Both programs are
capable of displaying the contents of ELF format files, so why does
the binutils project have two file dumpers ?
The reason is that
objdump sees an ELF file through a BFD filter of the world; if BFD
has a bug where, say, it disagrees about a machine constant in
e_flags, then the odds are good that it will remain internally
consistent. The linker sees it the BFD way, objdump sees it the BFD
way, GAS sees it the BFD way. There was need for a tool to go find
out what the file actually says.
This is why the readelf program
does not link against the BFD library - it exists as an independent
program to help verify the correct working of BFD.
There is also the
case that readelf can provide more information about an ELF file
than is provided by objdump. In particular it can display DWARF
debugging information which (at the moment) objdump cannot.

How do I emulate objdump --dwarf=decodedline in .bundle files?

I've been successfully using objdump --dwarf=decodedline to find the source location of each offset in a .so file on Linux.
Unfortunately on Mac-OS X. It seems that .bundle files (used as shared libraries) are not queriable in this manner.
I'm optimistic that there's something I can do, because gdb is able to correctly debug and step through code in these bundles — does anyone know what it's doing?
Further information:
The dwarfdump utility claims that the .bundle file contains no DWARF data, but that it does contain STABS data; however objdump --stabs cannot find any stabs data either.
(If it makes the question easier to answer, I don't actually need all of the offsets; being able to query the source location of any given offset would be good enough).
The bundle file I've been testing this on was generated using:
cc -dynamic -bundle -undefined suppress -flat_namespace -g -o c_location.bundle c_location.o -L. -L/Users/User/.rvm/rubies/ruby-1.8.7-p357/lib -L. -lruby -ldl -lobjc
The original c_location.o file does contain the necessary information for objdump --dwarf=decodedline to work.
So it turns out that one way to do this is to use Apple's nm -pa *.bundle to find the symbol name and the original object file for a given offset.
Once you have that, you can first use objdump -tT to find the offset of the symbol name in the original object file; and then use objdump --dwarf=decodedline as before.
Each step requires a little bit of simplistic output parsing, but it does seem to work™. I'd be interested if there are more robust approaches.

How can I tell if a library was compiled with -g?

I have some compiled libraries on x86 Linux and I want to quickly determine whether they were compiled with debugging symbols.
If you're running on Linux, use objdump --debugging. There should be an entry for each object file in the library. For object files without debugging symbols, you'll see something like:
objdump --debugging libvoidincr.a
In archive libvoidincr.a:
voidincr.o: file format elf64-x86-64
If there are debugging symbols, the output will be much more verbose.
The suggested command
objdump --debugging libinspected.a
objdump --debugging libinspected.so
gives me always the same result at least on Ubuntu/Linaro 4.5.2:
libinspected.a: file format elf64-x86-64
libinspected.so: file format elf64-x86-64
no matter whether the archive/shared library was built with or without -g option
What really helped me to determine whether -g was used is readelf tool:
readelf --debug-dump=decodedline libinspected.so
or
readelf --debug-dump=line libinspected.so
This will print out set of lines consisting of source filename, line number and address if such debug info is included into library, otherwise it'll print nothing.
You may pass whatever value you'll find necessary for --debug-dump option instead of decodedline.
What helped is:
gdb mylib.so
It prints when debug symbols are not found:
Reading symbols from mylib.so...(no debugging symbols found)...done.
Or when found:
Reading symbols from mylib.so...done.
None of earlier answers were giving meaningful results for me: libs without debug symbols were giving lots of output, etc.
nm -a <lib> will print all symbols from library, including debug ones.
So you can compare the outputs of nm <lib> and nm -a <lib> - if they differ, your lib contains some debug symbols.
On OSX you can use dsymutil -s and dwarfdump.
Using dsymutil -s <lib_file> | more you will see source file paths in files that have debug symbols, but only the function names otherwise.
You can use objdump for this.
EDIT: From the man-page:
-W
--dwarf
Displays the contents of the DWARF debug sections in the file, if
any are present.
Answers suggesting the use of objdump --debugging or readelf --debug-dump=... don't work in the case that debug information is stored in a file separate from the binary, i.e. the binary contains a debug link section. Perhaps one could call that a bug in readelf.
The following code should handle this correctly:
# Test whether debug information is available for a given binary
has_debug_info() {
readelf -S "$1" | grep -q " \(.debug_info\)\|\(.gnu_debuglink\) "
}
See Separate Debug Files in the GDB manual for more information.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/debugging
The command readelf -wi file is a good verification of debuginfo, compiled within your program.

Resources