Why shows readelf on an ARM binary an odd entry point address? - arm

I compiled a C++ HelloWorld on an Odroid-XU3 with gcc/g++ version 4.8.2 and clang version 3.5. I also wrote a C HelloWorld for comparison.
g++ -static -o HelloWorld hello.cc
readelf -h HelloWorld shows the following entry point addresses:
HelloWorld: 0x8be5
HelloClang: 0x8c45
HelloC: 0x88b5
These are odd addresses. Thumb has odd addresses, so has this something to do with Thumb?
Additionally, objdump -lSd HelloWorld shows the _start Symbol at 0x8be4, which looks like the "right" address.
Why show these two tools different addresses?

Yes addresses are odd because they are Thumb functions, which is a simple question, however why two tools report differently to me is a good question.
readelf on purpose doesn't use BFD (unlike objdump) and mostly used to verify other tools against.
Here:
The difference between readelf and objdump: Both programs are
capable of displaying the contents of ELF format files, so why does
the binutils project have two file dumpers ?
The reason is that
objdump sees an ELF file through a BFD filter of the world; if BFD
has a bug where, say, it disagrees about a machine constant in
e_flags, then the odds are good that it will remain internally
consistent. The linker sees it the BFD way, objdump sees it the BFD
way, GAS sees it the BFD way. There was need for a tool to go find
out what the file actually says.
This is why the readelf program
does not link against the BFD library - it exists as an independent
program to help verify the correct working of BFD.
There is also the
case that readelf can provide more information about an ELF file
than is provided by objdump. In particular it can display DWARF
debugging information which (at the moment) objdump cannot.

Related

Difference using avr-gcc and avr-ld in ELF (Executable and Linkable Format) file generation

Sorry if this might be off-topic.
In the process of generating .hex(Intel HEX format) files using avr-gcc or avr-ld the output(final result) is significantly different. As an minimal clarification I am talking about the step of generating ELF file just after generating the Object files.
On my first attempt, I used avr-ld to generate my ELF file. Process works smoothly but after generating HEX files and uploading to my board it did nothing (as in uploading an blank HEX file).
On my second try, I followed the advice found here:
It is important to specify the MCU type when linking. The compiler uses the -mmcu option to choose start-up files and run-time libraries that get linked together. If this option isn't specified, the compiler defaults to the 8515 processor environment, which is most certainly what you didn't want.
It did as I expected. Uploaded the HEX file and my board updated accordingly.
So my questions are as follows:
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
What is the logic behind this configuration? Is my way of thinking wrong (in using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the EFL files, and avr-objcopy to strip only usefull information and changing the format of the file ELF -> HEX)?
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
The linker doesn't llose that information, it was never supplied in the first place. Object files resp. ELF headers are on level of "emulation", i.e. granularity like -mmcu=arch where arch is one of avr2, avr25, avrxmega2, avrtiny etc.
using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the ELF files, and avr-objcopy to strip only usefull information and changing the format of the file ELF → HEX?
avr-gcc is not a compiler, it's just a driver program that calls other programs like compiler proper cc1 or cc1plus, assembler as, linker ld depending on file type and options provided. The driver will add options to these programs which greatly simplifies their usage, many of which are described in specs-attiny25 (since v5 onwards).
As an example, take a simple main.c with a main function returning 0, and process it with
avr-gcc main.c -o main.elf -mmcu=attiny25 -save-temps -v -Wl,-v
The -v makes the driver show the commands it is issuing, and -save-temps stores intermediate files like assembly. For avr-gcc v8.5, the link process starts with a call of collect2:
.../bin/../libexec/gcc/avr/8.5.0/collect2 -plugin .../bin/../libexec/gcc/avr/8.5.0/liblto_plugin.so -plugin-opt=.../bin/../libexec/gcc/avr/8.5.0/lto-wrapper -plugin-opt=-fresolution=main.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lm -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lattiny25 -mavr25 -o main.elf .../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack/crtattiny25.o -L.../bin/../lib/gcc/avr/8.5.0/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0 -L.../bin/../lib/gcc -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib main.o -v --start-group -lgcc -lm -lc -lattiny25 --end-group
where ... stands for the absolute path where the tools are installed. As you can see, the driver adds some salt, for example it links against startup code crtattiny25.o, standard libs like libc.a, libm.a, libgcc.a. collect2 gathers some extra information needed to build startup code, and then calls back the compiler1 and finally ld.
The options provided to ld look very much like the ones provided to collect2. The only device-specific stuff is: startup code crtattiny25.o and device lib libattiny25.a. Many other device-specific stuff has already been compiler into the code, like SFR addresses, #ifdef __AVR_ATtiny25__ etc.
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
You could provide all that options by hand.
1Calling back the compiler is needed for LTO (link-time optimization) as of -flto. The linker calls a plugin which calls the compiler with LTO byte-code, compiles it to assemblwith the LTO-compiler lto1, then as, then ld. Newer versions of the tool are always using linker plugin when without lto compilation; one can -fno-use-linker-plugin which makes the call chain and options somewhat simpler.

Create non-PIC shared libraries with ld

I have a bunch of object files that have been compiled without the -fPIC option. So the calls to the functions do not use #PLT. (source code is C and is compiled with clang).
I want to link these object files into a shared library that I can load at runtime using dlopen. I need to do this because I have to do a lot of setup before the actual .so is loaded.
But every time I try to link with the -shared option, I get the error -
relocation R_X86_64_PC32 against symbol splay_tree_lookup can not be used when making a shared object; recompile with -fPIC
I have no issues recompiling from source. But I don't want to use -fPIC. This is part of a research project where we are working on a custom compiler. PIC wouldn't work for the type of guarantees we are trying to provide in the compiler.
Is there some flag I can use with ld so that it generate load time relocating libraries. In fact I am okay with no relocations. I can provide a base address for the library and dlopen can fail if the virtual address is not available.
The command I am using for compiling my c files are equivalent to -
clang -m64 -c foo.c
and for linking I am using
clang -m64 -shared *.o -o foo.so
I say equivalent because it is a custom compiler (forked off clang) and has some extra steps. But it is equivalent.
It is not possible to dynamically load your existing non PIC objects with the expectation of it working without problems.
If you cannot recompile the original code to create a proper shared library that supports PIC, then I suggest you create a service executable that links to a static library composed of those objects. The service executable can then provide IPC/RPC/REST API/shared memory/whatever to allow your object code to be used by your program.
Then, you can author a shared library which is compiled with PIC that provides wrapper APIs that launches and communicates with the service executable to perform the actual work.
On further thought, this wrapper API library may as well be static. The dynamic aspect of it is performed by launching the service executable.
Recompiling the library's object files with the -fpic -shared options would be the best option, if this is possible!
man ld says:
-i Perform an incremental link (same as option -r).
-r
--relocatable
Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file’s magic number to "OMAGIC". If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur.
When an input file does not have the same format as the output file, partial linking is only supported if that input file does not contain any relocations. Different output formats can have further restrictions; for example some "a.out"-based formats do not support partial linking with input files in other formats at all.
I believe you can partially link your library object files into a relocatable (PIC) library, then link that library with your source code object file to make a shared library.
ld -r -o libfoo.so *.o
cp libfoo.so /foodir/libfoo.so
cd foodir
clang -m32 -fpic -c foo.c
clang -m32 -fpic -shared *.o -o foo.so
Regarding library base address:
(Again from man ld)
--section-start=sectionname=org
Locate a section in the output file at the absolute address given by org. You may use this option as many times as necessary to locate multiple sections in the command line. org must be a single hexadecimal integer; for compatibility with other linkers, you may omit the leading 0x usually associated with hexadecimal values. Note: there should be no white space between sectionname, the equals sign ("="), and org.
You could perhaps move your library's .text section?
--image-base value
Use value as the base address of your program or dll. This is the lowest memory location that will be used when your program or dll is loaded. To reduce the need to relocate and improve performance of your dlls, each should have a unique base address and not overlap any other dlls. The default is 0x400000 for executables, and 0x10000000 for dlls. [This option is specific to the i386 PE targeted port of the linker]

Trying to understand the main function with GCC and Windows

They say that main() is a function like any other function, but "marked" as an entry point inside the binary, an entry point that the operating system may find (Don't know how) and start the program from there. So, I'm trying to find out more about this function. What have I done? I created a simple .C file with this code inside:
int main(int argc, char **argv) {
return (0);
}
I saved the file, installed the GCC compiler (in Windows, MingW environment) and created a batch file like this:
gcc -c test.c -nostartfiles -nodefaultlibs -nostdlib -nostdinc -o test.o
gcc -o test.exe -nostartfiles -nodefaultlibs -nostdlib -nostdinc -s -O2 test.o
#%comspec%
I did this to obtain a very simplistic compiler and linker, no library, no header, just the compiler. So, the compiling goes well but the linking stops with this error:
test.c:(.text+0xa): undefined reference to '___main'
collect2.exe: error: Id returned 1 exit status
I thought that the main function is exported by the linker but I believed that you didn't need any library with additional information about it. But it looks like it does. In my case I supposed that it must be the standard GCC library, so I downloaded the source code of it and opened this file: libgcc2.c
Now, I don't know if that is the file where the main function is constructed to be linked by GCC. In fact, I don't understand how the main function is used by GCC. Why does the linker need the gcc standard libraries? To know what about main? I hope this has made my question quite specific and clear. Thanks!
When gcc puts together all object files (test.o) and libraries to form a binary it also prepends a small object (usually crt0.o or crt1.o), which is responsible for calling your main(). You can see what gcc is doing, when you add -v on the command line:
$ gcc -v -o test.exe test.o
crt0/crt1 does some setup and then calls into main. But the linker is finally responsible for building the executable according to the OS. With -v you can see also an option for the target system. In my case it's for Linux 64 bit: -m elf_x86_64. For your system this will be something like -m windows or -m mingw.
The error happens because you use these two options: -nodefaultlibs -nostdlib
These tell GCC that it should not link your code against libc.a/c.lib which contains the code which really calls main(). In a nutshell, every OS is slightly different and most of them don't care about C and main(). Each has their own special way to start a process and most of them are not compatible with the C API.
So the solution of the C developers was to put "glue code" into the C standard library libc.a which contains the interface which the OS expects, creates the standard C environment (setting up the memory allocation structures so malloc() will map the OS's memory management functions, set up stdio, etc) and eventually calls main()
For C developers, this means they get a libc.a for their OS (along with the compiler binaries) and they don't need to care about how the setup works.
Another source of confusion is the name of the reference. On most systems, the symbolic name of main() is _main (i.e. one underscore) while __main is the name of an internal function called by the setup code which eventually calls the real main()

How do I emulate objdump --dwarf=decodedline in .bundle files?

I've been successfully using objdump --dwarf=decodedline to find the source location of each offset in a .so file on Linux.
Unfortunately on Mac-OS X. It seems that .bundle files (used as shared libraries) are not queriable in this manner.
I'm optimistic that there's something I can do, because gdb is able to correctly debug and step through code in these bundles — does anyone know what it's doing?
Further information:
The dwarfdump utility claims that the .bundle file contains no DWARF data, but that it does contain STABS data; however objdump --stabs cannot find any stabs data either.
(If it makes the question easier to answer, I don't actually need all of the offsets; being able to query the source location of any given offset would be good enough).
The bundle file I've been testing this on was generated using:
cc -dynamic -bundle -undefined suppress -flat_namespace -g -o c_location.bundle c_location.o -L. -L/Users/User/.rvm/rubies/ruby-1.8.7-p357/lib -L. -lruby -ldl -lobjc
The original c_location.o file does contain the necessary information for objdump --dwarf=decodedline to work.
So it turns out that one way to do this is to use Apple's nm -pa *.bundle to find the symbol name and the original object file for a given offset.
Once you have that, you can first use objdump -tT to find the offset of the symbol name in the original object file; and then use objdump --dwarf=decodedline as before.
Each step requires a little bit of simplistic output parsing, but it does seem to work™. I'd be interested if there are more robust approaches.

How can I tell if a library was compiled with -g?

I have some compiled libraries on x86 Linux and I want to quickly determine whether they were compiled with debugging symbols.
If you're running on Linux, use objdump --debugging. There should be an entry for each object file in the library. For object files without debugging symbols, you'll see something like:
objdump --debugging libvoidincr.a
In archive libvoidincr.a:
voidincr.o: file format elf64-x86-64
If there are debugging symbols, the output will be much more verbose.
The suggested command
objdump --debugging libinspected.a
objdump --debugging libinspected.so
gives me always the same result at least on Ubuntu/Linaro 4.5.2:
libinspected.a: file format elf64-x86-64
libinspected.so: file format elf64-x86-64
no matter whether the archive/shared library was built with or without -g option
What really helped me to determine whether -g was used is readelf tool:
readelf --debug-dump=decodedline libinspected.so
or
readelf --debug-dump=line libinspected.so
This will print out set of lines consisting of source filename, line number and address if such debug info is included into library, otherwise it'll print nothing.
You may pass whatever value you'll find necessary for --debug-dump option instead of decodedline.
What helped is:
gdb mylib.so
It prints when debug symbols are not found:
Reading symbols from mylib.so...(no debugging symbols found)...done.
Or when found:
Reading symbols from mylib.so...done.
None of earlier answers were giving meaningful results for me: libs without debug symbols were giving lots of output, etc.
nm -a <lib> will print all symbols from library, including debug ones.
So you can compare the outputs of nm <lib> and nm -a <lib> - if they differ, your lib contains some debug symbols.
On OSX you can use dsymutil -s and dwarfdump.
Using dsymutil -s <lib_file> | more you will see source file paths in files that have debug symbols, but only the function names otherwise.
You can use objdump for this.
EDIT: From the man-page:
-W
--dwarf
Displays the contents of the DWARF debug sections in the file, if
any are present.
Answers suggesting the use of objdump --debugging or readelf --debug-dump=... don't work in the case that debug information is stored in a file separate from the binary, i.e. the binary contains a debug link section. Perhaps one could call that a bug in readelf.
The following code should handle this correctly:
# Test whether debug information is available for a given binary
has_debug_info() {
readelf -S "$1" | grep -q " \(.debug_info\)\|\(.gnu_debuglink\) "
}
See Separate Debug Files in the GDB manual for more information.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/debugging
The command readelf -wi file is a good verification of debuginfo, compiled within your program.

Resources