What is the difference between executable files? - c

I have the following C program:
#include<stdio.h>
int main()
{
printf("hhhh");
return 0;
}
Commands to compile, copy and compare:
$ gcc print.c -o a.out
$ objcopy a.out b.out
$ cmp a.out b.out
I have compiled this program and created an executable. Then, I have used the objcopy command to make a copy of the executable. But, when I compare these files, I get this:
files differ: byte 41, line 1
How can I know what contents are missing?
Any help or pointers would be appreciated. Thanks!

How can I know what contents are missing?
What made you believe that any contents is missing?
The way objcopy works is:
parse the contents of the input file into internal representation.
copy parts of the original file to the output file as instructed by options
Nowhere does objdump guarantee that when "copy entire file" is given, the result will be bit-identical.
In particular, non-loadable sections could be reordered or changed in other ways.
The difference is EntSize of .init_array section is 0 bytes in a.out file and it is 8 bytes in the b.out
The EntSize of 0 doesn't make sense for a non-empty section. If you really have such section in your a.out, it's likely that your linker has a bug.

Related

How to remove 'bloat' from a compiled shared object?

I have a gcc C application which compiles to a shared object using the -fpic
option. The intent is to create a 'executable' which allows running the code anywhere in the memory.This is how a sample C program is compiled.
./armeb-eabi-gcc -march=armv5t -mbig-endian -nostdlib -fpic -c main.c
main.c
int main(){
void (*UART)() = 0x594323 | 1;
UART("Hello");
}
The problem is the compiled executable has 'bloat' where i am only looking for machine code and no symbols. I was unable to extract the exact portions from objcopy and objdump which did absolutely nothing. The file size is around 948 bytes which is insane for such simple program.
Here is a snippet of the 'portion' of the file i am looking for.
(The exact highlighted parts could be skewed)
Running
objcopy -I elf32-big -O binary main.o test.bin
gives a 64 byte file which for some odd reason moves part of the string to the top of the file which makes tools like ghidra and ida unable to disassemble properly.
Hopefully it can be seen that the reference to "Hello" is incorrect.

How to see the instructions inside of compiled language executable files?

When i compile a C/C++ source file the compiler generates another executable file. How to see the instructions of that file? What is the process known as?
gcc hello.c -o hello
./hello
Here, the first after executing the first line a file name 'hello' gets generated. I need to see the instructions of this 'hello' file.
The executable a.out file is in binary format.
You can open that in any text editor(Ex: vi, vim etc) or hex editor but you won't be able to understand the contents.
You can use some commands to get more information about what is contained in the executable file.
Some example commands are: nm, strings, objdump
Example:
$ nm a.out
$ strings a.out
$ objdump -xD --demangle a.out
Read their manual to know more about them

What I get after I compile the c file?

I use gcc compiled the hello.c:
dele-MBP:temp ldl$ ls
a.out hello.c
now, when I cat a.out:
$ cat a.out
??????? H__PAGEZERO?__TEXT__text__TEXTP1P?__stubs__TEXT??__stub_helper__TEXT???__cstring__TEXT??__unwind_info__TEXT?H??__DATA__nl_symbol_ptr__DATA__la_symbol_ptr__DATH__LINKEDIT ?"? 0 0h ? 8
P?
/usr/lib/dyldס??;K????t22
?*(?P
8??/usr/lib/libSystem.B.dylib&`)h UH??H?? ?E??}?H?u?H?=5??1ɉE??H?? ]Ð?%?L?yAS?%i?h?????Hello
P44?4
there shows the messy code.
I want to know what type of the a.out? is it assembly language? if is why there have so many ??? or %%%?
There are several intermediate file formats, depending on the compiler system you use. Most systems use the following steps, here shown with GCC as example:
Preprocessed C source (gcc -E test.c -o test.i), but this is before compilation, strictly speaking
Assembly source (gcc -S test.c -o test.s)
Object file containing machine code, not executable because calls to external functions are not resolved (gcc -c test.c -o test.o)
Executable file containing machine code (gcc test.c -o test)
Only the first two steps generate text files that you could read by cat or in a text editor. This is BTW a valuable source for insight. However, you can use objdump to see most informations contained in the other formats. Please read its documentation.
Each step does also all steps before it. So (gcc test.c -o test) generates assembly source and object file in temporary files that are removed automatically. You can watch that process by giving GCC the option -v.
Use gcc --help to see some entry points for further investigations.
There is at lot more to say about this process but it would fill a book.

How to check if a macro exists in an object file in C?

For example, I define a macro:
#ifdef VERSION
//.... do something
#endif
How can I check if VERSION exist in my object file or not? I tried to disassemble it with objdump, but found no actual value of my macro VERSION. VERSION is defined in Makefile.
Try compiling with -g3 option in gcc. It stores macro information too in the generated ELF file.
After this, if you've defined a macro MACRO_NAME just grep for it in the output executable or your object file. For example,
$ grep MACRO_NAME a.out # any object file will do instead of a.out
Binary file a.out matches
Or you can even try,
$ strings -a -n 1 a.out | grep MACRO_NAME
-a Do not scan only the initialized and loaded sections of object files;
scan the whole files.
-n min-len Print sequences of characters that are at least min-len characters long,
instead of the default 4.
The following command displays contents of .debug_macro DWARF section:
$ readelf --debug-dump=macro path/to/binary
or
$ objdump --dwarf=macro path/to/binary
You can also use dwarfdump path/to/binary, but it's not easy to leave only .debug_macro section in the output.

How to use the addr2line command in Linux?

I am trying to use addr2line command in Unix but everytime it is giving the same output as ??:0. I am giving command as addr2line -e a.out 0x4005BDC . I got this address while running this a.out executable with valgrind tool to find the memory leakage. I also compiled the source code with -g option.
You can also use gdb instead of addr2line to examine memory address. Load executable file in gdb and print the name of a symbol which is stored at the address. 16 Examining the Symbol Table.
(gdb) info symbol 0x4005BDC
You need to specify an offset to addr2line, not a virtual address (VA). Presumably if you had address space randomization turned off, you could use a full VA, but in most modern OSes, address spaces are randomized for a new process.
Given the VA 0x4005BDC by valgrind, find the base address of your process or library in memory. Do this by examining the /proc/<PID>/maps file while your program is running. The line of interest is the text segment of your process, which is identifiable by the permissions r-xp and the name of your program or library.
Let's say that the base VA is 0x0x4005000. Then you would find the difference between the valgrind supplied VA and the base VA: 0xbdc. Then, supply that to add2line:
addr2line -e a.out -j .text 0xbdc
And see if that gets you your line number.
That's exactly how you use it. There is a possibility that the address you have does not correspond to something directly in your source code though.
For example:
$ cat t.c
#include <stdio.h>
int main()
{
printf("hello\n");
return 0;
}
$ gcc -g t.c
$ addr2line -e a.out 0x400534
/tmp/t.c:3
$ addr2line -e a.out 0x400550
??:0
0x400534 is the address of main in my case. 0x400408 is also a valid function address in a.out, but it's a piece of code generated/imported by GCC, that has no debug info. (In this case, __libc_csu_init. You can see the layout of your executable with readelf -a your_exe.)
Other times when addr2line will fail is if you're including a library that has no debug information.
Try adding the -f option to show the function names :
addr2line -f -e a.out 0x4005BDC

Resources