READELF - How to add "#GLIBX_XXX" after symbol name - c

I'm learning ELF, and was given a task to create a custom READELF program in C 32bit linux.
As part of my task, I created pointers to the '.symtab' and the '.strtab' tables, so I could print each symbol name. However, when comparing my output to the original READELF output, I noticed that I'm missing the "#..." part after some of the symbol names.
Where can I find this data?

Related

How to get main function address from dynamically added library .so?

I had dynamically load a shared library for target process. Target process has a function
void printNum(). Is it possible to get it address in shred library?
actually i need start address of .text segment + function offset, but in target process i can get it with &printNum, is it posiible to do the same but in shared librarry?
Is it possible to get it address in shred library?
Maybe.
It's easy if the main executable exports the symbol in its dynamic symbol table. If so, you can access its address using the same &printNum syntax.
To see whether the main executable exports the symbol, use nm -D a.out | grep ' printNum'.
If the main executable doesn't export the symbol, and you try to access it from the foo.so with &printNum, your dlopen("foo.so", ...) will fail with "printNum: unresolved symbol" or a similar error.
If you can't rebuild the main executable with e.g. -rdynamic flag, things get trickier.
If the main executable has symbol table (i.e. it is no stripped), the function will be visible in the output from nm a.out. You can read the symbol table using this code, and obtain the address of printNum from it. If a.out is position-independent, you will also need to find the address where it is loaded, using e.g. dl_iterate_phdr().
If the main executable doesn't have symbol table (i.e. it is stripped), then there is no way to find where printNum actually is, and the answer in that case is "no".

How to check the values of a struct from an image/binary file?

Is there anyway i can look into the values of a structure after compilation? objdump -td gives the function definitions and only the address where the structure is stored. The problem is i am getting a wrong address for one of the threads/functions in a structure when i run a program. The target mcu is lpc1347 (ARM Cortex-m3).
objdump parses object files (products of the compiler), which are relocatable (not executable) ELF files. At this stage, there is no such notion as the memory address these compiled pieces will run at.
You have the following possibilities:
Link your *.obj files into the final non-stripped (-g passed to compiler) executable ELF image and parse it using readelf.
Generate the linker map file by adding -Wl,-Map,file.map to your LDFLAGS and see the output sections and addresses your data is located at in the map file.
Use a debugger/gdb.

binutils - kernel - "_binary" meaning?

I am reading xv6 lectures.
I have a file named initcode.S that is to be linked in the kernel.
Now two symbols are created that way :
extern char _binary_initcode_start[], _binary_initcode_size[];
inside a function.
The lecture says :
as part of the kernel build process, the linker embeds that binary that defines two special symbols, _binary_initcode_starcode_size, indicating the location and size of the binary.
I understand that binutils is getting the address and the size of this assembled code.
I wonder about the notation : is it default ? my searches didn't prove that clearly.
_binary -> it is originally an assembly code
_initcode -> the name of my file
_start -> the parameter i am interested in.
It would imply that any assembly code compiled would have those variables too.
I have no proof of that, though.
The question is :
is _binary_myAsmFileHere_myParameterhere the default variable structure binutils give to the assembly file to export their address, size and so on ?
Could someone tell me if my assumption is right and if it is better than that : the rule
Thanks
Strangely enough, it doesn't seem to be documented in the ld manual. However, man objcopy does say this:
You can access this binary data inside a program by referencing the
special symbols that are created by the conversion process. These
symbols are called _binary_objfile_start, _binary_objfile_end and
_binary_objfile_size. e.g. you can transform a picture file into an object file and then access it in your code using these symbols.
Apparently the same logic is used by ld when embedding binary files.
Notice that the Makefile for xv6 contains this line for linking the kernel:
$(LD) $(LDFLAGS) -T kernel.ld -o kernel entry.o $(OBJS) -b binary initcode entryother
As you can see, it uses -b binary to embed the files initcode and entryother, so the above symbols will be defined during this process.
when a .global variable is defined in an assembly file, for a C file to be able to reference that variable, the C file has to prepend a '_' to the variable name. This is so the linker can 'link' the name in the C file with the name in the assembly file.

How to set the The section number of a symbol when compiling ELF binary?

The test is on 32-bit Linux, x86.
Suppose in my assembly program final.s, I have to load some library symbols, say, stdin##GLIBC_2.0, and I want to load these symbols in a fixed address.
So following instructions in this question, I did this:
echo ""stdin##GLIBC_2.0" = 0x080a7390;" > symbolfile
echo ""stdin#GLIBC_2.0 (4)" = 0x080a7390;" >> symbolfile
gcc -Wl,--just-symbols=symbolfile final.s -g
And when I checked the output of symbol table, I got this:
readelf -s a.out | grep stdin
53: 080a7390 4 OBJECT GLOBAL DEFAULT ABS stdin##GLIBC_2.0
17166: 080a7390 0 NOTYPE GLOBAL DEFAULT ABS stdin#GLIBC_2.0 (4)
And comparing to a common ELF biary that requires stdin symbol:
readelf -s hello.out | grep stdin
17199: 0838b8c4 4 OBJECT GLOBAL DEFAULT 25 stdin##GLIBC_2.0
52: 0838b8c4 4 OBJECT GLOBAL DEFAULT 25 stdin#GLIBC_2.0 (4)
So an obvious difference I found is that the Ndx column, say, the section number of my fixed position symbols are ABS. Please check the references here.
When executing the a.out, it throws a segmentation fault error.
So my question is, how to set the section number of the symbol fixed position?
I want to load these symbols in a fixed address.
You are importing these symbols from GLIBC. Unless you are doing a fully-static linking, you get no say in what address these symbols end up at.
So my question is, how to set the section number of the symbol
That question makes no sense: section number itself is meaningless and 25 may refer to .bss in one executable, but to .text in another.
Your section 25 just happens to be .bss on this particular system and for this particular build. Try building a fully-static binary, and you are likely to see section 24 instead.
Anyway, a normal executable gets stdin copied from libc.so.6. You will do well to read this description of the process, and pay special attention to "Extra credit #2: Referencing shared library data from the executable" section.
But it may be easier to understand the fully-static case first.

How to find out *.c and *.h files that were used to build a binary?

I am building a project that builds multiple shared libraries and executable files. All the source files that are used to build these binaries are in a single /src directory. So it is not obvious to figure out which source files were used to build each of the binaries (there is many-to-many relation).
My goal is to write a script that would parse a set of C files for each binary and make sure that only the right functions are called from them.
One option seems to be to try to extract this information from Makefile. But this does not work well with generated files and headers (due to dependence on Includes).
Another option could be to simply browse call graphs, but this would get complicated, because a lot of functions are called by using function pointers.
Any other ideas?
You can first compile your project with debug information (gcc -g) and use objdump to get which source files were included.
objdump -W <some_compiled_binary>
Dwarf format should contain the information you are looking for.
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
< c> DW_AT_producer : (indirect string, offset: 0x5f): GNU C 4.4.3
<10> DW_AT_language : 1 (ANSI C)
<11> DW_AT_name : (indirect string, offset: 0x28): test_3.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x36): /home/auselen/trials
<19> DW_AT_low_pc : 0x82f0
<1d> DW_AT_high_pc : 0x8408
<21> DW_AT_stmt_list : 0x0
In this example, I've compiled object file from test_3, and it was located in .../trials directory. Then of course you need to write some script around this to collect related source file names.
First you need to separate the debug symbols from the binary you just compiled. check this question on how to do so:
How to generate gcc debug symbol outside the build target?
Then you can try to parse this file on your own. I know how to do so for Visual Studio but as you are using GCC I won't be able to help you further.
Here is an idea, need to refine based on your specific build. Make a build, log it using script (for example script log.txt make clean all). The last (or one of the last) step should be the linking of object files. (Tip: look for cc -o <your_binary_name>). That line should link all .o files which should have corresponding .c files in your tree. Then grep those .c files for all the included header files.
If you have duplicate names in your .c files in your tree, then we'll need to look at the full path in the linker line or work from the Makefile.
What Mahmood suggests below should work too. If you have an image with symbols, strings <debug_image> | grep <full_path_of_src_directory> should give you a list of C files.
You can use unix nm tool. It shows all symbols that are defined in the object. So you need to:
Run nm on your binary and grab all undefined symbols
Run ldd on your binary to grab list of all its dynamic dependencies (.so files your binary is linked to)
Run nm on each .so file youf found in step 2.
That will give you the full list of dynamic symbols that your binary use.
Example:
nm -C --dynamic /bin/ls
....skipping.....
00000000006186d0 A _edata
0000000000618c70 A _end
U _exit
0000000000410e34 T _fini
0000000000401d88 T _init
U _obstack_begin
U _obstack_newchunk
U _setjmp
U abort
U acl_extended_file
U bindtextdomain
U calloc
U clock_gettime
U closedir
U dcgettext
U dirfd
All those symbols with capital "U" are used by ls command.
If your goal is to analyze C source files, you can do that by customizing the GCC compiler. You could use MELT for that purpose (MELT is a high-level domain specific language to extend GCC) -adding your own analyzing passes coded in MELT inside GCC-, but you should first learn about GCC middle-end internal representations (Gimple, Tree, ...).
Customizing GCC takes several days of work (mostly because GCC internals are quite complex in the details).
Feel free to ask me more about MELT.

Resources