How to set the The section number of a symbol when compiling ELF binary? - c

The test is on 32-bit Linux, x86.
Suppose in my assembly program final.s, I have to load some library symbols, say, stdin##GLIBC_2.0, and I want to load these symbols in a fixed address.
So following instructions in this question, I did this:
echo ""stdin##GLIBC_2.0" = 0x080a7390;" > symbolfile
echo ""stdin#GLIBC_2.0 (4)" = 0x080a7390;" >> symbolfile
gcc -Wl,--just-symbols=symbolfile final.s -g
And when I checked the output of symbol table, I got this:
readelf -s a.out | grep stdin
53: 080a7390 4 OBJECT GLOBAL DEFAULT ABS stdin##GLIBC_2.0
17166: 080a7390 0 NOTYPE GLOBAL DEFAULT ABS stdin#GLIBC_2.0 (4)
And comparing to a common ELF biary that requires stdin symbol:
readelf -s hello.out | grep stdin
17199: 0838b8c4 4 OBJECT GLOBAL DEFAULT 25 stdin##GLIBC_2.0
52: 0838b8c4 4 OBJECT GLOBAL DEFAULT 25 stdin#GLIBC_2.0 (4)
So an obvious difference I found is that the Ndx column, say, the section number of my fixed position symbols are ABS. Please check the references here.
When executing the a.out, it throws a segmentation fault error.
So my question is, how to set the section number of the symbol fixed position?

I want to load these symbols in a fixed address.
You are importing these symbols from GLIBC. Unless you are doing a fully-static linking, you get no say in what address these symbols end up at.
So my question is, how to set the section number of the symbol
That question makes no sense: section number itself is meaningless and 25 may refer to .bss in one executable, but to .text in another.
Your section 25 just happens to be .bss on this particular system and for this particular build. Try building a fully-static binary, and you are likely to see section 24 instead.
Anyway, a normal executable gets stdin copied from libc.so.6. You will do well to read this description of the process, and pay special attention to "Extra credit #2: Referencing shared library data from the executable" section.
But it may be easier to understand the fully-static case first.

Related

Why are some relocations .text + addend instead of symbol's name + addend?

Why are some relocation entries in an ELF file symbol name + addend while others are section + addend? I am looking to clear up some confusion and gain a deeper understanding of ELFs. Below is my investigation.
I have a very simple C file, test.c:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
static void func1(void)
{
fprintf(stdout, "Inside func1\n");
}
// ... a couple other simple *static* functions
int main (void)
{
func1();
// ... call some other functions
exit(EXIT_SUCCESS);
}
I then compile this into an object file with:
clang -O0 -Wall -g -c test.c -o test.o
If look at the relocations with readelf -r test.o I see the entries that refer to my static functions as follows (this one is picked from the .rela.debug_info section):
Offset Info Type Symbol's Value Symbol's Name + Addend
...
000000000000006f 0000000400000001 R_X86_64_64 0000000000000000 .text + b0
...
Why are these functions referred to as section + addend rather than symbol name + addend? I see entries for the functions in the .symtab using readelf -s test.o:
Num: Value Size Type Bind Vis Ndx Name
...
2: 00000000000000b0 31 FUNC LOCAL DEFAULT 2 func1
...
Additionally, when I disassemble the object file (via objdump -d), I see that the functions are there and weren't optimized into main or anything.
If I don't make the functions static and then look at the relocations, I see the same as before when the type is R_X86_64_64, but I also see entries that use the symbol name plus an addend with type R_X86_64_PC32. So for example in .rela.text:
Offset Info Type Symbol's Value Symbol's Name + Addend
...
00000000000000fe 0000001200000002 R_X86_64_PC32 0000000000000000 func1 + 1c
...
Please let me know if more examples/readelf output would be helpful. Thank you for taking the time to read this.
Why are these functions referred to as section + addend rather than symbol name + addend?
The function names for static functions are not guaranteed to be present at link time. You could remove them with e.g. objcopy --strip-unneeded or objcopy --strip-symbol, and the result will still link.
I see entries for the functions in the .symtab using readelf -s test.o
I believe the only reason they are kept is to help debugging, and they are not used by the linker at all. But I have not verified this by looking at linker source, and so did not answer this related question.
Eli Bendersky's blog also mentions this in his blog post. From the section titled "Extra credit: Why was the call relocation needed?":
In short, however, when ml_util_func is global, it may be overridden in the executable or another shared library, so when linking our shared library, the linker can't just assume the offset is known and hard-code it [12]. It makes all references to global symbols relocatable in order to allow the dynamic loader to decide how to resolve them. This is why declaring the function static makes a difference - since it's no longer global or exported, the linker can hard-code its offset in the code.
The full post should be read to get complete context, but I thought I would share it here as it presents better examples than in my question and reinforces the solution that Employed Russian gave.

READELF - How to add "#GLIBX_XXX" after symbol name

I'm learning ELF, and was given a task to create a custom READELF program in C 32bit linux.
As part of my task, I created pointers to the '.symtab' and the '.strtab' tables, so I could print each symbol name. However, when comparing my output to the original READELF output, I noticed that I'm missing the "#..." part after some of the symbol names.
Where can I find this data?

How to print message to stdout from GNU ld script?

I have quite large ld link script for embedded platform which is low on RAM and ROM. I want to know how much memory is left available after I have relocated all the code. Actually, I want to print out the value
of location counter . to stdout.
How can I do it? Is there some magic command like print(.)?
I have a post-link step in my projects that dumps the size of stuff so I can see how close I'm getting. Just add something along the lines of:
arm-none-eabi-size binary_image.axf
That will get you output like:
text data bss dec hex filename
204808 704 23188 228700 37d5c Foo.axf
On my cortex-m3 chip, this would be text+data = flash usage, data+bss = ram usage. dec/hex are useless values.
And as Olaf says, use a map file for more specific memory consumption. I have this added to my link step:
-Xlinker -Map=Foo.map
Another solution might be to add the following command to the linker:
-Xlinker --print-memory-usage
This gives me the following output:
Memory region Used Size Region Size %age Used
m_interrupts: 576 B 576 B 100.00%
m_text: 22988 B 32192 B 71.41%
m_data: 26552 B 32 KB 81.03%
Read the manual. There are no such commands - there cannot be.
Linker "scripts" are actually more like configuration/descriptor files. They are not "executed" like a script. There is also not a single . (how could be for different memory areas?).
You can, however, output a map which might exactly be what you need. Try option -M. If you have set up the memory regions in the linker script correctly, the linker will warn if some memory area overflows, which is actually what you want for automatic builds.
Update: You could grep/filter the map file if you want to insist seeing the section sizes on each build.
You can't print the value of a symbol while the script is being executed, but you can create a symbol and the look it up afterwards with nm. Like this:
value_of_dot = .;
Then
nm my_file.elf | grep value_of_dot
Edit: If you really want it printed to stdout you would have to modify the linker. E.g. for lld, add printfs in LinkerScript.cpp in LinkerScript::assignSymbol().
For your particular use-case of checking how much memory is used, it is probably better to use size, as escrafford suggested, or objdump -section-headers.

Find start point, int main()

I am currently compiling a bought data stack in C. I use their own tool to compile it, using in the background gcc. I can pass flags and parameters to gcc as I see fit. I want to know, from which file is the main() used. That is, in the project, which file is the starting point. Is there any way to tell gcc to generate a list of files, or similar, given that I dont know from which file is main() being taken? Thank you.
You can disassemble the final executable to find the starting point. Although you have not provided any additional info to help you more. I'm using a sample code to demonstrate the process.
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
Now the object main.o has the following this
[root#s1 sf]# gcc -c main.c
[root#s1 sf]# nm main.o
0000000000000000 T main
U puts
You can see main is not initialized. Because it will changed in linking stage. Now after linking :
$gcc main.o
$nm a.out
U __libc_start_main##GLIBC_2.2.5
0000000000600874 A _edata
0000000000600888 A _end
00000000004005b8 T _fini
0000000000400390 T _init
00000000004003e0 T _start
000000000040040c t call_gmon_start
0000000000600878 b completed.6347
0000000000600870 W data_start
0000000000600880 b dtor_idx.6349
00000000004004a0 t frame_dummy
00000000004004c4 T main
You see that main has a address now. But its still not final. Because this main will called by C runtime dynamically. you can see who will do the part of U __libc_start_main##GLIBC_2.2.5:
[root#s1 sf]# ldd a.out
linux-vdso.so.1 => (0x00007fff61de1000) /* the linux system call interface */
libc.so.6 => /lib64/libc.so.6 (0x0000003c96000000) /* libc runime , this will invoke your main*/
/lib64/ld-linux-x86-64.so.2 (0x0000003c95c00000) /* dynamic loader */
Now you can verify this by viewing the disassembly :
00000000004003e0 <_start>:
..........
4003fd: 48 c7 c7 c4 04 40 00 mov rdi,0x4004c4 /* address of start of main */
400404: e8 bf ff ff ff call 4003c8 <__libc_start_main#plt> /* this will set up the environment for main, like pushing argc and argv to stack */
...........
If you don't have the source with you, then you can search in the executable for references to libc_start_main or main or start to see how your executable is initialized and starts the main.
Now all of these is done when linking is done with default linker script. Many big project will use its own linker script. If your project has custom linker script, then finding the start point will be different depending on the linker script used. There are projects which does not uses glibc's runtime. In that case, its still possible to find the start point by hacking the object files, library archives etc.
If your binary is stripped from symbols, then you have to actually rely on your assembler skill to find where it starts.
I've assumed that you don't have the source, that is the stack is distributed with some libraries and some header definitions only.(A common practice of commercial software vendors).
But if you have source with you, then its just too trivial. just grep your way through it. Some answers already pointed that out.
From where main() is called is implementation-dependent -- using GCC, it will most likely be a stub object file in /usr/lib called crt0.o or crt1.o from which it is called. (this file contains the OS-dependent symbol which is automatically invoked by the kernel when your app is loaded into memory. On Linux and Mac OS X, this is called start).
You can use objdump -t to list symbols from object files. So assuming you are on Linux, and also assuming that the object files are still around somewhere, you can do this:
find -name '*.o' -print0 \
| xargs -0 objdump -t \
| awk '/\.o:/{f=$1} /\.text\.main/{print f, $6}'
This will print a list of object files and the references to main they contain. Usually there should be a simple map from object files to source files. If there are multiple object files containing that symbol, then it depends on which one of those actually got linked into the binary you're looking at, as there can be no more than one main per executable binary (except perhaps for some really exotic black magic).
After the application is linked and debugging symbols are stripped, there usually is no indication from which source file a specific function came. The exception to this are files which include the function names as string literals, e.g. using the __FILE__ macro. Before stripping debugging symbols, you might use the debugger to obtain that information. If debugging symbols are included, that is.

How to 'link' object file to executable/compiled binary?

Problem
I wish to inject an object file into an existing binary. As a concrete example, consider a source Hello.c:
#include <stdlib.h>
int main(void)
{
return EXIT_SUCCESS;
}
It can be compiled to an executable named Hello through gcc -std=gnu99 -Wall Hello.c -o Hello. Furthermore, now consider Embed.c:
func1(void)
{
}
An object file Embed.o can be created from this through gcc -c Embed.c. My question is how to generically insert Embed.o into Hello in such a way that the necessary relocations are performed, and the appropriate ELF internal tables (e.g. symbol table, PLT, etc.) are patched properly?
Assumptions
It can be assumed that the object file to be embedded has its dependencies statically linked already. Any dynamic dependencies, such as the C runtime can be assumed to be present also in the target executable.
Current Attempts/Ideas
Use libbfd to copy sections from the object file into the binary. The progress I have made with this is that I can create a new object with the sections from the original binary and the sections from the object file. The problem is that since the object file is relocatable, its sections can not be copied properly to the output without performing the relocations first.
Convert the binary back to an object file and relink with ld. So far I tried using objcopy to perform the conversion objcopy --input elf64-x86-64 --output elf64-x86-64 Hello Hello.o. Evidently this does not work as I intend since ld -o Hello2 Embed.o Hello.o will then result in ld: error: Hello.o: unsupported ELF file type 2. I guess this should be expected though since Hello is not an object file.
Find an existing tool which performs this sort of insertion?
Rationale (Optional Read)
I am making a static executable editor, where the vision is to allow the instrumentation of arbitrary user-defined routines into an existing binary. This will work in two steps:
The injection of an object file (containing the user-defined routines) into the binary. This is a mandatory step and can not be worked around by alternatives such as injection of a shared object instead.
Performing static analysis on the new binary and using this to statically detour routines from the original code to the newly added code.
I have, for the most part, already completed the work necessary for step 2, but I am having trouble with the injection of the object file. The problem is definitely solvable given that other tools use the same method of object injection (e.g. EEL).
If it were me, I'd look to create Embed.c into a shared object, libembed.so, like so:
gcc -Wall -shared -fPIC -o libembed.so Embed.c
That should created a relocatable shared object from Embed.c. With that, you can force your target binary to load this shared object by setting the environment variable LD_PRELOAD when running it (see more information here):
LD_PRELOAD=/path/to/libembed.so Hello
The "trick" here will be to figure out how to do your instrumentation, especially considering it's a static executable. There, I can't help you, but this is one way to have code present in a process' memory space. You'll probably want to do some sort of initialization in a constructor, which you can do with an attribute (if you're using gcc, at least):
void __attribute__ ((constructor)) my_init()
{
// put code here!
}
Assuming source code for first executable is available and is compiled with a linker script that allocates space for later object file(s), there is a relatively simpler solution. Since I am currently working on an ARM project examples below are compiled with the GNU ARM cross-compiler.
Primary source code file, hello.c
#include <stdio.h>
int main ()
{
return 0;
}
is built with a simple linker script allocating space for an object to be embedded later:
SECTIONS
{
.text :
{
KEEP (*(embed)) ;
*(.text .text*) ;
}
}
Like:
arm-none-eabi-gcc -nostartfiles -Ttest.ld -o hello hello.c
readelf -s hello
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 FILE LOCAL DEFAULT ABS hello.c
5: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a
6: 00000000 0 FILE LOCAL DEFAULT ABS
7: 00000000 28 FUNC GLOBAL DEFAULT 1 main
Now lets compile the object to be embedded whose source is in embed.c
void func1()
{
/* Something useful here */
}
Recompile with the same linker script this time inserting new symbols:
arm-none-eabi-gcc -c embed.c
arm-none-eabi-gcc -nostartfiles -Ttest.ld -o new_hello hello embed.o
See the results:
readelf -s new_hello
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 FILE LOCAL DEFAULT ABS hello.c
5: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a
6: 00000000 0 FILE LOCAL DEFAULT ABS
7: 00000000 0 FILE LOCAL DEFAULT ABS embed.c
8: 0000001c 0 NOTYPE LOCAL DEFAULT 1 $a
9: 00000000 0 FILE LOCAL DEFAULT ABS
10: 0000001c 20 FUNC GLOBAL DEFAULT 1 func1
11: 00000000 28 FUNC GLOBAL DEFAULT 1 main
The problem is that .o's are not fully linked yet, and most references are still symbolic. Binaries (shared libraries and executables) are one step closer to finally linked code.
Doing the linking step to a shared lib, doesn't mean you must load it via the dynamic lib loader. The suggestion is more that an own loader for a binary or shared lib might be simpler than for .o.
Another possibility would be to customize that linking process yourself and call the linker and link it to be loaded on some fixed address. You might also look at the preparation of e.g. bootloaders, which also involve a basic linking step to do exactly this (fixate a piece of code to a known loading address).
If you don't link to a fixed address, and want to relocate runtime you will have to write a basic linker that takes the object file, relocates it to the destination address by doing the appropriate fixups.
I assume you already have it, seeing it is your master thesis, but this book: http://www.iecc.com/linker/ is the standard introduction about this.
You must make room for the relocatable code to fit in the executable by extending the executables text segment, just like a virus infection. Then after writing the relocatable code into that space, update the symbol table by adding symbols for anything in that relocatable object, and then apply the necessary relocation computations. I've written code that does this pretty well with 32bit ELF's.
You cannot do this in any practical way. The intended solution is to make that object into a shared lib and then call dlopen on it.
Have you looked at the DyninstAPI? It appears support was recently added for linking a .o into a static executable.
From the release site:
Binary rewriter support for statically linked binaries on x86 and x86_64 platforms

Resources