still struggling to understand the relocation entry in Relocatable Object Files, let's say I have this simple C program:
//main1.c
void functionTest();
functionTest(){
...
}
int main()
{
functionTest();
return 0;
}
my questions are:
Q1. since everything is known to main1, so there is no relocation entry in .rel.text or .rel.data section of main1.o, is my understanding correct?
Q2. below is a picture illustrates how DLL works,
for libc.so, everything is known(it has all definitions just like main1), so why there is still relocation entries in libc.so? I can understand the symbol table information needs to be copied because they exist, how can you copy something that doesn't exist?
Q3. lets say below is the relocation entry structure;
typedef struct {
int offset; /* Offset of the reference to relocate */
int symbol:24, /* Symbol the reference should point to */
type:8; /* Relocation type */
} Elf32_Rel;
so my understanding is there is already a relocation entry in main2.o for printf(), so the offset will be something like 8 or 9 bytes offset from caller function, symbol will be 'printf', type is R_386_PC32, so if there is another one needs to be copied from libc.so to main2.o, what's the structure of that relocation entry?
Q1: Yes, if you compile the main1.c in your question, it will build without the need to link in anything, because it's not using functions that are defined elsewhere.
Q2: That diagram won't apply to building main1.c, because main1.c does not use external functions. But, in a program that does have a call to, say, printf(), here's what's going on: the diagram shows that relocation entries about libc.so are being placed into main2.o. You ask "why there is still relocation entries in libc.so?" but the relocation entries are not being put into libc.so; they are being put into main2.o, and they refer to things in libc.so.
Q2 follow-up #1: When you say "for libc.so, everything is known", that is true only within libc.so. Anything that uses a function defined in libc.so will not know how that function is defined, until linking takes place. That's the function of ld: to copy reference info from a library like libc.so into a program being built, like main2 in the diagram. The reference info allows the kernel executing main2 to also load libc.so into memory in such a way that execution can flow from the main2 code over to the libc.so code and back to main2 wherever main2 calls a function whose definition / code resides in libc.so.
Q3: I think the best way to put it is this: The information that is used to populate the relocation structure within main2.o comes from libc.so. Where I say that relocation entries are copied from libc.so, that's what I mean: information about the target (e.g. printf()) is taken from libc.so and used to provide values for the relocation entry in main2.o whose purpose is to tell the loader where to load the code for printf() from.
Q3 follow-up #1: There is another sense in which libc.so has relocation entries: the thing that built libc.so added relocation entries to libc.so, so that anything that wants to use its (exportable) functions and variables can do so. These don't need to be copied anywhere. Part of building an object file is to create information for internal things that other programs might use; and, part of building a program is to populate the information about external things that it makes use of. But the diagram looks to me like it's only meant to show that information about libc.so and libvector.so are added to main2.o so that the loader can load all the needed code into memory when main2 is executed.
Related
A shared object, such as glibc, when compiled appropriately, defines many symbols, such as main_arena, that are not normally used by other programs (although they can be seen in objdump and gcc), but are defined, with their addresses, as local symbols:
$ objdump -t ../.glibc/glibc_2.30_no-tcache/libc.so.6 | grep main_arena
00000000003b4b60 l O .data 0000000000000898 main_arena
Yet, when I reference one of these in C (via extern), and attempt to link, the linker can't find it:
$ gcc -g -Og -no-pie -Wl,-rpath ../.glibc/glibc_2.30_no-tcache/ -Wl,--dynamic-linker=../.glibc/glibc_2.30_no-tcache/ld.so.2 s1.c -o s1
/usr/bin/ld: /tmp/ccjKyCNh.o: in function `printf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:112: undefined reference to `main_arena'
/usr/bin/ld: /usr/include/x86_64-linux-gnu/bits/stdio2.h:112: undefined reference to `main_arena'
collect2: error: ld returned 1 exit status
Note: I've updated this question with extensive research:
This is by design:
c language, global symbol, local symbol clarification "local (static): local symbols that are defined and referenced exclusively by module m.... These symbols are visible anywhere within module m, but cannot be referenced by other modules."
See also "Symbol Visibility
Symbols can be categorized as local or global. Local symbols can not be referenced from an object other than the object that contains the symbol definition." https://docs.oracle.com/cd/E26505_01/html/E26506/chapter2-90421.html
and https://reverseengineering.stackexchange.com/questions/14895/why-are-symbols-with-local-binding-present-in-the-symbol-table-of-my-elf-files and http://web.cse.ohio-state.edu/~reeves.92/CSE2421au12/SlidesDay52.pdf
Nonetheless, for debugging, exploration, and reverse engineering, its sometimes desirable to reference an external local symbol defined in a shared object. All the information is there, as evidenced by gdb's ability to display it; its simply a flag that tells ld to not resolve symbols to it.
Given such, is it possible to tell ld to ignore the local flag, and resolve to the symbol anyway?
For example:
$ objdump -t ../.glibc/glibc_2.30_no-tcache/libc.so.6 | grep -E ' malloc$| main_arena$'
00000000003b4b60 l O .data 0000000000000898 main_arena
0000000000083500 g F .text 0000000000000213 malloc
$ man objdump 2>/dev/null | grep -A10 'flag characters'
The flag characters are divided into 7 groups as follows:
"l"
"g"
"u"
"!" The symbol is a local (l), global (g), unique global (u), neither global nor local (a space) or both global and
local (!). ...
I'd like to be able to write code that, for debugging and reverse engineering, references the symbol main_arena regardless. How can I do this?
Update
I've read Employed Russian's excellent posts on related topics, and seen his reference to the XY Problem. With that in mind, let me ask my question X:
For exploratory purposes, I'd like to be able to look at the behavior of things like main_arena, and other malloc internals, as I use malloc and free. I can do this with gdb. But I'd like to do this programaticaly, in C. One way to do this might have been to actually link to these symbols (question Y), but there's no reason to assume that's the best way, the only way, or even a viable way. Given that:
How can I inspect the value of local symbols in a shared library from within a different program, without having to drop to gdb?
Given such, is it possible to tell ld to ignore the local flag, and resolve to the symbol anyway?
No.
All the information is there, as evidenced by gdb's ability to display it; its simply a flag that tells ld to not resolve symbols to it.
You are mistaken. While the symbol is present in the static symbol table (in the .symtab section), it is not present in the dynamic symbol table (in the .dynsym section). It is not just a matter of a flag, fundamental parts needed to perform dynamic linking at runtime are missing.
You can confirm this by looking in readelf --dyn-syms .../libc.so.6 | grep main_arena -- the symbol will not be there.
You could binary patch the "flag", changing STB_LOCAL binding of the symbol in .symtab to STB_GLOBAL. After you do that, the symbol will show as g in the objdump output, but the linker will still not be able to use it.
P.S. You should never use objdump to examine ELF binaries -- it's highly deficient for that purpose. Use readelf instead.
Update:
How does GDB find ...
By reading .symtab section.
Is there a way I can tell ld to do something similar?
No. The linker could easily read the .symtab section as well, and can link the binary that imports the main_arena symbol in the same way it imports e.g. stdout.
But such a binary will not run.
At runtime, as soon as the binary is loaded, the loader (ld.so) will need to resolve the reference to main_arena. And since the symbol is not present in the dynamic symbol table (which is the only symbol table ld.so can use), the symbol resolution will fail and ld.so will exit with a fatal error.
This is precisely the same thing as linking a.out against foo.so with int foo defined, and then running that a.out against a different version of foo.so, one without foo in it.
Update 2:
Is that simply a feature that ld lacks (because it's not needed outside of reverse engineering and other nonstandard use cases), or is it inherently not possible?
It's a feature that both ld (the static linker) and ld.so (the dynamic loader) lack.
It's possible to do (GDB can resolve these symbols after all), but a lot of work, for very little gain.
Could one possibly augment ld to use the regular .symtab (I understand it would be slower due to lack of hashes)?
Like I said, you would need to modify both ld and ld.so. The latter is part of GLIBC, and modifying GLIBC has complications. Making any mistakes in the process can easily render your system un-bootable.
And if you are going to modify GLIBC anyway, it would likely be much simpler to expose all the symbols you want (make them non-local). That way you only need to change GLIBC, and can use standard ld and the rest of standard symbol resolution mechanisms.
I'm trying to figure out how relocation works, but I can't seem to get my head around it.
This document describes the different types one can encounter when relocating an ELF file.
Let's take R_ARM_ALU_SB_G0_NC (#70) for example.
Type: static
Class: ARM, describes the type of place being relocated (which I do not understand)
Operation: ((S + A) | T) – B(S))
I'm guessing that the mathematical expression is the operation I'm looking for. However, I do not completely understand how this fits in my function.
The method where the relocation takes place looks as follows:
int elfloader_arch_relocate(int input_fd, struct elfloader_output *output,
unsigned int sectionoffset, char *sectionaddr, struct elf32_rela *rela, char *addr)
input_fd is a file descriptor for the ELF file, *output is used when writing the output segment, sectionoffset is the file offset at which the relocation can be found, *sectionaddr is the section start address (absolute runtime) and *addr is the relocated address.
The 32-bit relocation structure looks like this
struct elf32_rela {
elf32_addr r_offset;
elf32_word r_info;
elf32_sword r_addend;
};
On page 26 of the above mentioned document the nomenclature is explained:
S (when used on its own) is the address of the symbol.
A is the addend for the relocation.
T is 1 if the target symbol S has type STT_FUNC and the symbol addresses a Thumb instruction; it is 0 otherwise.
B(S) is the addressing origin of the output segment defining the symbol
So my question is, which of the parameters in the relocate function correspond to the ones used in the formula?
If I'm reading this right, and I'm not sure I am, it goes like this:
S is addr.
B(S) is sectionaddr.
A is rela->r_addend.
T may be derivable from information in rela->r_info; if not, I don't know where you need to look.
This is a really complicated-looking relocation. Consider starting with the simple ones (like R_ARM_ABS16). In a dynamic loader, you should not have to implement all of the relocation typess in the spec you linked to, only a small subset. If it seems like you need a lot of relocation types, this is probably because you are trying to feed unlinked object files to the dynamic loader; you should turn them into shared objects, using your existing ARM linker. (With the GNU toolchain, a first approximation to how you do that is gcc -shared foo.o -o foo.so.)
Cribbing off an existing dynamic loader for the architecture is often a good plan; there tends to be a lot of undocumented wisdom buried in the code for such things. For instance, here is GNU libc's ld.so's ARM relocator. (LGPL)
When linking an application against a dynamic shared library such as in
gcc -o myprog myprog.o -lmylib
I know the linker (ld on my Linux) use the -l option to store in the produced myprog ELF executable file the name of the library (mylib in this case) that will be used at load and link time (both when the program will be started if we ignore lazy dynamic linking). I am wondering what are the other jobs perform by ld (I am only speaking of the static linking step done at compilation time) regarding the dynamic shared library ?
ld must checks for undefined symbol existence in provided dynamic shared libraries
any other stuff ?
Moreover, I will be interested on pointers you are using (books, online documentation) regarding ELF format and dynamic linking and loading processes.
While you hit the most obvious things ld needs to do when linking to ELF shared libraries, there are a few more you missed. I'll re-state the ones you mentioned and add some more:
Ensuring that all undefined symbols are resolved (unless the output is a shared library itself, in which case undefined symbols are valid).
Storing a reference to the library in a DT_NEEDED record of the _DYNAMIC object of the output file.
If the output is not position-independent and references objects (in the sense of data, as opposed to functions) in the shared library, generating a copy relocation to copy the original image of the object into the main program's data segment at load time, and the proper symbol table entry so that references to the object in the shared library itself get resolved to the new copy in the main program, rather than the original copy in the library.
Generating PLT thunks for the destination of each function call in the output that's not resolved at ld-time to a definition in the output.
These are the tasks I can think of that are specific to use of shared libraries, and of course don't include all the work that the linker already does which would be the same as for static linking. One way to think of what ld does with dynamic linking is that it takes object files with a huge repertoire of relocation types (representing anything the compiler or assembler can produce) and resolves all but a small number of them (for static linking, that number would be zero), where all of the remaining relocations fit into a much more limited set of types resolvable by the dynamic linker at load time.
One important step is the creation of a dynamic symbol table, which the runtime linker ld.so can use to link the executable against the library at runtime. It will also write the dynamic relocation table to note which machine code locations need to be changed to point to dynamically linked symbols. To see details:
objdump -T myprog
objdump -R myprog
Also note that the string written to the executable will actually be the SONAME of the library, which might be something like mylib.so.0. This will ensure that even when you install a newer and incompatible mylib.so.1.42 at some later point, the executable will use the compatible ABI version 0 instead. For details:
ldd myprog
Of course, the linker will also link your object files against one another, but since it does that even in the absence of a dynamic shared library, I take it that you are not interested in this part of its operation.
I am developing an operating system, and I need to load some modules BEFORE paging is set up. So since paging is not set up at this point I need to relocate all of the symbols in the program to there physical address. My problem is that not all symbols can be found in the symbol table and not all relocation info can be found in rel.text. How can I get GCC to export all symbol data???
Surely, ANYTHING needing relocation will be in the relocation table. How else could it be loaded? Whether paging is enabled or not, relocation works exactly the same - entries that are absolute locations in the binary are listed with an offset, and then processed by the loading software. Everything else should be fine without relocation.
Note that a symbol table is not meaningful for resolving relocations in and of itself, as that only gives the location of a symbol.
Are you perhaps thinking of the symbols in your OS itself? If so, it's really a case of exporting the symbols from your OS in an appropriate way. Linux has EXPORT_SYMBOL(name), which builds a symbol table within the kernel itself. [Note that this is NOT the symbols generated by gcc or ld, but symbols built by macros, and processed in the kernel.
Edit to clarify, as I ran out of space in "comment":
There are two types of "relocations": Internal ones - where you have absolute references to things in your own module, e.g. pointers to strings, poitners to functions, jump tables for switch statements, and so on - these should simply be a question of adding up the current value with the offset for where the binary is actually located (virtual address of course). The other is "external references", such as when your module calls, say spinlock() - this is not implemented inside the module, so it will have an "external reference". In this case, there will be a relocation entry with "spinlock" as the name and an offset of where the call to spinlock goes in the module. Now you obviously need a symbol table to look up where in your kernel "spinlock" is located [and if you want to be really complicated, allow for moduels to reference other modules, but I'd leave that until you have one module loading OK first!].
Really your question is about the linker. And the answer depends on the linker, that you are using.
If it is the standard linker ld under gcc, try the "-Wl,-r" option.
I've got a small static library (.a). In the static library is a pointer that points to a large, statically allocated, 1D array.
When I link my code to this library, the pointer's address is hardcoded in various locations, easily found through the disassembly. The issue is, I'd like my code to be able to have access to this array (the library is faulting, and I want to know why).
Naturally, it would be trivial to get that pointer by disassembling, hardcoding that address into my code, and then recompiling. That wouldn't be a problem except the library can be configured in different ways with other modules, and the array's pointer changes depending on what modules are linked in.
What are my options for getting that pointer? Because the starting state of the array is predictable, I could walk through memory, catching segfaults with a signal handler, until I found something that looks reasonable. Is there a better way?
Since your library is a .a archive, I'll assume you are on some kind of UNIX.
The global array should have a symbolic name associated with it. Your job would be easier or harder depending on what kind of symbol describes it.
If there is a global symbol describing this array, then you can just reference it directly, e.g.
extern char some_array[];
for (int i = 0; i < 100; i++) printf("%2d: 0x%2x\n", i, some_array[i]);
If the symbol is local, then you can first globalize it with objcopy --globalize-symbol=some_array, then proceed as above.
So how can you determine what is the symbol describing that array? Run objdump -dr foo.o, where foo.o contains instructions which you know reference that array. The relocation that will appear next to the referring instruction will tell you the name.
Finally, run nm foo.o | grep some_array. If you see 00000XX D some_array, you are done -- the array is globally visible (same for B). If you see 000XX d some_array, you need to globalize it first (likewise for b).
Update:
The -dr to objectdump didn't work
Right, because the symbol turned out to be local, the relocation probably referred to .bss + 0xNNN.
00000000006b5ec0 b grid
00000000006c8620 b grid
00000000006da4a0 b grid
00000000006ec320 b grid
00000000006fe1a0 b grid
You must have run nm on the final linked executable, not on individual foo.o objects inside your archive. There are five separate static arrays called grid in your binary, only the first one is the one you apparently care about.
declaring "extern int grid[];" and using it gives an undefined reference
That's expected for local symbols: the code in the library was something like:
// foo.c
static char grid[1000];
and you can't reference this grid from outside foo.o without globalizing the symbol first.
I'm not allowed to run a changed binary of the library on our server for security reasons
I hope you understand that that argument is total BS: if you can link your own code into that binary, then you can do anything on the server (subject to user-id restrictions); you are already trusted. Modifying third-party library should be the least worry of the server's admin if he doesn't trust you.