What is the entry count of ELF _DYNAMIC symbol table? - c

I have two uncertainties regarding _DYNAMIC defined in elf(5) - Dynamic tags (Dyn).
The symbol table received via DT_SYMTAB corresponds to .dynsym and the string table received via DT_STRTAB corresponds to .dynstr?
The symbol table entry size in bytes can be received via DT_SYMENT, but there is nothing written about the symbol table entry count. Is it correct to assume that symbol table size in bytes is address of string table minus address of symbol table, and therefore get the count by dividing with entry size in bytes?

there is nothing written about the symbol table entry count.
That's because the entry count can be deduced from the hash table of symbols.
There are two common formats: DT_HASH and DT_GNU_HASH (the latter is a GNU extension). For DT_HASH, the number of symbols is nchain, which is the second word in the table. See e.g. this document.
Is it correct to assume that symbol table size in bytes is address of string table minus address of symbol table
Not at all: there is no guarantee that .dynsym is followed by .dynstr, and even when they are laid out like that, there is no guarantee that there aren't holes due to alignment.

Related

Does the every reference have symbol entry

Now I'm reading computer systems : a programmer's perspective, and in the chapter 7 Linking.
There are reference, symbol and entry related knowledge, what mentioned in book is Entry has definition of symbol, and my thought about these is "Every symbol has entry, and entry has symbol reference like pointer, this reference actually has some address".
Therefore, every time I read code related global variable or function / procedure, all of them actually can be regarded as corresponding entry, which has symbol reference and other info.
Finally, my thought is right? can I keep going with this thought? Really want to understand all about computer system and techniques related programming.enter image description here
Oh~final question, is the symbol table in .symtab section same with relocation entry table?
Please avoid associating symbol with entry. The term entry is reserved in most linkers to specify the entry point of the whole linked program, i.e. address of the first instruction performed at the start of program execution. I prefer the term records for items arranged into table, for instance the symbol table.
When you create a procedure or function in your program, it will be loaded at certain address in memory when the program runs. You don't know where exactly (at which numeric address) will the procedure be located at run-time, that's why you give that address a symbolic name (label). That is symbol - a human-readable denomination of certain position in program (address symbol) or of a constant value (scalar symbol).
You can call the procedure or refer to it at write-time using its symbolic name: CALL MyProcedure or
MOV register,MyProcedure. Again, the final value of MyProcedure address is not known yet, so compiler temporarily puts 0 into the instruction body instead of this address, and creates a relocation record in relocation table. Each such record specifies 1) pointer to the temporary 0 inside the instruction body, and
2) specification of the target symbol in the form of index into symbol table.
Global symbols, such as MyProcedure should be unique in the program, but they may be referred many times, and each reference will create a record in relocation table.
The relation between symbol and relocation is not 1:1.
When the linker has enough information to decide about the final address of each symbol, it will go through relocation table and replace temporary 0 in the code with symbol's final address.

Does a linker need to adjust segment size when relocation takes more space?

Let's assume the following:
There's a jump or a reference to data in an address reference encoded in 2 bytes. Now when statically linking, the relocation happens so that the new address does not fit in 2 bytes -- maybe it needs 4 bytes.
I assume the linker will rewrite the code, possibly using a different instruction, and use 4 bytes for the new address.
Does the linker then need to update the size of the current segment/section, and update all farther addresses by the same offset (+2 bytes in this example)?
Machine instructions which refer to external symbols cannot use abbreviated form, where the displacement or immediate operand is encoded in one byte (extendable on runtime) instead of full word.
Linkers are not that smart to recompile the once assembled segments (at least the one that I wrote isn't :-)

How to read the relocation records of an object file

I'm trying to understand the linking stage of C toolchain. I wrote a sample program and dissected the resulting object file. While this helped me to get a better understanding of the processes involved, there are some things which remain unclear to me.
Here are:
My (blazingly simple) sample program
Relevant parts of the object disassembly
The objects symbol table
The objects relocation table
Part 1: Handling of initialized variables.
Is it correct, that theses relocation table entries...
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000002b dir32 .data
00000035 dir32 .data
0000003f dir32 .data
... are basically telling the linker, that the addresses stored at offset 2b, 35 and 3f from .text are not absolute adresses, but relative adresses (= offsets) in relation to .data? It is my understanding that this enables the linker to
either convert these relative adresses to absolute adresses for creation of a non-relocatable object file,
or just adjust them accordingly in case the object file gets linked with some other object file.
Part 2: Handling of uninitialized variables.
I don't understand why uninitalized variables are handled so differently to initialized variables. Why are the register adresses stored in the opcode,
equal for all the uninitialized variables (0x0, 0x0 and 0x0), while being
different for all the initialized variables (0x0, 0x4 and 0x8)?
Also the value field of their relocation table entries is entirely unclear to me. I would have expected the .bss section to be referenced there.
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000d dir32 _var1_zeroed-0x00000004
00000017 dir32 _var2_zeroed-0x00000004
00000021 dir32 _var3_zeroed-0x00000004
... are basically telling the linker, that the addresses stored at offset ...
No, the linker is no longer involved with this. The relocation tables tell the loader, the part of the operating system that's responsible for loading the executable image into memory about the addresses.
The linker builds the executable image based on the assumption that everything is ideal and the image can be loaded at the intended address. If that's the case then everything is hunky-dory, nothing needs to be done. If there's a conflict however, the virtual address space is already in use by something else, then the image needs to be relocated at a different address.
That requires addresses to be patched, the offset between the ideal and the actual load address needs to be added. So if the .data section ends up at another address then addresses .text+0x2b, .text+0x35, etcetera, must be changed. No different for the uninitialized variables, the linker already picked an address for them but when _var1_zeroed-0x00000004 ends up at another address then .text+0x0d, .text+0x17, etcetera, need to be changed.

How do I resolve dyld imported symbols?

I'm trying to implement some parts of what dyld does and I'm a little bit stuck at stub trampolines.
Consider the following ARM instruction:
BL 0x2fec
It branches with link (subprocedure call) to 0x2fec. I'm aware of the fact, that there is a section __symbolstub1 in the __TEXT segment starting at 0x2fd8, so it's a jump to 20 bytes inside of __symbolstub1.
Now, there is a symbol
(undefined) external _objc_autoreleasePoolPush (from libobjc)
that I've resolved through LC_SYMTAB load command. There is no known address provided. I know, as a fact, that 0x2fec address is a trampoline to _objc_autoreleasePoolPush, but I cannot prove it via any means.
I've checked the LC_DYLD_INFO_ONLY command, and I had a slight hint in there, in the lazy_bind symbols I've found:
{:offset=>20, :segment=>2, :library=>6, :flags=>[], :name=>"_objc_autoreleasePoolPush"}
where the name and offset match what I have exactly, and the library #6 is "/usr/lib/libobjc.A.dylib", which is also perfect. Now the issue is that segment #2 is __TEXT, but __TEXT starts at 0x1000, and __symbolstub1 is way down there at 0x2fd8. So I'm missing some reference down to section.
Any ideas on how am I supposed to map 0x2fec virtual address to _objc_autoreleasePoolPush?
Heh, just a little more digging and I've found it at LC_DYSYMTAB's indirect symbols.
Now the long answer.
Find a section for given address;
The section should be of type S_NON_LAZY_SYMBOL_POINTERS, S_LAZY_SYMBOL_POINTERS, S_LAZY_DYLIB_SYMBOL_POINTERS, S_THREAD_LOCAL_VARIABLE_POINTERS or S_SYMBOL_STUBS;
If the section type is S_SYMBOL_STUBS, then the byte size is stored in reserved2, otherwise it is considered equal to 4;
The offset into indirect symbols table is stored in reserved1;
The index into indirect symbols table is calculated as
index = sect.reserved1 + (vmaddr - sect.addr) / bytesize;
The symbol in the symbols table is found at symbols[indirect_symbols[index]].

What exactly is the "size" of a function in a symbol table?

when you are given the symbol size in bytes of a function in a symbol table of an ELF file what does this exactly mean and how does this connect to the range of execution of a function?
It's purely the difference between the byte address one past the end of the machine code comprising the function and the byte address of the first byte of the function, and it's largely useless. While size matters for data (due to the possibility of copy relocations), incorrect/meaningless values in the size field for function symbols should not cause any problem, and it's probably a bad idea to rely on the values except for debugging or reverse engineering purposes.

Resources