ARM Bootloader: value of "start_armboot" - arm

In the U-boot for S3C24X0(ARM920T), we use following instructions to jump to C part:
ldr pc, _start_armboot
_start_armboot: .word start_armboot
But how could I know the value of start_armboot? I couldn't find when or where we have defined the address value of start_armboot. It doesn't exist in the .lds file,either. Or because of
_start_armboot: .word start_armboot
we put start_armboot in the memory after the current position directly? Then how could we associate this instruction/address with the C function of "void start_armboot(void)"?

_start_armboot: .word start_armboot just means to put the address of the symbol start_armboot at that location.
The linker is responsible for filling it with the correct address at link time.
Internally, start_armboot is just a stub filled with some dummy value (usually zero) when it is compiled into an object file. Later, when all the object files have been gathered together, the linker starts putting pieces together. Once all the pieces are laid out, it goes back through the object files and fills in the stubs since the symbol locations are known to the linker now.

Related

Understand vector table definition assembly in stm32cubeIDE startup

The code to initialize vector table is placed in startup code of STM32cubeIDE:
.global g_pfnVectors
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
g_pfnVectors:
.word _estack
.word Reset_Handler
.word NMI_Handler
.word HardFault_Handler
.
.
.
.word DMA1_Stream0_IRQHandler /* DMA1 Stream 0 */
.
.
.
I want to understand it, Then I have some questsion if anyone could help:
g_pfnVectors have declared two time one with .global other with .word. It first declared as global then declared it's size in hardware?
line 2,3,4 have all something comma separated, what are they?
Is there any reference to understand them?
Note: I know line 6 is array defenition of g_pfnVectord.
It seem the startup have to holds the IRQhandlers function pointer, since we have defined them on *_it.c ,these will link to them, am I right?
In arm interupt vector relocation usage and description they've reassigne SysTick_Handler after table reallocation. If we use linker to move this function to RAM then we don't need this re allocate this function, am I right? (Since this function executes every 1mS by default on cubeIDE apps and have be fast)
Can't exactly cover the assembly syntax well, but I think some explanation of vector table and interrupt entry would be helpful. In fact, it's very easy to have zero assembly code in startup, you can totally make a vector table in C as an array that you place in its own section. Also, I'm not super verse in assembly myself. Anyway, regarding the vector table, what it is, how it works and what you can do with it.
In the beginning of Flash, before any executable code, you have a vector table. It's just a list of pointers to void (void) functions. (The first word of the vector table is loaded into main stack pointer on power on, the others are pointers). Since Flash is not intended to be dynamically changed at runtime, we can safely assume that in practice those pointers are fixed.
When interrupt occurs, the CPU reads the corresponding interrupt handler address from the vector table and then jumps to whatever address it read. Thus, if you move vector table, you also need to make sure that all interrupts have handler addresses in the correct places in new vector table. Basically, you want to copy the old vector table into the new place.
Small addition: in Cortex M, vector table entries' least significant bit is always set to 1 (like all function pointers). Architecture requirement (indicates Thumb instruction set and not 32-bit ARM instruction set). Otherwise usage fault exception. (so for interrupt handler at 0x20040000, the vector table entry is 0x20040001).
Let's consider a basic example.
Your vector table is in Flash and it's fixed (unchangeable). Imagine you have interrupt number 20. Imagine you have void IRQ20_Handler (void) handler for that interrupt, located also in Flash, where your executable code typically resides. Let's imagine the address of the handler is 0x08004000 (any address in Flash that's not overlapping with vector table, just an example).
In your Flash, 32-bit word number 36 of the memory is going to contain 0x08004001. 36 because MSP+Exceptions (hardfault, memmanage fault, busfault, systick, etc) take the first 16 words, and IRQs start only after that. Vector table is in the STM MCU's reference manual.
So right now, your vector table is at location 0x08000000 (beginning of Flash), and word 36 of it has a pointer to IRQ20_Handler, which is also somewhere in Flash. So if IRQ20 interrupt happens, your MCU reads word number 36 from Flash - 0x08004001 - and jumps to 0x08004000 to the handler.
Imagine you want to have a function in SRAM as an interrupt handler located at 0x20040000. You will have to overwrite Flash so that vector table word number 36 has new handler's address - it's not recommended to rewrite flash just for 1 thing - so instead you move vector table to RAM, and then you can dynamically change vectors as much as you want at runtime.
Thus, you first move IRQ handler to RAM (say, 0x20040000), then you move vector table to RAM and make sure that all handler addresses are copied over to there so that all interrupts work just like before, then in the new vector table you overwrite the IRQ20_Handler (word 36 of vector table) with the address of the SRAM function (+1). In this case, with 0x20040001.
Unfortunately, can't provide more details about assembly, since I use it only a bit and mainly as inline assembly blocks for context switching or special instructions not natively supported through C. It looks like it declares a section (and makes it globally visible) and gives it a pair of attributes.

How can compilation occur without symbol resolution?

Here is my question. Suppose you want to compile the c code:
void some_function() {
write_string("Hello, World!\n");
}
For this example, I want to focus specifically on the string: "Hello, World!\n". My understanding is that the compiler will put the string into the .rodata section in an elf file. A symbol, referring to its location in the .rodata section, is added to the symbol table and that symbol is kept in the .text section as a placeholder for the location of the string.
Here is the problem. How can you leave a value like that unresolved in machine code? In x86, it should be easy enough for the linker to do a find and replace on the symbol when the location is known. However, there are many CPU architectures where an address can not be encoded in its entirety into a single machine instruction. Therefore the value would have to be loaded in 2 stages, using separate machine instructions and the linker would have to figure that out. It would have to be smart enough to manipulate the machine code with half the address in one place the half the address in another. Furthermore, somehow the elf file has to represent this complex encoding scheme for the linker later on. How does this all work?
I most programs, this will be in a user space application. So the kernel may load the .rodata section wherever it wants in memory. So it would seem that when the program is loaded, somehow, at runtime, the kernel loader would have to resolve all these symbols in the program prior to beginning execution. It would have to inject into the machine code where it put each section so they may be referenced appropriately. How does this work?
I have a feeling that my understanding and above descriptions are wrong or that I am missing something very important because this does not seem right to me. Ether that, or there is in fact the logic to preform these complex functions within modern kernels and linkers. I am looking for some further explanation and understanding.
Compilation takes place, emitting something like this:
lea rdi, [rip+some_function.hello_world]
mov rax, [rip+some_function.write_string]
call rax
after the asm pass, we end up with something that disassembles to
lea rdi, [rip+00000000]
mov rax, [rip+00000000]
call rax
where the two 00000000 slots are filled as load-time fixups. The loader performs symbol resolution and fills in the 00000000 values with the correct values.
This is a simplification. In reality there's an extra layer of indirection called the global offset table, which is used (among other things) to put all the fixups right next to each other.
The innards of how this works is CPU and OS specific, but in general you don't really have to care exactly how it works, and it could change in the next release of the compiler (and has changed at least twice already). The loader understands fixups at a very generic level using a fixup table, and can deal with new ideas so long as they resolve to put (absolute or relative) address of a symbol at offset + size.
The Alpha processor had it kind of bad back in the day. Fixups had to be in between functions, and relative addressing could be only done in signed 16 bit sizes, so the fixups for functions were located immediately before or after each function, and presumably you got an error in the ASM pass if the pointer didn't fit because the function was too big. I did come up with a clever sequence that would have fixed the problem on Alpha, but that was long after the platform was retired, and nobody cares anymore so it never got implemented.
I remember the bad old days from before the loader could do good patchups. There once was a global (and I really do mean global) table of shared library load addresses, and the compiler emitted absolute addresses and you had to rebuild your application if you changed a library, even though you used shared libraries. That just wasn't the brightest ideas, and no wonder people keps statically linked emergency binaries lying around. Breaking libc wasn't fun.

Linux: using backtrace(), /proc/self/maps and addr2line together results in invalid result

I'm trying to implement a way to record callstacks of my program into a file then display it later.
Here are the steps:
Write the content of /proc/self/maps to a log file.
In this example, the content of /proc/self/maps is:
00400000-05cdc000 r-xp 00000000 00:51 12974779926 helloworld
Which means the base address of helloworld program is 0x400000.
In the program, whenever an interesting code needs to have its callstack recorded, I use the function backtrace() to obtain the callstack's addresses then write to the log file. Let say the callstack in this example is:
0x400001
0x400003
At some point later, in a separate log viewer program, the log file is opened and parsed. An address in the callstack will be deducted by the base address of the program. In this case:
0x400001 - 0x400000 = 1
I then use this deducted offset to obtain the line number using addr2line program:
addr2line -fCe hellowork 0x1
However this produces ??? result, i.e. invalid offset.
But if I don't deduct the callstack's address, but pass the actual value to add2line command:
addr2line -fCe hellowork 0x400001, then it returns correct file and line number.
The thing is if the address in within a shared object, then an absolute address won't work while a deducted offset will.
Why is there such a difference in the way the addresses are mapped for the main executable and the shared objects? Or maybe this is backtrace implementation specific, such that it always returns an absolute address for a function within the main executable?
Why is there such a difference in the way the addresses are mapped for the main executable and the shared objects?
The shared libraries are usually linked at address 0 and relocated. The non-position executable is usually linked at address 0x400000 on x86_64 Linux and must not be relocated (or it wouldn't work).
To find out where a given ELF binary is linked, look at the p_vaddr address of the fist PT_LOAD segment (readelf -Wl foo will show you that). In addition, only ET_DYN ELF binaries can be relocated, while ET_EXEC binaries must not be.
Note that position-independent executables exist, and for them you need to do the subtraction.
Note that shared libraries are usually linked at address 0 (and so subtraction works), but they don't have to. Running prelink on a shared library will result in a shared library linked at non-0 address, and then the subtraction you use will not work either.
Really, what you need to do is subtract at-runtime load address from linked-at address to get relocation (which would be 0 for non-PIE executables, and non-0 for shared libraries), and then subtract that relocation from the program counter recorded by backtrace to get the symbol value.
Finally, if you iterate over all loaded ELF images with dl_iterate_phdr, the dlpi_addr it provides is exactly the relocation that you need to subtract.

How to read the relocation records of an object file

I'm trying to understand the linking stage of C toolchain. I wrote a sample program and dissected the resulting object file. While this helped me to get a better understanding of the processes involved, there are some things which remain unclear to me.
Here are:
My (blazingly simple) sample program
Relevant parts of the object disassembly
The objects symbol table
The objects relocation table
Part 1: Handling of initialized variables.
Is it correct, that theses relocation table entries...
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000002b dir32 .data
00000035 dir32 .data
0000003f dir32 .data
... are basically telling the linker, that the addresses stored at offset 2b, 35 and 3f from .text are not absolute adresses, but relative adresses (= offsets) in relation to .data? It is my understanding that this enables the linker to
either convert these relative adresses to absolute adresses for creation of a non-relocatable object file,
or just adjust them accordingly in case the object file gets linked with some other object file.
Part 2: Handling of uninitialized variables.
I don't understand why uninitalized variables are handled so differently to initialized variables. Why are the register adresses stored in the opcode,
equal for all the uninitialized variables (0x0, 0x0 and 0x0), while being
different for all the initialized variables (0x0, 0x4 and 0x8)?
Also the value field of their relocation table entries is entirely unclear to me. I would have expected the .bss section to be referenced there.
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000d dir32 _var1_zeroed-0x00000004
00000017 dir32 _var2_zeroed-0x00000004
00000021 dir32 _var3_zeroed-0x00000004
... are basically telling the linker, that the addresses stored at offset ...
No, the linker is no longer involved with this. The relocation tables tell the loader, the part of the operating system that's responsible for loading the executable image into memory about the addresses.
The linker builds the executable image based on the assumption that everything is ideal and the image can be loaded at the intended address. If that's the case then everything is hunky-dory, nothing needs to be done. If there's a conflict however, the virtual address space is already in use by something else, then the image needs to be relocated at a different address.
That requires addresses to be patched, the offset between the ideal and the actual load address needs to be added. So if the .data section ends up at another address then addresses .text+0x2b, .text+0x35, etcetera, must be changed. No different for the uninitialized variables, the linker already picked an address for them but when _var1_zeroed-0x00000004 ends up at another address then .text+0x0d, .text+0x17, etcetera, need to be changed.

How do I resolve dyld imported symbols?

I'm trying to implement some parts of what dyld does and I'm a little bit stuck at stub trampolines.
Consider the following ARM instruction:
BL 0x2fec
It branches with link (subprocedure call) to 0x2fec. I'm aware of the fact, that there is a section __symbolstub1 in the __TEXT segment starting at 0x2fd8, so it's a jump to 20 bytes inside of __symbolstub1.
Now, there is a symbol
(undefined) external _objc_autoreleasePoolPush (from libobjc)
that I've resolved through LC_SYMTAB load command. There is no known address provided. I know, as a fact, that 0x2fec address is a trampoline to _objc_autoreleasePoolPush, but I cannot prove it via any means.
I've checked the LC_DYLD_INFO_ONLY command, and I had a slight hint in there, in the lazy_bind symbols I've found:
{:offset=>20, :segment=>2, :library=>6, :flags=>[], :name=>"_objc_autoreleasePoolPush"}
where the name and offset match what I have exactly, and the library #6 is "/usr/lib/libobjc.A.dylib", which is also perfect. Now the issue is that segment #2 is __TEXT, but __TEXT starts at 0x1000, and __symbolstub1 is way down there at 0x2fd8. So I'm missing some reference down to section.
Any ideas on how am I supposed to map 0x2fec virtual address to _objc_autoreleasePoolPush?
Heh, just a little more digging and I've found it at LC_DYSYMTAB's indirect symbols.
Now the long answer.
Find a section for given address;
The section should be of type S_NON_LAZY_SYMBOL_POINTERS, S_LAZY_SYMBOL_POINTERS, S_LAZY_DYLIB_SYMBOL_POINTERS, S_THREAD_LOCAL_VARIABLE_POINTERS or S_SYMBOL_STUBS;
If the section type is S_SYMBOL_STUBS, then the byte size is stored in reserved2, otherwise it is considered equal to 4;
The offset into indirect symbols table is stored in reserved1;
The index into indirect symbols table is calculated as
index = sect.reserved1 + (vmaddr - sect.addr) / bytesize;
The symbol in the symbols table is found at symbols[indirect_symbols[index]].

Resources