Moving memcpy into another code section - c

I am building a piece of software meant to run on an ARM Cortex-M0+ microcontroller. It includes a USB bootloader of sorts that runs as a secondary program upon a call to a function. I'm having an issue with the insertion of the memcpy function during compilation.
Background
The linker script is where it all starts. Most of it is pretty straightforward and standard. The program is stored in .text and is executed from there as well. Everything in .text is stored in the flash section of the chip.
The strangeness is the part where the bootloader runs. In order to be able to write all of the flash without overwriting the bootloader code, my bootloader entry point initiates a copy of the bootloader program into the SRAM portion of the microcontroller and then executes it from there. This way, the bootloader can safely erase all of the flash on the device without inadverently deleting itself.
This is implemented by doing an faked "overlay" in the linker script (the real OVERLAY didn't quite match my use case):
/**
* The bootloader and general ram live in the same area of memory
* NOTE: The bootloader gets its own special RAM space and it lives on top
* of both .data and .bss.
*/
_shared_start = .;
.bootloader _shared_start : AT(_end_flash)
{
/* We keep the bootloader and its data together */
_start_bootloader_flash = LOADADDR(.bootloader);
_start_bootloader = .;
*(.bootloader.data)
*(.bootloader.data.*)
. = ALIGN(1024); /* Interrupt vector tables must be aligned to a 1024-byte boundary */
*(.bootloader.interrupt_vector_table)
*(.bootloader)
_end_bootloader = .;
}
.data _shared_start : AT(_end_flash + SIZEOF(.bootloader))
{
_start_data_flash = LOADADDR(.data);
_start_data = .;
*(.data)
*(.data.*)
*(.shdata)
_end_data = .;
}
. = _shared_start + SIZEOF (.data);
_bootloader_size = _end_bootloader - _start_bootloader;
_data_size = _end_data - _start_data;
_end_flash is a reference to the end of the previous section which stored all of its data in flash (.text, .rodata, .init...basically anything read-only gets stuck there).
What this accomplishes is that the .data and .bss sections normally live in RAM. However, the .bootloader sections also live in the same place in RAM. Both sections are stored to the flash sequentially when compiled. In my crt0 routines, the .data section is copied from the flash into its appropriate address in RAM (specified by _start_data) and the .bss section is zeroed. I have an additional section stored in the .text section which initiates the bootloader by copying its data from the flash into RAM, overwriting whatever was in .data and .bss. The only exit from the bootloader is a system reset, so it is ok that it destroys the data for the running program. After copying the bootloader into RAM, it executes it.
The Question
Obviously, there are some possible issues with compiling an overlaid program and making sure all the references line up. In order to mitigate issues that would crop up accessing bootloader code from the normal program or accessing the normal .data or .bss from the bootloader, I have the following three lines in my linker script:
NOCROSSREFS(.bootloader .text);
NOCROSSREFS(.bootloader .data);
NOCROSSREFS(.bootloader .bss);
Now, whenever I have a cross between the .text (which might be erased by the bootloader), .data (which the bootloader lives on top of), or .bss (again, the bootloader lives on top of it) and the .bootloader section, a compiler error will be issued.
This worked great until I actually started writing code. Part of my code includes some struct copying and other such things. Apparently, the compiler decided to do this (bootloader_ functions live in the .bootloader section):
20000340 <bootloader_usb_endp0_handler>:
...
20000398: 1c11 adds r1, r2, #0
2000039a: 1c1a adds r2, r3, #0
2000039c: f000 f8e0 bl 20000560 <__memcpy_veneer>
...
20000560 <__memcpy_veneer>:
20000560: b401 push {r0}
20000562: 4802 ldr r0, [pc, #8] ; (2000056c <__memcpy_veneer+0xc>)
20000564: 4684 mov ip, r0
20000566: bc01 pop {r0}
20000568: 4760 bx ip
2000056a: bf00 nop
2000056c: 00000869 andeq r0, r0, r9, ror #16
In my chip's architecture, addresses 0x20000000 until 0xE000000 or so are located in SRAM (I only have 4Kb of that actually on the device). Any address below 0x1fffffc00 is located in the flash section.
The problem is this: In my function located in my .bootloader section (bootloader_usb_endp0_handler), a reference to memcpy (2000039c, 20000562, and 2000056c) was inserted because I'm doing a struct copy among other things. The reference it put to memcpy is at address 0x00000869, which lives in the flash...which could be erased.
The particular code is:
static setup_t last_setup;
last_setup = *((setup_t*)(bdt->addr));
Where setup_t is a two-word struct and bdt->addr is a void* which I know points to data that looks like a setup_t. This line generates the call to memcpy.
My question is: I'd really like to keep my struct copying. It is convenient. Is there any way to specify to the compiler to place the memcpy into a specific section other than the default? I want that to happen just for the bootloader module. All the other code can have it's memcpy...I just want a special copy for my bootloader module that lives inside .bootloader.
If this simply isn't possible, I'm going to either write the entire bootloader in assembly (not as fun) or go the route of compiling the bootloader separately, including it as a fairly long hexadecimal string in the end program, and executing the string after copying it to RAM. The string route doesn't appeal to me very well because it is breakable and difficult to implement...so any other suggestions would also be appreciated.
The compilation line for this module is:
arm-none-eabi-gcc -Wall -fno-common -mthumb -mcpu=cortex-m0plus -ffreestanding -fno-builtin -nodefaultlibs -nostdlib -O0 -c src/bootloader.c -o obj/bootloader.o
Normally the optimization would be -Os, but I was trying to get rid of the memcpy...it didn't work.
Also, I've looked at this question and it didn't fix the problem.

I never tried, but you might get away using the EXTERN() linker script directive to force load your newlib memcpy() twice - first in the bootloader link stage into your desired section and later undefining it and link it a second time into your "normal" code.

Related

Relocation of data from flash to RAM during boot phase

I'm currently trying to solve a problem which requires moving data from flash to RAM during the booting phase. Right now everything is only being simulated using a microcontroller architecture which is based on the open-source PULPissimo. For simulation I use QuestaSim by Mentor Graphics. Toolchain is GNU.
Unfortunately I have pretty much zero experience on how to relocate data during the boot phase so I've read some posts and tutorials on this topic but I'm still confused about quite a few thing.
The situation is as follows: I set my boot mode to boot from flash which in this case means that the code will already reside pre-loaded inside the flash memory. The code is just a simply hello world or any other program really. When I simulate everything is compiled and the modules are loaded. After the boot phase the output "hello world" is displayed and the simulation is done. This means everything works as intended which is obviously a good sign and a good starting point.
Side note: As far as i know the PULPissimo architecture does not support direct boot from flash at the moment so the data from flash has to be moved to RAM (which they call L2) and executed.
From what I understand there are multiple things involved in the booting process. Please correct me if anything in the next paragraph is wrong:
First: The code that will be executed. It's written in C and has to be translated into a language which the architecture understands. This should be done automatically and reside in the flash memory pre boot phase. Considering that the code is actually being executed as mentioned above there is not much confusion here.
Second: The bootloader. This is also written in C. It is also translated and will be burned into ROM later on so changing this wouldn't make much sense. It loads the data which is neccessary for booting. It can also differentiate if you want to boot from flash or JTAG.
Third: The main startup file crt0.S. This is one of the things that confuse me, especially what it exactly does and what the difference between the bootloader and the main startup file is. Wikipedia (yes i know...) defines it as: "crt0 (also known as c0) is a set of execution startup routines linked into a C program that performs any initialization work required before calling the program's main function." So does that mean that it has noting to do with the boot phase but instead kind of "initializes" and/or loads only the code that I want to execute?
Fourth: The linker script link.ld. Even tho this is the part I read the most about, there are still quite a lot of questions. From what I understand the linker script contains information on where to relocate data. The data that is to be relocated is the data of the code i want to execute(?). It consists of different parts explained here.
.text program code;
.rodata read-only data;
.data read-write initialized data;
.bss read-write zero initialized data.
Sometimes I see more than those sections, not just text, rodata, data, bss. But how does the linker script know what the "text" is and what the "data" is and so on?
I know that's quite a lot and probably pretty basic stuff for a lot of you but I'm genuinely confused.
What I am trying to accomplish is relocating data from flash to RAM during the boot phase. Not only the code that I want to execute but more data that is also located in the flash memory. Consider the following simple scenario: I want to run a hello world C program. I want to boot from flash. Up to this point nothing special and everything works fine. Now after the data of the code I also load more data into flash, let's say 256 bytes of A (hex) so I can check my memory in QuestaSim by looking for AAAAAAAA sections. I also want to say where I want that data to be loaded during boot phase, for example 0x1C002000. I tried playing around with the crt0.S and the linker.ld files but with no success. The only time it actually worked was when I modified the bootloader.c file but I have to assume that this is already burned into ROM and i can't do any modifications on it. To be honest I'm not even sure if what I'm trying to do is even possible without any changes to the bootloader.c.
Thank you for your time.
Update
So I was playing around a bit and tried to create a simple example to understand what's happening and what manipulations or relocations I can do.
First I created a C file which basically contains only data.
Lets call it my_test_data.c
int normal_arr[] = {0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555};
int attribute_arr[] __attribute__ ((section(".my_test_section"))) = {0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666};
static int static_arr[] = {0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777};
int normal_var = 0xCCCCCCCC;
static int static_var = 0xDDDDDDDD;
int result_var;
Then I created the object file. I looked into it via objdump and could see my section my_test_section :
4 .my_test_section 00000020 00000000 00000000 00000054 2**2
After that I tried to modify my linker script so that this section would be loaded to an address that I specified. These are the lines I added in the linker script (probably more than needed). It is not the whole linker script!:
CUT01 : ORIGIN = 0x1c020000, LENGTH = 0x1000
.my_test_section : {
. = ALIGN(4);
KEEP(*(.my_test_section))
_smytest = .;
*(.my_test_section)
*(.my_test_section.*)
_endmytest = .;
} > CUT01
I wanted to see what data from my_test_data.c gets moved and where it gets moved. Remember that my goal is to have the data inside the RAM (Addr.: 0x1c020000) after booting (or during booting however you prefer). Unfortunately only:
int normal_arr[] = {0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555};
gets moved into ROM (Addr.: 0x1A000000) as it seems to be part of the .text section (iirc) which is already being handled by the linker script:
.text : {
. = ALIGN(4);
KEEP(*(.vectors))
_stext = .;
*(.text)
*(.text.*)
_etext = .;
*(.lit)
( ... more entries ...)
_endtext = .;
} > ROM
What also confuses me is the fact that I can add this line in the above .text section:
*(.my_test_section)
and then the data from the attribute_arr will be located in ROM but if I try to move it to the address I added (CUT01) nothing will ever end up there.
I also generated the map file which also lists my_test_section. This is an excerpt from the map file (don't mind the locations of where the output files are on my machine).
.my_test_section
0x000000001c020000 0x3c
0x000000001c020000 _mts_start = .
*(.text)
*(.text.*)
*(.comment)
.comment 0x000000001c020000 0x1a /.../bootloader.o
0x1b (size before relaxing)
.comment 0x000000001c02001a 0x1b /.../my_test_data.o
*(.comment.*)
*(.rodata)
*(.rodata.*)
*(.data)
*(.data.*)
*(.my_test_section)
*fill* 0x000000001c02001a 0x2
.my_test_section
0x000000001c02001c 0x20 /.../my_test_data.o
0x000000001c02001c attribute_arr
*(.my_test_section.*)
*(.bss)
*(.bss.*)
*(.sbss)
*(.sbss.*)
0x000000001c02003c . = ALIGN (0x4)
0x000000001c02003c _mts_end = .
OUTPUT(/.../bootloader elf32-littleriscv)
I will continue to try to get this to work but right now I'm kind of confused as to why it seems like my_test_section gets recognized but not moved to the location which I specified. This makes me wonder if I made a mistake (or several mistakes) in the linker script or if one of the other files (bootloader.c or crt0.S) might be the reason.
There is a lot being asked here. I'm going to take a stab at answering part of the questions. you ask:
But how does the linker script know what the "text" is and what the
"data" is and so on?
The additional, custom, sections, and the predefined sections, are handled differently.
Custom sections usually require the related variables to have the section specified with a pragma.
The standard sections are defined by their type:
text: this is the code. that should be clear; the instructions to the computer of what to do, not the data
rodata: const data -- such as literal strings (eg. "This is a literal string" in the code. A good compiler/linker should put variables defined as 'const' (not const parameters) in the rodata section as well.
bss: static or global variables which are not initialized when declared:
int global_var_not_a_good_idea; // not in a function; local variables are different
static int anUninitializedArray[10];
data: static or global variables which are initialized when declared
int initializedGlobalVarStillNotRecommended = 10;
static int initializedArray[] = { 1, 2, 3, 4, 5, 6};
This data should be copied to RAM when the program loads.
EDIT:
Somewhere in your startup code should be a reset handler. This function will be called on processor reset. It should be the function that copies data to RAM, possibly clears the zero segment, initializes the C library, etc. When finished with initializations, it should call main();
Here is an example (in this case, from generated or example code for the Atmel SAMG55 processor, but the idea should be the same) of relocating data to RAM.
In the linker script memory space definitions (I'm going to leave out the real numbers):
ram (rwx) : ORIGIN = 0x########, LENGTH = 0x########
in the linker script section definitions:
.relocate : AT (_etext)
{
. = ALIGN(4);
_srelocate = .;
(.ramfunc .ramfunc.);
(.data .data.);
. = ALIGN(4);
_erelocate = .;
} > ram
note that _etext is the end of the previous section
_srelocate and _erelocate are used in the startup code to relocate, I believe, everything in .data (and, apparently, .ramfunc as well) in all the files:
/* Initialize the relocate segment */
pSrc = &_etext;
pDest = &_srelocate;
if (pSrc != pDest) {
for (; pDest < &_erelocate;) {
*pDest++ = *pSrc++;
}
}
This is a pretty standard example. If you search in your project for where main() is called, you should find something similar.
If all you want to do is relocate the entire .data section to the address you are specifying in RAM, you should need only to change the definition of the location of the RAM section, not define your own. You only need to define your own section if you want to move specific variables to a different location
I am not familiar with the platform on which you are working, but there should be either a C or assembly file with the startup code that runs before crt0. This will set up the stack, heap, and interrupt vectors. In some implementations, this code also copies the .data section to RAM, and may be set up to copy everything from the beginning of the data section until the beginning of the .bss section, to RAM. If your platform is set up in this way, if you locate your section between .data and .bss, it should be copied with no other changes from you (see here, for example).
If, however, you want to copy the data to a different location, you will probably have to add code to copy it, either in the loader code or at the very beginning of main, using the symbols you defined for the beginning and ending of the section.
Since you mention, though, that it is read-only data, I would recommend leaving it in read-only memory if you can.

Compiling and linking position independent code (PIC) for ARM M4

I'm working on making a bootloader and giving it the ability to update itself. This process involves copying the binary to a new location, jumping to it, and using it to flash the new bootloader in the original location. This is all being developed for an M4 processor in Eclipse, using the ARM GCC toolchain.
To do this, I've gathered that I need to compile as Position Independent Code (PIC).
I've searched around and found this excellent article, so when I added "-fPIC" to the ARM GCC compiler call I expected to see linker errors about GOT and PLT being missing
https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
In my linker script, I added these location to the .data section as follows:
.data : AT(__DATA_ROM)
{
. = ALIGN(4);
__DATA_RAM = .;
__data_start__ = .; /* Create a global symbol at data start. */
*(.got*) /* .got and .plt for position independent code */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
KEEP(*(.jcr*))
. = ALIGN(4);
__data_end__ = .; /* Define a global symbol at data end. */
} > m_data
However, this code fails to copy-up from ROM to RAM.
My next thought was that perhaps my linker needed to be aware it was linking a PIC executable. To find out, I added "--pic-executable" to the LD linker script call. However, the linker now generated sections for "interp", "dyn", "rel.dyn" and "hash". I tried throwing these into the data section as well, but got the following errors:
gcc-arm-none-eabi-4_9/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe:
could not find output section .hash
gcc-arm-none-eabi-4_9/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe:
final link failed: Nonrepresentable section on output
I assume this means the compiler didn't actually fill the ".hash" section with anything, so the link failed.
Am I going about this correctly? Is there anything else I need to add to get the compiler to do? Any help would be greatly appreciated.
Recently I researched extensively Cortex-M4 bootloading a PIC firmware image.
There are couple of things needed:
A very simple bootloader. It only needs to read 2 4-byte words from the flash location of the firmware image. First is stack pointer address and second is Reset_Handler address. Bootloader needs to jump to Reset_Handler. But as the firmware is relocated further in the flash, the Reset_Handler is actually a bit further. See this picture for clarification:
So, bootloader adds the correct offset before jumping to Reset_Handler of the firmware image. No other patching is done, but bootloader (in my solution at least) stores location and offset of the firmware image, calculates checksum and passes this information in registers for the firmware image to use.
Then we need modifications in firmware linker file to force the ISR vector table in the beginning of RAM. For Cortex-M4 and VTOR (Vector Offset Table Register) the ISR vector table needs to be aligned to 512 boundary. In the beginning of RAM it is there naturally. In linker script we also dedicate a position in RAM for Global Offset Table (GOT) for easier manipulation. Address ranges should be exported via linker script symbols also.
We need also C compiler options. Basically these:
-fpic
-mpic-register=r9
-msingle-pic-base
-mno-pic-data-is-text-relative
They go to C compiler only! This creates the Global Offset Table (GOT) accounting.
Finally we need dedicated and tedious assembly bootstrap routines in the firmware image project. These routines perform the normal startup task of setting up the C runtime environment, but they additionally also go and read ISR and GOT from flash and copy them to RAM. In RAM, the addresses of those tables pointing to flash are offsetted by the amount the firmware image is running apart from bootloader. Example pictures:
I spent last 6 months researching this subject and I have written an in-depth article about it here:
https://techblog.paalijarvi.fi/2022/01/16/portable-position-independent-code-pic-bootloader-and-firmware-for-arm-cortex-m0-and-cortex-m4/
By providing the link I hope I can contribute to StackOverflow which I used as a starting point for my research. I hope this helps some people in learning the intrinsics.
Booting a code and relocating involves many careful steps and initialization of configuration of RAM, SPI and other necessary peripherals.
I know U-Boot does the sequence you are trying to achieve. So a better starting is to walk through u-boot documentation and sources in, machine specific folders for the processor or board of interest.
For what it's worth, neither I nor NXP's technical support team were able to get their S32DS IDE to compile truly position independent code.
To this day, we have two bootloaders - one compiled for location A, the other is an intermediate for location B. Both of which are required for an update.
TI does this with the Tivaware bootloader. You can do it with linker gnu ld trickery:
.text 0x20000000 : AT (0x00000000)
{
_text = .;
KEEP(*(.isr_vector))
*(.text*)
*(.rodata*)
_etext = .;
}
Startup code that copies this code from Flash at 0x0 to SRAM at 0x2000_0000 is left as an exercise to the reader.

Understanding certain ELF file structure

From ARM's infocenter, regarding section static linking and relocations:
** Section #1 'ER_RO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 28 bytes (alignment 4)
Address: 0x00008000
$a
.text
bar
0x00008000: E59f000C .... LDR r0,[pc,#12] ; [0x8014] = 0x801C
0x00008004: E5901000 .... LDR r1,[r0,#0]
0x00008008: E2411001 ..A. SUB r1,r1,#1
0x0000800C: E5801000 .... STR r1,[r0,#0]
0x00008010: E12FFF1E ../. BX lr
$d
0x00008014: 0000801C .... DCD 32796
$a
.text
foo
0x00008018: EAFFFFF8 .... B bar ; 0x8000
and from ELF for the ARM architecture:
Table 4-7, Mapping symbols
Name Meaning
$a - Start of a sequence of ARM instructions
$d - Start of a sequence of data items (for example, a literal pool)
As you can see, the ELF file contains a section in which there is code (bar), then data/ro (32796), then more code (foo) in consecutive addresses.
Now, a basic principle regarding any SW file structure is that the SW is composed from different and separate sections - text (code), data, and bss. (and rodata if we want to be pedantic) as we can see if we examine the MAP file.
So, this ELF structure is not consistent with this basic principle, so my question is what is going on here? am I mistaking in this basic principle? if not, than is this ELF structure will be changed in run time to meet the sections separation?
and why is the ELF section contains mixed types in a certain sequential address space?
NOTE: I assume the scatter file used in the example is the default one since the document contains the example do not provide any scatter file along with the example.
At run time, the sections do not matter, only the PT_LOAD segments in the program header. The ELF specification is quite flexible there as well, but some loaders have restrictions on the PT_LOAD segments they can process.
The reason for splitting code and data this way could be that this architecture supports only a limited range of PC-relative addressing and needs a constant pool for loading most constants (because constructing them via immediates is too expensive). Having as few large constants pools as possible is attractive because it leads to improved data and instruction cache utilization (instead of caching memory which is not of the right type and this can never be used), but you may still need more than one if the code size exceeds what can be addressed directly.

Why do I get the same address every time I build + disassemble a function inside GDB?

Every time when I disassemble a function, why do I always get the same instruction address and constants' address?
For example, after executing the following commands,
gcc -o hello hello.c -ggdb
gdb hello
(gdb) disassemble main
the dump code would be:
When I quit gdb and re-disassemble the main function, I will get the same result as before. The instruction address and even the address of constants are always the same for each disassemble command in gdb. Why is that? Does the compiled file hello contain certain information about the address of each assembly instruction as well as the constants' addresses?
If you made a position-independent executable (e.g. with gcc -fpie -pie, which is the default for gcc in many recent Linux distros), the kernel would randomize the address it mapped your executable at. (Except when running under GDB: GDB disables ASLR by default even for shared libraries, and for PIE executables.)
But you're making a position-dependent executable, which can take advantage of static addresses being link-time constants (by using them as immediates and so on without needing runtime relocation fixups). e.g. you or the compiler can use mov $msg, %edi (like your code) instead of lea msg, %rdi (with -fpie).
Regular (position-dependent) executables have their load-address set in the ELF headers: use readelf -a ./a.out to see the ELF metadata.
A non-PIE executable will load at the same time every time even without running it under GDB, at the address specified in the ELF program headers.
(gcc / ld chooses 0x400000 by default on x86-64-linux-elf; you could change this with a linker script). Relocation information for all the static addresses hard-coded into the code + data is not available, so the loader couldn't fix up the addresses even if it wanted to.
e.g. in a simple executable (with only a text segment, not data or bss) I built with -no-pie (which seems to be the default in your gcc):
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000c5 0x00000000000000c5 R E 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
So the ELF headers request that offset 0 in the file be mapped to virtual address 0x0000000000400000. (And the ELF entry point is 0x400080; that's where _start is.) I'm not sure what the relevance of PhysAddr = VirtAddr is; user-space executables don't know and can't easily find out what physical addresses the kernel used for pages of RAM backing their virtual memory, and it can change at any time as pages are swapped in / out.
Note that readelf does line wrapping; note there are two rows of columns headers. The 0x200000 is the Align column for that one LOADed segment.
By default, the GNU toolchain for x86-64 Linux produces position-dependent executables which are mapped at address 0x400000. (position-independent executables will be mapped at 0x55… addresses instead). It is possible to change that by building GCC --enable-default-pie, or by specifying compiler and linker flags.
However, even for a position-independent executable (PIE), the addresses would be constant between GDB runs because GDB disables address space layout randomization by default. GDB does this so that breakpoints at absolute addresses can be re-applied after the program has been started.
There are a variety of executable file formats. Typically, an executable file contains information anout several memory sections or segments. Inside the executable, references to memory addresses may be expressed relative to the beginning of a section. The executable also contains a relocation table. The relocation table is a list of those references, including where each one is in the executable, what section it refers to, and what type of reference it is (what field of an instruction it is used in, etc.).
The loader (software that loads your program into memory) reads the executable and writes the sections to memory. In your case, the loader appears to be using the same base addresses for sections every time it runs. After initially putting the sections in memory, the loader reads the relocation table and uses it to fix up all the references to memory by adjusting them based on where each section was loaded into memory. For example, the compiler may write an instruction as, in effect, “Load register 3 from the start of the data section plus 278 bytes.” If the loader puts the data section at address 2000, it will adjust this instruction to use the sum of 2000 and 278, making “Load register 3 from address 2278.”
Good modern loaders randomize where sections are loaded. They do this because malicious people are sometimes able to exploit bugs in programs to cause them to execute code injected by the attacker. Randomizing section locations prevents the attacker from knowing the address where their code will be injected, which can hinder their ability to prepare the code to be injected. Since your addresses are not changing, it appears your loader does not do this. You may be using an older system.
Some processor architectures and/or loaders support position independent code (PIC). In this case, the form of an instruction may be “Load register 3 from 694 bytes beyond where this instruction is.” In that case, as long as the data is always at the same distance from the instruction, it does not matter where they are in memory. When the process executes the instruction, it will add the address of the instruction to 694, and that will be the address of the data. Another way of implementing PIC-like code is for the loader to provide the addresses of each section to the program, by putting those addresses in registers or fixed locations in memory. Then the program can use those base addresses to do its own address calculations. Since your program has an address built into the code, it does not appear your program is using these methods.
a not intended to be really executed program
bootstrap
.globl _start
_start:
bl one
b .
first c file
extern unsigned int hello;
unsigned int one ( void )
{
return(hello+5);
}
second c file (being extern forces the compiler to compile the first object in a certain way)
unsigned int hello;
linker script
MEMORY
{
ram : ORIGIN = 0x00001000, LENGTH = 0x4000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
building position dependent
Disassembly of section .text:
00001000 <_start>:
1000: eb000000 bl 1008 <one>
1004: eafffffe b 1004 <_start+0x4>
00001008 <one>:
1008: e59f3008 ldr r3, [pc, #8] ; 1018 <one+0x10>
100c: e5930000 ldr r0, [r3]
1010: e2800005 add r0, r0, #5
1014: e12fff1e bx lr
1018: 0000101c andeq r1, r0, r12, lsl r0
Disassembly of section .bss:
0000101c <hello>:
101c: 00000000 andeq r0, r0, r0
the key here is at address 0x1018 the compiler had to leave a placeholder for the address to the external item. shown as offset 0x10 below
00000000 <one>:
0: e59f3008 ldr r3, [pc, #8] ; 10 <one+0x10>
4: e5930000 ldr r0, [r3]
8: e2800005 add r0, r0, #5
c: e12fff1e bx lr
10: 00000000 andeq r0, r0, r0
The linker fills this in at link time. You can see in the disassembly above that position dependent it fills in the absolute address of where to find that item. For this code to work the code must be loaded in a way that that item shows up at that address. It has to be loaded at a specific position or address in memory. Position dependent. (loaded at address 0x1000 basically).
If your toolchain supports position independent (gnu does) then this represents a solution.
Disassembly of section .text:
00001000 <_start>:
1000: eb000000 bl 1008 <one>
1004: eafffffe b 1004 <_start+0x4>
00001008 <one>:
1008: e59f3014 ldr r3, [pc, #20] ; 1024 <one+0x1c>
100c: e59f2014 ldr r2, [pc, #20] ; 1028 <one+0x20>
1010: e08f3003 add r3, pc, r3
1014: e7933002 ldr r3, [r3, r2]
1018: e5930000 ldr r0, [r3]
101c: e2800005 add r0, r0, #5
1020: e12fff1e bx lr
1024: 00000014 andeq r0, r0, r4, lsl r0
1028: 00000000 andeq r0, r0, r0
Disassembly of section .got:
0000102c <.got>:
102c: 0000103c andeq r1, r0, r12, lsr r0
Disassembly of section .got.plt:
00001030 <_GLOBAL_OFFSET_TABLE_>:
...
Disassembly of section .bss:
0000103c <hello>:
103c: 00000000 andeq r0, r0, r0
It has a performance hit of course, but instead of the compiler and linker working together by leaving one location, there is now a table, global offset table (for this solution) that is at a known location which is position relative to the code, that contains linker supplied offsets.
The program is not position independent yet, it will certainly not work if you load it anywhere. The loader has to patch up the table/solution based on where it wants to place the items. This is far simpler than having a very long list of each of the locations to patch in the first solution, although that would have been a very possible way to do it. A table in the executable (executables contain more than the program and data they have other items of information as you know if you objdump or readelf an elf file) could contain all of those offsets and the loader could patch those up too.
If your data and bss and other memory sections are fixed relative to .text as I have built here, then a got wasnt necessary the linker could have at link time computed the relative offset to the resource and along with the compiler found the item in an position independent way, and the binary could have been loaded just about anywhere (some minimum alignment may hav been required) and it would work without any patching. With the gnu solution I think you can move the segments relative to each other.
It is incorrect to state that the kernel will or would always randomize your location if built position independent. While possible so long as the toolchain and the loader from the operating system (a completely separate development) work hand in hand, the loader has the opportunity. But that does not in any way mean that every loader does or will. Specific operating systems/distros/versions may have that set as a default yes. If they come across a binary that is position independent (built in a way that loader expects). It is like saying all mechanics on the planet will use a specific brand and type of oil if you show up in their garage with a specific brand of car. A specific mechanic may always use a specific oil brand and type for a specific car, but that doesnt mean all mechanics will or perhaps even can obtain that specific oil brand or type. If that individual business chooses to as a policy then you as a customer can start to form an assumption that that is what you will get (with that assumption then failing when they change their policy).
As far as disassembly you can statically disassemble your project at build time or whenever. If loaded at a different position then there will be an offset to what you are seeing, but the .text code will still be in the same place relative to other code in that segment. If the static disassembly shows a call being 0x104 bytes ahead, then even if loaded somewhere else you should see that relative jump also be 0x104 bytes ahead, the addresses may be different.
Then there is the debugger part of this, for the debugger to work/show the correct information it also has to be part of the toolchain/loader(/os) team for everything to work/look right. It has to know this was position independent and have to know where it was loaded and/or the debugger is doing the loading for you and may not use the standard OS loader in the same way that a command line or gui does. So you might still see the binary in the same place every time when using the debugger.
The main bug here was your expectation. First operating systems like windows, linux, etc desire to use an MMU to allow them to manage memory better. To pick some/many non-linear blocks of physical memory and create a linear area of virtual memory for your program to live, more importantly the virtual address space for each separate program can look the same, I can have every program load at 0x8000 in virtual address space, without interfering with each other, with an MMU designed for this and an operating system that takes advantage of this. Even with this MMU and operating system and position independent loading one would hope they are not using physical addresses, they are still creating a virtual address space, just possibly with different load points for each program or each instance of a program. Expecting all operating systems to do this all the time is an expectation problem. And when using a debugger you are not in a stock environment, the program runs differently, can be loaded differently, etc. It is not the same as running without the debugger, so using a debugger also changes what you should expect to see happen. Two levels of expectation here to deal with.
Use an external component in a very simple program as I made above, see in the disassembly of the object that it has built for position independence as well as in the linking then try Linux as Peter has indicated and see if it loads in a different place each time, if not then you need to be looking at superuser SE or google around about how to use linux (and/or gdb) to get it to change the load location.

Loading HEX data into memory

I am compiling baremetal software (no OS) for the Beagleboard (ARM Cortex A8) with Codesourcerys GCC arm EABI compiler. Now this compiles to a binary or image file that I can load up with the U-Boot bootloader.
The question is, Can I load hexdata into memory dynamically in runtime (So that I can load other image files into memory)? I can use gcc objcopy to generate a hexdump of the software. Could I use this information and load it into the appropriate address? Would all the addresses of the .text .data .bss sections be loaded correctly as stated in the linker script?
The hexdata output generated by
$(OBJCOPY) build/$(EXE).elf -O binary build/$(EXE).bin
od -t x4 build/$(EXE).bin > build/$(EXE).bin.hex
look like this:
0000000 e321f0d3 e3a00000 e59f1078 e59f2078
0000020 e4810004 e1510002 3afffffc e59f006c
0000040 e3c0001f e321f0d2 e1a0d000 e2400a01
0000060 e321f0d1 e1a0d000 e2400a01 e321f0d7
... and so on.
Is it as simple as to just load 20 bytes for each line into the desired memory address and everything would work by just branching the PC into the correct address? Did I forget something?
when you use -O binary you pretty much give up your .text, .data. .bss control. For example if you have one word 0x12345678 at address 0x10000000 call that .text, and one word of .data at 0x20000000, 0xAABBCCDD, and you use -O binary you will get a 0x10000004 byte length file which starts with the 0x12345678 and ends with 0xAABBCCDD and has 0x0FFFFFFC bytes of zeros. try to dump that into a chip and you might wipe out your bootloader (uboot, etc) or trash a bunch of registers, etc. not to mention dealing with potentially huge files and an eternity to transfer to the board depending on how you intend to do that.
What you can do which is typical with rom based bootloaders, is if using gcc tools
MEMORY
{
bob : ORIGIN = 0x10000000, LENGTH = 16K
ted : ORIGIN = 0x20000000, LENGTH = 16K
}
SECTIONS
{
.text : { *(.text*) } > bob
.bss : { *(.bss*) } > ted AT > bob
.data : { *(.data*) } > ted AT > bob
}
The code (.text) will be linked as if the .bss and .data are at their proper places in memory , 0x20000000, but the bytes are loaded by the executable (an elf loader or -O binary, etc) tacked onto the end of .text. Normally you use more linkerscript magic to determine where the linker did this. On boot, your .text code should first zero the .bss and copy the .data from the .text space to the .data space and then you can run normally.
uboot can probably handle formats other than .bin yes? It is also quite easy to write an elf tool that extracts the different parts of binaries and makes your own .bins, not using objcopy. It is also quite easy to write code that never relies on .bss being zero nor has a .data. solving all of these problems.
If you can write to random addresses without an OS getting in the way, there's no point in using some random hex dump format. Just load the binary data directly to the desired address. Converting on the fly from hex to binary to store in memory buys you nothing. You can load binary data to any address using plain read() or fread(), of course.
If you're loading full-blown ELF files (or similar), you of course need to implement whatever tasks that particular format expects from the object loader, such as allocating memory for BSS data, possibly resolving any unresolved addresses in the code (jumps and such), and so on.
Yes, it is possible to write to memory (on an embedded system) during run-time.
Many bootloaders copy data from a read-only memory (e.g. Flash), into writeable memory (SRAM) then transfer execution to that address.
I've worked on other systems that can download a program from a port (USB, SD Card) into writeable memory then transfer execution to that location.
I've written functions that download data from a serial port and programmed it into a Flash Memory (and EEPROM) device.
For memory to memory copies, use either memcpy or write your own, use pointers that are assigned a physical address.
For copying data from a port to memory, figure out how to get data from a device (such as a UART) then copy the data from its register into your desired location, via pointers.
Example:
#define UART_RECEIVE_REGISTER_ADDR (0x2000)
//...
volatile uint8_t * p_uart_receive_reg = (uint8_t*) UART_RECEIVE_REGISTER_ADDR;
*my_memory_location = *p_uart_receive_reg; // Read device and put into memory.
Also, search Stack Overflow for "embedded C write to memory"

Resources