how does the program gets loaded in memory when you have several memory region? - c

Let's say I have SOC that have different processors (embedded SOC) and different memory regions. let's say it has internal ram for 3 different processors it has and an external ddr ram, let's call them sram1,sram2,sram3, and ddr1.
let's say I write some code and write a linker that uses all 4 different memories (and perhaps few sections per memory). I think what it means at this stage is that linker assumes your code is running at those addresses and resolves addresses based on them. But in order for this code to actually run, it has to be put in the correct addresses and then run. and I'm confused about this part, who makes sure that the code is copied into the right memory addresses before it's run ? is it the initial software that is running ?
In my case, the initial code (bootloader) is loaded by ROM. the ROM looks for an address in a specidfied format (specific to my processors) and copies the code from boot media (sd card) there. which is all the code in one region (let's say sram1). this makes sense, few hundred KBs of bootloader code gets copied in one of the regions which matches the linker file. What confuses me is after this : let's say my bootloader wants to load another app, and as I mentioned, that app uses all 4 different memory regions. is it the duty of bootloader to make sure the code is copied from sd card to the correct memory regions before passing control to it ? I think it should be, but then the bootloader should completely know how you have linked your app with the linker script, and I think this doesn't make sense. Can you enlighten me of how it works ?

Related

.text area of memory layout

My Question is related to the memory layout in embedded system
I learned that when we flash(or burn) a executable file it sits either in ROM or FLASH depending on the hardware we use.
But i also learned from the memory layout of a c-program that program segment contains the .text area ( i.e compiled code)
My question is :
1) Is it the same code what we burn in flash/ROM sits in RAM(as depicted in .text area of program code)
2) Two copies are created one in flash/ROM and RAM ??
1) Depending on your hardware, the .text (and other segments) may be accessible directly in flash/ROM, or (in the case of serial flash), it may need to be copied into RAM to be executable.
2) The version in flash/ROM is the only version, UNTIL execution starts. Then (depending on the answers in 1), some start up code MAY copy the ROM into RAM to execute it, OR it may execute directly from the flash/ROM. Once executing, the C start up code MAY copy (or initialise) some of the non-code segments into RAM (e.g. .data, .bss etc).
Older, slower processors, may execute from ROM (think 8086/6502 era), whereas more modern processors (Pentium+ era, FPGA etc) would run incredibly slowly if executing from flash, so they will copy the executable into RAM (and even then, the currently executing code will be cached into the processor itself, so there may be a 3rd copy of the code).

Is there a case when const data should be load in RAM rather then direct flash access

I've used few types microcontrollers.
When I write code like this:
const int var = 5;
usually var is kept in flash. I understand that const variables are not always kept only in flash. Sometimes (depending compiler, processor, options like pic etc.) they are loaded from flash to RAM before main. Is there a case, when it is better to load var into RAM?
Many microcontrollers have a unified address space, but some (such as AVRs) do not. When there are multiple address spaces, then you may need to use different instructions to access data in those spaces. (For AVR, the LPM (Load Program Memory) instruction must be used to access data in Flash, instead of one of the ordinary LD (Load) instructions.)
If you imagine a pointer to your const variable, a function receiving the pointer would not be able to access the data unless it knows which address space the pointer points into. In a case like that, you either have to copy the data to RAM, even though it's const, or the user of the pointer has to know which instruction to use to access the data.
Assuming a Microcontroller architecture like ARM Cortex or Microchip MIPS (and many others), RAM and Flash are mapped to different parts of the internal address space, like a huge array. So the Assembly commands reading from RAM are the same like reading from Flash. No difference here.
Access times of RAM and Flash shouldn't be too different, so no waiting needed on any of the controllers I've worked with.
The only case I can imagine where storing const vars in flash could cause problems is in some sort of bootloader app, when the flash is written. Of course, writing to a flash range where you are executing from is a bad idea and will cause much heavier problems than overwritten const values.

Virtual/Logical Memory and Program relocation

Virtual memory along with logical memory helps to make sure programs do not corrupt each others data.
Program relocation does an almost similar thing of making sure that multiple programs does not corrupt each other.Relocation modifies object program so that it can be loaded at a new, alternate address.
How are virtual memory, logical memory and program relocation related ? Are they similar ?
If they are same/similar, then why do we need program relocation ?
Relocatable programs, or said another way position-independent code, is traditionally used in two circumstances:
systems without virtual memory (or too basic virtual memory, e.g. classic MacOS), for any code
for dynamic libraries, even on systems with virtual memory, given that a dynamic library could find itself lodaded on an address that is not its preferred one if other code is already at that space in the address space of the host program.
However, today even main executable programs on systems with virtual memory tend to be position-independent (e.g. the PIE* build flag on Mac OS X) so that they can be loaded at a randomized address to protect against exploits, e.g. those using ROP**.
* Position Independent Executable
** Return-Oriented Programming
Virtual memory does not prevent programs from interfering with out other. It is logical memory that does so. Unfortunately, it is common for the two concepts to be conflated to "virtual memory."
There are two types of relocation and it is not clear which you are referring to. However, they are connected. On the other hand, the concept is not really related to virtual memory.
The first concept of relocatable code. This is critical for shared libraries that usually have to be mapped to different addresses.
Relocatable code uses offsets rather than absolute addresses. When a program results in an instruction sequence something like:
JMP SOMELABEL
. . .
SOMELABEL:
The computer or assembler encodes this as
JUMP the-number-of-bytes-to-SOMELABEL
rather than
JUMP to-the-address-of-somelabel.
By using offsets the code works the same way no matter where the JMP instruction is located.
The second type of relocation uses the first. In the past relocation was mostly used for libraries. Now, some OS's will load program segments at different places in memory. That is intended for security. It is designed to keep malicious cracks that depend upon the application being loaded at a specific address.
Both of these concepts work with or without virtual memory.
Note that generally the program is not modified to relocated it. I generally, because an executable file will usually have some addresses that need to be fixed up at run time.

How does OS execute binary files in virtual memory?

For example in my program I called a function foo(). The compiler and assembler would eventually write jmp someaddr in the binary. I know the concept of virtual memory. The program would think that it has the whole memory at disposal, and the start position is 0x000. In this way the assembler can calculate the position of foo().
But in fact this is not decided until runtime right? I have to run the program to know where I loaded the program into, hence the address of the jmp. But when the program actually runs, how does the OS come in and change the address of the jmp? These are direct CPU instructions right?
This question can't be answered in general because it's totally hardware and OS dependent. However a typical answer is that the initially loaded program can be compiled as you say: Because the VM hardware gives each program its own address space, all addresses can be fixed when the program is linked. No recalculation of addresses at load time is needed.
Things get much more interesting with dynamically loaded libraries because two used by the same initially loaded program might be compiled with the same base address, so their address spaces overlap.
One approach to this problem is to require Position Independent Code in DLLs. In such code all addresses are relative to the code itself. Jumps are usually relative to the PC (though a code segment register can also be used). Data are also relative to some data segment or base register. To choose the runtime location, the PIC code itself needs no change. Only the segment or base register(s) need(s) be set whenever in the prelude of every DLL routine.
PIC tends to be a bit slower than position dependent code because there's additional address arithmetic and the PC and/or base registers can bottleneck the processor's instruction pipeline.
So the other approach is for the loader to rebase the DLL code when necessary to eliminate address space overlaps. For this the DLL must include a table of all the absolute addresses in the code. The loader computes an offset between the assumed code and data base addresses and actual, then traverses the table, adding the offset to each absolute address as the program is copied into VM.
DLLs also have a table of entry points so that the calling program knows where the library procedures start. These must be adjusted as well.
Rebasing is not great for performance either. It slows down loading. Moreover, it defeats sharing of DLL code. You need at least one copy per rebase offset.
For these reasons, DLLs that are part of Windows are deliberately compiled with non-overlapping VM address spaces. This speeds loading and allows sharing. If you ever notice that a 3rd party DLL crunches the disk and loads slowly, while MS DLLs like the C runtime library load quickly, you are seeing the effects of rebasing in Windows.
You can infer more about this topic by reading about object file formats. Here is one example.
Position-independent code is code that you can run from any address. If you have a jmp instruction in position-independent code, it will often be a relative jump, which jumps to an offset from the current location. When you copy the code, it won't change the offsets between parts of the code so it will still work.
Relocatable code is code that you can run from any address, but you might have to modify the code first (maybe you can't just copy it). The code will contain a relocation table which tells how it needs to be modified.
Non-relocatable code is code that must be loaded at a certain address or it will not work.
Each program is different, it depends on how the program was written, or the compiler settings, or other various factors.
Shared libraries are usually compiled as position-independent code, which allows the same library to be loaded at different locations in different processes, without having to load multiple copies into memory. The same copy can be shared between processes, even though it is at a different address in each process.
Executables are often non-relocatable, but they can be position-independent. Virtual memory allows each program to have the entire address space (minus some overhead) to itself, so each executable can choose the address at which it's loaded without worrying about collisions with other executables. Some executables are position-independent, which can be used to increase security (ASLR).
Object files and static libraries are usually relocatable code. The linker will relocate them when combining them to create a shared library, executable, or other image.
Boot loaders and operating system kernels are almost always non-relocatable.
Yes, it is at runtime. The operating system, the part managing starting and switching tasks is ideally at a different protection level, it has more power. It knows what memory is in use and allocates some for the new task. It configures the mmu so that the new task has a virtual address space starting at zero or whatever the rule is for that operating system and processor. How you get into user mode at that starting address, is very processor specific.
One method for example is the hardware might save some state not just address but mode or virtual id or something when an interrupt occurs, lets say on the stack. And the return from interrupt instruction as defined by that processor takes the address, and state/mode, off of the stack and switches there (causing lets assume the mmu to react to its next fetch based on the new mode not the old). For a processor that works like that then you may be able to fake an interrupt return by placing the right items on the stack such that when you kick the interrupt return instruction it basically does a jump with additional features of mode switching, etc.
The ARM family for example (not cortex-m) has a processor state register for what you are running now (in the case of an interrupt or service call) and a second state register for where you came from, the state that was interrupted, when you do the proper return you give it the address and it switches back to that mode using the other register. You can directly access that register from the non-users modes so you can manipulate the state of the return. There is no return instruction in arm, just flavors of jump (modifications to the program counter), so it is a special kind of jump.
The short answer is that it is very specific to the processor as to what your choices are for jumping to the first time or returning to after a task switch to a running task in an application mode in a virtual address space. Either directly or indirectly the processor documentation will describe these modes and how you change them. If not explicitly described then you have to figure out on your own from the instructions and the mmu protections and such how to switch tasks.

How to use external memory on a microcontroller

In the past, I've worked a lot with 8 bit AVR's and MSP430's where both the RAM and flash were stored on the chip directly. When you compile and download your program, it sort of "just works" and you don't need to worry about where and how variables are actually stored.
Now I'm starting a project where I'd like to be able to add some external memory to a microcontroller (a TI Stellaris LM3S9D92 if that matters) but I'm not entirely sure how you get your code to use the external RAM. I can see how you configure the external bus pretty much like any other peripheral but what confuses me is how the processor keeps track of when to talk to the external memory and when to talk to the internal one.
From what I can tell, the external RAM is mapped to the same address space as the internal SRAM (internal starts at 0x20000000 and external starts at 0x60000000). Does that mean if I wrote something like this:
int* x= 0x20000000;
int* y= 0x60000000;
Would x and y would point to the first 4 bytes (assuming 32 bit ints) of internal and external RAM respectively? If so, what if I did something like this:
int x[999999999999]; //some super big array that uses all the internal ram
int y[999999999999]; //this would have to be in external ram or it wouldn't fit
I imagine that I'd need to tell something about the boundaries of where each type of memory is or do I have it all wrong and the hardware figures it out on its own? Do linker scripts deal with this? I know they have something to do with memory mapping but I don't know what exactly. After reading about how to set up an ARM cross compiler I get the feeling that something like winavr (avr-gcc) was doing a lot of stuff like this for me behind the scenes so I wouldn't have to deal with it.
Sorry for rambling a bit but I'd really appreciate it if someone could tell me if I'm on the right track with this stuff.
Update
For any future readers I found this after another few hours of googling http://www.bravegnu.org/gnu-eprog/index.html. Combined with answers here it helped me a lot.
Generally that is exactly how it works. You have to properly setup the hardware and/or the hardware may already have things hardcoded at fixed addresses.
You could ask the same question, how does the hardware know that when I write a byte to address 0x21000010 (I just made that up) that that is the uart transmit holding register and that write means I want to send a byte out the uart? The answer because it is hardcoded in the logic that way. Or the logic might have an offset, the uart might be able to move it might be at some other control register contents plus 0x10. change that control register (which itself has some hardcoded address) from 0x21000000, to 0x90000000 and then write to 0x90000010 and another byte goes out the uart.
I would have to look at that particular part, but if it does support external memory, then in theory that is all you have to do know what addresses in the processors address space are mapped to that external memory and reads and writes will cause external memory accesses.
Intel based computers, PC's, tend to like one big flat address space, use the lspci command on your Linux box (if you have one) or some other command if windows or a mac, and you will find that your video card has been given a chunk of address space. If you get through the protection of the cpu/operating system and were to write to an address in that space it will go right out the processor through the pcie controllers and into the video card, either causing havoc or maybe just changing the color of a pixel. You have already dealt with this with your avr and msp430s. Some addresses in the address space are flash, and some are ram, there is some logic outside the cpu core that looks at the cpu cores address bus and makes decisions on where to send that access. So far that flash bank and ram bank and logic are all self contained within the boundaries of the chip, this is not too far of a stretch beyond that the logic responds to an address, and from that creates an external memory cycle, when it is done or the result comes back on a read it completes the internal memory cycle and you go on to the next thing.
Does that make any sense or am I making it worse?
You can use the reserved word register to suggest to the compiler that it put that variable into an internal memory location:
register int iInside;
Use caution; the compiler knows how many bytes of register storage are available, and when all available space is gone it won't matter.
Use register variables only for things that are going to be used very, very frequently, such as counters.

Resources