remapping Interrupt vectors and boot block - arm

I am not able to understand the concept of remapping Interrupt vectors or boot block. What is the use of remapping vector table? How it works with remap and without remap? Any links to good articles on this? I googled for this, but unable to get good answer. What is the advantage of mapping RAM to 0x0000 and mapping whatever existing in 0x0000 to elsewhere? Is it that execution is faster if executed from 0x0000?

It's a simple matter of practicality. The reset vector is at 0x0*, and when the system first powers up the core is going to start fetching instructions from there. Thus you have to have some code available there immediately from powerup - it's got to be some kind of ROM, since RAM would be uninitialised at this point. Now, once you've got through the initial boot process and started your application proper, you have a problem - your exception vectors, and the code to handle them, are in ROM! What if you want to install a different interrupt handler? What if you want to switch the reset vector for a warm-reset handler? By having the vector area remappable, the application is free to switch out the ROM boot firmware for the RAM area in which it's installed its own vectors and handler code.
Of course, this may not always be necessary - e.g. for a microcontroller running a single dedicated application which handles powerup itself - but as soon as you get into the more complex realm of separate bootloaders and application code it becomes more important. Performance is also a theoretical concern, at least - if you have slow flash but fast RAM you might benefit from copying your vectors and interrupt handlers into that RAM - but I think that's far less of an issue on modern micros.
Furthermore, if an application wants to be able to update the boot flash at runtime, then it absolutely needs a way of putting the vectors and handlers elsewhere. Otherwise, if an interrupt fires whilst the flash block is in programming mode, the device will lock up in a recursive hard fault due to not being able to read from the vectors, never finish the programming operation and brick itself.
Whilst most types of ARM core have some means to change their own vector base address, some (like Cortex-M0), not to mention plenty of non-ARM cores, do not, which necessitates this kind of non-architecture-specific system-level remapping functionality to achieve the same result. In the case of microcontrollers built around older cores like ARM7TDMI, it's also quite likely for there to be no RAM behind the fixed alternative "high vectors" address (more suited for use withg an MMU), rendering that option useless.
* Yeah, OK, 0x4 if we're talking Cortex-M, but you know what I mean... ;)

Related

Using OpenOCD to determine RAM usage in microcontroller (ARM Cortex-M3)

I'd like to see how much RAM is used by the firmware by writing a known pattern, and comparing RAM contents to see how much has been modified.
I've tried
reset halt
load_image pattern.bin 0xaddress
resume
(let target run for a bit)
halt
dump_image sram.bin 0xaddress 0xsize
but it appears I have obtained flash contents and cannot see the test pattern anywhere.
Am I using the proper commands? If I "verify" manually by loading and dumping, the data is identical.
Could halt affect the RAM contents? Otherwise, is it safe to assume that the application in fact initializes all of the RAM, making analysis difficult/impossible?
I should point out that I only have a "dump" of the firmware, i.e. I am not building it.
I had to do soft_reset_halt to get the PC to the reset vector address.
My version of OpenOCD warns me that the command is deprecated.
Then I was able to spot a few occurrences of my test pattern in the RAM dump.
Also, there are notable differences between the RAM image and the firmware, so it seems that the firmware is indeed using most of the RAM.
(this might not be an issue if your interface is using a physical reset line?)

Vector table relocation in bootloader application

I've written a bootloader application for NXP Kinetis microcontroller. Following are the things I did to do the same: 1. Created a bootloader application in CFlash addresses 0x0000 to 0x8000 2. Created my main application code from addresses 0x8000 to 0x1FFFF
This code is working fine. Now my doubt is, I have ISRs placed in both bootloader as well as main application code and didn't use any ISR vector relocation. Is it necessary to relocate the vector tables in main application?
PS: I may not be facing the issue just because the ISRs in both the apps are same.
On most modern MCUs the vector table relocation is not required, as the vector table base address can be specified as a parameter when compiling an application.
If your target's doesn't have such feature and the vector table is in the bootloader are 0x0000 to 0x8000 then you will need to relocate the vector table for the application so that an interrupt occurring in the application results in jumping to the correct handler.
Although I don't know the specifics of a Kinetis microcontroller, the following is based on general behavior of other Freescale/NXP controllers.
A bootloader is meant to allow you to update your firmware. (Otherwise, you don't need one.) And, a bootloader has to be kept in protected memory to prevent accidental erasures. By protecting the bootloader you also protect the vectors. So, you can't update the vectors anymore.
Unless you go to extremes to guarantee each firmware update will have the ISR code start at the exact same address as in the previous version(s), you'd rather be able to have ISRs move freely in the address space. That's where vector relocation or redirection comes in to play.
Currently, you have both bootloader and app use the same addresses in both sets of vectors, and everything works fine.
As soon as you update your firmware to another version where the ISR entry points most likely have moved address, your code will stop working because the MCU/bootloader will be sending the ISR events to the wrong addresses.
If you enable/implement vector relocation/redirection, the original bootloader vectors will effectively be ignored, and the relocated vectors will be used. Since these are updated along with your application, no problem.
There are two methods for vector relocation. One is hardware based (has the advantage of no ISR call overhead) and the other is software based (some minimal overhead but can be implemented even in microcontrollers that have no hardware vector redirection available).

Real-life use cases of barriers (DSB, DMB, ISB) in ARM

I understand that DSB, DMB, and ISB are barriers for prevent reordering of instructions.
I also can find lots of very good explanations for each of them, but it is pretty hard to imagine the case that I have to use them.
Also, from the open source codes, I see those barriers from time to time, but it is quite hard to understand why they are used. Just for an example, in Linux kernel 3.7 tcp_rcv_synsent_state_process function, there is a line as follows:
if (unlikely(po->origdev))
sll->sll_ifindex = orig_dev->ifindex;
else
sll->sll_ifindex = dev->ifindex;
smp_mb();
if (po->tp_version <= TPACKET_V2)
__packet_set_status(po, h.raw, status);
where smp_mb() is basically DMB.
Could you give me some of your real-life examples?
It would help understand more about barriers.
Sorry, not going to give you a straight-out example like you're asking, because as you are already looking through the Linux source code, you have plenty of those to go around, and they don't appear to help. No shame in that - every sane person is at least initially confused by memory access ordering issues :)
If you are mainly an application developer, then there is every chance you won't need to worry too much about it - whatever concurrency frameworks you use will resolve it for you.
If you are mainly a device driver developer, then examples are fairly straightforward to find - whenever there is a dependency in your code on a previous access having had an effect (cleared an interrupt source, written a DMA descriptor) before some other access is performed (re-enabling interrupts, initiating the DMA transaction).
If you are in the process of developing a concurrency framework (, or debugging one), you probably need to read up on the topic a bit more - but your question suggests a superficial curiosity rather than an immediate need?
If you are developing your own method for passing data between threads, not based on primitives provided by a concurrency framework, that is for all intents and purposes a concurrency framework.
Paul McKenney wrote an excellent paper on the need for memory barriers, and what effects they actually have in the processor: Memory Barriers: a Hardware View for Software Hackers
If that's a bit too hardcore, I wrote a 3-part blog series that's a bit more lightweight, and finishes off with an ARM-specific view. First part is Memory access ordering - an introduction.
But if it is specifically lists of examples you are after, especially for the ARM architecture, you could do a lot worse than Barrier Litmus Tests and Cookbook.
The extra-extra light programmer's view and not entirely architecturally correct version is:
DMB - whenever a memory access requires ordering with regards to another memory access.
DSB - whenever a memory access needs to have completed before program execution progresses.
ISB - whenever instruction fetches need to explicitly take place after a certain point in the program, for example after memory map updates or after writing code to be executed. (In practice, this means "throw away any prefetched instructions at this point".)
Usually you need to use a memory barrier in cases where you have to make SURE that memory access occurs in a specific order. This might be required for a number of reasons, usually it's required when two or more processes/threads or a hardware component access the same memory structure, which has to be kept consistent.
It's used very often in DMA-transfers. A simple DMA control structures might look like this:
struct dma_control {
u32 owner;
void * data;
u32 len;
};
The owner will usually be set to something like OWNER_CPU or OWNER_HARDWARE, to indicate who of the two participants is allowed to work with the structure.
Code which changes this will usually like like this
dma->data = data;
dma->len = length;
smp_mb();
dma->owner = OWNER_HARDWARE;
So, data an len are always set before the ownership gets transfered to the DMA hardware. Otherwise the engine might get stale data, like a pointer or length which was not updated, because the CPU reordered the memory access.
The same goes for processes or threads running on different cores. The could communicate in a similar manner.
One simple example of a barrier requirement is a spinlock. If you implement a spinlock using compare-and-swap(or LDREX/STREX on ARM) and without a barrier, the processor is allowed to speculatively load values from memory and lazily store computed values to memory, and neither of those are required to happen in the order of the loads/stores in the instruction stream.
The DMB in particular prevents memory access reordering around the DMB. Without DMB, the processor could reorder a store to memory protected by the spinlock after the spinlock is released. Or the processor could read memory protected by the spinlock before the spinlock was actually locked, or while it was locked by a different context.
unixsmurf already pointed it out, but I'll also point you toward Barrier Litmus Tests and Cookbook. It has some pretty good examples of where and why you should use barriers.

Significance of Reset Vector in Modern Processors

I am trying to understand how computer boots up in very detail.
I came across two things which made me more curious,
1. RAM is placed at the bottom of ROM, to avoid Memory Holes as in Z80 processor.
2. Reset Vector is used, which takes the processor to a memory location in ROM, whose contents point to the actual location (again ROM) from where processor would actually start executing instructions (POST instruction). Why so?
If you still can't understand me, this link will explain you briefly,
http://lateblt.tripod.com/bit68.txt
The processor logic is generally rigid and fixed, thus the term hardware. Software is something that can be changed, molded, etc. thus the term software.
The hardware needs to start some how, two basic methods,
1) an address, hardcoded in the logic, in the processors memory space is read and that value is an address to start executing code
2) an address, hardcoded in the logic, is where the processor starts executing code
When the processor itself is integrated with other hardware, anything can be mapped into any address space. You can put ram at address 0x1000 or 0x40000000 or both. You can map a peripheral to 0x1000 or 0x4000 or 0xF0000000 or all of the above. It is the choice of the system designers or a combination of the teams of engineers where things will go. One important factor is how the system will boot once reset is relesed. The booting of the processor is well known due to its architecture. The designers often choose two paths:
1) put a rom in the memory space that contains the reset vector or the entry point depending on the boot method of the processor (no matter what architecture there is a first address or first block of addresses that are read and their contents drive the booting of the processor). The software places code or a vector table or both in this rom so that the processor will boot and run.
2) put ram in the memory space, in such a way that some host can download a program into that ram, then release reset on the processor. The processor then follows its hardcoded boot procedure and the software is executed.
The first one is most common, the second is found in some peripherals, mice and network cards and things like that (Some of the firmware in /usr/lib/firmware/ is used for this for example).
The bottom line though is that the processor is usually designed with one boot method, a fixed method, so that all software written for that processor can conform to that one method and not have to keep changing. Also, the processor when designed doesnt know its target application so it needs a generic solution. The target application often defines the memory map, what is where in the processors memory space, and one of the tasks in that assignment is how that product will boot. From there the software is compiled and placed such that it conforms to the processors rules and the products hardware rules.
It completely varies by architecture. There are a few reasons why cores might want to do this though. Embedded cores (think along the lines of ARM and Microblaze) tend to be used within system-on-chip machines with a single address space. Such architectures can have multiple memories all over the place and tend to only dictate that the bottom area of memory (i.e. 0x00) contains the interrupt vectors. Then then allows the programmer to easily specify where to boot from. On Microblaze, you can attach memory wherever the hell you like in XPS.
In addition, it can be used to easily support bootloaders. These are typically used as a small program to do a bit of initialization, then fetch a larger program from a medium that can't be accessed simply (e.g. USB or Ethernet). In these cases, the bootloader typically copies itself to high memory, fetches below it and then jumps there. The reset vector simply allows the programmer to bypass the first step.

Flow of Startup code in an embedded system , concept of boot loader?

I am working with an embedded board , but i don't know the flow of the start up code(C/assembly) of the same.
Can we discuss the general modules/steps acted upon by the start up action in the case of an embedded system.
Just a high level overview(algorithmic) is enough.All examples are welcome.
/Kanu__
CPU gets a power on reset, and jumps to a defined point: the reset vector, beginning of flash, ROM, etc.
The startup code (crt - C runtime) is run. This is an important piece of code generated by your compiler/libc, which performs:
Configure and turn on any external memory (if absolutely required, otherwise left for later user code).
Establish a stack pointer
Clear the .bss segment (usually). .bss is the name for the uninitialized (or zeroed) global memory region. Global variables, arrays, etc which don't have an initializing value (beyond 0) are located here. The general practice on a microcontroller is to loop over this region and set all bytes to 0 at startup.
Copy from the end of .text the non-const .data. As most microcontrollers run from flash, they cannot store variable data there. For statements such as int thisGlobal = 5;, the value of thisGlobal must be copied from a persistent area (usually after the program in flash, as generated by your linker) to RAM. This applies to static values, and static values in functions. Values which are left undefined are not copied but instead cleared as part of step 2.
Perform other static initializers.
Call main()
From here, your code is run. Generally, the CPU is left in an interrupts-off state (platform dependent).
Pretty open-ended question, but here are a few things I have picked up.
For super simple processors, there is no true startup code. The cpu gets power and then starts running the first instruction in its memory: no muss no fuss.
A little further up we have mcu's like avr's and pic's. These have very little start up code. The only thing that really needs to be done is to set up the interrupt jump table with appropriate addresses. After that it is up to the application code (the only program) to do its thing. The good news is that you as the developer doesn't generally have to worry about these things: that's what libc is for.
After that we have things like simple arm based chips; more complicated than the avr's and pic's, but still pretty simple. These also have to setup the interrupt table, as well as make sure the clock is set correctly, and start any needed on chip components (basic interrupts etc.). Have a look at this pdf from Atmel, it details the start up procedure for an ARM 7 chip.
Farther up the food chain we have full-on PCs (x86, amd64, etc.). The startup code for these is really the BIOS, which are horrendously complicated.
The big question is whether or not your embedded system will be running an operating system. In general, you'll either want to run your operating system, start up some form of inversion of control (an example I remember from a school project was a telnet that would listen for requests using RL-ARM or an open source tcp/ip stack and then had callbacks that it would execute when connections were made/data was received), or enter your own control loop (maybe displaying a menu then looping until a key has been pressed).
Functions of Startup Code for C/C++
Disables all interrupts
Copies any initialized data from ROM to RAM
Uninitialized data area is set to zero.
Allocates space for and initializes the stack
Initializes the processor’s stack pointer
Creates and initializes the heap
Executes the constructors and initializers for all global variables (C++ only)
Enables interrupts
Calls main
Where is "BOOT LOADER" placed then? It should be placed before the start-up code right?
As per my understanding, from the reset vector the control goes to the boot loader. There the code waits for a small period of time during which it expects for data to be flashed/downloaded to the controller/processor. If it does not detect a data then the control gets transferred to the next step as specified by theatrus. But my doubt is whether the BOOT LOADER code can be re-written. Eg: Can a UART bootloader be changed to a ETHERNET/CAN bootloader or is it that data sent using any protocol are converted to UART using a gateway and then flashed.

Resources