ARM Cortex-m4 boot sequence - arm

I am a bit confused about boot sequence of ARM Cortex-m processors. From many different resources, i read that upon reset, the cortex-m copies contents from 0x0 to stack pointer and copies reset handler address from 0x4 to PC...
My questions are:
1) how the cortex-m processor copies these two values to appropriate registers, I mean processor need LDR/STR instruction to do so, but here values are automatically copied??? How the processor know thats these two words need to be copied.
2) does cortex-m controller contains any builtin firmware that is executed initially?
3) Normally processors after reset, start execting from a specific memory location in reset vector where the jump instruction is placed to reset handler... but here in cortex-m the processors start by copying first two words into registers and then Program counter points to Reset Handler... No jump instruction no Specific memory location where processor jump on reset.!!! How is it possible??

2) does cortex-m controller contains any builtin firmware that is executed initially?
Depends highly on the model and make. Example: NXP LPC series Cortex-M chips (like LPC17xx) have some masked ROM instructions that are executed before the program in flash. Others may have no such memory build in.
1) how the cortex-m processor copies these two values to appropriate registers, I mean processor need LDR/STR instruction to do so
This happens in hardware before any code execution, so no LDR instructions needed.
Its ridiculously simple, if you know what a state machine is and how to implement one in a hardware description language like VHDL or Verilog.

Related

Are memory mapped registers separate registers on the bus?

I will use the TM4C123 Arm Microcontroller/Board as an example.
Most of it's I/O registers are memory mapped so you can get/set their values using
regular memory load/store instructions.
My questions is, is there some type of register outside of cpu somewhere on the bus which is mapped to memory and you read/write to it using the memory region essentially having duplicate values one on the register and on memory, or the memory IS the register itself?
There are many buses even in an MCU. Bus after bus after bus splitting off like branches in a tree. (sometimes even merging unlike a tree).
It may predate the intel/motorola battle but certainly in that time frame you had segmented vs flat addressing and you had I/O mapped I/O vs memory mapped I/O, since motorola (and others) did not have a separate I/O bus (well one extra...address...signal).
Look at the arm architecture documents and the chip documentation (arm makes IP not chips). You have load and store instructions that operate on addresses. The documentation for the chip (and to some extent ARM provides rules for the cortex-m address space) provides a long list of addresses for things. As a programmer you simply line up the address you do loads and stores with and the right instructions.
Someones marketing may still carry about terms like memory mapped I/O, because intel x86 still exists (how????), some folks will continue to carry those terms. As a programmer, they are number one just bits that go into registers, and for single instructions here and there those bits are addresses. If you want to add adjectives to that, go for it.
If the address you are using based on the chip and core documentation is pointing at an sram, then that is a read or write of memory. If it is a flash address, then that is the flash. The uart, the uart. timer 5, then timer 5 control and status registers. Etc...
There are addresses in these mcus that point at two or three things, but not at the same time. (address 0x00000000 and some number of kbytes after that). But, again, not at the same time. And this overlap at least in many of these cortex-m mcus, these special address spaces are not going to overlap "memory" and peripherals (I/O). But instead places where you can boot the chip and run some code. With these cortex-ms I do not think you can even use the sort of mmu to mix these spaces. Yes definitely in full sized arms and other architectures you can use a fully blow mcu to mix up the address spaces and have a virtual address space that lands on a physical address space of a different type.

Vector table relocation in bootloader application

I've written a bootloader application for NXP Kinetis microcontroller. Following are the things I did to do the same: 1. Created a bootloader application in CFlash addresses 0x0000 to 0x8000 2. Created my main application code from addresses 0x8000 to 0x1FFFF
This code is working fine. Now my doubt is, I have ISRs placed in both bootloader as well as main application code and didn't use any ISR vector relocation. Is it necessary to relocate the vector tables in main application?
PS: I may not be facing the issue just because the ISRs in both the apps are same.
On most modern MCUs the vector table relocation is not required, as the vector table base address can be specified as a parameter when compiling an application.
If your target's doesn't have such feature and the vector table is in the bootloader are 0x0000 to 0x8000 then you will need to relocate the vector table for the application so that an interrupt occurring in the application results in jumping to the correct handler.
Although I don't know the specifics of a Kinetis microcontroller, the following is based on general behavior of other Freescale/NXP controllers.
A bootloader is meant to allow you to update your firmware. (Otherwise, you don't need one.) And, a bootloader has to be kept in protected memory to prevent accidental erasures. By protecting the bootloader you also protect the vectors. So, you can't update the vectors anymore.
Unless you go to extremes to guarantee each firmware update will have the ISR code start at the exact same address as in the previous version(s), you'd rather be able to have ISRs move freely in the address space. That's where vector relocation or redirection comes in to play.
Currently, you have both bootloader and app use the same addresses in both sets of vectors, and everything works fine.
As soon as you update your firmware to another version where the ISR entry points most likely have moved address, your code will stop working because the MCU/bootloader will be sending the ISR events to the wrong addresses.
If you enable/implement vector relocation/redirection, the original bootloader vectors will effectively be ignored, and the relocated vectors will be used. Since these are updated along with your application, no problem.
There are two methods for vector relocation. One is hardware based (has the advantage of no ISR call overhead) and the other is software based (some minimal overhead but can be implemented even in microcontrollers that have no hardware vector redirection available).

ARM Cortex M3 Bootloader with User code at normal 0 address?

I have been researching ARM M3 bootloaders and most seem to work with the bootloader code sitting in low memory and the user code in higher memory. This requires the users application to be linked differently when used with the bootloader. That is some address above the bootloader code which is located at 0x00000000. I think this an inconvenience.
I would like to know if it is possible to have a bootloader which is located in high memory and yet still allow the users application to be linked normally so it starts at 0x00000000 in low memory ?
I have done this before with PIC bootloaders which also have reset & stack code at 0x0000 similar to ARM M3's NVIC table.
They way this has worked form me on the PIC is that bootloader is located in the top 1k of the flash. Reset vector and SP code which is at 0x0000 points to the start address of the bootloader code. Thus the bootloader starts after reset.
Now when the bootloaders is downloading the user code it puts it into low memory where the linker thinks it should go, the only exception being it does not allow overwriting of its own reset and SP vectors located at 0x0000. Instead it 'catches' these and stores them in flash within its high memory.
When the bootloader is ready for the users code to start it retrieves the initial reset and SP vectors it stored during the download and starts the user code using these.
At the next reset the bootloader will start and it can do a check to see if it should wait for a new user program or just start the users code using the vectors it has stored.
Could the above method be translated for use on the ARM M3 ?
I think the expression 'go with the flow' applies here. You are trying to do something that the ARM is not intended to do so I think you are making unnecessary problems for yourself. Putting the app above the bootloader just requires linking at a different address then setting the VTOR register to that address.

Significance of Reset Vector in Modern Processors

I am trying to understand how computer boots up in very detail.
I came across two things which made me more curious,
1. RAM is placed at the bottom of ROM, to avoid Memory Holes as in Z80 processor.
2. Reset Vector is used, which takes the processor to a memory location in ROM, whose contents point to the actual location (again ROM) from where processor would actually start executing instructions (POST instruction). Why so?
If you still can't understand me, this link will explain you briefly,
http://lateblt.tripod.com/bit68.txt
The processor logic is generally rigid and fixed, thus the term hardware. Software is something that can be changed, molded, etc. thus the term software.
The hardware needs to start some how, two basic methods,
1) an address, hardcoded in the logic, in the processors memory space is read and that value is an address to start executing code
2) an address, hardcoded in the logic, is where the processor starts executing code
When the processor itself is integrated with other hardware, anything can be mapped into any address space. You can put ram at address 0x1000 or 0x40000000 or both. You can map a peripheral to 0x1000 or 0x4000 or 0xF0000000 or all of the above. It is the choice of the system designers or a combination of the teams of engineers where things will go. One important factor is how the system will boot once reset is relesed. The booting of the processor is well known due to its architecture. The designers often choose two paths:
1) put a rom in the memory space that contains the reset vector or the entry point depending on the boot method of the processor (no matter what architecture there is a first address or first block of addresses that are read and their contents drive the booting of the processor). The software places code or a vector table or both in this rom so that the processor will boot and run.
2) put ram in the memory space, in such a way that some host can download a program into that ram, then release reset on the processor. The processor then follows its hardcoded boot procedure and the software is executed.
The first one is most common, the second is found in some peripherals, mice and network cards and things like that (Some of the firmware in /usr/lib/firmware/ is used for this for example).
The bottom line though is that the processor is usually designed with one boot method, a fixed method, so that all software written for that processor can conform to that one method and not have to keep changing. Also, the processor when designed doesnt know its target application so it needs a generic solution. The target application often defines the memory map, what is where in the processors memory space, and one of the tasks in that assignment is how that product will boot. From there the software is compiled and placed such that it conforms to the processors rules and the products hardware rules.
It completely varies by architecture. There are a few reasons why cores might want to do this though. Embedded cores (think along the lines of ARM and Microblaze) tend to be used within system-on-chip machines with a single address space. Such architectures can have multiple memories all over the place and tend to only dictate that the bottom area of memory (i.e. 0x00) contains the interrupt vectors. Then then allows the programmer to easily specify where to boot from. On Microblaze, you can attach memory wherever the hell you like in XPS.
In addition, it can be used to easily support bootloaders. These are typically used as a small program to do a bit of initialization, then fetch a larger program from a medium that can't be accessed simply (e.g. USB or Ethernet). In these cases, the bootloader typically copies itself to high memory, fetches below it and then jumps there. The reset vector simply allows the programmer to bypass the first step.

How is data from the RAM fetched?

In C each byte is individually addressable. Suppose an integer (say which uses 4 bytes) has an address 0xaddr (which is 32 bits, assuming that we have a 32 bit processor with 32 bit address bus and 32 bit data bus) and suppose the value of the integer is 0x12345678. Now if I am fetching this value from memory, how does the processor do this ? Does the processor place 0xaddr (which is 32bit address) on the address lines and then fetch 8 bit data say 0x12. And then processor will pace 0xaddr+1 on address lines and then fetch another 8 bit data 0x34 and so on for the 4 bytes of an integer? Or does the processor just place 0xaddr and read the 4 bytes at once thus utilizing its full 32 bit data bus?
This is a well known article by the GNU C library lead that describes memory access (particularly in x86 - current PC - systems). It goes into far more detail than you can ever possibly need.
The entire article is spread across many parts:
Introduction
CPU Caches
Virtual Memory
NUMA Support
Programmers
More Programmers
Performance Tools
Future
Appendices
one thing i'd add to gbulmer's answer is that in many systems getting a stream of data is faster than you would expect from getting a single word. in other words, selecting where you want to read from takes some time, but one you have that selected, reading from that point, and then the next 32 or 64 or whatever bits, and then the next... is faster than switching to some unconnected place and reading another value.
and what dominates modern programming is not the behaviour of fetching from memory on the motherboard, but whether the data are in a cpu cache.
If you search the web for "Computer Architecture" you are likely to get some answers to your questions.
For your specific question, a 32bit computer, with a 32bit data and address bus, for a simple case, with no obfuscating hardware. It will read 32bits from a 32bit wide memory.
This is the sort of hardware which existed since the late 1970's as minicompters (e.g. DEC VAX), and still exists as microprocessors (x86, ARM, Cortex-A8, MIPS32) and inside some microcontrollers (e.g. ARM, Cortex-M3, PIC32, etc.).
The simplest case:
The address bus is a set of signals (wires) which carry address signals to memory, plus a few more signals to communicate whether memory is to be 'read from' or 'written to' (data direction), and whether the signals on the address and data direction wires are valid. In the case of your example, there might be 32 wires to carry the bit pattern of the address.
The data bus is a second set of wires which communicate the value to and from memory. Memory might assert a signal to say the data is valid, but it might just be fast enough that everything 'just works'.
When the processor puts the address on the address signals, says it wants to read from memory (data direction is 'read'), memory will retrieve the value stored at that address, and put it onto the data bus signals. The processor (after suitable delays and signals) will sample the data bus wires, and that is the value it uses.
The processor might read the whole 32bits, and extract a byte (if that is all the instruction requires) internally, or the external address bus may provide extra signals so that the external memory system can be built to provide the appropriate byte, double byte or quad byte values. For many years, versions of the ARM processor architecture could only read the whole 32bits, and smaller pieces, e,g, a byte, were extracted internally.
You can see an example of this sort of signal set at http://www.cpu-world.com/info/Pinouts/68000.html
That chip only has a 24bit address bus, and a 16bit data bus.
It has two signals (UDS and LDS) which signal whether the upper data signals are being used, the lower data signals, or both.
I found a reasonably detailed explanation at research.cs.tamu.edu/prism/lectures/mbsd/mbsd_l15.pdf
I found that by searching for "68000 memory bus cycle".
You might usefully look for MIPS, ARM, or x86 to see their bus cycle.

Resources