Significance of Reset Vector in Modern Processors - c

I am trying to understand how computer boots up in very detail.
I came across two things which made me more curious,
1. RAM is placed at the bottom of ROM, to avoid Memory Holes as in Z80 processor.
2. Reset Vector is used, which takes the processor to a memory location in ROM, whose contents point to the actual location (again ROM) from where processor would actually start executing instructions (POST instruction). Why so?
If you still can't understand me, this link will explain you briefly,
http://lateblt.tripod.com/bit68.txt

The processor logic is generally rigid and fixed, thus the term hardware. Software is something that can be changed, molded, etc. thus the term software.
The hardware needs to start some how, two basic methods,
1) an address, hardcoded in the logic, in the processors memory space is read and that value is an address to start executing code
2) an address, hardcoded in the logic, is where the processor starts executing code
When the processor itself is integrated with other hardware, anything can be mapped into any address space. You can put ram at address 0x1000 or 0x40000000 or both. You can map a peripheral to 0x1000 or 0x4000 or 0xF0000000 or all of the above. It is the choice of the system designers or a combination of the teams of engineers where things will go. One important factor is how the system will boot once reset is relesed. The booting of the processor is well known due to its architecture. The designers often choose two paths:
1) put a rom in the memory space that contains the reset vector or the entry point depending on the boot method of the processor (no matter what architecture there is a first address or first block of addresses that are read and their contents drive the booting of the processor). The software places code or a vector table or both in this rom so that the processor will boot and run.
2) put ram in the memory space, in such a way that some host can download a program into that ram, then release reset on the processor. The processor then follows its hardcoded boot procedure and the software is executed.
The first one is most common, the second is found in some peripherals, mice and network cards and things like that (Some of the firmware in /usr/lib/firmware/ is used for this for example).
The bottom line though is that the processor is usually designed with one boot method, a fixed method, so that all software written for that processor can conform to that one method and not have to keep changing. Also, the processor when designed doesnt know its target application so it needs a generic solution. The target application often defines the memory map, what is where in the processors memory space, and one of the tasks in that assignment is how that product will boot. From there the software is compiled and placed such that it conforms to the processors rules and the products hardware rules.

It completely varies by architecture. There are a few reasons why cores might want to do this though. Embedded cores (think along the lines of ARM and Microblaze) tend to be used within system-on-chip machines with a single address space. Such architectures can have multiple memories all over the place and tend to only dictate that the bottom area of memory (i.e. 0x00) contains the interrupt vectors. Then then allows the programmer to easily specify where to boot from. On Microblaze, you can attach memory wherever the hell you like in XPS.
In addition, it can be used to easily support bootloaders. These are typically used as a small program to do a bit of initialization, then fetch a larger program from a medium that can't be accessed simply (e.g. USB or Ethernet). In these cases, the bootloader typically copies itself to high memory, fetches below it and then jumps there. The reset vector simply allows the programmer to bypass the first step.

Related

Are memory mapped registers separate registers on the bus?

I will use the TM4C123 Arm Microcontroller/Board as an example.
Most of it's I/O registers are memory mapped so you can get/set their values using
regular memory load/store instructions.
My questions is, is there some type of register outside of cpu somewhere on the bus which is mapped to memory and you read/write to it using the memory region essentially having duplicate values one on the register and on memory, or the memory IS the register itself?
There are many buses even in an MCU. Bus after bus after bus splitting off like branches in a tree. (sometimes even merging unlike a tree).
It may predate the intel/motorola battle but certainly in that time frame you had segmented vs flat addressing and you had I/O mapped I/O vs memory mapped I/O, since motorola (and others) did not have a separate I/O bus (well one extra...address...signal).
Look at the arm architecture documents and the chip documentation (arm makes IP not chips). You have load and store instructions that operate on addresses. The documentation for the chip (and to some extent ARM provides rules for the cortex-m address space) provides a long list of addresses for things. As a programmer you simply line up the address you do loads and stores with and the right instructions.
Someones marketing may still carry about terms like memory mapped I/O, because intel x86 still exists (how????), some folks will continue to carry those terms. As a programmer, they are number one just bits that go into registers, and for single instructions here and there those bits are addresses. If you want to add adjectives to that, go for it.
If the address you are using based on the chip and core documentation is pointing at an sram, then that is a read or write of memory. If it is a flash address, then that is the flash. The uart, the uart. timer 5, then timer 5 control and status registers. Etc...
There are addresses in these mcus that point at two or three things, but not at the same time. (address 0x00000000 and some number of kbytes after that). But, again, not at the same time. And this overlap at least in many of these cortex-m mcus, these special address spaces are not going to overlap "memory" and peripherals (I/O). But instead places where you can boot the chip and run some code. With these cortex-ms I do not think you can even use the sort of mmu to mix these spaces. Yes definitely in full sized arms and other architectures you can use a fully blow mcu to mix up the address spaces and have a virtual address space that lands on a physical address space of a different type.

What exactly is "memory" in C Programming?

I'm curious to know what "Memory" really stands for.
When I compile and execute this code:
#include <stdio.h>
int main(void)
{
int n = 50;
printf("%p\n", &n);
}
As we know, we get a Hex output like:
0x7ffeee63dabc
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
Where can I read more about this, any references would be helpful. Thank you.
Some Background:
I've recently picked up learning Computer Science again after a break of a few years (I was working in the Industry as a low-code / no-code Web Developer) and realised there's a few gaps in my knowledge I want to colour in.
In learning C (via CS50x) I'm on the week of Memory. And I realise I don't actually know what Memory this is referring to. The course either assumes that the students already know this, or that it isn't pertinent to the context of this course (it's an Intro course so abstractions make sense to avoid going down rabbit holes), but I'm curious and I'd like to chase it down to find out the answers.
computer architecture 101
In your computer there is a CPU chip and there are RAM chips.
The CPU's job is to calculate things. The RAM's job is to remember things.
The CPU is in charge. When it wants to remember something, or look up something it's remembering, it asks the RAM.
The RAM has a bunch of slots where it can store things. Each slot holds 1 byte. The slot number (not the number in the slot, but the number of the slot) is called an address. Each slot has a different address. They start from 0 and go up: 0, 1, 2, 3, 4, ... Like letterboxes on a street, but starting from 0.
The way the CPU tells the RAM which thing to remember is by using a number called an address.
The CPU can say: "Put the number 126 into slot 73224." And it can say, "Which number is in slot 97221?"
We normally write slot numbers (addresses) in hexadecimal, with 0x in front to remind us that they're hexadecimal. It's tradition.
How does the CPU know which address it wants to access? Simple: the program tells it.
operating systems 101
An operating system's job is to keep the system running smoothly
That doesn't happen when faulty programs are allowed to access memory that doesn't belong to them.
So the operating system decides which memory the program is allowed to access, and which memory it isn't. It tells the CPU this information.
The "Am I allowed to access this memory?" information applies in 4 kilobyte chunks called "pages". Either you can access the entire page, or none of it. That's because if every byte had separate access information, you'd need to waste half your RAM just storing the access information!
If you try to access an address in a page that the OS said you can't access, the CPU narcs to the OS, which then stops running your program.
operating systems 102
Remember this shiny new "virtual memory" feature from the Windows 95 days?
"Virtual memory" means the addresses your program uses aren't the real RAM addresses.
Whenever you access an address, the CPU looks up the real address. This also uses pages. So the OS can make any "address page" go to any "real page".
These are not official terms - OS designers actually say that any "virtual page" can "map" to any "physical page".
If the OS wants a physical page but there aren't any left, it can pick one that's already used, save its data onto the disk, make a little note that it's on disk, and then it can reuse the page.
What if the program tries to access a page that's on disk? The OS lies to the CPU: it says "The program is not allowed to access this page." even though it is allowed.
When the CPU narcs to the OS, the OS doesn't stop the program. It pauses the program, finds something else to store on disk to make room, reads in the data for the page the program wants, then it unpauses the program and tells the CPU "actually, he's allowed to access this page now." Neat trick!
So that's virtual memory. The CPU doesn't know the difference between a page that's on disk, and one that's not allocated. Your program doesn't know the difference between a page that's on disk, and one that isn't. Your program just suffers a little hiccup when it has to get something from disk.
The only way to know whether a virtual page is actually stored in RAM (in a physical page), or whether it's on disk, is to ask the OS.
Virtual page numbers don't have to start from 0; the OS can choose any virtual page number it wants.
computer architecture 102
A cache is a little bit of memory in the CPU so it doesn't have to keep asking the RAM chip for things.
The first time the CPU wants to read from a certain address, it asks the RAM chip. Then, it chooses something to delete from its cache, deletes it, and puts the value it just read into the cache instead.
Whenever the CPU wants to read from a certain address, it checks if it's in the cache first.
Things that are in the cache are also in RAM. It's not one or the other.
The cache typically stores chunks of 64 bytes, called cache lines. Not pages!
There isn't a good way to know whether a cache line is stored in the cache or not. Even the OS doesn't know.
programming language design 101
C doesn't want you to know about all this stuff.
C is a set of rules about how you can and can't write programs. The people who design C don't want to have to explain all this stuff, so they make rules about what you can and can't do with pointers, and that's the end of it.
For example, the language doesn't know about virtual memory, because not all types of computers have virtual memory. Dishwashers or microwaves have no use for it and it would be a waste of money.
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
The address 0x7ffeee63dabc means address 0xabc within virtual page 0x7ffeee63d. It might be on your SSD at the moment or in RAM; if you access it then it has to come into RAM. It might also be in cache at the moment but there's no good way to tell. The address doesn't change no matter where it goes to.
You should think of memory as an abstract mapping from addresses to values, nothing more.
Whether your actual hardware implements it as a single chunk of memory, or a complicated hierarchy of caches is not relevant, until you try to optimize for a very specific hardware, which you will not want to do 99% of the time.
In general memory is anything that is stored either temporarily or non-volatile. Temporary memory is lost when the machine is turned off and usually referred as RAM or simply "memory". Non volatile is kept in a hard disk, flash drive, EEPROM, etc. and usually referred as ROM or storage.
Caches are also a type of temporary memory, but they are referred just as cache and is not considered part of the RAM. The RAM on your PC is also referred as "physical memory" or "main memory".
When programming, all the variables are usually in main memory (more on this later) and brought to the caches (L1, L2, etc) when they are being used. But the caches are for the most part transparent for the app developers.
Now there is another thing to mention before I answer your question. The addresses of a program are not necessarily the addresses of the physical memory. The addresses are translated from "virtual addresses" to "physical addresses" by an MMU (memory protection unit) or similar CPU feature. The OS handles the MMU. The MMU is used for many reasons, two reasons are to hide and secure the OS memory and other apps memory from wrong memory accesses by a program. This way a program cannot access nor alter the OS or other program's memory.
Further, when there is not enough RAM to store all the memory that apps are requesting, the OS can store some of that memory in non volatile storage. Using virtual addresses, a program cannot easily know if the memory is actually in RAM or storage. This way programs can allocate a lot more memory than there is RAM. This is also why programs become very slow when they are consuming a lot of memory: it takes a long time to bring the data from storage back into main memory.
So, the address that you are printing is most likely the virtual address.
You can read something about those topics here:
https://en.wikipedia.org/wiki/Memory_management_(operating_systems)
https://en.wikipedia.org/wiki/Virtual_memory
Memory from the C standard point of view is the objects storage. How does it work and how it is organized is left to the implementation.
Even printing the pointers from the C point of view is pointless (it can be informative and interesting from the implementation point of view) and meaningless.
If your code is running under a modern operating system1, pointer values almost certainly correspond to virtual memory addresses, not physical addresses. There's a virtual memory system that maps the virtual address your code sees to a physical address in main memory (RAM), but as pages get swapped in and out that physical address may change.
For desktops, anything newer than the mid-'90s. For mainframes and minis, almost anything newer than the mid-'60s.
Is it a part of my computer's L1 Cache? RAM? SSD?
The short answer is RAM. This address is usually associated with a unique location inside your RAM. The long answer is, well - it depends!
Most machines today have a Memory Management Unit (MMU) which sits in-between the CPU and the peripherals attached to it, translating 'virtual' addresses seen by a program to real ones that actually refer to something physically attached to the bus. Setting up the MMU and allotting memory regions to your program is generally the job of the Operating System. This allows for cool stuff like sharing code/data with other running programs and more.
So the address that you see here may not be the actual physical address of a RAM location at all. However, with the help of the MMU, the OS can accurately and quickly map this number to an actual physical memory location somewhere in the RAM and allow you to store data in RAM.
Now, any accesses to the RAM may be cached in one or more of the available caches. Alternatively, it could happen that your program memory temporarily gets moved to disk (swapfile) to make space for another program. But all of this is completely automatic and transparent to the programmer. As far as your program is concerned, you are directly reading from or writing to the available RAM and the address is your handle to the unique location in RAM that you are accessing.

remapping Interrupt vectors and boot block

I am not able to understand the concept of remapping Interrupt vectors or boot block. What is the use of remapping vector table? How it works with remap and without remap? Any links to good articles on this? I googled for this, but unable to get good answer. What is the advantage of mapping RAM to 0x0000 and mapping whatever existing in 0x0000 to elsewhere? Is it that execution is faster if executed from 0x0000?
It's a simple matter of practicality. The reset vector is at 0x0*, and when the system first powers up the core is going to start fetching instructions from there. Thus you have to have some code available there immediately from powerup - it's got to be some kind of ROM, since RAM would be uninitialised at this point. Now, once you've got through the initial boot process and started your application proper, you have a problem - your exception vectors, and the code to handle them, are in ROM! What if you want to install a different interrupt handler? What if you want to switch the reset vector for a warm-reset handler? By having the vector area remappable, the application is free to switch out the ROM boot firmware for the RAM area in which it's installed its own vectors and handler code.
Of course, this may not always be necessary - e.g. for a microcontroller running a single dedicated application which handles powerup itself - but as soon as you get into the more complex realm of separate bootloaders and application code it becomes more important. Performance is also a theoretical concern, at least - if you have slow flash but fast RAM you might benefit from copying your vectors and interrupt handlers into that RAM - but I think that's far less of an issue on modern micros.
Furthermore, if an application wants to be able to update the boot flash at runtime, then it absolutely needs a way of putting the vectors and handlers elsewhere. Otherwise, if an interrupt fires whilst the flash block is in programming mode, the device will lock up in a recursive hard fault due to not being able to read from the vectors, never finish the programming operation and brick itself.
Whilst most types of ARM core have some means to change their own vector base address, some (like Cortex-M0), not to mention plenty of non-ARM cores, do not, which necessitates this kind of non-architecture-specific system-level remapping functionality to achieve the same result. In the case of microcontrollers built around older cores like ARM7TDMI, it's also quite likely for there to be no RAM behind the fixed alternative "high vectors" address (more suited for use withg an MMU), rendering that option useless.
* Yeah, OK, 0x4 if we're talking Cortex-M, but you know what I mean... ;)

PoU and PoC in cache maintenance operations in arm

When reading ARM arch. ref. manual v7, I've found two concepts; point of coherency (PoC) and point of unification (PoU).
For PoC, it looks like the point that all agents (i.e., CPU cores) can see the same copy of memory.
For PoU, it looks like the point that all agents (in this case, CPU cores and MMU) can see the same copy of memory.
I have several follow up questions:
Is my understanding correct?
If so, If I issue DCCMVAC (Data cache clean MVA to PoC) with giving MVA to 0x40000000, (and let say PoC happen to be 0x70000000),
all cache entries between VA of 0x40000000 and 0x70000000 are cleaned?
Then, if I issue DCCMVAC with MVA 0x0, all data cache entries are cleaned?
PoU sounds like that MMU itself has its own data caches (not TLB) for page table walk inside main memory. Is this correct?
According to ARM training materials:
The PoU (Point of Unification) for a processor is the point (physical location within the hardware) where the instruction and data caches and the translation table walks of the processor are guaranteed to see the same copy of a memory location. For example, a unified level 2 cache would be the point of unification in a system with Harvard level 1 caches and a TLB (to cache page table entries). If no external cache is present, main memory would be the Point of unification.
The PoC (Point of [system] Coherency) is the point at which all blocks (for example, CPUs, DSPs, or DMA engines) which can access memory, are guaranteed for a particular address to see the same copy of a memory location. Typically, this will be the main external system memory.
it's one old case, however, adding some comments in case of someone's search.
in my opinion, PoU and PoC are coined by ARM to define one level for cache maintenance. the definition of PoC and PoU is in ARM ARM specification, while its ARMv8 programming guide (not ARM spec) gives some diagram for better understanding: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch11s04.html
one point is ,under ARM V8 processor's implementation, Iside can snoop Dside, for example, if there is one Icache miss, it will check Dcache, so you could treat PoU as the level of L1 cache. while other ARMv8 processor may not have this behaviour.
back to the original questions:
2) DCCMVAC 0x40000000, it will do cache clean to PoC about this address, mostly one cache line
PoC is defined by SoC implementation, not by address.
3) considering Q2, DCCMVAC 0x0, only applies to one cache line.
if you want to clean and invalidate the whole cache, you need use by set/way to walkthrough the whole cache.
4) PoU has nothing to do with MMU.
MMU hardware block owns some buffers to save TLB entries, it's one common practice, as for pagetable, which is built by software as in the memory, normally it's defined as normal memory type, so it could be in the cache during setup by CPU instruction, or walk by MMU hardware walk unit.

Flow of Startup code in an embedded system , concept of boot loader?

I am working with an embedded board , but i don't know the flow of the start up code(C/assembly) of the same.
Can we discuss the general modules/steps acted upon by the start up action in the case of an embedded system.
Just a high level overview(algorithmic) is enough.All examples are welcome.
/Kanu__
CPU gets a power on reset, and jumps to a defined point: the reset vector, beginning of flash, ROM, etc.
The startup code (crt - C runtime) is run. This is an important piece of code generated by your compiler/libc, which performs:
Configure and turn on any external memory (if absolutely required, otherwise left for later user code).
Establish a stack pointer
Clear the .bss segment (usually). .bss is the name for the uninitialized (or zeroed) global memory region. Global variables, arrays, etc which don't have an initializing value (beyond 0) are located here. The general practice on a microcontroller is to loop over this region and set all bytes to 0 at startup.
Copy from the end of .text the non-const .data. As most microcontrollers run from flash, they cannot store variable data there. For statements such as int thisGlobal = 5;, the value of thisGlobal must be copied from a persistent area (usually after the program in flash, as generated by your linker) to RAM. This applies to static values, and static values in functions. Values which are left undefined are not copied but instead cleared as part of step 2.
Perform other static initializers.
Call main()
From here, your code is run. Generally, the CPU is left in an interrupts-off state (platform dependent).
Pretty open-ended question, but here are a few things I have picked up.
For super simple processors, there is no true startup code. The cpu gets power and then starts running the first instruction in its memory: no muss no fuss.
A little further up we have mcu's like avr's and pic's. These have very little start up code. The only thing that really needs to be done is to set up the interrupt jump table with appropriate addresses. After that it is up to the application code (the only program) to do its thing. The good news is that you as the developer doesn't generally have to worry about these things: that's what libc is for.
After that we have things like simple arm based chips; more complicated than the avr's and pic's, but still pretty simple. These also have to setup the interrupt table, as well as make sure the clock is set correctly, and start any needed on chip components (basic interrupts etc.). Have a look at this pdf from Atmel, it details the start up procedure for an ARM 7 chip.
Farther up the food chain we have full-on PCs (x86, amd64, etc.). The startup code for these is really the BIOS, which are horrendously complicated.
The big question is whether or not your embedded system will be running an operating system. In general, you'll either want to run your operating system, start up some form of inversion of control (an example I remember from a school project was a telnet that would listen for requests using RL-ARM or an open source tcp/ip stack and then had callbacks that it would execute when connections were made/data was received), or enter your own control loop (maybe displaying a menu then looping until a key has been pressed).
Functions of Startup Code for C/C++
Disables all interrupts
Copies any initialized data from ROM to RAM
Uninitialized data area is set to zero.
Allocates space for and initializes the stack
Initializes the processor’s stack pointer
Creates and initializes the heap
Executes the constructors and initializers for all global variables (C++ only)
Enables interrupts
Calls main
Where is "BOOT LOADER" placed then? It should be placed before the start-up code right?
As per my understanding, from the reset vector the control goes to the boot loader. There the code waits for a small period of time during which it expects for data to be flashed/downloaded to the controller/processor. If it does not detect a data then the control gets transferred to the next step as specified by theatrus. But my doubt is whether the BOOT LOADER code can be re-written. Eg: Can a UART bootloader be changed to a ETHERNET/CAN bootloader or is it that data sent using any protocol are converted to UART using a gateway and then flashed.

Resources