external abort in arm processor [closed] - arm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What is a typical external abort on an arm processor?
How does it differ from a normal data abort and prefetch abort?
How does it inform an application about external abort?

usually ARM processor comes with co processor one of its co processor is cp15 which acts as a MMU.
If there is any such virtual address for which MMU is not able to find any page or it encounter with translation fault called data or prefetch abort depends upon the respective path (I Cache or D Cache).
http://infocenter.arm.com/help/index.jsptopic=/com.arm.doc.ddi0438d/BABFFDFD.html
Suppose you encounter with such a virtual address that has a valid physical address in mapping but the physical address itself is not valid (or the address belongs to the secure world i.e. trust zone) , system bus will generate a abort in this case , because it will not be able to decode the physical address.
In a simple word all aborts which will not detect by MMU called as external abort , and application will be notified with SIGBUS signal
What is a typical external abort on an arm processor?
A typical external abort is something hardware related. It is not typically possible for a user process to cause this. Typical causes are not enabling clocks to an SOC module and/or initializing dependant SOC blocks (bus configuration, pin multiplexing, etc.). It will also happen with TrustZone when accessing protected memory (Ie, secure memory from the normal world).
How does it differ from a normal data abort and prefetch abort?
A normal data abort and prefetch are using memory not mapped by an MMU. An external abort is mapped, but when the CPU runs the bus cycle, the peripheral at the physical address does not respond (or sends an error back to the CPU).
How does it inform an application about external abort?
As an external abort means some external to the CPU (aka hardware), it is not normally the case that an application will get an external abort. An application should not deal with hardware directly.

Related

Enforcing the type of medium on a Virtual Memory system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Suppose I'm designing a software application that requires high bandwidth / low latency memory transfers to operate properly.
My OS uses Virtual Memory addressing.
Is there a way to enforce the variables (that I choose) to be located in DDR and not on the hard drive for example?
You're conflating virtual memory with swap memory: Virtual memory just means, that the address space in which a process operates is an abstraction that presents a very orderly structure, while the actual physical address space is occupied in almost a chaotic manner. And yes, virtual memory is part of memory page swapping, but it's not a synonym for it.
One way to achieve what you want is to simply turn off page swapping for the whole system. It can also be done for specific parts of virtual address space. But before I explain you how to that, I need to tell you this:
You're approaching this from the wrong angle. The system main memory banks you're referring to as DDR (which is just a particular transfer clocking mode, BTW) are just one level in a whole hierarchy of memory. But actually even system main memory is slow compared to the computational throughput of processors. And this has been so since the dawn of computing. This is why computers have cache memory; small amounts of fast memory. And on modern architectures these caches also form the interface between caching hierarchy layers.
If you perform a memory operation on a modern CPU, this memory operation will hit the cache. If it's a read and the cache is hot, the cache will deliver, otherwise it escalates the operation to the next layer. Writes will affect only the caches on the short term and only propagate to main memory through cache eviction or explicit memory barriers.
Normally you don't want to interfere with the decisions an OS takes regarding virtual memory management; you'll hardly able to outsmart it. If you have a bunch of data sitting in memory which you access at a high frequency, then the memory management will see that and don't even consider paging out that part of memory. I think I'll have to write that out again, in clear words: On every modern OS, regions of memory that are in active and repeated use will not be paged out. If swapping happens, then, because the system is running out of memory and tries to juggle stuff around. This is called Thrashing and locking pages into memory will not help against it; all it will do is forcing the OS to go around and kill processes that hog memory (likely your process) to get some breathing space.
Anyway, if you really feel you want to lock pages into memory, have a look at that mlock(2) syscall.
As far as I can tell, there is no way to force certain variables to be stored in DDR vs. HDD, when virtual-memory handles the memory translations. What you can do is to configure your operating system to use different types of secondary storage for virtual memory - such as solid state disks, HDD, etc.

ARM Linux reboot process [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How reboot procedure works on ARM SOCs running Linux, e.g do boot loaders reinitialize DDR memory? can anybody please explain me rebooting process in detail.
How reboot procedure works on ARM SOCs running Linux, ... ?
The typical ARM processor in use today is integrated with peripherals on a single IC called a SoC, system on a chip. Typically the reboot procedure is nearly identical to a power-on boot procedure. On a reset the ARM processor typically jumps to address 0.
Main memory, e.g. DRAM, and non-volatile storage, e.g. NAND flash, are typically external to the SoC (that is Linux capable) for maximum design flexibility.
But typically there is a small (perhaps 128KB) embedded ROM (read-only memory) to initialize the minimal system components (e.g. clocks, external memories) to begin bootstrap operations. A processor reset will cause execution of this boot ROM. (This ROM is truly read-only, and cannot be modified. The code is masked into the silicon during chip fabrication.)
The SoC may have a strapping option to instead execute an external boot memory, such as NOR flash or EEPROM, which can be directly executed (i.e. XIP, execute in place).
The salient characteristic of any ROM, flash, and SRAM that the first-stage boot program uses is that these memories must be accessible immediately after a reset.
One of the problems of bootstrapping a system that uses DRAM for main memory is its hardware initialization. The DRAM memory controller has to be initialized with board-specific parameters before code can be loaded into DRAM and executed. So from where does this board-specific initialization code execute, since it can't be in main memory?
Each vendor has their own solution.
Some require memory configuration data to be stored in nonvolatile memory for the boot ROM to access.
Some SoCs have integrated SRAM (which does not require initialization like DRAM) to execute a small second-stage bootstrap program.
Some SoCs use NOR flash to hold a XIP (execute in place) bootstrap program (e.g. the SPL program of U-Boot).
Each SoC vendor has its own bootstrap method to get the OS loaded and executing.
Some use hardware strapping read through GPIO pins to determine the source of the next stage of the bootstrap sequence.
Another vendor may use an ordered list of memories and devices to probe for a bootstrap program.
Another technique, is to branch to firmware in NOR flash, which can be directly executed (i.e. XIP, execute in place).
Once the bootstrap program has initialized the DRAM, then this main memory can be used to load the next stage of booting. That could be a sophisticated boot utility such as U-Boot, or (if the bootstrap program is capable) the Linux kernel. A ROM boot program could do everything to load an ARM Linux kernel (e.g. ETRAX), but more common is that there will be several bootstrap programs or stages that have be performed between processor reset to execution of the OS.
The requirements of booting the Linux ARM kernel are spelled out in the following document: Booting ARM Linux
Older versions of Linux ARM used the ATAGs list to pass basic configuration information to the kernel. Modern versions provide a complete board configuration using a compiled binary of a Device Tree.
... e.g do boot loaders reinitialize DDR memory?
Of the few examples that I have seen, the boot programs unconditionally configure the dynamic RAM controller.
PCs have a BIOS and Power On Self Tests, aka POST. The execution of POST is the primary difference between a power-on reset (aka cold boot) versus a software reset (aka warm boot or reboot). ARM systems typically do not perform POST, so you typically will see minimal to no difference between types of reset.
This is way too broad. It's not only SoC vendor dependent, but also hardware and software dependent.
However, the most typical setup is:
CPU executes first-stage bootloader (FSB).
FSB is located on the chip itself in ROM or EEPROM and is very small (AT91RM9200 FSB is 10kB max, AFAIR). FSB then initializes minimum set of peripherals (clocks, RAM, flash), transfers second-stage bootloader (U-Boot) to RAM, and executes it.
U-Boot starts.
U-Boot initializes some other hardware (serial, ethernet, etc), transfers Linux kernel to RAM, prepares the pointer to kernel input parameters and jumps into it's entry point.
Linux kernel starts.
Magic happens here. The system now able to serve you cookies via SSH console and/or executes whatever needs to be executed.
A bit more in-depth info about warm start:
Warm start is a software reset, while cold start is power-on or hardware reset. Some (most?) SoC's are able to pass the info to FSB/SSB about warm start. This way bootloaders are able to minimize the overall boot time by skipping re-initializion of already initialized peripherals.
Again, this is most typical setup from my 15+ years experience in embedded world.
It varies a lot depending on the SoC. I'll describe something like a "typical" one (Freescale iMX6)...
Typically an on-chip Watchdog Timer is used to reset the SoC cleanly. Sometimes, an external Power Management IC can be provoked to perform a board-wide reset (this method may be better, as it avoids the risk of external chips getting "stuck" in an unexpected state, but not all board designs support it).
Upon reset, the SoC will start its normal boot process: checking option pins, fuse settings and initializing clocks and the boot device (e.g. eMMC). This is typically controlled by CPU code executing from a small on-chip ROM.
Either the internal boot ROM will initialize DDR SDRAM (using settings taken from fuses or read from a file on the boot device), or the bootloader gets loaded into internal RAM then it takes care of DDR initialization (and other things). The U-Boot bootloader can be configured to work either way.
Finally, the kernel and DTB are loaded into memory and started.
note uboot, etc are not required they are GROSS overkill, they are operating systems in their own right. to load and run linux you need memory up and running copy the kernel branch to it with some registers set to point at tables that you setup or copied from flash along with the kernel.
What you do on a cold reset or warm is up to you, same chip and board no reason necessarily why any two solutions have to do the exact same thing unless it is driven by hardware (if you do a wdt reset to start over and that reset wipes out the whole chip including the ddr controller). You just have to put the system in the same state that linux expects.

How primary memory is organised in a microcontroller? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
My query is :How the memory is organized and managed in the microcontroller?
(It doesn't have any OS i.e no presence of MMU).
I am working over zynq 7000 (ZC702) FPGA,It has seperate arm core and seperate DDR memory connected together with axi interconnects.
I wrote 11 1111 1111 in decimal to DDR(10 1's),it is giving me same after reading.
when i write 111 1111 1111 in decimal to DDR(11 1's),it gives me -1.
here the whole physical memory is consumed.This will not be the case when i use any microcontroller.
Who will manage the memory in microcontroller?
Okay that is a cortex-A9, in no way shape or form is that a microcontroller.
When you read the arm documentation for that architecture you will find the mmu in there. It as expected is part of the Cortex-A core.
Who/how/when is the ddr initialized? Someone has to initialize it before you can write to it and read it back. 10 ones in decimal fits in a 32 bit write/read, 11 ones, takes 33 bits, what size write/read did you use? How much ddr do you have? I suspect you are not ready to be talking to ddr
There is 256K of on chip ram, maybe you should mess with that first.
All of this is in the zilinx and arm documentation. Experienced bootloader developers can take months to get DDR initialized, how/who/when was that done, do you have dram tests to confirm that it is up and working if you werent the one to initialize it? Has zilinx provided routines for that (how is your ddr initialized, who did it (your code, some code before yours, logic, etc) and when was it done, before your code ran?). Maybe it is your job to initialize.
the MMU you just read the arm docs as with everything about that core. Then work with the xilinx docs for more, and then of course who did the rest of the fpga design to connect the arm core to the dram? what address space are they using, what decoding, what alignments do they support, what axi transfers are supported, etc? That is not something anyone here could do you have to talk to the fpga logic designers for your specific design.
If you have no operating system then you are running bare metal. You the programmer are responsible for memory management. You dont need the mmu necessarily, sometimes it helps sometimes it just adds more work. Depends completely on your programming task and overall design of the software and system. The mmu is hardware, an operating system is software that runs on hardware. One thing might use the other but they are in no way tied to each other any more than the interrupt controller or dram controller or uart, etc are tied to the operating system. The operating system or other software may use that hardware, but you can use them with other software as well.

How does kernel restrict processes to their own memory pool?

This is purely academical question not related to any OS
We have x86 CPU and operating memory, this memory resembles some memory pool, that consist of addressable memory units that can be read or written to, using their address by MOV instruction of CPU (we can move memory from / to this memory pool).
Given that our program is the kernel, we have a full access to whole this memory pool. However if our program is not running directly on hardware, the kernel creates some "virtual" memory pool which lies somewhere inside the physical memory pool, our process consider it just as the physical memory pool and can write to it, read from it, or change its size usually by calling something like sbrk or brk (on Linux).
My question is, how is this virtual pool implemented? I know I can read whole linux source code and maybe one year I find it, but I can also ask here :)
I suppose that one of these 3 potential solutions is being used:
Interpret the instructions of program (very ineffective and unlikely): the kernel would just read the byte code of program and interpret each instruction individually, eg. if it saw a request to access memory the process isn't allowed to access it wouldn't let it.
Create some OS level API that would need to be used in order to read / write to memory and disallow access to raw memory, which is probably just as ineffective.
Hardware feature (probably best, but have no idea how that works): the kernel would say "dear CPU, now I will send you instructions from some unprivileged process, please restrict your instructions to memory area 0x00ABC023 - 0xDEADBEEF" the CPU wouldn't let the user process do anything wrong with the memory, except for that range approved by kernel.
The reason why am I asking, is to understand if there is any overhead in running program unprivileged behind the kernel (let's not consider overhead caused by multithreading implemented by kernel itself) or while running program natively on CPU (with no OS), as well as overhead in memory access caused by computer virtualization which probably uses similar technique.
You're on the right track when you mention a hardware feature. This is a feature known as protected mode and was introduced to x86 by Intel on the 80286 model. That evolved and changed over time, and currently x86 has 4 modes.
Processors start running in real mode and later a privileged software (ring0, your kernel for example) can switch between these modes.
The virtual addressing is implemented and enforced using the paging mechanism (How does x86 paging work?) supported by the processor.
On a normal system, memory protection is enforced at the MMU, or memory management unit, which is a hardware block that configurably maps virtual to physical addresses. Only the kernel is allowed to directly configure it, and operations which are illegal or go to unmapped pages raise exceptions to the kernel, which can then discipline the offending process or fetch the missing page from disk as appropriate.
A virtual machine typically uses CPU hardware features to trap and emulate privileged operations or those which would too literally interact with hardware state, while allowing ordinary operations to run directly and thus with moderate overall speed penalty. If those are unavailable, the whole thing must be emulated, which is indeed slow.

Low interrupt latency via dedicated architectures and operating systems

This question may seem slightly vague, however I am researching upon how interrupt systems work and their latency times. I am trying to achieve an understanding of how architecture facilities such as FIQ in ARM help decrease latency times. How does this differ from using a operating system that does not have access or can not provide access to this facilities? For example - Windows RT is made for ARM etc, and this operating system is not able to be ported to other architectures.
Simply put - how is interrupt latency different in dedicated architectures that have dedicated operating systems as compared to operating systems that can be ported across many different architectures (Linux for example)?
Sorry for the rant - I'm pretty confused as you can probably tell.
I'll start with your Windows RT example, Windows RT is a port of Windows to the ARM architecture. It is not a 'dedicated operating system'. There are (probably) many OSes that only run on only 1 architecture, but that is more a function of can't be arsed to port them due to some reason.
What does 'port' really mean though?
Windows has a kernel (we'll call is NT here, doesn't matter) and that NT kernel has a bunch of concepts that need to be implemented. These concepts are things like timers, memory virtualisation, exceptions etc...
These concepts are implemented differently between architectures, so the port of the kernel and drivers (I will ignore the rest of the OS here, often that is a recompile only) will be a matter of using the available pieces of silicon to implement the required concepts. This implementation is a called 'port'.
Let's zoom in on interrupts (AKA exceptions) on an ARM that has FIQ and IRQ.
In general an interrupt can occur asynchronously, by that I mean at any time. The CPU is generally busy doing something when an IRQ is asserted so that context (we'll call it UserContext1) needs to be stored before the CPU can use any resources in use by UserContext1. Generally this means storing registers on the stack before using them.
On ARM when an IRQ occurs the CPU will switch to IRQ mode. Registers r13 and r14 have there own copy for IRQ mode, the rest will need to be saved if they are used - so that is what happens. Those stores to memory take some time. The IRQ is handled, UserContext1 is popped back off the stack then IRQ mode is exited.
So the latency in this case might be the time from IRQ assertion to the time the IRQ vector starts executing. That going to be some set number of clock cycles based upon what the CPU was doing when the IRQ happened.
The latency before the IRQ handling can occur is the time from the IRQ assert to the time the CPU has finished storing the context.
The latency before user mode code can execute depends on too much stuff in the OS/Kernel to explain here, but the minimum boils down to the time from the IRQ assertion to the return after restoring UserContext1 + the time for the OS context switch.
FIQ - If you are a hard as nails programmer you might only need to use 7 registers to completely handle your interrupt servicing. I mentioned that IRQ mode has its own copy of 2 registers, well FIQ mode has its own copy of 7 registers. Yup, that's 28 bytes of context that doesn't need to be pushed out into the stack (actually one of them is the link register so it's really 6 you have). That can remove the need to store UserContext1 then restore UserContext1. Thus the latency can be reduced by up to the length of time needed to do that save/restore.
None of this has much to do with the OS. The OS can choose to use or not use these features. The OS can choose to make guarantees regarding how long it will take to execute the OSes concept of an interrupt handler, or it may not. This is one of the basic concepts of an RTOS, the contract about how long before the handler will run.
The OS is designed for some purpose (and that purpose may be 'general') - that target design goal will have a lot more affect on latency than haw many target the OS has been ported to.
Go have a read about something like freertos than buy some hardware and try it. Annotate the code to figure out the latencies you really want to look at. IT will likely be the best way to get your ehad around it.
(*Multi-CPU systems do it the same with but with some synchronization and barrier functions and a sprinkling of complexity)

Resources