Boot process in android devices - arm

I was reading this informative blog post here about the booting process on x64. In what ways is the booting process on ARM different? I had a look at Raspberry pi and it seems that the GPU is what executes before control is handed over to ARM processor. Are there any similar resources you have come across for ARM processors?

Just like the x86 boot process is documented in documents available at intel.com arm boot process is documented at arm.com as is any other processor documented.
The full sized (non-cortex-m) arm cores start by executing at address zero for reset. There is one instruction location for reset, one for data abort, undefined instruction, etc. Similar to an interrupt vector table but instead of an address there is an instruction there ideally a branch.
processors historically have some non-volatile ram mapped into the boot space or vector table or whatever, then volatile ram if any elsewhere. x86 historically near the top of ram, arm at the bottom.
ARM does not make chips like intel, it designs processor cores which other folks that make chips include in their chip designs instead of having to design their own cores and maintain compilers, etc. So the chip vendor can solve the boot process in a number of ways, some have something non-volatile mapped low then after booting you can swap in ram to that address space, whatever. In the case of the Raspberry Pi chips which are made by broadcom, they have their own gpu which actually boots the chip, it eventually reads a file assumed to be the linux kernel and root file system, but doesnt have to be. It places that file in ram (by default) in the place where a linux kernel would be loaded by a bootloader like redboot or uboot, in this case by the gpu's arm loader. the gpu then places a few breadcrumbs including the reset instruction(s) required to branch into the linux kernel (generally a very trivial thing), then release reset on the arm core allowing it to boot. so basically the arm sees only ram which is kind of nice, but that is somewhat atypical for arm or other processors to do that. Normally the main processor boots from non-volatile storage (eeprom, flash, etc) and then itself loads linux or whatever and branches to it.
A number of other processor types will have an interrupt vector table, including the arm cortex-m series which are thumb instruction set only cores. they are designed to be microcontrollers and not carry as much overhead, so the first address slot is actually meant to be filled with the init value for the stack pointer, then the second is the address for reset, and then a zillion others. The hardware is designed to preserve registers for you so that you can have the address to C functions right in the table and not have to have "some assembly required" wrappers written by you or the folks that ported the toolchain to this platform. Other processor types will just have a vector table and some assembly is required to be wrapped around interrupts and such.

Related

ARM Linux reboot process [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How reboot procedure works on ARM SOCs running Linux, e.g do boot loaders reinitialize DDR memory? can anybody please explain me rebooting process in detail.
How reboot procedure works on ARM SOCs running Linux, ... ?
The typical ARM processor in use today is integrated with peripherals on a single IC called a SoC, system on a chip. Typically the reboot procedure is nearly identical to a power-on boot procedure. On a reset the ARM processor typically jumps to address 0.
Main memory, e.g. DRAM, and non-volatile storage, e.g. NAND flash, are typically external to the SoC (that is Linux capable) for maximum design flexibility.
But typically there is a small (perhaps 128KB) embedded ROM (read-only memory) to initialize the minimal system components (e.g. clocks, external memories) to begin bootstrap operations. A processor reset will cause execution of this boot ROM. (This ROM is truly read-only, and cannot be modified. The code is masked into the silicon during chip fabrication.)
The SoC may have a strapping option to instead execute an external boot memory, such as NOR flash or EEPROM, which can be directly executed (i.e. XIP, execute in place).
The salient characteristic of any ROM, flash, and SRAM that the first-stage boot program uses is that these memories must be accessible immediately after a reset.
One of the problems of bootstrapping a system that uses DRAM for main memory is its hardware initialization. The DRAM memory controller has to be initialized with board-specific parameters before code can be loaded into DRAM and executed. So from where does this board-specific initialization code execute, since it can't be in main memory?
Each vendor has their own solution.
Some require memory configuration data to be stored in nonvolatile memory for the boot ROM to access.
Some SoCs have integrated SRAM (which does not require initialization like DRAM) to execute a small second-stage bootstrap program.
Some SoCs use NOR flash to hold a XIP (execute in place) bootstrap program (e.g. the SPL program of U-Boot).
Each SoC vendor has its own bootstrap method to get the OS loaded and executing.
Some use hardware strapping read through GPIO pins to determine the source of the next stage of the bootstrap sequence.
Another vendor may use an ordered list of memories and devices to probe for a bootstrap program.
Another technique, is to branch to firmware in NOR flash, which can be directly executed (i.e. XIP, execute in place).
Once the bootstrap program has initialized the DRAM, then this main memory can be used to load the next stage of booting. That could be a sophisticated boot utility such as U-Boot, or (if the bootstrap program is capable) the Linux kernel. A ROM boot program could do everything to load an ARM Linux kernel (e.g. ETRAX), but more common is that there will be several bootstrap programs or stages that have be performed between processor reset to execution of the OS.
The requirements of booting the Linux ARM kernel are spelled out in the following document: Booting ARM Linux
Older versions of Linux ARM used the ATAGs list to pass basic configuration information to the kernel. Modern versions provide a complete board configuration using a compiled binary of a Device Tree.
... e.g do boot loaders reinitialize DDR memory?
Of the few examples that I have seen, the boot programs unconditionally configure the dynamic RAM controller.
PCs have a BIOS and Power On Self Tests, aka POST. The execution of POST is the primary difference between a power-on reset (aka cold boot) versus a software reset (aka warm boot or reboot). ARM systems typically do not perform POST, so you typically will see minimal to no difference between types of reset.
This is way too broad. It's not only SoC vendor dependent, but also hardware and software dependent.
However, the most typical setup is:
CPU executes first-stage bootloader (FSB).
FSB is located on the chip itself in ROM or EEPROM and is very small (AT91RM9200 FSB is 10kB max, AFAIR). FSB then initializes minimum set of peripherals (clocks, RAM, flash), transfers second-stage bootloader (U-Boot) to RAM, and executes it.
U-Boot starts.
U-Boot initializes some other hardware (serial, ethernet, etc), transfers Linux kernel to RAM, prepares the pointer to kernel input parameters and jumps into it's entry point.
Linux kernel starts.
Magic happens here. The system now able to serve you cookies via SSH console and/or executes whatever needs to be executed.
A bit more in-depth info about warm start:
Warm start is a software reset, while cold start is power-on or hardware reset. Some (most?) SoC's are able to pass the info to FSB/SSB about warm start. This way bootloaders are able to minimize the overall boot time by skipping re-initializion of already initialized peripherals.
Again, this is most typical setup from my 15+ years experience in embedded world.
It varies a lot depending on the SoC. I'll describe something like a "typical" one (Freescale iMX6)...
Typically an on-chip Watchdog Timer is used to reset the SoC cleanly. Sometimes, an external Power Management IC can be provoked to perform a board-wide reset (this method may be better, as it avoids the risk of external chips getting "stuck" in an unexpected state, but not all board designs support it).
Upon reset, the SoC will start its normal boot process: checking option pins, fuse settings and initializing clocks and the boot device (e.g. eMMC). This is typically controlled by CPU code executing from a small on-chip ROM.
Either the internal boot ROM will initialize DDR SDRAM (using settings taken from fuses or read from a file on the boot device), or the bootloader gets loaded into internal RAM then it takes care of DDR initialization (and other things). The U-Boot bootloader can be configured to work either way.
Finally, the kernel and DTB are loaded into memory and started.
note uboot, etc are not required they are GROSS overkill, they are operating systems in their own right. to load and run linux you need memory up and running copy the kernel branch to it with some registers set to point at tables that you setup or copied from flash along with the kernel.
What you do on a cold reset or warm is up to you, same chip and board no reason necessarily why any two solutions have to do the exact same thing unless it is driven by hardware (if you do a wdt reset to start over and that reset wipes out the whole chip including the ddr controller). You just have to put the system in the same state that linux expects.

ARM Architecture Initialization

In the case of x86 the same (real mode) bootloader works on virtually any x86 device.
Is that possible on ARM or do I need to create a specific bootloader for each 'cortex'?
x86 or lets say PC compatible systems are ... pc compatible. They support the ancient bios calls so that there is massive compatibility. by design, by the chip vendor (intel) the software vendors (bios, operating system) and the motherboard vendors.
ARM is in now way shape or form like that. There are instruction sets you can choose that work almost or all the way across, but remember ARM systems you buy an ARM core and add it to your special chip, you and your special/custom stuff, then that is put on one or more different boards. There is little to no compatibility. Instruction set and arm core is a small part of the whole picture most of the code is for the non-arm stuff.
u-boot and perhaps others are fairly massive bootloaders, pretty much an operating system themselves, and have to be ported just like an operating system to each chip/board combination. The chip vendor, if this is a linux compatible system, most likely has a reference design and a BSP including a u-boot port and/or some other solution (rasberry pi is a good example). it is fairly trivial to boot linux or used to be, there is no reason for the massively overcomplicated u-boot. without a DTB you setup a few memory locations a register or two and branch to the kernel, thats it (again look at the raspberry pi), I assume with DTB you build the dtb then put it somewhere, setup a few registers and branch to the linux kernel (raspberry pi? ntc chip?)
There is a Arm open source project that can cover Armv7/v8 Cortex-A processors bootloaders.
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/
Another open source project for Cortex-M processors:
https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/

How ARM CPU loading bootloader?

As I know, CPU can access RAM directly. Device RAM is empty on start and CPU don't know from where to load bootloader into the RAM for executing it. Even it can do nothing because call stack should be empty too as I think.
Yet how is bootloader program copied into the RAM for further execution?
This should happening with embedded devices such as smartphones. On a x86 PCs BIOS is responsible for loading MBR section from disk to RAM as I know.
A bootloader in RAM is a secondary bootloader; invariably there is code in a ROM of some kind containing a primary bootstrap that loads the secondary bootstrap. Often that ROM is mask-ROM on the chip it self.
Typically on an ARM application processor such as a Cortex-A the primary bootstrap will load code to RAM from NAND flash or SD card. ARM Cortex-M often run code directly from ROM in any case.
you have the same problem with an x86. and the same solution in general (as with any other processor as well), you put a rom/flash in the address space where the processor boots and/or where its vector table is.
there are some other solutions like having other logic that reads from some non volatile storage and places it in the boot/vector space, or other logic that provides an interface for some other processor/computer to download into the board/chip and then release reset.

Explaination of ARM (especifically mobile) Peripherals Addressing and Bus architecture?

I will first say that I'm not expert in the field and my question might contain misunderstanding, in which case, I'll be glad if you correct me and attach resources so I can learn further details.
I'm trying to figure out the way that the system bus and how the various devices that appear in a mobile device (such as sensors chips, wifi/BT SoC, touch panel, etc.) are addressed by the CPU (and by other MCUs).
In the PC world we have the bus arbitrator that route the commands/data to the devices, and, afaik, the addresses are hardwired on the board (correct me if I'm wrong). However, in the mobile world I didn't find any evidence of that type of addressing; I did find that ARM has standardized the Advanced Microcontroller Bus Architecture, I don't know, though, whether that standard applied for the components (cpu-cores) which lies inside the same SoC (that is Exynos, OMAP, Snapdragon etc.) or also influence peripheral interfaces. Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
A more basic question would be whether there even exist a bus management in the mobile device architecture or maybe there is some kind of "star" topology (where the CPU is the center).
From this question I get the impression that these devices are considered as platform devices, i.e., devices that are connected directly to the CPU, and not through a bus. Still, my question is how does the OS knows how to address them? Then other threads, this and this about platform devices/drivers made me confused..
A difference between ARM and the x86 is PIO. There are no special instruction on the ARM to access an I/O device. Everything is done through memory mapped I/O.
A second difference is the ARM (and RISC in general) has a separate load/store unit(s) that are separate from normal logic.
A third difference is that ARM licenses both the architecture and logic core. The first is used by companies like Apple, Samsung, etc who make a clean room version of the cores. For the second set, who actually buy the logic, the ARM CPU will include something from the AMBA family.
Other peripherals from ARM such as a GIC (Cortex-A interrupt controller), NVIC (Cortex-M interrupt controller), L2 controllers, UARTs, etc will all come with an AMBA type interface. 3rd party companies (ChipIdea USB, etc) may also make logic that is setup for a specific ARM bus.
Note AMBA at Wikipedia documents several bus types.
APB - a lower speed peripheral bus; sort of like south bridge.
AHB - several versions (older north bridge).
AXI - a newer multi-CPU (master) high speed bus. Example NIC301.
ACE - an AXI extension.
A single CPU/core may have one, two, or more master connection to an AXI bus. There maybe multiple cores attached to the AXI bus. The load/store and instruction fetch units of a core can use the multiple ports to dispatch requests to separate slaves. The SOC vendor will balance the number of ports with expected memory bandwidth needs. GPUs are also often connected to the AXI BUS along with DDR slaves.
It is true that there is no 100% standard topology; especially if you consider all possible future ARM designs. However, typical topologies will include a top level AXI with some AHB peripherals attached. One or multiple 2nd level APB (buses) will provide access to low speed peripherals. Not every SOC vendor wants to spend time to redesign peripherals and the older AHB interface speeds maybe quite fine for a device.
Your question is tagged embedded-linux. For the most part Linux just needs to know the physical addresses. On occasion, the peripheral BUS controllers may need configuration. For instance, an APB may be configure to allow or disallow user mode. This configuration could be locked at boot time. Generally, Linux doesn't care too much about the bus structure directly. Programmers may have coded a driver with knowledge of the structure (like IRAM is fasters, etc).
Still, my question is how does the OS knows how to address them?
Older Linux kernels put these definitions in a machine file and passed a platform resource structure including interrupt number, and the physical address of a register bank. In newer Linux versions, this information is included with Open Firmware or device tree files.
Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
The physical addresses are set by the SOC manufacturer. Linux platform support will use the MMU to map them as non-cacheable to some un-used range. Often the physical addresses may be very sparse so the virtual remapping pack more densely. Each one incurs a TLB hit (MMU cache).
Here is a sample SOC bus structure using AXI with a Cortex-M and Cortex-A connected.
The PBRIDGE components are APB bridges and it is connected in a star topology. As others suggests, you need to look a your particular SOC documentation for specifics. However, if you have no SOC and are trying to understand ARM generally, some of the information above will help you, no matter what SOC you have.
1) ARM does not make chips, they make IP that is sold to chip vendors who make chips. 2) yes the amba/axi bus is the interface from ARM to the world. But that is on chip, so it is up to the chip vendor to decide what to hook up to it. Within a chip vendor you may find standards or habits, those standards or habits may be that for a family of parts the same peripherals may be find at the same addresses (same uart peripheral, same spi peripheral, clock tree, etc). And of course sometimes the same peripheral at different addresses in the family and sometimes there is no consistency. In the intel x86 world intel makes the processors they have historically made many of the peripherals be they individual parts to super I/O parts to north and south bridges to being in the same package. Intels processor success lies primarily in reverse compatibility so you can still access a clone uart at the same address that you could access it on your original ibm pc. When you have various chip vendors you simply cannot do that, arm does not incorporate the peripherals for the most part, so getting the vendors to agree on stuff simply will not happen. This has driven folks crazy yes, and linux is in a constant state of emergency with arm since it rarely if ever works on any platform. The additions tend to be specific to one chip or vendor or nuance not caring to check that the addition is in the wrong place or the workaround or whatever does not apply everywhere and should not be applied everywhere. The cortex-ms have taken a small step, before the arm7tdmi you had the freedom to use whatever address space you wanted for anything. The cortex-m has divided the space up into some major chunks along with some internal addresses (not just the cortex-ms this is true on a number of the cores). But beyond a system timer and maybe a interrupt controller it is still up to the chip vendor. The x86 reverse compatibility habits extend beyond intel so pcs have a lot of consistency across motherboard vendors (partly driven by software that they want to run on their system namely windows). Embedded in general be it arm or mips or whomever puts stuff wherever and the software simply adapts so embedded/phone software the work is on the developer to select the right drivers and adjust physical addresses, etc.
AMBA/AXI is simply the bus standard like wishbone or isa or pci, usb, etc. It defines how to interface to the arm core the processor from arm, this is basically on chip, the chip vendor then adds or buys from someone IP to bridge the amba/axi bus to pci or usb or dram or flash, etc, on chip or off is their choice it is their product. Other than perhaps a few large chunks the chip vendor is free to define the address space, and certainly free to define what peripherals and where. They dont have to use the same usb IP or dram IP as anyone else.
Is the arm at the center? Well with your smart phone processors you tend to have a graphics coprocessor, so then you have to ask who owns the world the arm, the gpu, or someone else? In the case of the raspberry pi which is to some extent one of these flavor of processors albeit older and slower now, the gpu appears to be the center of the world and the arm is a side fixture that has to time share on the gpu's bus, who knows what the protocol/architecture of that bus is, the arm is axi of course but is the whole chip or does the bridge from the arm to gpu side also switch to some other bus protocol? The point being is the answer to your question is no there is no rule there is no standard sometimes the arm is at the center sometimes it isnt. Up to the chip and board vendors.
not interested in terminology maybe someone else will answer, but I would say outside an elementary sim you wont have just one peripheral (okay I will use that term for generic stuff the processor accesses) tied to the amba/axi bus. You need a first level amba/axi interface that then divides up the address space per your design, and then using amba/axi or whatever bus protocol you want (generally you adapt to the interface for the purchased or designed IP). You, the chip vendor decides on the address space. You the programmer, has to read the documentation from the chip vendor or also board vendor to find the physical address space for each thing you want to talk to and you compile that knowledge into your operating system or application per the rules of that software or build system.
This is not unique to arm based systems you have the same problem with mips and powerpc and other cores you can buy in ip form, for whatever reason arm has dominated the world (there are many arm processors in or outside your computer for every x86 you own, x86 processors are extremely low volume compared to arm based). Like Gates had a desktop in every home, a long time ago ARM had a "touch an ARM once a day" type of a thing to push their product and now most things with a power switch and in particular with a battery has an arm in it somewhere. Which is a nightmare for developers because there are so many arm cores now with nuances and every chip vendor and every family and sometimes members within a family are different so as a developer you simply have to adapt, write your stuff in a modular form, mix and match modules, change addresses, etc. Making one binary like windows does for example that runs everywhere, is not in any way a wise goal for arm based products. Make the modules portable and build the modules per target.
Each SoC will be designed to have its own (possibly configurable) memory map. You will need to read the relevant technical reference manual to get the exact details.
Examples are:
Raspeberry pi datasheet (pdf)
OMAP 5 TRM

Memory space of ARM microprocessors

In ARM microprocessors, is the only available memory space the 37 or so general and status registers, or is there a separate accessible memory space within the microprocessor chip?
For example, in the Atmel AVR microcontroller, to my understanding, the memory is mapped internally within the same chip, with data memory, program memory (containing program memory) and EEPROM memory. Does the same apply to ARM microprocessors, or does a microcontroller with an ARM microprocessor require separate external memory?
Your interpretation of the Atmel AVR architecture is not quite correct.
Of course it's possible to integrate memory of virtually any kind on the same die as the CPU core. However, that doesn't mean you can compare flash memory available on one such integrated system to registers on another.
A CPU core needs a memory interface and that's all that counts: Flash is slower than registers. So if you connect Flash to an ARM processor it will behave similar (in the same order of magniture regarding speed) as the on-board Flash of the AVR.
Besides, ARM is solely an IP (design concept) and licenced by numerous companies which build efficient peripherals and sometimes also memory around the core. So you will find chips with an ARM core and on-board memory on the market.
(I simplified things a bit in the above description but I was focusing on trying to point out where I think you misunderstand how the two processors compare.)
Below link talks a lot about how memory management is done in ARM processor. Hope it helps
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/CHDDJIFI.html

Resources