how does an ARM boot from a sd card? - arm

I'm a bit lost regarding the boot concept of a modern system from a serial flash device. Having programmed quite a number of simple micros ranging from 8 bit PICs to 32 bit Power Architecture down on the bare metal (always by reprogramming the normal bus-addressable flash)I wonder how modern SoCs boot from serial devices. I didn't find too much on the net either as every system seems to rely on a combination of sd-card programming tool and a secondary bootloader, both receiving little to no attention.

The 'how' is a mask-PROM based primary-boot loader baked into the SoC. It doesn't need to do much besides initialise the SD card interface and possibly some SDRAM with enormously conservative timings (although some devices have embedded SRAM to use at this point).
It then enumerates the card interface, reads the FAT from the first card it finds, and from whence, copies the secondary boot-loader into SRAM or SDRAM and executes it.
Often there are restrictions such as the secondary bootloader being the first file on the card and the allocated locks being contiguous.
Many systems then load yet another boot-loader at this point, which is the one which boots the operating system.

Related

is there any MCU based on cortex-A who's on-chip mask primary bootloader can be changed?

I want study cortex-A inside. but AM335X and S5PV210's inside flash can not be changed, so I want to know if there is there any MCU based on cortex-A who's on-chip mask primary bootloader can be changed?
please recommend some for me, if there has.
please forgive my pool English, thank you!
There is usually no flash in a Cortex-A. The ROM code is usually, well.., in a Read Only Memory. When there is a bug in this code, you need to produce a new wafer mask to fix it, but, as millions of parts are produced, the cost reduction is significant, and ROM avoid data retention issues.
Flash memory is by definition re-writable; the parts you mentioned simply have no on-chip flash.
Parts that have on-chip flash typically execute code directly from it, so because of the relatively low speed of flash memory it is usually used on lower frequency processors sub-200KHz.
Fast "applications" processors do not normally have on-chip flash because it takes up a large amount of die space and and has insufficient capacity to support the for the kind of applications and OS (such as Linux, Android or Windows) typically used on such processors. Instead they often have an on-chip mask ROM primary bootloader than loads a secondary bootloader from external media such as NOR flash, NAND flash, SD card, eMMC etc. The secondary bootloader then boots the OS and/or application code.
Code on such processors is loaded to and executed from SDRAM which is much faster than flash. Also the boot media is not always memory mapped so cannot be executed directly in any case.

Boot process in android devices

I was reading this informative blog post here about the booting process on x64. In what ways is the booting process on ARM different? I had a look at Raspberry pi and it seems that the GPU is what executes before control is handed over to ARM processor. Are there any similar resources you have come across for ARM processors?
Just like the x86 boot process is documented in documents available at intel.com arm boot process is documented at arm.com as is any other processor documented.
The full sized (non-cortex-m) arm cores start by executing at address zero for reset. There is one instruction location for reset, one for data abort, undefined instruction, etc. Similar to an interrupt vector table but instead of an address there is an instruction there ideally a branch.
processors historically have some non-volatile ram mapped into the boot space or vector table or whatever, then volatile ram if any elsewhere. x86 historically near the top of ram, arm at the bottom.
ARM does not make chips like intel, it designs processor cores which other folks that make chips include in their chip designs instead of having to design their own cores and maintain compilers, etc. So the chip vendor can solve the boot process in a number of ways, some have something non-volatile mapped low then after booting you can swap in ram to that address space, whatever. In the case of the Raspberry Pi chips which are made by broadcom, they have their own gpu which actually boots the chip, it eventually reads a file assumed to be the linux kernel and root file system, but doesnt have to be. It places that file in ram (by default) in the place where a linux kernel would be loaded by a bootloader like redboot or uboot, in this case by the gpu's arm loader. the gpu then places a few breadcrumbs including the reset instruction(s) required to branch into the linux kernel (generally a very trivial thing), then release reset on the arm core allowing it to boot. so basically the arm sees only ram which is kind of nice, but that is somewhat atypical for arm or other processors to do that. Normally the main processor boots from non-volatile storage (eeprom, flash, etc) and then itself loads linux or whatever and branches to it.
A number of other processor types will have an interrupt vector table, including the arm cortex-m series which are thumb instruction set only cores. they are designed to be microcontrollers and not carry as much overhead, so the first address slot is actually meant to be filled with the init value for the stack pointer, then the second is the address for reset, and then a zillion others. The hardware is designed to preserve registers for you so that you can have the address to C functions right in the table and not have to have "some assembly required" wrappers written by you or the folks that ported the toolchain to this platform. Other processor types will just have a vector table and some assembly is required to be wrapped around interrupts and such.

Explaination of ARM (especifically mobile) Peripherals Addressing and Bus architecture?

I will first say that I'm not expert in the field and my question might contain misunderstanding, in which case, I'll be glad if you correct me and attach resources so I can learn further details.
I'm trying to figure out the way that the system bus and how the various devices that appear in a mobile device (such as sensors chips, wifi/BT SoC, touch panel, etc.) are addressed by the CPU (and by other MCUs).
In the PC world we have the bus arbitrator that route the commands/data to the devices, and, afaik, the addresses are hardwired on the board (correct me if I'm wrong). However, in the mobile world I didn't find any evidence of that type of addressing; I did find that ARM has standardized the Advanced Microcontroller Bus Architecture, I don't know, though, whether that standard applied for the components (cpu-cores) which lies inside the same SoC (that is Exynos, OMAP, Snapdragon etc.) or also influence peripheral interfaces. Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
A more basic question would be whether there even exist a bus management in the mobile device architecture or maybe there is some kind of "star" topology (where the CPU is the center).
From this question I get the impression that these devices are considered as platform devices, i.e., devices that are connected directly to the CPU, and not through a bus. Still, my question is how does the OS knows how to address them? Then other threads, this and this about platform devices/drivers made me confused..
A difference between ARM and the x86 is PIO. There are no special instruction on the ARM to access an I/O device. Everything is done through memory mapped I/O.
A second difference is the ARM (and RISC in general) has a separate load/store unit(s) that are separate from normal logic.
A third difference is that ARM licenses both the architecture and logic core. The first is used by companies like Apple, Samsung, etc who make a clean room version of the cores. For the second set, who actually buy the logic, the ARM CPU will include something from the AMBA family.
Other peripherals from ARM such as a GIC (Cortex-A interrupt controller), NVIC (Cortex-M interrupt controller), L2 controllers, UARTs, etc will all come with an AMBA type interface. 3rd party companies (ChipIdea USB, etc) may also make logic that is setup for a specific ARM bus.
Note AMBA at Wikipedia documents several bus types.
APB - a lower speed peripheral bus; sort of like south bridge.
AHB - several versions (older north bridge).
AXI - a newer multi-CPU (master) high speed bus. Example NIC301.
ACE - an AXI extension.
A single CPU/core may have one, two, or more master connection to an AXI bus. There maybe multiple cores attached to the AXI bus. The load/store and instruction fetch units of a core can use the multiple ports to dispatch requests to separate slaves. The SOC vendor will balance the number of ports with expected memory bandwidth needs. GPUs are also often connected to the AXI BUS along with DDR slaves.
It is true that there is no 100% standard topology; especially if you consider all possible future ARM designs. However, typical topologies will include a top level AXI with some AHB peripherals attached. One or multiple 2nd level APB (buses) will provide access to low speed peripherals. Not every SOC vendor wants to spend time to redesign peripherals and the older AHB interface speeds maybe quite fine for a device.
Your question is tagged embedded-linux. For the most part Linux just needs to know the physical addresses. On occasion, the peripheral BUS controllers may need configuration. For instance, an APB may be configure to allow or disallow user mode. This configuration could be locked at boot time. Generally, Linux doesn't care too much about the bus structure directly. Programmers may have coded a driver with knowledge of the structure (like IRAM is fasters, etc).
Still, my question is how does the OS knows how to address them?
Older Linux kernels put these definitions in a machine file and passed a platform resource structure including interrupt number, and the physical address of a register bank. In newer Linux versions, this information is included with Open Firmware or device tree files.
Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
The physical addresses are set by the SOC manufacturer. Linux platform support will use the MMU to map them as non-cacheable to some un-used range. Often the physical addresses may be very sparse so the virtual remapping pack more densely. Each one incurs a TLB hit (MMU cache).
Here is a sample SOC bus structure using AXI with a Cortex-M and Cortex-A connected.
The PBRIDGE components are APB bridges and it is connected in a star topology. As others suggests, you need to look a your particular SOC documentation for specifics. However, if you have no SOC and are trying to understand ARM generally, some of the information above will help you, no matter what SOC you have.
1) ARM does not make chips, they make IP that is sold to chip vendors who make chips. 2) yes the amba/axi bus is the interface from ARM to the world. But that is on chip, so it is up to the chip vendor to decide what to hook up to it. Within a chip vendor you may find standards or habits, those standards or habits may be that for a family of parts the same peripherals may be find at the same addresses (same uart peripheral, same spi peripheral, clock tree, etc). And of course sometimes the same peripheral at different addresses in the family and sometimes there is no consistency. In the intel x86 world intel makes the processors they have historically made many of the peripherals be they individual parts to super I/O parts to north and south bridges to being in the same package. Intels processor success lies primarily in reverse compatibility so you can still access a clone uart at the same address that you could access it on your original ibm pc. When you have various chip vendors you simply cannot do that, arm does not incorporate the peripherals for the most part, so getting the vendors to agree on stuff simply will not happen. This has driven folks crazy yes, and linux is in a constant state of emergency with arm since it rarely if ever works on any platform. The additions tend to be specific to one chip or vendor or nuance not caring to check that the addition is in the wrong place or the workaround or whatever does not apply everywhere and should not be applied everywhere. The cortex-ms have taken a small step, before the arm7tdmi you had the freedom to use whatever address space you wanted for anything. The cortex-m has divided the space up into some major chunks along with some internal addresses (not just the cortex-ms this is true on a number of the cores). But beyond a system timer and maybe a interrupt controller it is still up to the chip vendor. The x86 reverse compatibility habits extend beyond intel so pcs have a lot of consistency across motherboard vendors (partly driven by software that they want to run on their system namely windows). Embedded in general be it arm or mips or whomever puts stuff wherever and the software simply adapts so embedded/phone software the work is on the developer to select the right drivers and adjust physical addresses, etc.
AMBA/AXI is simply the bus standard like wishbone or isa or pci, usb, etc. It defines how to interface to the arm core the processor from arm, this is basically on chip, the chip vendor then adds or buys from someone IP to bridge the amba/axi bus to pci or usb or dram or flash, etc, on chip or off is their choice it is their product. Other than perhaps a few large chunks the chip vendor is free to define the address space, and certainly free to define what peripherals and where. They dont have to use the same usb IP or dram IP as anyone else.
Is the arm at the center? Well with your smart phone processors you tend to have a graphics coprocessor, so then you have to ask who owns the world the arm, the gpu, or someone else? In the case of the raspberry pi which is to some extent one of these flavor of processors albeit older and slower now, the gpu appears to be the center of the world and the arm is a side fixture that has to time share on the gpu's bus, who knows what the protocol/architecture of that bus is, the arm is axi of course but is the whole chip or does the bridge from the arm to gpu side also switch to some other bus protocol? The point being is the answer to your question is no there is no rule there is no standard sometimes the arm is at the center sometimes it isnt. Up to the chip and board vendors.
not interested in terminology maybe someone else will answer, but I would say outside an elementary sim you wont have just one peripheral (okay I will use that term for generic stuff the processor accesses) tied to the amba/axi bus. You need a first level amba/axi interface that then divides up the address space per your design, and then using amba/axi or whatever bus protocol you want (generally you adapt to the interface for the purchased or designed IP). You, the chip vendor decides on the address space. You the programmer, has to read the documentation from the chip vendor or also board vendor to find the physical address space for each thing you want to talk to and you compile that knowledge into your operating system or application per the rules of that software or build system.
This is not unique to arm based systems you have the same problem with mips and powerpc and other cores you can buy in ip form, for whatever reason arm has dominated the world (there are many arm processors in or outside your computer for every x86 you own, x86 processors are extremely low volume compared to arm based). Like Gates had a desktop in every home, a long time ago ARM had a "touch an ARM once a day" type of a thing to push their product and now most things with a power switch and in particular with a battery has an arm in it somewhere. Which is a nightmare for developers because there are so many arm cores now with nuances and every chip vendor and every family and sometimes members within a family are different so as a developer you simply have to adapt, write your stuff in a modular form, mix and match modules, change addresses, etc. Making one binary like windows does for example that runs everywhere, is not in any way a wise goal for arm based products. Make the modules portable and build the modules per target.
Each SoC will be designed to have its own (possibly configurable) memory map. You will need to read the relevant technical reference manual to get the exact details.
Examples are:
Raspeberry pi datasheet (pdf)
OMAP 5 TRM

Linux - How to upload code to a dedicated freescale chip NIC on my motherboard?

I have bought a Gigabyte g1.guerilla motherboard and the NIC is a dedicated freescale chip on the motherboard. It is connected to the PCI bus.
I am running Linux and unfortunately there is no driver for it. I am working to write one, however I am hitting a basic problem: How to communicate and upload code to its dedicated CPU-RAM?
Much help appreciated.
I am running on ubuntu and the chip is a mpc8308vmagd PowerQuicc II pro
I don't know anything about your specific motherboard or the processor, but are you totally sure you need to upload any code to the processor?
Usually, if a peripheral needs any code (firmware), it's already present on a ROM or a flash chip and you only need to touch it if you specifically want to write your own firmware for it. AFAIK the way it usually works is that the peripheral exposes a set of registers on the PCI bus and you interact with it by poking the registers (usually with MMIO). That is, you don't write code for the peripheral, but you write a kernel driver that pokes the registers (ie. the API for the peripheral) when it wants the device to do something.
Now, in general the register descriptions aren't often freely available, which can make writing drivers really hard.
If you really want/need to write your own firmware for the thing, it probably depends on where the code is stored. If it sits in ROM or in an inaccessible flash, you'll probably need to do some soldering. If the firmware is updatable, I'd probably try to reverse-engineer the software they provide for updating the firmware, if one is available. (Unless it allows uploading arbitrary files already, of course)

Can I run a program from a SD memory instead of flash on an evaluation board (embedded programming)?

I have an evaluation board (Olimex STM32-P103) which supports a SD-card connector. I want to put my program in to a SD memory instead of internal flash of the micro-controler; and run it from there.
I don't know if it is possible to do that according to boot-loader issue!
P.S my goal is running linux on this board and then port my application over it.
To run programs from SD-Card in general you should know that you can't run them "right away". This means, you have to load it in a executable memory somewhere in your address space which is done by a (more or less) simple bootloader. In the simplest instance, the bootloader is capable to read from a SD-Card a specific binary and copy it into the memory.
That being said you should think about this considering you only got 20k of RAM and 128k of Flash on your board. So where should your program go? Or better: Why not flashing the program in the 128k of Flash from the very beginning? Especially you should know that Linux is a bit "hungry" in terms of memory.
If your goal is to run a "normal" Linux on this board, I'm afraid you're screwed. This because from what I know Linux needs a MMU to run and the chip on this board does not provide one (as far as researchable without access to datasheets from ST).
If you're lucky you can go with uCLinux. I'm not sure if a finished port exists for the STM32 but it seems there are some resources based on a short google search for "STM32 uCLinux". But even if you manage to run uCLinux I'm afraid there's not much left in your system for your application, so the result might be a bit disappointing.
Depending on why you are looking for Linux running on this MCU, there are maybe other solutions like a FreeRTOS in combination with a lwIP-stack (if networking is needed) or a FAT library like FullFAT if you are looking for reading SD-Cards and stuff.
Edit: One thing i'd like to add is that booting from the SD-Card is typically something you do with "bigger" (not much but slightly) systems where you have enough RAM to keep the whole image you'd like to run in it and still have some space left for the data you want to process.
You're going to have to have some code in the STM's onboard flash (typically called a "boot loader") that implements this since the "bare metal" very likely can't boot from SD card.
You're going to have to build that code, which figures out how to use the STM's onboard peripherals to talk to the SD card, finds the file you want to run in the file system (which you also have to implement), and loads it.
I wanted to include a link to the STM standard peripheral library, but it seems to be down (being moved). :/
The data on the SD card is not memory mapped, so cannot be executed directly.
It is possible to dynamically load the data from the card into RAM for execution. WindRiver's VxWorks RTOS supports loading and linking object modules dynamically, I know of no other OS that would scale to a Cortex-M that directly supports that but it would be possible to write your own.
However, I would suggest that in the case of the microcontroller you are using the idea is ill-advised; optimal performance on Cortex-M is achieved when the code is in on-chip flash and data in RAM allowing the data and instruction to be fetch to occur simultaneously on the separate buses (Harvard architecture). If you execute the code from RAM the performance will be severely hit since then data and instructions must be fetched sequentially over the same bus.
The board is entirely unsuited to running Linux, with only 128K Bytes Program Flash, and 20K Bytes RAM is is not at all feasible. Even the smallest Linux distribution requires 600Kb RAM plus whatever is needed by application code. uClinux can just about run on higher-end STM32 with external RAM and Flash, but that would suffer from the same bus contention performance hit and Linux without an MMU is rather missing the one major benefit of using Linux at all. The part on your board lacks an external memory interface, so cannot be expanded to support Linux.
If you need an OS consider a RTOS such as uC/OS-II, FreeRTOS, or emBOS for example.
AS other says you cannot directly execute your code directly from the SD CARD.
But like those "linux board", you can load the stored kernel/programm into an external SDRAM that can be mapped and execute it from there.
You'll still need to write that "bootloader" and store it in the internal flash.
That'is a lot work to my opinion, for limited application.
If you want to write your application in a linux environnement then port it suck small target, I would rather design my application using dependency injection, or even use an emulator.

Resources