L1 cache ports in ARM Cortex processors - arm

I did some reseach, but could not find much information.
I'd like to know how many L1 read and L1 write ports ARM embedded processors have and how wide the ports are. Specifically, I am interested in Cortex-A8, Cortex-A9, and Cortext-A15.
My blind guess is that a Cortex-A9 processor has one L1 read port and one L1 write port which are 64 bits wide. My other guess is that it has a single shared read/write port. Any thoughts on that?

These processors have separate L1 instruction and data caches. I'm pretty sure all ARM cores' L1 I-cache and D-cache each have 1 read and 1 write port Furber p.81.
L1 Cache is in each core, so for details I'd go to core TRM e.g. Cortex-A9 TRM rather than an MPCore TRM. Ch 7 there tells of 64-bit datapath for each.

Afaik you should check the AXI capabilities of each processors.
For example page for Cortex-A9 contains a detailed table for AXI master interface attributes and states:
The Cortex-A9 MPCore L2 interface can have two 64-bit wide AXI bus masters.
Page for Cortex-A15 contains less information, stating:
The processor implements an AMBA 4 AXI Coherency Extensions (ACE) master interface and an AMBA 3 AXI Accelerator Coherency Port (ACP) slave interface. Both the ACE and ACP support a hardware configurable 64-bit or 128-bit data width.
There also exists a similar page or Cortex-A8.

Related

How does memory mapped I/O (MMIO) work on ARM architectures?

I would like to understand how the MMIO works on ARM architecture.
I realized that ARM provides 1:1 mapping from physical address to specific peripheral.
For example, to manage the GPIOX on arm, for example in Raspberry Pi, the processor accesses the specific physical addresses (seems that preconfigured by the manufacturer?) without configuring some registers beforehand.
I thought that there are some specific BAR register that arbitrates the read/write request to specific physical addresses to peripherals. However, when I check the spec for BCM2835 (Raspberry pi 3), the physical addresses are translated by another MMU called VC/ARM MMU. Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?
Also, I was wondering how the SMMU (IOMMU in x86) is utilized in this concept. I found one article mentioning that the VC/ARM MMU is an example of SMMU but I think that is not true? When I check some monitor code & kernel driver code implementation (not raspberry pi), it seems that the SMMU is also mapped to specific physical addresses and the monitor/kernel uses those addresses to initialize and communicate with SMMU. If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU..?
Lastly, I thought that all peripherals are managed by the SMMU. If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?
I am just answering your questions. There are many things wrong with some assumptions you have, which Old Timer tries to clarify.
Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?
No. Broadcom has an architecture license. They designed there own HDL code for the ARM CPU and system details can be different. In fact, it is quite common for even direct license (using ARM HDL) that the systems differ from vendor to vendor.
If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU?
If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?
The typical solution to this is 'TrustZone'. Here, the access is defined as having a 'secure' or 'normal' access. The master (CPU) tags the access and either the peripheral or a bus access controller permits or prohibits access to the peripheral. These are best statically mapped at boot time and locked.
By defining permitted use cases, the peripheral/master access patterns can be defined for the system. The complication is 'dynamic' peripherals and masters. The ARM TrustZone CPU is dynamic. A 'world switch' changes the CPU access and there are attacks on the communication interface between 'normal' and 'secure' worlds.
The ARM AXI/AHB buses were originally designed for embedded devices. PCs in contrast have dynamic buses ISA->EISA->PCMIA->PCI (etc.) These addresses are typically dynamic. Note However, this bus structure has an expense. So some of your question is like asking why isn't ARM just like an x86 PC. They had different goals and they are different. You can't put a new grahpics card into your cell phone.
Reference:
Handling ARM Trustzones
Explanation of arm Bus architecture
ARM Differences from PC
Note: It is from the modular nature of the PC system which was part of why the PC dominated Apple, Commodore, Atari, etc. in my opinion (the other aspect was piracy), contrary to others who think the answer is Microsoft. The hardware matters.

Buffer depth of UART in Arm cortex M7 processor

Can any one tell me what is the buffer depth of Arm cortex M7 processor.Where can i find?Since i am net to the processor development?i need to know what is the buffer depth of UART in Arm Cortex M7 processor.
Arm CORTEX micros do not have any uarts. Those ones are developed by the chip venfors, and every single vendor has its own design. UARTS do not have hardware buffers only FIFOs of the size 2 or 4 depending on the uart design and vendor.
Microcontroller vendors design the UART part, ARM is only designing cortex -m core, NVIC (nested vectored interrupt controller) and some buses. One mcu vendor can put 1 byte uart buffer while another one can design it with 10 bytes fifo. That totally depends on the design of the mcu.

Boot process in android devices

I was reading this informative blog post here about the booting process on x64. In what ways is the booting process on ARM different? I had a look at Raspberry pi and it seems that the GPU is what executes before control is handed over to ARM processor. Are there any similar resources you have come across for ARM processors?
Just like the x86 boot process is documented in documents available at intel.com arm boot process is documented at arm.com as is any other processor documented.
The full sized (non-cortex-m) arm cores start by executing at address zero for reset. There is one instruction location for reset, one for data abort, undefined instruction, etc. Similar to an interrupt vector table but instead of an address there is an instruction there ideally a branch.
processors historically have some non-volatile ram mapped into the boot space or vector table or whatever, then volatile ram if any elsewhere. x86 historically near the top of ram, arm at the bottom.
ARM does not make chips like intel, it designs processor cores which other folks that make chips include in their chip designs instead of having to design their own cores and maintain compilers, etc. So the chip vendor can solve the boot process in a number of ways, some have something non-volatile mapped low then after booting you can swap in ram to that address space, whatever. In the case of the Raspberry Pi chips which are made by broadcom, they have their own gpu which actually boots the chip, it eventually reads a file assumed to be the linux kernel and root file system, but doesnt have to be. It places that file in ram (by default) in the place where a linux kernel would be loaded by a bootloader like redboot or uboot, in this case by the gpu's arm loader. the gpu then places a few breadcrumbs including the reset instruction(s) required to branch into the linux kernel (generally a very trivial thing), then release reset on the arm core allowing it to boot. so basically the arm sees only ram which is kind of nice, but that is somewhat atypical for arm or other processors to do that. Normally the main processor boots from non-volatile storage (eeprom, flash, etc) and then itself loads linux or whatever and branches to it.
A number of other processor types will have an interrupt vector table, including the arm cortex-m series which are thumb instruction set only cores. they are designed to be microcontrollers and not carry as much overhead, so the first address slot is actually meant to be filled with the init value for the stack pointer, then the second is the address for reset, and then a zillion others. The hardware is designed to preserve registers for you so that you can have the address to C functions right in the table and not have to have "some assembly required" wrappers written by you or the folks that ported the toolchain to this platform. Other processor types will just have a vector table and some assembly is required to be wrapped around interrupts and such.

Explaination of ARM (especifically mobile) Peripherals Addressing and Bus architecture?

I will first say that I'm not expert in the field and my question might contain misunderstanding, in which case, I'll be glad if you correct me and attach resources so I can learn further details.
I'm trying to figure out the way that the system bus and how the various devices that appear in a mobile device (such as sensors chips, wifi/BT SoC, touch panel, etc.) are addressed by the CPU (and by other MCUs).
In the PC world we have the bus arbitrator that route the commands/data to the devices, and, afaik, the addresses are hardwired on the board (correct me if I'm wrong). However, in the mobile world I didn't find any evidence of that type of addressing; I did find that ARM has standardized the Advanced Microcontroller Bus Architecture, I don't know, though, whether that standard applied for the components (cpu-cores) which lies inside the same SoC (that is Exynos, OMAP, Snapdragon etc.) or also influence peripheral interfaces. Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
A more basic question would be whether there even exist a bus management in the mobile device architecture or maybe there is some kind of "star" topology (where the CPU is the center).
From this question I get the impression that these devices are considered as platform devices, i.e., devices that are connected directly to the CPU, and not through a bus. Still, my question is how does the OS knows how to address them? Then other threads, this and this about platform devices/drivers made me confused..
A difference between ARM and the x86 is PIO. There are no special instruction on the ARM to access an I/O device. Everything is done through memory mapped I/O.
A second difference is the ARM (and RISC in general) has a separate load/store unit(s) that are separate from normal logic.
A third difference is that ARM licenses both the architecture and logic core. The first is used by companies like Apple, Samsung, etc who make a clean room version of the cores. For the second set, who actually buy the logic, the ARM CPU will include something from the AMBA family.
Other peripherals from ARM such as a GIC (Cortex-A interrupt controller), NVIC (Cortex-M interrupt controller), L2 controllers, UARTs, etc will all come with an AMBA type interface. 3rd party companies (ChipIdea USB, etc) may also make logic that is setup for a specific ARM bus.
Note AMBA at Wikipedia documents several bus types.
APB - a lower speed peripheral bus; sort of like south bridge.
AHB - several versions (older north bridge).
AXI - a newer multi-CPU (master) high speed bus. Example NIC301.
ACE - an AXI extension.
A single CPU/core may have one, two, or more master connection to an AXI bus. There maybe multiple cores attached to the AXI bus. The load/store and instruction fetch units of a core can use the multiple ports to dispatch requests to separate slaves. The SOC vendor will balance the number of ports with expected memory bandwidth needs. GPUs are also often connected to the AXI BUS along with DDR slaves.
It is true that there is no 100% standard topology; especially if you consider all possible future ARM designs. However, typical topologies will include a top level AXI with some AHB peripherals attached. One or multiple 2nd level APB (buses) will provide access to low speed peripherals. Not every SOC vendor wants to spend time to redesign peripherals and the older AHB interface speeds maybe quite fine for a device.
Your question is tagged embedded-linux. For the most part Linux just needs to know the physical addresses. On occasion, the peripheral BUS controllers may need configuration. For instance, an APB may be configure to allow or disallow user mode. This configuration could be locked at boot time. Generally, Linux doesn't care too much about the bus structure directly. Programmers may have coded a driver with knowledge of the structure (like IRAM is fasters, etc).
Still, my question is how does the OS knows how to address them?
Older Linux kernels put these definitions in a machine file and passed a platform resource structure including interrupt number, and the physical address of a register bank. In newer Linux versions, this information is included with Open Firmware or device tree files.
Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
The physical addresses are set by the SOC manufacturer. Linux platform support will use the MMU to map them as non-cacheable to some un-used range. Often the physical addresses may be very sparse so the virtual remapping pack more densely. Each one incurs a TLB hit (MMU cache).
Here is a sample SOC bus structure using AXI with a Cortex-M and Cortex-A connected.
The PBRIDGE components are APB bridges and it is connected in a star topology. As others suggests, you need to look a your particular SOC documentation for specifics. However, if you have no SOC and are trying to understand ARM generally, some of the information above will help you, no matter what SOC you have.
1) ARM does not make chips, they make IP that is sold to chip vendors who make chips. 2) yes the amba/axi bus is the interface from ARM to the world. But that is on chip, so it is up to the chip vendor to decide what to hook up to it. Within a chip vendor you may find standards or habits, those standards or habits may be that for a family of parts the same peripherals may be find at the same addresses (same uart peripheral, same spi peripheral, clock tree, etc). And of course sometimes the same peripheral at different addresses in the family and sometimes there is no consistency. In the intel x86 world intel makes the processors they have historically made many of the peripherals be they individual parts to super I/O parts to north and south bridges to being in the same package. Intels processor success lies primarily in reverse compatibility so you can still access a clone uart at the same address that you could access it on your original ibm pc. When you have various chip vendors you simply cannot do that, arm does not incorporate the peripherals for the most part, so getting the vendors to agree on stuff simply will not happen. This has driven folks crazy yes, and linux is in a constant state of emergency with arm since it rarely if ever works on any platform. The additions tend to be specific to one chip or vendor or nuance not caring to check that the addition is in the wrong place or the workaround or whatever does not apply everywhere and should not be applied everywhere. The cortex-ms have taken a small step, before the arm7tdmi you had the freedom to use whatever address space you wanted for anything. The cortex-m has divided the space up into some major chunks along with some internal addresses (not just the cortex-ms this is true on a number of the cores). But beyond a system timer and maybe a interrupt controller it is still up to the chip vendor. The x86 reverse compatibility habits extend beyond intel so pcs have a lot of consistency across motherboard vendors (partly driven by software that they want to run on their system namely windows). Embedded in general be it arm or mips or whomever puts stuff wherever and the software simply adapts so embedded/phone software the work is on the developer to select the right drivers and adjust physical addresses, etc.
AMBA/AXI is simply the bus standard like wishbone or isa or pci, usb, etc. It defines how to interface to the arm core the processor from arm, this is basically on chip, the chip vendor then adds or buys from someone IP to bridge the amba/axi bus to pci or usb or dram or flash, etc, on chip or off is their choice it is their product. Other than perhaps a few large chunks the chip vendor is free to define the address space, and certainly free to define what peripherals and where. They dont have to use the same usb IP or dram IP as anyone else.
Is the arm at the center? Well with your smart phone processors you tend to have a graphics coprocessor, so then you have to ask who owns the world the arm, the gpu, or someone else? In the case of the raspberry pi which is to some extent one of these flavor of processors albeit older and slower now, the gpu appears to be the center of the world and the arm is a side fixture that has to time share on the gpu's bus, who knows what the protocol/architecture of that bus is, the arm is axi of course but is the whole chip or does the bridge from the arm to gpu side also switch to some other bus protocol? The point being is the answer to your question is no there is no rule there is no standard sometimes the arm is at the center sometimes it isnt. Up to the chip and board vendors.
not interested in terminology maybe someone else will answer, but I would say outside an elementary sim you wont have just one peripheral (okay I will use that term for generic stuff the processor accesses) tied to the amba/axi bus. You need a first level amba/axi interface that then divides up the address space per your design, and then using amba/axi or whatever bus protocol you want (generally you adapt to the interface for the purchased or designed IP). You, the chip vendor decides on the address space. You the programmer, has to read the documentation from the chip vendor or also board vendor to find the physical address space for each thing you want to talk to and you compile that knowledge into your operating system or application per the rules of that software or build system.
This is not unique to arm based systems you have the same problem with mips and powerpc and other cores you can buy in ip form, for whatever reason arm has dominated the world (there are many arm processors in or outside your computer for every x86 you own, x86 processors are extremely low volume compared to arm based). Like Gates had a desktop in every home, a long time ago ARM had a "touch an ARM once a day" type of a thing to push their product and now most things with a power switch and in particular with a battery has an arm in it somewhere. Which is a nightmare for developers because there are so many arm cores now with nuances and every chip vendor and every family and sometimes members within a family are different so as a developer you simply have to adapt, write your stuff in a modular form, mix and match modules, change addresses, etc. Making one binary like windows does for example that runs everywhere, is not in any way a wise goal for arm based products. Make the modules portable and build the modules per target.
Each SoC will be designed to have its own (possibly configurable) memory map. You will need to read the relevant technical reference manual to get the exact details.
Examples are:
Raspeberry pi datasheet (pdf)
OMAP 5 TRM

Send Inter-Processor Interrupts in Zynq (arm-v7 / cortex-a9)

I am trying to add multiprocessor support for an embedded operating system (DNA-OS) on the Zynq platform in the ZedBoard.
The OS is actually flawlessly functional with CPU_0 alone. The OS architecture requires the implementation of a cpu_send_ipi function in order to activate multiprocessing support: Basically, this function would interrupt a processor and give him a new thread to process.
I looked for an IPI register in the ug585 (Technical Reference Manual for Zynq) but couldn't find any.
I tried digging further in the Cortex-A9 spec for an IPI register, and found out that software generated interrupts could be used as IPI.
After adding software interrupt support to my OS, the problem is that CPU_0 can interrupt itself, but cannot interrupt CPU_1 !
PS: for my OS to handle SGIs, I used the register spec from the ug585 in page 1486:
So is there any other special configuration to permit CPUs to interrupt each others? or any other way to implement IPI ?
Regards,
Your reference documentation is a form of the GIC (global interrupt controller). The Cortex-A9 MP cores include an integrated GIC controller. Each CPU includes an Interrupt interface. As well, there is a system wide distributor. In order to receive the IPI (also known as SGI or software generate interrupt), you need to enable the CPU interface to receive the SGI interrupts on the 2nd CPU. This entails several steps,
Configuring the GIC interrupt interface registers on CPU2.
Setting the CP15 vector table for CPU2.
Enabling the CPSR I-bit on CPU2
Possibly setting up some banked PPI distributor registers. note1
Note1: While most distributor registers are system global, some are banked per CPU as well. For instance, see section 3.3.8. PPI Status Register in the Cortex-A9 MPcore TRM. I don't see any from a cursory investigation, but I would not rule it out.
Testing that an unused SPI (shared peripheral interrupt) works by handling the vector on CPU2 by setting the GIC distributor GICD_ISPEND register on the CPU1. This should verify that you have steps 2 and 3 covered. You may also need to set the type to ensure that they are interrupts and not FIQ; especially if you have security support. You need to use the GICD_ITARGETSR register to include CPU2.
GIC reference list
ARM Generic GIC document - registration needed, GICv1 (ignore GICv2 info).
ARM Cortex-A9 MPcore TRM - chapter 3, for specific info.
PL390 TRM - it is not spelled out anywhere, but I think this is the integrated GIC. It maybe worth looking at if you use more esoteric features.
Especially useful in the Appendix B of the Generic GIC manual. For some reason, ARM likes to keep changing the register names in each and every document they publish.

Resources