Unified ARM Cortex M3 documentation - arm

I have discovered to some extend overlapping documentation between
ARM core (from ARM) and vendor specific documentation, which repeats
information of ARM core already given by ARM itself. Given
STM32F10xxx/20xxx/21xxx/L1xxxx as an device family example, there is
quite a lot of redundant core specific information distributed over
6 datasheets and manuals, coming from ARM and STM.
My question is, are there any efforts to make those information less
redundant from ARM core implementing companies ? Thank for any replies.

typically with arm and I assume other IP vendors cores (mips, etc) you want to get the documentation for the core specifically from the ip vendor (arm). In this case http://infocenter.arm.com get the ARM Architectural Reference Manual (ARM ARM) for the ARMv7-M, and then under Cortex-M get the Cortex-m3 or m4 or m0 Technical Reference Manual (TRM), ideally get the manual specific to the revision of the core in the device, if the chip vendor has provided that information (even if it is not the most recent manual).
From a chip vendors perspective any time you create a programmers reference manual when you have purchased ip inside your part you still want to have a complete manual for the users. I think ARM restricts how much you can/should publish and prefers to provide core documentation. Peripherals like the timer inside the Cortex-M3 which has an offset driven by ARM but a base address driven by the chip vendor you will often see described in both chip and ip vendor manuals. Unlike ARM some IP vendors dont want users to see their manuals they want the chip vendor to deal with providing and supporting programming support. So on an IP vendor by IP vendor basis and on a chip vendor by chip vendor basis you are going to see a large mix of solutions to the documentation problem.
Within a single chip vendor company you will see the same habits, perhaps because there is only one person or team that writes all the manuals for a family of chips or through company policies/practices, or because the next chip manual steals a fair amount of logic and content from the prior chip and chip manual (cut and paste and change a few things as needed).
What you wont see is two competing chip vendors standardizing. See if you can get Intel and AMD to jointly create an x86 manual or conform to the same standards for chip pinout, footprint, power, etc. One might copy the other to make a drop in replacement or source code compatible part, but you wont normally see chip vendors get along that well. It normally is not in their best interest to do so. They might have to buy each others ip blocks or have lawyers write up patent royalty payment agreements and such things, but you wont see them sit side by side and work on something unless forced for some reason.
The perfect example is the system timer inside the cortex-m3 or information about internal core interrupt information or something like that, esp if the vendor has added their own interrupt logic and registers outside the arm core. Rarely, unless a successfully licensed clone (xscale) will you see a full instruction set.
If the vendors were to publish that information themselves it would be more chaotic, less standardized. Forcing programmers to go here and there and the other place to find docs that have moved or dont exist because you dont want to or wont publish in your document is a bit off putting to programmers, at the same time though the experience is consistent, arm programmers use ARM docs, and the arm core side of it is a consistent experience across chip vendors.

Related

Writing device library C/C++ for STM32 or ARM

I need to develop device libraries like uBlox, IMUs, BLE, ecc.. from scratch (almost). Is there any doc or tutorial that can help me?
Question is, how to write a device library using C/C++ (Arduino style if you want) given a datasheet and a platform like STM32 or other ARMs?
Thanks so much
I've tried to read device libraries from Arduino library and various Github, but I would like to have a guide/template to follow (general rules) to write proper device libraries from a given datasheet.
I'm not asking a full definitive guide, just where to start, docs, methods approach.
I've found this one below, but is very basic and quite lite for my targets.
http://blog.atollic.com/device-driver-development-the-ultimate-guide-for-embedded-system-developers
I don't think that you can actually write libraries for STM32 in Arduino style. Most Arduino libraries you can find in the wild promote ease of usage rather than performance. For example, a simple library designed for a specific sensor works well if reading the sensor and reporting the results via serial port is the only thing that firmware must do. When you work on more complex projects where uC has lots to do and satisfy some real time constraints, the general Arduino approach doesn't solve your problems.
The problem with STM32 library development is the complex connection between peripherals, DMA and interrupts. I code them in register level without using the Cube framework and I often find myself digging the reference manual for tables that shows the connections between DMA channels or things like timer master-slave relations. Some peripherals (timers mostly) work similar but each one of them has small differences. It makes development of a hardware library that fits all scenarios practically impossible.
The tasks you need to accomplish are also more complex in STM32 projects. For example, in one of my projects, I fool SPI with a dummy/fake DMA transfer triggered by a timer, so that it can generate periodic 8-pulse trains from its clock pin (data pins are unused). No library can provide you this kind of flexibility.
Still, I believe not all is lost. I think it may be possible to build an hardware abstraction layer (HAL, but not The HAL by ST). So, it's possible to create useful libraries if you can abstract them from the hardware. A USB library can be a good example for this approach, as the STM32 devices have ~3 different USB peripheral hardware variations and it makes sense to write a separate HAL for each one of them. The upper application layer however can be the same.
Maybe that was the reason why ST created Cube framework. But as you know, Cube relies on external code generation tools which are aware of the hardware of each device. So, some of the work can be avoided in runtime. You can't achieve the same result when you write your own libraries unless you also design a similar external code generation tool. And also, the code Cube generates is bloated in most cases. You trade development time for runtime performance and code space.
I assume you will be using a cross toolchain on some platform like Linux, and that the cross toolchain is compatible with some method to load object code on the target CPU. I also assume that you already have a working STM32 board that is documented well enough to figure out how the sensors will connect to the board or to the CPU.
First, you should define what your library is supposed to provide. This part is usually surprisingly difficult. It’s a bit hard to know what it can provide, without knowing a bit about what the hardware sensors are capable of providing. Some iteration on the requirements is expected.
You will need to have access to the documentation for the sensors, usually in the form of the manufacturer’s data sheets. Using the datasheet, and knowing how the device is connected to the target CPU/board, you will need to access the STM32 peripherals that comprise the interface to the sensors. Back to the datasheets, this time for the STM32, to see how to access its peripheral interfaces. That might be simple GPIO bits and bytes, or might be how to use built-in peripherals such as SPI or I2C.
The datasheets for the sensors will detail a bunch of registers, describing the meaning of each, including the meanings of each bit, or group of bits, in certain registers. You will write code in C that accesses the STM32 peripherals, and those peripherals will access the sensors across the electrical interface that is part of the STM32 board.
The workflow usually starts out by writing to a register or three to see if there is some identifiable effect. For example, if you are exercising a digital IO port, you might wire up an LED to see if you can turn it on or off, or a switch to see if you can correctly read its state. This establishes that your code can poke or peek at IO using register level access. There may be existing helper functions to do this work as part of the cross toolchain. Or you might have to develop your own, using pointer indirection to access memory mapped IO. Or there might be specially instructions needed that can only be accessed from inline assembler code. This answer is generic as I don’t know the specifics of the STM32 processor or its typical ecosystem.
Then you move on to more complex operations that might involve sequences of operations, like cycling a bit or two to effect some communication with the device. Or it might be as simple as finding the proper sequence of registers to access for operation of a SPI interface. Often, you will find small chunks of code are complete enough to be re-used by your driver; like how to read or write an individual byte. You can then make that a reusable function to simplify the rest of the work, like accessing certain registers in sequence and printing the contents of register that you read to see if they make sense. Ultimately, you will have two important pieces of information: and understanding of the low-level register accesses needed to create a formal driver, and an understanding of what components and capabilities make up the hardware (ie, you know how the device(s) work).
Now, throw away most of what you’ve done, and develop a formal spec. Use what you now know to include everything that can be useful. Use what you now know to develop a spec that includes an appropriate interface API that your application code can use. Rewrite the driver, armed with the knowledge of how are the pieces work, and taking advantage of the blank canvas afforded you by the fresh rewrite of the spec. Only reuse code that you are completely confident is optimal and appropriate to the format dictated by the spec. Write test code for all of the modules, and use the test code to actually test that the code works and that it conforms to the spec. Re-use the test code every time you modify anything it tests.

ARM Architecture Initialization

In the case of x86 the same (real mode) bootloader works on virtually any x86 device.
Is that possible on ARM or do I need to create a specific bootloader for each 'cortex'?
x86 or lets say PC compatible systems are ... pc compatible. They support the ancient bios calls so that there is massive compatibility. by design, by the chip vendor (intel) the software vendors (bios, operating system) and the motherboard vendors.
ARM is in now way shape or form like that. There are instruction sets you can choose that work almost or all the way across, but remember ARM systems you buy an ARM core and add it to your special chip, you and your special/custom stuff, then that is put on one or more different boards. There is little to no compatibility. Instruction set and arm core is a small part of the whole picture most of the code is for the non-arm stuff.
u-boot and perhaps others are fairly massive bootloaders, pretty much an operating system themselves, and have to be ported just like an operating system to each chip/board combination. The chip vendor, if this is a linux compatible system, most likely has a reference design and a BSP including a u-boot port and/or some other solution (rasberry pi is a good example). it is fairly trivial to boot linux or used to be, there is no reason for the massively overcomplicated u-boot. without a DTB you setup a few memory locations a register or two and branch to the kernel, thats it (again look at the raspberry pi), I assume with DTB you build the dtb then put it somewhere, setup a few registers and branch to the linux kernel (raspberry pi? ntc chip?)
There is a Arm open source project that can cover Armv7/v8 Cortex-A processors bootloaders.
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/
Another open source project for Cortex-M processors:
https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/

How safe is my code inside an ARM Cortex-M? Threat of hackers pulling it? MPU required?

I have an older product that is Microchip PIC 18F based. It's being knocked off in China. The clones are a cheap copy of the hardware, but 1:1 software. It's clearly my code on these. The code is programmed at initial device assembly and never again (no updates, no bootloader, etc). So the only way this could happen is if they defeated the read protections on the PIC 18F. I don't think that's unreasonable to assume given the age and my impression of that chip that lead me to switch away from them before this came to my attention.
I've migrated all new projects to ARM (M0+, M3, M4) already for other reasons. This doesn't do anything for my old code now. I am hoping the protections are better on the ARM Cortex chips (NXP, ST, Freesale, Ti, etc). I can find very little information on how this works.
Is it possible to defeat the chip read protections in place on ARM? Assuming full JTAG, SerialWireDebug, whatever. Even if you decap the chip to expose the die? Even if you really knew what you were doing? How safe is it? Because... 8bit PIC is apparently extremely unsafe.
I had a thought that in new projects I could require connection to a our server where I would record the unique ID (96-128bit) and authorize the device from there. Clones ID's would not be recognized. This is a logistical mess because I'll have a master list from the assembler, it'll be online, the user's device code will have to be sent and authorized with a firmware download... There are a few places for spoofing and abuse. This is a hassle on many levels. Are there simpler ways of ensuring protection than this: ?
We record the unique id at manufacturing
The user locks his name/info/unique number into the part
The user plugs into USB
Our web/java software talks with module
We store the unique ID of the chip and their unique info
If the chip isn't recognized (id not in the list), stop
If the chip is already linked to another user's info (spoofed ID), stop
If it's a new or already verified module, allow the user to work with it
The above is under the assumption that the unique id's built into the ARM die and could never be written over. Seems likely. But every mfg seems have to a different system in place, this seems to be a peripheral feature and not part of the core (some STM chips it's 96bit, some Freescale it's 128bit). I assume this is largely what the unique ID's are for, serialization and encryption schemes.
(Note: I hate DRM. And that's not really what I'm trying to do, but this product is part of a system that could be indirectly responsible for people's lives. We cannot have cheap knockoffs from China out there for many reasons, even if the software in mine, we can't verify their cheap hardware.)
I would recommend the following:
Look at ARM chips that have TrustZone and encryption. Some have per peripheral key locking.
If you want absolute security, then buy the arm IP and design a silicon chip with the code embedded in the hardware with no way to read/write it. Make it part of the silicon logic or an on chip ROM.

msp432 - Can code written for the TI MSP432 ARM Cortex M4 be ported automatically to other Cortex M4 microcontrollers?

I will preface by saying that I am fairly new to programming at the hardware level and that I'm interested in building apps based on the MSP432 microcontroller by Texas Instruments.
I understand that to program this controller, one writes C code, links to the MSPWare library/drivers, and compiles with gcc. Is it possible to take the code written for this controller and deploy to other controllers that are also based on the the Cortex M4 32-bit architecture? What sorts of differences are there between the various implementations of the Cortex M4?
I will say generally not, unlike an x86 pc or a mac, where the masses at least are used to one operating system and that operating system allows for a lot of reverse compatibility, you write a program today on a dell and it works on an acer, and will probably run for another 10 years or more on your daily driver computer or at least many 10 year old or more programs run today, actually in 10 years todays programs probably wont run (on your phone or brain implant).
the cortex-m4 is a processor core, arm doesnt make chips they make processor cores that chip companies buy and surround with chip company stuff. So instead of saying I can drive a car move me from one car to another and there is a pretty good chance I can drive it, instead this is I am a specific sized tire and move me from one car to another and it is quite likely I dont work.
almost all of the code in the libraries that you are making calls to are for things within the chip, but outside the arm core, the chip vendor specific stuff. So while that particular chip vendor may make libraries that are close to the same from one of their arm chips to another within a class of chips or within the same production time frame or whatever, that doesnt mean that that code apis or how the peripherals work is in any way portable from one family of chips in that company to another and certainly not from one chip vendor to another. your ti code is likely not close to what you see on an atmel or nxp or st arm based chip.
Now saying that there are folks trying, the mbed stuff is an attempt to be arduino like where arduino is an attempt to make a high enough set of libraries and port them to specific boards (which are mostly within a family of chips from one vendor). There are some arm based attempts to make arduino libraries such that code developed on a real arduino will compile for these arm based things and just work, but those arm based things are specific boards designed to be ardunio compatable and the libraries are thick and hold all the conversion magic from avr/atmel peripherals to whatever arm based chip was chosen.
mbed is probably closer to it, originally just nxp chips but now some st boards with st chips that are trying to be both arduino compatible and mbed compatible. not sure how that will work out.
then there are phones of course but that is a lot closer to the windows thing write an iphone app and it will/should work on all the iphones for some period of time, even though those phones all use different arm based chips from different vendors with vastly different peripherals.
This question likely will be closed for being primarily opinion based, since it is not really a black and white fact question. I suggest you just enjoy the board you bought, make the leds blink and stuff, get used to dealing with a whole new environment compared to operating system stuff, and the very limited resources compared to a laptop/desktop.
If you have a specific porting question or something that is more of a question with a more specific answer then ask it that way. If you are wanting to play with this but ultimately do X with it (port the code to an stm32f4 for example), will it work.
Now, it is quite likely that if you wanted to create your own abstraction layer, then you could create it such that it works on top of multiple chips/platforms.
Arm has this cmsis thing but I think that is for the debuggers to get common access to the board, you may or may not know or have noticed that the access to the stellaris launchpad now tiva C is a different interface/protocol than the one used now. The one used now is on the hercules and now msp432 (I hate that, it is in no way shape or form related to the msp430, perhaps this is a pic vs pic32 thing which in no way are related to each other except for being from the same parent company) uses the same XDS100 compatible front end. The thing that was formerly a board with an attempt to be arduino like easy to use web based environment (arduino is java based not web based, but run anywhere is the idea) and a lot of libraries so you dont have to know as many details, this is mbed, now mbed appears to becoming an rtos or something so kind of like writing for arduino or for android, you might...might...be able to develop on top of that and have it port. Understand the more layers the thicker the abstraction layer the more resources you need the more power you consume the more the chip cost, etc. So it is a tradeoff of saving a little bit of software development time vs the price or size or power consumption of the product. We dont know, nor necessarily need to know, what you are making, that is your business but, there are tradeoffs to making the software "simpler", portable, readable, etc...

Explaination of ARM (especifically mobile) Peripherals Addressing and Bus architecture?

I will first say that I'm not expert in the field and my question might contain misunderstanding, in which case, I'll be glad if you correct me and attach resources so I can learn further details.
I'm trying to figure out the way that the system bus and how the various devices that appear in a mobile device (such as sensors chips, wifi/BT SoC, touch panel, etc.) are addressed by the CPU (and by other MCUs).
In the PC world we have the bus arbitrator that route the commands/data to the devices, and, afaik, the addresses are hardwired on the board (correct me if I'm wrong). However, in the mobile world I didn't find any evidence of that type of addressing; I did find that ARM has standardized the Advanced Microcontroller Bus Architecture, I don't know, though, whether that standard applied for the components (cpu-cores) which lies inside the same SoC (that is Exynos, OMAP, Snapdragon etc.) or also influence peripheral interfaces. Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
A more basic question would be whether there even exist a bus management in the mobile device architecture or maybe there is some kind of "star" topology (where the CPU is the center).
From this question I get the impression that these devices are considered as platform devices, i.e., devices that are connected directly to the CPU, and not through a bus. Still, my question is how does the OS knows how to address them? Then other threads, this and this about platform devices/drivers made me confused..
A difference between ARM and the x86 is PIO. There are no special instruction on the ARM to access an I/O device. Everything is done through memory mapped I/O.
A second difference is the ARM (and RISC in general) has a separate load/store unit(s) that are separate from normal logic.
A third difference is that ARM licenses both the architecture and logic core. The first is used by companies like Apple, Samsung, etc who make a clean room version of the cores. For the second set, who actually buy the logic, the ARM CPU will include something from the AMBA family.
Other peripherals from ARM such as a GIC (Cortex-A interrupt controller), NVIC (Cortex-M interrupt controller), L2 controllers, UARTs, etc will all come with an AMBA type interface. 3rd party companies (ChipIdea USB, etc) may also make logic that is setup for a specific ARM bus.
Note AMBA at Wikipedia documents several bus types.
APB - a lower speed peripheral bus; sort of like south bridge.
AHB - several versions (older north bridge).
AXI - a newer multi-CPU (master) high speed bus. Example NIC301.
ACE - an AXI extension.
A single CPU/core may have one, two, or more master connection to an AXI bus. There maybe multiple cores attached to the AXI bus. The load/store and instruction fetch units of a core can use the multiple ports to dispatch requests to separate slaves. The SOC vendor will balance the number of ports with expected memory bandwidth needs. GPUs are also often connected to the AXI BUS along with DDR slaves.
It is true that there is no 100% standard topology; especially if you consider all possible future ARM designs. However, typical topologies will include a top level AXI with some AHB peripherals attached. One or multiple 2nd level APB (buses) will provide access to low speed peripherals. Not every SOC vendor wants to spend time to redesign peripherals and the older AHB interface speeds maybe quite fine for a device.
Your question is tagged embedded-linux. For the most part Linux just needs to know the physical addresses. On occasion, the peripheral BUS controllers may need configuration. For instance, an APB may be configure to allow or disallow user mode. This configuration could be locked at boot time. Generally, Linux doesn't care too much about the bus structure directly. Programmers may have coded a driver with knowledge of the structure (like IRAM is fasters, etc).
Still, my question is how does the OS knows how to address them?
Older Linux kernels put these definitions in a machine file and passed a platform resource structure including interrupt number, and the physical address of a register bank. In newer Linux versions, this information is included with Open Firmware or device tree files.
Specifically I'm asking what component is responsible on allocating addresses to peripheral devices and MMIO addresses?
The physical addresses are set by the SOC manufacturer. Linux platform support will use the MMU to map them as non-cacheable to some un-used range. Often the physical addresses may be very sparse so the virtual remapping pack more densely. Each one incurs a TLB hit (MMU cache).
Here is a sample SOC bus structure using AXI with a Cortex-M and Cortex-A connected.
The PBRIDGE components are APB bridges and it is connected in a star topology. As others suggests, you need to look a your particular SOC documentation for specifics. However, if you have no SOC and are trying to understand ARM generally, some of the information above will help you, no matter what SOC you have.
1) ARM does not make chips, they make IP that is sold to chip vendors who make chips. 2) yes the amba/axi bus is the interface from ARM to the world. But that is on chip, so it is up to the chip vendor to decide what to hook up to it. Within a chip vendor you may find standards or habits, those standards or habits may be that for a family of parts the same peripherals may be find at the same addresses (same uart peripheral, same spi peripheral, clock tree, etc). And of course sometimes the same peripheral at different addresses in the family and sometimes there is no consistency. In the intel x86 world intel makes the processors they have historically made many of the peripherals be they individual parts to super I/O parts to north and south bridges to being in the same package. Intels processor success lies primarily in reverse compatibility so you can still access a clone uart at the same address that you could access it on your original ibm pc. When you have various chip vendors you simply cannot do that, arm does not incorporate the peripherals for the most part, so getting the vendors to agree on stuff simply will not happen. This has driven folks crazy yes, and linux is in a constant state of emergency with arm since it rarely if ever works on any platform. The additions tend to be specific to one chip or vendor or nuance not caring to check that the addition is in the wrong place or the workaround or whatever does not apply everywhere and should not be applied everywhere. The cortex-ms have taken a small step, before the arm7tdmi you had the freedom to use whatever address space you wanted for anything. The cortex-m has divided the space up into some major chunks along with some internal addresses (not just the cortex-ms this is true on a number of the cores). But beyond a system timer and maybe a interrupt controller it is still up to the chip vendor. The x86 reverse compatibility habits extend beyond intel so pcs have a lot of consistency across motherboard vendors (partly driven by software that they want to run on their system namely windows). Embedded in general be it arm or mips or whomever puts stuff wherever and the software simply adapts so embedded/phone software the work is on the developer to select the right drivers and adjust physical addresses, etc.
AMBA/AXI is simply the bus standard like wishbone or isa or pci, usb, etc. It defines how to interface to the arm core the processor from arm, this is basically on chip, the chip vendor then adds or buys from someone IP to bridge the amba/axi bus to pci or usb or dram or flash, etc, on chip or off is their choice it is their product. Other than perhaps a few large chunks the chip vendor is free to define the address space, and certainly free to define what peripherals and where. They dont have to use the same usb IP or dram IP as anyone else.
Is the arm at the center? Well with your smart phone processors you tend to have a graphics coprocessor, so then you have to ask who owns the world the arm, the gpu, or someone else? In the case of the raspberry pi which is to some extent one of these flavor of processors albeit older and slower now, the gpu appears to be the center of the world and the arm is a side fixture that has to time share on the gpu's bus, who knows what the protocol/architecture of that bus is, the arm is axi of course but is the whole chip or does the bridge from the arm to gpu side also switch to some other bus protocol? The point being is the answer to your question is no there is no rule there is no standard sometimes the arm is at the center sometimes it isnt. Up to the chip and board vendors.
not interested in terminology maybe someone else will answer, but I would say outside an elementary sim you wont have just one peripheral (okay I will use that term for generic stuff the processor accesses) tied to the amba/axi bus. You need a first level amba/axi interface that then divides up the address space per your design, and then using amba/axi or whatever bus protocol you want (generally you adapt to the interface for the purchased or designed IP). You, the chip vendor decides on the address space. You the programmer, has to read the documentation from the chip vendor or also board vendor to find the physical address space for each thing you want to talk to and you compile that knowledge into your operating system or application per the rules of that software or build system.
This is not unique to arm based systems you have the same problem with mips and powerpc and other cores you can buy in ip form, for whatever reason arm has dominated the world (there are many arm processors in or outside your computer for every x86 you own, x86 processors are extremely low volume compared to arm based). Like Gates had a desktop in every home, a long time ago ARM had a "touch an ARM once a day" type of a thing to push their product and now most things with a power switch and in particular with a battery has an arm in it somewhere. Which is a nightmare for developers because there are so many arm cores now with nuances and every chip vendor and every family and sometimes members within a family are different so as a developer you simply have to adapt, write your stuff in a modular form, mix and match modules, change addresses, etc. Making one binary like windows does for example that runs everywhere, is not in any way a wise goal for arm based products. Make the modules portable and build the modules per target.
Each SoC will be designed to have its own (possibly configurable) memory map. You will need to read the relevant technical reference manual to get the exact details.
Examples are:
Raspeberry pi datasheet (pdf)
OMAP 5 TRM

Resources