Do a 8-bit write to a 32-bit register - c

I'm trying to read the clock source value of a given generic clock generator on the Samd21 MCU.
The datasheet says, that if I wish to read the GENCTRL register (containing the clock source value), I need to "do a 8-bit write" and read the register afterwards. How can I do that, given that the register is 32-bit?
I'm afraid, that by doing the following, I am actually changing the generic clock generator X's configuration:
GCLK->GENCTRL.reg = GCLK->GENCTRL.reg & 0xFFFFFFF0 | 0x0000000X
Keep in mind, that the lower 8-bits of GENCTRL are reserved for the generic clock generator's ID.
Bellow is part of the datasheet containing the instructions for reading the GENCTRL register.

The ARM registers are 32 bit. The peripheral registers (in general) will be arranged at 4 byte offsets, but will not always implement all 32 bits that this implies.
This is most obvious when the upper bits of a peripheral register are 'read as zero, write ignored'. You might occasionally see a newer or more featured version of the peripheral where some of these unused bits become used in the future.
Depending on exactly how a specific peripheral is connected to the core it is generally possible to perform byte, half-word or word accesses to any region of memory. Provided this is supported, only the relevant bytes will be updated. Where there is a restriction (for example a 32 bit APB bus where only byte access is supported), this should be clearly identified in the documentation. With a AA64 processor, it is even possible to write two registers at once!
Do note that the peripheral 'knows' the access size (at least the information is present on the internal bus), so it is possible to specify different behaviour for a byte access as a word (even if this is the sort of confusing behaviour that is best avoided). To generalise, any memory mapped peripheral is more of an observer of the bus than a true implementation of memory - the designer is free to play tricks with the full address/data/control bus bit combinations, and implement bitmasks, read/modify/write, access locks, magic values, etc.

Related

How to define memory pointers when programming AVR chips?

Preamble: after working a couple of years as application developer, the world of the software engineering became more obscure than it was before. The reason is that the real stuff is hidden under zillions layers of abstractions: OS, frameworks, etc. The young generation is deprived of the pleasure of working with PDP-like machines where all programming was done via electrical switch toggling. Another problem is the ephemeral nature of modern programming languages. Once there was Python 2.x, now it is deprecated and there is Python 3.x which in its turn will be deprecated in a couple of months. Idem for other languages. ANSI C looks like the Pyramid of Cheops: it was there in 70's and I don't doubt it will be there after the Sun will become a red dwarf.
It seems that now the only way to understand the interaction between the hardware and the software is to play with embedded development. From the pedagogical point of view physical chips are very handy because they allow to tackle the most difficult part of C language, namely pointers. When coding in OS environment, */& notation is still very confusing because it refers to some location somewhere inside of the virtual memory. And before you will got the understanding of what is the virtual memory, you have to read a couple of monographs about OS development, etc. You may find it stupid but I do really want to know which transistor is holding my bit right now. At least, I can wire physical pin voltage to programming abstractions.
Currently I am working with Atmel chips and WinAVR package because of numerous textbooks and accessible hardware. Though all books promise to teach AVR coding using plain C, the reality is that all pointers are hidden behind macros like PORTA, DDRB, etc. All code examples include header file 'io.h' which in its turn refers to other header files specific for a given chip like 'iomx8.h'. So far, I cannot find any macros definition in these headers. The code to increase the voltage on the physical pin 14 on Atmega168 looks like
DDRB = 0x01;
PORTB = 0x01;
Fortunately, Microchip site provides some basic documents where it is stated, for example, that if I want to rise the voltage on the physical pin 14, I need to follow these steps:
unsigned char *ddrB;
ddrB = (unsigned char*)0x24; // the address of ddrB is 0x24
*ddrB |= 0x01; // set up low impedance/ high current state for the transistor 0
unsigned char *portB;
portB = (unsigned char*)0x25;
*portB |= 0x01; // voltage on
*portB &= ~(0x01); // voltage off
Unfortunately, this is the only info I got after one week of the lurking. Now I am going through USART programming and the things become more complicated with all these UBRR0H, UCSR0C. Since provided header files don't contain macros definitions for any register, where else can I find it?
A similar question was asked several years ago: accessing AVR registers with C?. However, provided answers were somewhat useless, besides the clue that GCC itself can map some mythical PORTB to real physical locations. Could someone describe the mechanism behind the mapping?
From a memory-mapping standpoint: The general purpose registers, special function+I/O registers, and SRAM share non-overlapping ranges a single address space, as described in datasheets for various processors in the AVR series. All of your pointers will reference this memory space, unless annotated as pointers to PROGMEM (which will cause different instructions to be emitted). The reference will be made without any sort of virtual memory mapping.
For example, the ATtiny 25/45/85 has the following map shown on page 18:
Your linker is aware of this memory map and will place variables accordingly. For example, a global variable declared in one of your compilation units will end up in an address above 0x0060 in the example device described above, so that it ends up in the SRAM.
From an instruction encoding standpoint: Although there is one address space, there is special functionality reserved for certain important regions. For example, IN and OUT instructions have six bits in their instruction encoding which can be used to directly refer to one of the 64 addresses within [0x20, 0x5F).
The IN and OUT instructions are unique in their ability to load and store to a fixed address encoded directly in the instruction, since the normal load and store instructions require an indirect load with the 'Z' register being loaded first.
As a result, when the compiler sees memory operations to a fixed I/O register, it may generate these more efficient instructions. However, a normal load/store via a pointer will have the same effect (although with different numbers of clock cycles required). For extended I/O registers that didn't fit into the first 64 (e.g. OSCCAL on an atmega328p), normal load/store instructions will always be generated.
Short answer - hidden away in the included headers from Atmel are a collection of macros that create pointers to the register locations. If you want to see any of the source, as well as additional necessary headers like interrupt.h, they are in WinAVR-20100110/avr/include/
Here's a brief overview of the process:
Your Makefile defines the device to be used, and then passes it has a definition to the compiler.
DEVICE = atmega2560
...
-D__$(DEVICE)__
You then include io.h, which automatically includes the necessary headers based on your device:
// In main source file
#include <io.h>
// In io.h
#include <avr/sfr_defs.h>
// ...
#elif defined (__AVR_ATmega2560__)
# include <avr/iom2560.h>
// In sfr_defs.h
#define _MMIO_BYTE(mem_addr) (*(volatile uint8_t *)(mem_addr))
#define __SFR_OFFSET 0x20
#define _SFR_IO8(io_addr) _MMIO_BYTE((io_addr) + __SFR_OFFSET)
// In iom2560.h
#include <avr/iomxx0_1.h>
// Other device specific definitions
// Om iomxx0_1.h
#define PINA _SFR_IO8(0X00)
// Other device family shared definitions
So if you unroll all of that, what you get is a volatile pointer to the register address. When ever you use PINA in your code, the preprocessor replaces it with all of the expanded macros:
PINA
_SFR_IO8(0X00)
_MMIO_BYTE((0X00) + __SFR_OFFSET)
(*(volatile uint8_t *)((0X00) + 0x20))
Which specifies that PINA is a pointer to a volatile 8-bit memory address of 0x20. The internal chip architecture then maps that address to the appropriate peripheral register whenever it is accessed.
Different devices have different register addresses and offsets. If you want to define your own, you'll need to check out the relevant datasheet. For most AVR chips, there is a section towards the end titled "Register Summary" that lists all of the register addresses and names of the individual control bits. In my experience (for AVR, at least), the names of the registers and bits found in the datasheet are exactly what they are defined as in the io.h files.
Also notice the use of "uint8_t" rather than "char." It's common (and highly encouraged) to use the bit-width specific definitions found in <stdint.h> to specify signed/unsigned and 8/16/32 bit variables whenever appropriate. Since AVR is 8-bit, any use of 16 or 32 bit (or float) variables will require multiple clock cycles for each operation. In this case, stdint.h should have:
typedef unsigned char uint8_t

Processor word size? 8bit processor = 8 bit word?

I thought if I had an AVR 8 bit microcontroller the wordsize is 8 bits/1 byte? But in the datasheet it states that most AVR processors have a 16bit word. But it does not say that the specific processor has that. Weird to state something generally in a specific datasheet.
But what is the 8-bit, 32-bit MCU about if that is not the word size?
If wordsize = 2 byte then this is atomic in C right:
U16 Position;
Position = 1000;
But if the word is 1 byte I should disable interrupts (the interrupt uses this variable) when writing to this variable?
How slow is it to disable interrupts?
The traditional AVR family (i.e. ATtiny, ATmega, ATxmega, not the AVR32) are 8-bit MCUs working on 8-bit registers/accumulators, though there are a few 16-bit instructions such as when dealing with pointers through address pairs.
Unfortunately there is no universally accepted definition of what a "word" is. In this context I suspect that the author is simply referring to a 16-bit value as a word, as oppose to an 8-bit byte or a 32-bit double-word.
So, no, you cannot count on a 16-bit variable being accessed atomically. Thankfully some of the most important I/O registers, such as timers, where this matters have internal latches to hide the fact but you do need to be careful with RAM variables shared with interrupts.
Temporarily disabling interrupts is quite fast, a cycle each for the CLI/SEI instructions. One gotcha with certain compilers (ImageCraft comes to mind) is that using inline assembly like this in a function may disable optimizations so the actual cost can be somewhat higher. Consider disabling only the contentious interrupt in question to avoid this issue and to reduce latency.
Beware that unlike some other MCUs atomic bit access is normally restricted to a small subset of registers in the lowest I/O port range, typically a few PORTs and general-purpose registers.

How to load the largest integer possible in one memory operation?

I'm building a small bytecode VM that will run on a variety of platforms including exotic embedded and microcontroller environments.
Each opcode in my VM can be variable length(no more than 4 bytes, no less than 1 byte). In interpreting the opcodes, I want to create a tiny "cache" for the current opcode. However, due to it being used on many different platforms, it's hard to do.
So, here is a few examples of expected behavior:
On an 8-bit microcontroller with an 8-bit memory bus, I'd want it to only load 1 byte because it'd take multiple (slow) memory operations to load anymore, and in theory, it might only require 1 byte to execute the current opcode
On an 8086(16-bit), I'd want to load 2 bytes because to only load 1 byte we would basically be throwing some useful data away to be read later, but I don't want to load more than 2 bytes because it'd take multiple operations
On a 32-bit ARM processor, I'd want to load 4 bytes because otherwise we're either throwing data that'd might have to be read again away, or we're doing multiple operations
I would say this could be handled easily by just assuming that unsigned int is good enough, but on 8-bit AVR microcontrollers, int is defined as 16-bit, but the memory data bus width is only 8 bit, so 2 memory load operations would be required.
Anyway, current ideas:
using uint_fast16_t seems to work as expected on most platforms (32 bits on ARM, 16 bits on 8086, 64 bits on x86-64). However, it clearly still leaves out AVR and other 8-bit microcontrollers.
I thought using uint_fast8_t might work, but it would appear on most platforms that it's defined as being unsigned char, which definitely isn't optimal
Also, there is another problem that must be solved as well: unaligned memory access. On x86, this probably isn't going to be a problem(in theory it does 2 memory operations, but it's probably cached away in hardware), however on ARM I know that doing an unaligned 32-bit access could possibly cost 3 times as much as a single aligned 32-bit load. If the address is unaligned, I want to load the aligned option and get as much data as possible, but at all costs avoid another memory operation
Is there a way to somehow do this using magical preprocessor includes or some such, or does it just require manually defining the optimum cache size before compiling for the platform?
There is no automatic way to do this using the types or information provided by standard C (in headers such as and so on).
Problems such as this are sometimes handled by executing and measuring sample code on the target platform and using the results to determine what code to use in practice. The samples might be executed during a build and then built into the final code or might be executed at the start of each program execution and then used for the duration of execution.

Dealing with reserved register bits of an ARM chip

I am working with the registers of an ARM Cortex M3. In the documentation, some of the bits may be "reserved". It is unclear to me how I should deal with these reserved bits when writing on the registers.
Are these reserved bits even writeable? Should I be cautious to not touch them? Will something bad happen if I touch them?
This is a classic embedded world problem as to what to do with reserved bits! First, you should NOT write randomly into it lest your code becomes un-portable. What happens when the architecture assigns a new meaning to the reserved bits in future? Your code will break. So the best mantra when dealing with registers having reserved bits is Read-Modify-Write. i.e read the register contents, modify only the bits you want and then write back the value so that reserved bits are untouched ( untouched, does not mean we dont write into them, but in the sense, that we wrote that which was there before )
For example, say there is a register in which only the LSBit has meaning and all others are reserved. I would do this
ldr r0,=memoryAddress
ldr r1,[r0]
orr r1,r1,#1
str r1,[r0]
If there is no other clue in the documentation, write a zero. You cannot avoid writing to a few reserved bits spread around in a 32-bit register.
Read-Modify-Write should work most of the time, however there are cases where reserved bits are undefined on read but must be written with a specific value. See this post from the LPC2000 group (the whole thread is quite interesting too). So, always check the docs carefully, and also any errata that's available. When in doubt or docs are unclear, don't hesitate to write to the manufacturer.
Ideally you should read-modify-write, no guarantee for success, when you change to a newer chip with different bits, you are changing your code anyway. I have seen vendors where writing zeros to the reserved bits failed when they revved the chip and the code had to be touched. So there are no guarantees. The biggest clue is if in the vendors code you see a register or set that are clearly read-modify-write or clearly just a write. This could be different developers writing different sections of the example or there is a register in that peripheral that is sensitive, has an undocumented bit, and needs the read-modify-write.
On the chips that I work on I make sure that undocumented (to the customer), but not unused bits are marked in some way to stand out from other unused bits. We normally mark unused/reserved bits as zero, and these other bits get a name, and a must write this value marking. Not all vendors do this.
The bottom line is there is no guarantee, assume all documentation and example programs have bugs and you have to hack your way through to figure out what is right and what is wrong. No matter what path you take (read-modify-write, write zeros, etc) you will be wrong from time to time and have to re-do the code to match a hardware change. I strongly suggest that if a vendor has a chip id of some sort, that your software reads that ID and if it is an id that you have not tested your code against, declare a failure and not program that part. In production testing long before a customer sees the product, the part change will get detected and software will be involved in understanding the reason for the part change, the resolution being the alternate part is not compatible and rejected or the software changes, etc.
Reserved most of the time mean that they aren't used in this chip, but they might be used on feature devices (other product line). (Most chip manufacturers produce one peripheral driver and they use it for all there chips. This way it's mostly copy past work and there is less change for errors) Most of the time it doesn't matter if you write to reserved bits in peripheral registers, this because there isn't any logic attached to it.
It is possible that if you write something to it, it won't be stored and next time you attempt to read the register / bits it seams unchanged.

Why does ARM have 16 registers?

Why does ARM have only 16 registers? Is that the ideal number?
Does distance of registers with more registers also increase the processing time/power ?
As the number of the general-purpose registers becomes smaller, you need to start using the stack for variables. Using the stack requires more instructions, so code size increases. Using the stack also increases the number of memory accesses, which hurts both performance and power usage. The trade off is that to represent more registers you need more bits in your instruction, and you need more room on the chip for the register file, which increases power requirements. You can see how differing register counts affects code size and the frequency of load/store instructions by compiling the same set of code with different numbers of registers. The result of that type of exercise can be seen in table 1 of this paper:
Extendable Instruction Set Computing
Register Program Load/Store
Count Size Frequency
27 100.00 27.90%
16 101.62 30.22%
8 114.76 44.45%
(They used 27 as a base because that is the number of GPRs available on a MIPS processor)
As you can see, there are only marginal improvements in both programs size and the number of load/stores required as you drop the register count down to 16. The real penalties don't kick in until you drop down to 8 registers. I suspect ARM designers felt that 16 registers was a kind of sweet spot when you were looking for the best performance per watt.
To choose one of 16 registers you would need 4bit therefore it could be that this is the best match for opcodes (machine commands) otherwise you would have to introduce a more complex instructions set, which would lead to bigger coder which implies additional costs (execution time).
Wikipedia says It has "Fixed instruction width of 32 bits to ease decoding and pipelining"
so it is a reasonable tradeoff.
32-bit ARM has 16 registers because it only use 4 bits for encoding the register, not because 16 is the ideal number. Likewise x86 has only 8 registers because in history they used 3 bits to encode the register so that some instructions fit in a byte.
That's such a limited number so both x86 and ARM when going to 64-bit doubled the number to 16 and 32 registers respectively. The old ARM instruction encoding has no remaining bit left enough for the larger register number so they must do a trade-off by dropping the ability to execute almost every instruction conditionally and use the 4-bit condition for the new features (that's an oversimplification, in reality it's not exactly like that because the encoding is new, but you do need 3 more bits for the new registers).
Back in the 80's (IIRC) an academic paper was published that examined a number of different workloads, comparing expected performance benefits of different numbers of registers. This was at a time when RISC processors were transitioning from academic ideas to mainstream hardware, and it was important to decide what was optimal. CPUs were already pulling ahead of memory in speed, and RISC was making this worse by limiting addressing modes and having separate load and store instructions. Having more registers meant you could "cache" more data for immediate access and therefore access main memory less.
Considering only powers of two, it was found that 32 registers was optimal, although 16 wasn't terribly far behind.
ARM is unique in that each of the registers can have a conditional execution code avoiding tests & branches. Don't forget, many 32 register machines fix R0 to 0 so conditional tests are done by comparing to R0. I know from experience. 20 years ago I had to program a 'Mode 7' (from SNES terminology) floor. The CPUs were SH2 for the 32x (or rather 2 of them), MIPS3000 (Playstation) and 3DO (ARM), the inner loop of the code were 19,15 & 11. If the 3DO had been running at the same speed as the other 2, it would have been twice as fast. As it was, it was just a bit slower.

Resources