Assigning a 2 byte variable to a 3 byte register? - c

My Watch dog timer has a default value of 0x0fffff and i want to write a 2 byte variable (u2 compare) in it. What happens when i assign the value simply like this
wdt_register = compare;
What happens to most significant byte of register?
Register definition. It's 3 bytes register containing H, M, L 8bit registers. 4 most significat bits of H are not used and then it's actually a 20 bit register. Datasheet named all of them as WDTCR_20.
My question is what happens when i assign a value to the register using this line (just an example of 2 byte value written to 3 byte register) :
WDTCR_20 = 0x1234;

Your WDT is a so-called special function register. In hardware, it may end up being three bytes, or it could be four bytes, some of which are fixed/read-only/unused. Your compiler's implementation of the write is itself implementation-dependent if the SFR is declared in a particular way that makes the compiler emit SFR-specific write instructions.
This effectively makes the result of the assignment implementation-dependent; the high eight bits might end up being discarded, might set some other microarchitectural flags, or might cause a trap/crash if they aren't set to a specific (likely all-zeros value). It depends on the processor's datasheet (since you didn't mention a processor/toolchain, we don't know exactly).
For example, the AVR-based atmega328p datasheet shows an example of such a register:
In this case, the one-byte register is actually only three bits, effectively (bits 7..3 are fixed to zero on read and ignored on write, and could very well have no physical flip-flop or SRAM cell associated with them).

Related

How are 14-bit memory addresses accessed?

I know that each two letters in a hexdecimal address represents a byte, meaning that 0xFFFF is a 16bit address and can represent 65,536 bytes of memory, if the system is byte-addressable. However, if talking about a number of bits that is not a multiple of 8 (such as 14bit address), how can the operating system represnt these addresses?
0xFFFF -> 16 bit (e.g. Virtual memory address)
14 bit -> 0xFFF ? (Physical memory address)
One might say that the system has to be not byte addressable to access a not multiple of 8 address.. Then what will it be? I want the 16 bit of the virtual address to be byte addressable so we can easily access the data stored at that address, and I want to represent it in C code, but I have trouble representing the 14 bit physical memory addresses.
#define MAX_VIRT_ADDR ((0xffff) - 1) /* 65,536 */
#define MAX_PHYS_ADDR ((?) - 1) /* Max of 14bit physical memory space */
The addresses are still just numbers. On a system with a 14-bit address bus, they go from 00000000000000 (binary) up to 11111111111111 (binary).
We can also write those numbers in decimal: they go from 0 up to 16383.
Or we can write them in hexadecimal: they go from 0 up to 3FFF (or 0000 up to 3FFF).
Or in octal: they go from 0 up to 37777 (or 00000 up to 37777).
Or in any other system we like.
Typically a system based on 8-bit bytes will allow 2 bytes to be used to access a memory address. If the system has some kind of memory protection unit, and it's configured appropriately, then addresses above 3FFF may cause some kind of invalid address exception (i.e. a segfault). Otherwise, usually the extra bits are just ignored, so that no matter whether the program accesses "address" 0x0005, 0x4005, 0x8005, or 0xC005, the actual address sent to the address bus is 5 (binary: 00000000000101).
Your maximum "virtual" 16-bit address is 0xFFFF. Your maximum "physical" 14-bit address is 0x3FFF.
There's no rule that says address sizes have to be powers of 2, or that they have to be the same size as the words being addressed. It's massively more convenient to do things that way, but not required.
The old Motorola 68K had 32-bit words but a 24-bit address bus - address values were stored in 32-bit words with the upper 8 bits left unused.
As for mapping your 16-bit "virtual" address space onto a 14-bit "physical" address space, treat the upper two bits in the virtual address as a page number, treat the lower 14 bits as the offset into the page, map them directly to "physical" addresses. Store in a 16-bit type like uint16_t, then use macros to extract page number and address like so:
#define PAGENO(vaddr) ((0xC000 & vaddr) >> 14)
#define PHADDR(vaddr) (0x3FFF & vaddr)
It's simple: your 14bit address is
struct {
unsigned addr : 14;
};
You simply ignore 2 bits of the 16bit value.
Another way is to sign extend the 14bit value to 16bit. That's what 64bit systems like AMD64 or ARM64 do. They have 43-56 bits address space depending on the CPU and that is sign extended to 64bit. Addresses where the top bits aren't all 0 or all 1 are illegal.
Exotic systems where bytes have more than 8 bits still (as far as I know) use the same addressing on byte level. Only the size of a byte changes. Certain digital signal processors like this do exist in the real world, although those would not run on an OS but get coded as "bare metal". Traditionally, DSP programming is also most often done in assembler rather than C.
So for any system with 16 bit wide address bus, you'll have:
#define MAX_VIRT_ADDR 0xffffu /* 65,535 */
#define MAX_PHYS_ADDR 0xffffu /* 65,535 */
And the size of one byte is irrelevant. If you'd design the system in any other way, you'd probably lose the advantage of having a larger byte size.

Intrinsic to set value in array based on a BitMask

Is there an intrinsic that will set a single value at all the places in an input array where the corresponding position had a 1 bit in the provided BitMask?
10101010 is bitmask
value is 121
it will set positions 0,2,4,6 with value 121
With AVX512, yes. Masked stores are a first-class operation in AVX512.
Use the bitmask as an AVX512 mask for a vector store to an array, using _mm512_mask_storeu_epi8 (void* mem_addr, __mmask64 k, __m512i a) vmovdqu8. (AVX512BW. With AVX512F, you can only use 32 or 64-bit element size.)
#include <immintrin.h>
#include <stdint.h>
void set_value_in_selected_elements(char *array, uint64_t bitmask, uint8_t value) {
__m512i broadcastv = _mm512_set1_epi8(value);
// integer types are implicitly convertible to/from __mmask types
// the compiler emits the KMOV instruction for you.
_mm512_mask_storeu_epi8 (array, bitmask, broadcastv);
}
This compiles (with gcc7.3 -O3 -march=skylake-avx512) to:
vpbroadcastb zmm0, edx
kmovq k1, rsi
vmovdqu8 ZMMWORD PTR [rdi]{k1}, zmm0
vzeroupper
ret
If you want to write zeros in the elements where the bitmap was zero, either use a zero-masking move to create a constant from the mask and store that, or create a 0 / -1 vector using AVX512BW or DQ __m512i _mm512_movm_epi8(__mmask64 ). Other element sizes are available. But using a masked store makes it possible to safely use it when the array size isn't a multiple of the vector width, because the unmodified elements aren't read / rewritten or anything; they're truly untouched. (The CPU can take a slow microcode assist if any of the untouched elements would have faulted on a real store, though.)
Without AVX512, you still asked for "an intrinsic" (singular).
There's pdep, which you can use to expand a bitmap to a byte-map. See my AVX2 left-packing answer for an example of using _pdep_u64(mask, 0x0101010101010101); to unpack each bit in mask to a byte. This gives you 8 bytes in a uint64_t. In C, if you use a union between that and an array, then it gives you an array of 0 / 1 elements. (But of course indexing the array will require the compiler to emit shift instructions, if it hasn't spilled it somewhere first. You probably just want to memcpy the uint64_t into a permanent array.)
But in the more general case (larger bitmaps), or even with 8 elements when you want to blend in new values based on the bitmask, you should use multiple intrinsics to implement the inverse of pmovmskb, and use that to blend. (See the without pdep section below)
In general, if your array fits in 64 bits (e.g. an 8-element char array), you can use pdep. Or if it's an array of 4-bit nibbles, then you can do a 16-bit mask instead of 8.
Otherwise there's no single instruction, and thus no intrinsic. For larger bitmaps, you can process it in 8-bit chunks and store 8-byte chunks into the array.
If your array elements are wider than 8 bits (and you don't have AVX512), you should probably still expand bits to bytes with pdep, but then use [v]pmovzx to expand from bytes to dwords or whatever in a vector. e.g.
// only the low 8 bits of the input matter
__m256i bits_to_dwords(unsigned bitmap) {
uint64_t mask_bytes = _pdep_u64(bitmap, 0x0101010101010101); // expand bits to bytes
__m128i byte_vec = _mm_cvtsi64x_si128(mask_bytes);
return _mm256_cvtepu8_epi32(byte_vec);
}
If you want to leave elements unmodified instead of setting them to zero where the bitmask had zeros, OR with the previous contents instead of assigning / storing.
This is rather inconvenient to express in C / C++ (compared to asm). To copy 8 bytes from a uint64_t into a char array, you can (and should) just use memcpy (to avoid any undefined behaviour because of pointer aliasing or misaligned uint64_t*). This will compile to a single 8-byte store with modern compilers.
But to OR them in, you'd either have to write a loop over the bytes of the uint64_t, or cast your char array to uint64_t*. This usually works fine, because char* can alias anything so reading the char array later doesn't have any strict-aliasing UB. But a misaligned uint64_t* can cause problems even on x86, if the compiler assumes that it is aligned when auto-vectorizing. Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?
Assigning a value other than 0 / 1
Use a multiply by 0xFF to turn the mask of 0/1 bytes into a 0 / -1 mask, and then AND that with a uint64_t that has your value broadcasted to all byte positions.
If you want to leave element unmodified instead of setting them to zero or value=121, you should probably use SSE2 / SSE4 or AVX2 even if your array has byte elements. Load the old contents, vpblendvb with set1(121), using the byte-mask as a control vector.
vpblendvb only uses the high bit of each byte, so your pdep constant can be 0x8080808080808080 to scatter the input bits to the high bit of each byte, instead of the low bit. (So you don't need to multiply by 0xFF to get an AND mask).
If your elements are dword or larger, you could use _mm256_maskstore_epi32. (Use pmovsx instead of zx to copy the sign bit when expanding the mask from bytes to dwords). This can be a perf win over a variable-blend + always read / re-write. Is it possible to use SIMD instruction for replace?.
Without pdep
pdep is very slow on Ryzen, and even on Intel it's maybe not the best choice.
The alternative is to turn your bitmask into a vector mask:
is there an inverse instruction to the movemask instruction in intel avx2? and
How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?.
i.e. broadcast your bitmap to every position of a vector (or shuffle it so the right bit of the bitmap in in the corresponding byte), and use a SIMD AND to mask off the appropriate bit for that byte. Then use pcmpeqb/w/d against the AND-mask to find the elements that had their bit set.
You're probably going to want to load / blend / store if you don't want to store zeros where the bitmap was zero.
Use the compare-mask to blend on your value, e.g. with _mm_blendv_epi8 or the 256bit AVX2 version. You can handle bitmaps in 16-bit chunks, producing 16-byte vectors with just a pshufb to send bytes of it to the right elements.
It's not safe for multiple threads to do this at the same time on the same array even if their bitmaps don't intersect, unless you use masked stores, though.

How to increment number with different endianess?

I am doing some micro-controller programming where I have to load the firmware of a DSP chip at run time. The DSP chip requires that the register addresses be written in a different endianess so the addres 1024 becomes 0x04, 0x00. I have the address in a 2 element uint8_t array with the most significant byte being the 0 position and least significant byte being the 1 position. However, I need to run through a loop where i increment each register address by one every iteration. The micro controller is a different endianess so I can't simply cast the array to uint16_t* and increment.
How would i go about incrementing the address?
I would use a normal int counter, and then convert to the correct endianness before sending it to the DSP. You can use macros in the byteorder or endian family. This will be easier to debug and more portable.
What is it you are looking for from us?
1) swap before sending
2) increment the lower byte, add the carry to the upper byte (asm makes this easy)
3) endian swap and increment (x=(upper<<8)|lower; x++)

Is there a naming convention for n*3 registers which are semantically connected?

I know the naming convention which says if there are n*2 registers or variables which are semantically connected you should name them like following:
REGH REGL
In the case of 2*2 registers it would be:
REGHH REGHL REGLH REGLL
The last two letters stand for high-high, high-low, low-high and low-low. Is there any convention which declares the same thing for 3 registers? Like:
REGH REGM REGL
In this case the last letters stand for high, middle and low. 6 byte would look like this:
REGHH REGHM REGHL REGLH REGLM REGLL
I hope you understand what I mean. Is there any convention for this case?
The Atmel AVR Microcontroller, 1st ed. [P. 173; 6.10.1]
For a register larger than 16 bits, the bytes are numbered from the least significant byte. For example, the 32-bit ADC calibration register is named CAL. The four bytes are named CAL0, CAL1, CAL2, CAL3 (from the least to the most significant byte).
So in a 8-bit system we shouldn't even do:
REGHH REGHL REGLH REGLL
but:
REG3 REG2 REG1 REG0

Alternative of FP_SEG and FP_OFF for converting pointer to linear address

On 16 bit dos machine there is options like FP_SEG and FP_OFF for converting a pointer to linear address but since these method no more exist on 32 bit compiler what are other function that can do same on 32 bit machine??
They're luckily not needed as 32-bit mode is unsegmented and hence the addresses are always linear (a simplification, but let's keep it simple).
EDIT: The first version was confusing let's try again.
In 16-bit segmented mode (I'm exclusively referring to legacy DOS programs here, it'll probably be similar for other 16-bit x86 OSes) addresses are given in a 32-bit format consisting of a16-bit segment and a 16-bit offset. These are combined to form a 20-bit linear address (This is where the infamous 640K barrier comes from, 2**20 = 1MB and 384K are reserved for the system and bios leaving ~640K for user programs) by multiply the segment by 16 = 0x10 (equivalent to shifting left by 4) and adding the offset. I.e.: linear = segment*0x10 + offset.
This means that 2**12 segment:address type pointers will refer to the same linear address, so in general there is no way to obtain the 32-bit value used to form the linear address.
In old DOS programs that used far - segmented - pointers (as opposed to near pointers, which only contained an offset and implicitly used ds segment register) they were usually treated as 32-bit unsigned integer values where the 16 most significant bits were the segment and the 16 least significant bits the offset. This gives the following macro definitions for FP_SEG and FP_OFF (using the types from stdint.h):
#define FP_SEG(x) (uint16_t)((uint32_t)(x) >> 16) /* grab 16 most significant bits */
#define FP_OFF(x) (uint16_t)((uint32_t)(x)) /* grab 16 least significant bits */
To convert a 20-bit linear address to a segmented address you have many options (2**12). One way could be:
#define LIN_SEG(x) (uint16_t)(((uint32_t)(x)&0xf0000)>>4)
#define LIN_OFF(x) (uint16_t)((uint32_t)(x))
Finally a quick example of how it all works together:
Segmented address: a = 0xA000:0x0123
As 32-bit far pointer b = 0xA0000123
20-bit linear address: c = 0xA0123
FP_SEG(b) == 0xA000
FP_OFF(b) == 0x0123
LIN_SEG(c) = 0xA000
LIN_OFF(c) = 0x0123

Resources