C bitfield element with non-contiguous layout - c

I'm looking for input on the most elegant interface to put around a memory-mapped register interface where the target object is split in the register:
union __attribute__ ((__packed__)) epsr_t {
uint32_t storage;
struct {
unsigned reserved0 : 10;
unsigned ICI_IT_2to7 : 6; // TOP HALF
unsigned reserved1 : 8;
unsigned T : 1;
unsigned ICI_IT_0to1 : 2; // BOTTOM HALF
unsigned reserved2 : 5;
} bits;
};
In this case, accessing the single bit T or any of the reserved fields work fine, but to read or write the ICI_IT requires code more like:
union epsr_t epsr;
// Reading:
uint8_t ici_it = (epsr.bits.ICI_IT_2to7 << 2) | epsr.bits.ICI_IT_0to1;
// Writing:
epsr.bits.ICI_IT_2to7 = ici_it >> 2;
epsr.bits.ICI_IT_0to1 = ici_it & 0x3;
At this point I've lost a chunk of the simplicity / convenience that the bitfield abstraction is trying to provide. I considered the macro solution:
#define GET_ICI_IT(_e) ((_e.bits.ICI_IT_2to7 << 2) | _e.bits.ICI_IT_0to1)
#define SET_ICI_IT(_e, _i) do {\
_e.bits.ICI_IT_2to7 = _i >> 2;\
_e.bits.ICI_IT_0to1 = _i & 0x3;\
while (0);
But I'm not a huge fan of macros like this as a general rule, I hate chasing them down when I'm reading someone else's code, and far be it from me to inflict such misery on others. I was hoping there was a creative trick involving structs / unions / what-have-you to hide the split nature of this object more elegantly (ideally as a simple member of an object).

I don't think there's ever a 'nice' way, and actually I wouldn't rely on bitfields... Sometimes it's better to just have a bunch of exhaustive macros to do everything you'd want to do, document them well, and then rely on them having encapsulated your problem...
#define ICI_IT_HI_SHIFT 14
#define ICI_IT_HI_MASK 0xfc
#define ICI_IT_LO_SHIFT 5
#define ICI_IT_LO_MASK 0x02
// Bits containing the ICI_IT value split in the 32-bit EPSR
#define ICI_IT_PACKED_MASK ((ICI_IT_HI_MASK << ICI_IT_HI_SHIFT) | \
(ICI_IT_LO_MASK << ICI_IT_LO_SHIFT))
// Packs a single 8-bit ICI_IT value x into a 32-bit EPSR e
#define PACK_ICI_IT(e,x) ((e & ~ICI_IT_PACKED_MASK) | \
((x & ICI_IT_HI_MASK) << ICI_IT_HI_SHIFT) | \
((x & ICI_IT_LO_MASK) << ICI_IT_LO_SHIFT)))
// Unpacks a split 8-bit ICI_IT value from a 32-bit EPSR e
#define UNPACK_ICI_IT(e) (((e >> ICI_IT_HI_SHIFT) & ICI_IT_HI_MASK) | \
((e >> ICI_IT_LO_SHIFT) & ICI_IT_LO_MASK)))
Note that I haven't put type casting and normal macro stuff in, for the sake of readability. Yes, I get the irony in mentioning readability...

If you dislike macros that much just use an inline function, but the macro solution you have is fine.

Does your compiler support anonymous unions?
I find it an elegant solution which gets rid of your .bits part. It is not C99 compliant, but most compilers do support it. And it became a standard in C11.
See also this question: Anonymous union within struct not in c99?.

Related

Optimized code for big to little endian conversion

In an interview, I was asked to implement big_to_little_endian() as a macro. I implemented using shift operator. But the interviewer want me to optimize this further. I could not do it. Later I googled & searched but could not find it. Can someone help in understanding how to further optimize this code?
#define be_to_le (((x) >> 24) | (((x) & 0x00FF0000) >> 8) | (((x) & 0x0000FF00) << 8) | ((x) << 24))
He might have been referring to using a 16-bit op to swap the top two words then using 8-bit ops to swap the bytes in them -- saves a couple instructions, easiest done in a union, though C technically doesn't like it (but many compilers will accept it), and it still compiler dependent since you are hoping the compiler optimizes a couple things out:
union dword {
unsigned int i;
union shorts {
unsigned short s0, s1;
union bytes {
unsigned char c0, c1, c2, c3;
} c;
} s;
};
union dword in = (union dword)x;
union dword temp = { x.s.s1, x.s.s0 };
union dword out = { temp.s.c.c1, temp.s.c.c0, temp.s.c.c3, temp.s.c.c2 };
Not even valid C, but you get the idea (and I don't think the compiler will even emit what I'm hoping it will).
Or you can save an op, but introduce a data dependency so probably runs slower.
temp = (x << 16) | ( x >> 16)
out = ((0xff00ff00 & temp) >> 8) | (0x00ff00ff & temp) << 8)
Best is just use the compiler intrinsic since it maps to a single bswap instruction.

How do I implement a bitset of k bits in C? [duplicate]

I have been using the Bitset class in Java and I would like to do something similar in C. I suppose I would have to do it manually as most stuff in C. What would be an efficient way to implement?
byte bitset[]
maybe
bool bitset[]
?
CCAN has a bitset implementation you can use: http://ccan.ozlabs.org/info/jbitset.html
But if you do end up implementing it yourself (for instance if you don't like the dependencies on that package), you should use an array of ints and use the native size of the computer architecture:
#define WORD_BITS (8 * sizeof(unsigned int))
unsigned int * bitarray = (int *)calloc(size / 8 + 1, sizeof(unsigned int));
static inline void setIndex(unsigned int * bitarray, size_t idx) {
bitarray[idx / WORD_BITS] |= (1 << (idx % WORD_BITS));
}
Don't use a specific size (e.g. with uint64 or uint32), let the computer use what it wants to use and adapt to that using sizeof.
Nobody mentioned what the C FAQ recommends, which is a bunch of good-old-macros:
#include <limits.h> /* for CHAR_BIT */
#define BITMASK(b) (1 << ((b) % CHAR_BIT))
#define BITSLOT(b) ((b) / CHAR_BIT)
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITCLEAR(a, b) ((a)[BITSLOT(b)] &= ~BITMASK(b))
#define BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))
#define BITNSLOTS(nb) ((nb + CHAR_BIT - 1) / CHAR_BIT)
(via http://c-faq.com/misc/bitsets.html)
Well, byte bitset[] seems a little misleading, no?
Use bit fields in a struct and then you can maintain a collection of these types (or use them otherwise as you see fit)
struct packed_struct {
unsigned int b1:1;
unsigned int b2:1;
unsigned int b3:1;
unsigned int b4:1;
/* etc. */
} packed;
I recommend my BITSCAN C++ library (version 1.0 has just been released). BITSCAN is specifically oriented for fast bitscan operations. I have used it to implement NP-Hard combinatorial problems involving simple undirected graphs, such as maximum clique (see BBMC algorithm, for a leading exact solver).
A comparison between BITSCAN and standard solutions STL bitset and BOOST dynamic_bitset is available here:
http://blog.biicode.com/bitscan-efficiency-at-glance/
You can give my PackedArray code a try with a bitsPerItem of 1.
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
As usual you need to first decide what sort of operations you need to perform on your bitset. Perhaps some subset of what Java defines? After that you can decide how best to implement it. You can certainly look at the source for BitSet.java in OpenJDK for ideas.
Make it an array of unsigned int 64.

Casting troubles when using bit-banding macros with a pre-cast address on Cortex-M3

TL;DR:
Why isn't (unsigned long)(0x400253FC) equivalent to (unsigned long)((*((volatile unsigned long *)0x400253FC)))?
How can I make a macro which works with the former work with the latter?
Background Information
Environment
I'm working with an ARM Cortex-M3 processor, the LM3S6965 by TI, with their StellarisWare (free download, export controlled) definitions. I'm using gcc version 4.6.1 (Sourcery CodeBench Lite 2011.09-69). Stellaris provides definitions for some 5,000 registers and memory addresses in "inc/lm3s6965.h", and I really don't want to redo all of those. However, they seem to be incompatible with a macro I want to write.
Bit Banding
On the ARM Cortex-M3, a portion of memory is aliased with one 32-bit word per bit of the peripheral and RAM memory space. Setting the memory at address 0x42000000 to 0x00000001 will set the first bit of the memory at address 0x40000000 to 1, but not affect the rest of the word. To change bit 2, change the word at 0x42000004 to 1. That's a neat feature, and extremely useful. According to the ARM Technical Reference Manual, the algorithm to compute the address is:
bit_word_offset = (byte_offset x 32) + (bit_number × 4)
bit_word_addr = bit_band_base + bit_word_offset
where:
bit_word_offset is the position of the target bit in the bit-band memory region.
bit_word_addr is the address of the word in the alias memory region that maps to the
targeted bit.
bit_band_base is the starting address of the alias region.
byte_offset is the number of the byte in the bit-band region that contains the targeted bit.
bit_number is the bit position, 0 to 7, of the targeted bit
Implementation of Bit Banding
The "inc/hw_types.h" file includes the following macro which implements this algorithm. To be clear, it implements it for a word-based model which accepts 4-byte-aligned words and 0-31-bit offsets, but the resulting address is equivalent:
#define HWREGBITB(x, b) \
HWREGB(((unsigned long)(x) & 0xF0000000) | 0x02000000 | \
(((unsigned long)(x) & 0x000FFFFF) << 5) | ((b) << 2))
This algorithm takes the base which is either in SRAM at 0x20000000 or the peripheral memory space at 0x40000000) and ORs it with 0x02000000, adding the bit band base offset. Then, it multiples the offset from the base by 32 (equivalent to a five-position left shift) and adds the bit number.
The referenced HWREG simply performs the requisite cast for writing to a given location in memory:
#define HWREG(x) \
(*((volatile unsigned long *)(x)))
This works quite nicely with assignments like
HWREGBITW(0x400253FC, 0) = 1;
where 0x400253FC is a magic number for a memory-mapped peripheral and I want to set bit 0 of this peripheral to 1. The above code computes (at compile-time, of course) the bit offset and sets that word to 1.
What doesn't work
Unfortunately, the aforememntioned definitions in "inc/lm3s6965.h" already perform the cast done by HWREG. I want to avoid magic numbers and instead use provided definitions like
#define GPIO_PORTF_DATA_R (*((volatile unsigned long *)0x400253FC))
An attempt to paste this into HWREGBITW causes the macro to no longer work, as the cast interferes:
HWREGBITW(GPIO_PORTF_DATA_R, 0) = 1;
The preprocessor generates the following mess (indentation added):
(*((volatile unsigned long *)
((((unsigned long)((*((volatile unsigned long *)0x400253FC)))) & 0xF0000000)
| 0x02000000 |
((((unsigned long)((*((volatile unsigned long *)0x400253FC)))) & 0x000FFFFF) << 5)
| ((0) << 2))
)) = 1;
Note the two instances of
(((unsigned long)((*((volatile unsigned long *)0x400253FC)))))
I believe that these extra casts are what is causing my process to fail. The following result of preprocessing HWREGBITW(0x400253FC, 0) = 1; does work, supporting my assertion:
(*((volatile unsigned long *)
((((unsigned long)(0x400253FC)) & 0xF0000000)
| 0x02000000 |
((((unsigned long)(0x400253FC)) & 0x000FFFFF) << 5)
| ((0) << 2))
)) = 1;
The (type) cast operator has right-to-left precedence, so the last cast should apply and an unsigned long used for the bitwise arithmetic (which should then work correctly). There's nothing implicit anywhere, no float to pointer conversions, no precision/range changes...the left-most cast should simply nullify the casts to the right.
My question (finally...)
Why isn't (unsigned long)(0x400253FC) equivalent to (unsigned long)((*((volatile unsigned long *)0x400253FC)))?
How can I make the existing HWREGBITW macro work? Or, how can a macro be written to do the same task but not fail when given an argument with a pre-existing cast?
1- Why isn't (unsigned long)(0x400253FC) equivalent to (unsigned long)((*((volatile unsigned long *)0x400253FC)))?
The former is an integer literal and its value is 0x400253FCul while the latter is the unsigned long value stored in the (memory or GPIO) address 0x400253FC
2- How can I make the existing HWREGBITW macro work? Or, how can a macro be written to do the same task but not fail when given an argument with a pre-existing cast?
Use HWREGBITW(&GPIO_PORTF_DATA_R, 0) = 1; instead.

Casting unsigned int to unsigned short int with bit operator

I would like to cast unsigned int (32bit) A to unsigned short int (16bit) B in a following way:
if A <= 2^16-1 then B=A
if A > 2^16-1 then B=2^16-1
In other words to cast A but if it is > of maximum allowed value for 16bit to set it as max value.
How can this be achieved with bit operations or other non branching method?
It will work for unsigned values:
b = -!!(a >> 16) | a;
or, something similar:
static inline unsigned short int fn(unsigned int a){
return (-(a >> 16) >> 16) | a;
};
Find minimum of two integers without branching:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
On some rare machines where branching
is very expensive and no condition
move instructions exist, the above
expression might be faster than the
obvious approach, r = (x < y) ? x : y,
even though it involves two more
instructions. (Typically, the obvious
approach is best, though.)
Just to kick things off, here's a brain-dead benchmark. I'm trying to get a 50/50 mix of large and small values "at random":
#include <iostream>
#include <stdint.h>
int main() {
uint32_t total = 0;
uint32_t n = 27465;
for (int i = 0; i < 1000*1000*500; ++i) {
n *= 30029; // worst PRNG in the world
uint32_t a = n & 0x1ffff;
#ifdef EMPTY
uint16_t b = a; // gives the wrong total, of course.
#endif
#ifdef NORMAL
uint16_t b = (a > 0xffff) ? 0xffff : a;
#endif
#ifdef RUSLIK
uint16_t b = (-(a >> 16) >> 16) | a;
#endif
#ifdef BITHACK
uint16_t b = a ^ ((0xffff ^ a) & -(0xffff < a));
#endif
total += b;
}
std::cout << total << "\n";
}
On my compiler (gcc 4.3.4 on cygwin with -O3), NORMAL wins, followed by RUSLIK, then BITHACK, respectively 0.3, 0.5 and 0.9 seconds slower than the empty loop. Really this benchmark means nothing, I haven't even checked the emitted code to see whether the compiler's smart enough to outwit me somewhere. But I like ruslik's anyway.
1) With an intrinsic on a CPU that natively does this sort of convertion.
2) You're probably not going to like this, but:
c = a >> 16; /* previously declared as a short */
/* Saturate 'c' with 1s if there are any 1s, by first propagating
1s rightward, then leftward. */
c |= c >> 8;
c |= c >> 4;
c |= c >> 2;
c |= c >> 1;
c |= c << 1;
c |= c << 2;
c |= c << 4;
c |= c << 8;
b = a | c; /* implicit truncation */
First off, the phrase "non-branching method" doesn't technically make sense when discussing C code; the optimizer may find ways to remove branches from "branchy" C code, and conversely would be entirely within its rights to replace your clever non-branching code with a branch just to spite you (or because some heuristic said it would be faster).
That aside, the simple expression:
uint16_t b = a > UINT16_MAX ? UINT16_MAX : a;
despite "having a branch", will be compiled to some sort of (branch-free) conditional move (or possible just a saturate) by many compilers on many systems (I just tried three different compilers for ARM and Intel, and all generated a conditional move).
I would use that simple, readable expression. If and only if your compiler isn't smart enough to optimize it (or your target architecture doesn't have conditional moves), and if you have benchmark data that shows this to be a bottleneck for your program, then I would (a) find a better compiler and (b) file a bug against your compiler and only then look for clever hacks.
If you're really, truly devoted to being too clever by half, then ruslik's second suggestion is actually quite beautiful (much nicer than a generic min/max).

C-macro: set a register field defined by a bit-mask to a given value

I've got 32-bit registers with field defined as bit-masks, e.g.
#define BM_TEST_FIELD 0x000F0000
I need a macro that allows me to set a field (defined by its bit-mask) of a register (defined by its address) to a given value. Here's what I came up with:
#include <stdio.h>
#include <assert.h>
typedef unsigned int u32;
/*
* Set a given field defined by a bit-mask MASK of a 32-bit register at address
* ADDR to a value VALUE.
*/
#define SET_REGISTER_FIELD(ADDR, MASK, VALUE) \
{ \
u32 mask=(MASK); u32 value=(VALUE); \
u32 mem_reg = *(volatile u32*)(ADDR); /* Get current register value */ \
assert((MASK) != 0); /* Null masks are not supported */ \
while(0 == (mask & 0x01)) /* Shift the value to the left until */ \
{ /* it aligns with the bit field */ \
mask = mask >> 1; value = value << 1; \
} \
mem_reg &= ~(MASK); /* Clear previous register field value */ \
mem_reg |= value; /* Update register field with new value */ \
*(volatile u32*)(ADDR) = mem_reg; /* Update actual register */ \
}
/* Test case */
#define BM_TEST_FIELD 0x000F0000
int main()
{
u32 reg = 0x12345678;
printf("Register before: 0x%.8X\n", reg);/* should be 0x12345678 */
SET_REGISTER_FIELD(&reg, BM_TEST_FIELD, 0xA);
printf("Register after: 0x%.8X\n", reg); /* should be 0x123A5678 */
return 0;
}
Is there a simpler way to do it?
EDIT: in particular, I'm looking for a way to do reduce the run-time computing requirements. Is there a way to have the pre-processor compute the number of required left-shifts for the value?
EDIT: in particular, I'm looking for a way to do reduce the run-time computing requirements. Is there a way
to have the pre-processor compute the number of required left-shifts for the value?
Yes:
value *= ((MASK) & ~((MASK) << 1))
This multiplies value by the lowest set bit in MASK. The multiplier is known to be a constant power of 2 at compile time, so this will be compiled as a simple left shift by any remotely sane compiler.
Why not just put both the mask and the value in the right place?
#define BM_TEST_FIELD (0xfUL << 16)
#define BM_TEST_VALUE (0xaUL << 16)
#define mmioMaskInsert(reg, mask, value) \
(*(volatile u32 *)(reg) = (*(volatile u32 *)(reg) & ~(mask)) | value)
Then you can just use it like:
mmioMaskInsert(reg, BM_TEST_FIELD, BM_TEST_VALUE);
For sure what you have there is very dangerous. Register writing can often have side-effects, and these operations:
mem_reg &= ~(MASK);
mem_reg |= value;
are actually writing to the register twice, instead of once, like you probably intend to. Also, why isn't a mask of 0 supported? What if I want to write to the whole register (timer count match or something)? Do you have a different macro for that operation? If so, why not use it as part of this system?
Another note - it might be a good idea to apply the mask to the value before sticking it in the register, in case someone passes a value that has more bits than the mask does. Something like:
#define maskInsert(r, m, v) \
(*(volatile u32 *)(r) = (*(volatile u32 *)r & ~(m)) | ((v) & ~(m)))
I would consider using bitfields to "format" bits for hardware, e.g.:
#include <stdio.h>
#include <inttypes.h>
struct myregister {
unsigned upper_bits:12;
unsigned myfield:4;
unsigned lower_bits:16;
};
typedef union {
struct myregister fields;
uint32_t value;
} myregister_t;
int main (void) {
myregister_t r;
r.value = 0x12345678;
(void) printf("Register before: 0x%.8" PRIX32 "\n", r.value);
r.fields.myfield = 0xA;
(void) printf("Register after: 0x%.8" PRIX32 "\n", r.value);
return 0;
}
Edit: Note the follow-up discussion in the comments. There are valid arguments against using bitfields, but, in my opinion, also benefits (especially in syntax, which I value greatly). One should decide based on the circumstances the code will be used in.
If you insist on this specific interface (the position of the field is defined by the mask), then probably the only thing that can be changed/improved in your implementation is the cycle where you shift the value to the proper position (to align it with the mask). Basically, what you have to do is to find the offset expressed in the number of bits, and the shift the value left that number of bits. You used a plain cycle to perform that operation, and instead of explicitly calculating the offset in bits you simply shift the value left 1 bit at each iteration. This will work. However, it might be seen as inefficient, especially for fields that reside in the upper portion of the register, since they will require more iterations of the shifting cycle.
In order to improve the efficiency you can also use any of the rather well-known, potentially more efficient methods to calculate the offset value, as the ones described on this page. I don't know whether this is worth the effort in your case though. It might make your code more efficient, but it might also make it less readable. Decide for yourself.

Resources