Related
I long knew there are bit-fields in C and occasionally I use them for defining densely packed structs:
typedef struct Message_s {
unsigned int flag : 1;
unsigned int channel : 4;
unsigned int signal : 11;
} Message;
When I read open source code, I instead often find bit-masks and bit-shifting operations to store and retrieve such information in hand-rolled bit-fields. This is so common that I do not think the authors were not aware of the bit-field syntax, so I wonder if there are reasons to roll bit-fields via bit-masks and shifting operations your own instead of relying on the compiler to generate code for getting and setting such bit-fields.
Why other programmers use hand-coded bit manipulations instead of bitfields to pack multiple fields into a single word?
This answer is opinion based as the question is quite open:
Many programmers are unaware of the availability of bitfields or unsure about their portability and precise semantics. Some even distrust the compiler's ability to produce correct code. They prefer to write explicit code that they understand.
As commented by Cornstalks, this attitude is rooted in real life experience as explained in this article.
Bitfield's actual memory layout is implementation defined: if the memory layout must follow a precise specification, bitfields should not be used and hand-coded bit manipulations may be required.
The handing of signed values in signed typed bitfields is implementation defined. If signed values are packed into a range of bits, it may be more reliable to hand-code the access functions.
Are there reasons to avoid bitfield-structs?
bitfield-structs come with some limitations:
Bit fields result in non-portable code. Also, the bit field length has a high dependency on word size.
Reading (using scanf()) and using pointers on bit fields is not possible due to non-addressability.
Bit fields are used to pack more variables into a smaller data space, but cause the compiler to generate additional code to manipulate these variables. This results in an increase in both space as well as time complexities.
The sizeof() operator cannot be applied to the bit fields, since sizeof() yields the result in bytes and not in bits.
Source
So whether you should use them or not depends. Read more in Why bit endianness is an issue in bitfields?
PS: When to use bit-fields in C?
There is no reason for it. Bitfields are useful and convenient. They are in the common use in the embedded projects. Some architectures (like ARM) have even special instructions to manipulate bitfields.
Just compare the code (and write the rest of the function foo1)
https://godbolt.org/g/72b3vY
In many cases, it is useful to be able to address individual groups of bits within a word, or to operate on a word as a unit. The Standard presently does not provide
any practical and portable way to achieve such functionality. If code is written to use bitfields and it later becomes necessary to access multiple groups as a word, there would be no nice way to accommodate that without reworking all the code using the bit fields or disabling type-based aliasing optimizations, using type punning, and hoping everything gets laid out as expected.
Using shifts and masks may be inelegant, but until C provides a means of treating an explicitly-designated sequence of bits within one lvalue as another lvalue, it is often the best way to ensure that code will be adaptable to meet needs.
I'm reading paragraph 7 of 6.5 in ISO/IEC 9899:TC2.
It condones lvalue access to an object through:
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union),
Please refer to the document for what 'aforementioned' types are but they certainly include the effective type of the object.
It is in a section noted as:
The intent of this list is to specify those circumstances in which an
object may or may not be aliased.
I read this as saying (for example) that the following is well defined:
#include <stdlib.h>
#include <stdio.h>
typedef struct {
unsigned int x;
} s;
int main(void){
unsigned int array[3] = {73,74,75};
s* sp=(s*)&array;
sp->x=80;
printf("%d\n",array[0]);
return EXIT_SUCCESS;
}
This program should output 80.
I'm not advocating this as a good (or very useful) idea and concede I'm in part interpreting it that way because I can't think what else that means and can't believe it's a meaningless sentence!
That said, I can't see a very good reason to forbid it. What we do know is that the alignment and memory contents at that location are compatible with sp->x so why not?
It seems to go so far as to say if I add (say) a double y; to the end of the struct I can still access array[0] through sp->x in this way.
However even if the array is larger than sizeof(s) any attempt to access sp->y is 'all bets off' undefined behaviour.
Might I politely ask for people to say what that sentence condones rather than go into a flat spin shouting 'strict aliasing UB strict aliasing UB' as seems to be all too often the way of these things.
The answer to this question is covered in proposal: Fixing the rules for type-based aliasing which we will see, unfortunately was not resolved in 2010 when the proposal was made which is covered in Hedquist, Bativa, November 2010 minutes . Therefore C11 does not contain a resolution to N1520, so this is an open issue:
There does not seem to be any way that this matter will be
resolved at this meeting. Each thread of proposed approaches
leads to more questions. 1 1:48 am, Thurs, Nov 4, 2010.
ACTION – Clark do more work in
N1520 opens saying (emphasis mine going forward):
Richard Hansen pointed out a problem in the type-based aliasing
rules, as follows:
My question concerns the phrasing of bullet 5 of 6.5p7 (aliasing as it applies to unions/aggregates). Unless my understanding of
effective type is incorrect, it seems like the union/aggregate
condition should apply to the effective type, not the lvalue type.
Here are some more details:
Take the following code snippet as an example:
union {int a; double b;} u;
u.a = 5;
From my understanding of the definition of effective type (6.5p6), the effective type of the object at location &u is union {int a;
double b;}. The type of the lvalue expression that is accessing the
object at &u (in the second line) is int.
From my understanding of the definition of compatible type (6.2.7), int is not compatible with union {int a; double b;}, so
bullets 1 and 2 of 6.5p7 do not apply. int is not the signed or
unsigned type of the union type, so bullets 3 and 4 do not apply. int
is not a character type, so bullet 6 does not apply.
That leaves bullet 5. However, int is not an aggregate or union
type, so that bullet also does not apply. That means that the above
code violates the aliasing rule, which it obviously should not.
I believe that bullet 5 should be rephrased to indicate that if the
effective type (not the lvalue type) is an aggregate or union type
that contains a member with type compatible with the lvalue type, then
the object may be accessed.
Effectively, what he points out is that the rules are asymmetrical
with respect to struct/union membership. I have been aware of this
situation, and considered it a (non-urgent) problem, for quite some
time. A series of examples will better illustrate the problem. (These
examples were originally presented at the Santa Cruz meeting.)
In my experience with questions about whether aliasing is valid based
on type constraints, the question is invariably phrased in terms of
loop invariance. Such examples bring the problem into extremely sharp
focus.
And the relevant example that applies to this situation would be 3 which is as follows:
struct S { int a, b; };
void f3(int *pi, struct S *ps1, struct S const *ps2)
{
for (*pi = 0; *pi < 10; ++*pi) {
*ps1++ = *ps2;
}
}
The question here is whether the object *ps2 may be accessed (and
especially modified) by assigning to the lvalue *pi — and if so,
whether the standard actually says so. It could be argued that this is
not covered by the fifth bullet of 6.5p7, since *pi does not have
aggregate type at all.
**Perhaps the intention is that the question should be turned around: is
it allowed to access the value of the object pi by the lvalue ps2.
Obviously, this case would be covered by the fifth bullet.
All I can say about this interpretation is that it never occurred to
me as a possibility until the Santa Cruz meeting, even though I've
thought about these rules in considerable depth over the course of
many years. Even if this case might be considered to be covered by the
existing wording, I'd suggest that it might be worth looking for a
less opaque formulation.
The following discussion and proposed solutions are very long and hard to summarize but seems to end with a removal of the aforementioned bullet five and resolve the issue with adjustments to other parts of 6.5. But as noted above this issues involved were not resolvable and I don't see a follow-up proposal.
So it would seem the standard wording permits the scenario the OP demonstrates although my understanding is that this was unintentional and therefore I would avoid it and it could potentially change in later standards to be non-conforming.
I think this text does not apply:
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
sp->x has type unsigned int which is not an aggregate or union type.
In your code there is no strict aliasing violation: it is OK to read unsigned int as unsigned int.
The struct might have different alignment requirements to the array but other than that there is no problem.
Accessing via "an aggregate or union type" would be:
s t = *sp;
I confess that the idea that I can lay a struct over a locally defined array in this way is frankly exotic.
I still maintain that C99 and all subsequent standards permit it.
If fact it's very arguable that members being objects in themselves the first bullet point in 6.7.5 allows it:
a type compatible with the effective type of the object
I think that's M.M's point.
Looking at the problem the other way, let's notice that it's absolutely legitimate (in a strictly conforming environment) to alias the member sp->x as an object in it's own right.
In the context of the code in my OP consider a function with prototype void doit(int* ip,s* sp); the following call is expected to behave logically:
doit(&(sp->x),sp);
NB: Program logic may (of course) may not behave as desired. For example if doit increments sp->x until it exceeds *ip then there's a problem! However what is not allowed in a conformant compiler is for the outcome to be corrupted by artifacts due to the optimizer ignoring aliasing potential.
I maintain that C would be all the weaker if the language required me to code:
int temp=sp->x;
doit(&temp,sp);
sp->x=temp;
Imagine all the cases where any call to any function has to be policed for the potential aliasing access to any part of the structures being passed. Such a language would probably be unusable.
Obviously a hard optimizing (i.e. non-compliant) compiler might make a complete hash of doit() if it doesn't recognize that ip might be an alias of member in the middle of sp.
That's irrelevant to this discussion.
To set out when a compiler can (and cannot) make such assumptions is understood as the reason why the standard needs to set very precise parameters around aliasing. That is to give the optimizer some conditions to dis-count. In a low level language such as 'C' it could be reasonable (even desirable) to say that a suitably aligned pointer to an accessible valid bit pattern can be used to access to a value.
It is absolutely established that sp->x in my OP is pointing to a properly aligned location holding a valid unsigned int.
The intelligent concerns are whether the compiler/optimizer agree that's then a legitimate way to access that location or ignorable as undefined behavior.
As the doit() example shows it's absolutely established that a structure can be broken down and treated as individual objects which merely happen to have a special relationship.
This question appears to be about the circumstances when a set of members that happen to have that special relationship can have a structure 'laid over them'.
I think most people will agree that the program at the bottom of this answer performs valid, worthwhile functionality that if associated with some I/O library could 'abstract' a great deal of the work required to read and write structures.
You might think there's a better way of doing it, but I'm not expecting many people to think it's not an unreasonable approach.
It operates by exactly that means - it builds a structure member by member then accesses it through that structure.
I suspect some of the people who object to the code in the OP are more relaxed about this.
Firstly, it operates on memory allocated from the free-store as 'un-typed' universally aligned storage.
Secondly, it builds a whole structure. In the OP I'm pointing the rules (at least appear to permit) that you can line up bits of a structure and so long as you only de-reference those bits everything is OK.
I somewhat share that attitude. I think the OP is slightly perverse and language stretching in a poorly written corner of the standard. Not something to put your shirt on.
However, I absolutely think it would be a mistake to forbid the techniques below as they rule out a logically very valid technique that recognizes structures can be built up from objects just as much as broken down into them.
However I will say that something like this is the only thing I could come up with where this sort of approach seems worthwhile. But on the other hand if you can't pull data apart AND/OR put it together then you quickly start to break the notion at C structures are POD - the possibly padded sum of their parts, nothing more, nothing less.
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
typedef enum {
is_int, is_double //NB:TODO: support more types but this is a toy.
} type_of;
//This function allocates and 'builds' an array based on a provided set of types, offsets and sizes.
//It's a stand-in for some function that (say) reads structures from a file and builds them according to a provided
//recipe.
int buildarray(void**array,const type_of* types,const size_t* offsets,size_t mems,size_t sz,size_t count){
const size_t asize=count*sz;
char*const data=malloc(asize==0?1:asize);
if(data==NULL){
return 1;//Allocation failure.
}
int input=1;//Dummy...
const char*end=data+asize;//One past end. Make const for safety!
for(char*curr=data;curr<end;curr+=sz){
for(size_t i=0;i<mems;++i){
char*mem=curr+offsets[i];
switch(types[i]){
case is_int:
*((int*)mem)=input++;//Dummy...Populate from file...
break;
case is_double:
*((double*)mem)=((double)input)+((double)input)/10.0;//Dummy...Populate from file...
++input;
break;
default:
free(data);//Better than returning an incomplete array. Should not leak even on error conditions.
return 2;//Invalid type!
}
}
}
if(array!=NULL){
*array=data;
}else{
free(data);//Just for fun apparently...
}
return 0;
}
typedef struct {
int a;
int b;
double c;
} S;
int main(void) {
const type_of types[]={is_int,is_int,is_double};
const size_t offsets[]={offsetof(S,a),offsetof(S,b),offsetof(S,c)};
S* array=NULL;
const size_t size=4;
int err=buildarray((void **)&array,types,offsets,3,sizeof(S),size);
if(err!=0){
return EXIT_FAILURE;
}
for(size_t i=0;i<size;++i){
printf("%zu: %d %d %f\n",i,array[i].a,array[i].b,array[i].c);
}
free(array);
return EXIT_SUCCESS;
}
I think it's an interesting tension.
C is intended to be that low level high level language and give the programmer almost direct access to machine operations and memory.
That means the programmer can fulfill with the arbitrary demands of hardware devices and write highly efficient code.
However if the programmer is given absolute control such as my point about an 'if it fits it's OK' approach to aliasing then the optimizer gets its game spoilt.
So weirdly it's worth holding a little bit of performance back to return a dividend from the optimizer.
Section 6.5 of the C99 standard tries (and doesn't entirely succeed) to set that boundary out.
I want to declare a bitfield with the size specified using the a colon (I can't remember what the syntax is called). I want to write this:
void myFunction()
{
unsigned int thing : 12;
...
}
But GCC says it's a syntax error (it thinks I'm trying to write a nested function). I have no problem doing this though:
struct thingStruct
{
unsigned int thing : 4;
};
and then putting one such struct on the stack
void myFunction()
{
struct thingStruct thing;
...
}
This leads me to believe that it's being prevented by syntax, not semantic issues.
So why won't the first example work? What am I missing?
The first example won't work because you can only declare bitfields inside structs. This is syntax, not semantics, as you said, but there it is. If you want a bitfield, use a struct.
Why would you want to do such a thing? A bit field of 12 would on all common architectures be padded to at least 16 or 32 bits.
If you want to ensure the width of an integer variable use the types in inttypes.h, e.g int16_t or int32_t.
As others have said, bitfields must be declared inside a struct (or union, but that's not really useful). Why? Here are two reasons.
Mainly, it's to make the compiler writer's job easier. Bitfields tend to require more machine instructions to extract the bits from the bytes. Only fields can be bitfields, and not variables or other objects, so the compiler writer doesn't have to worry about them if there is no . or -> operator involved.
But, you say, sometimes the language designers make the compiler writer's job harder in order to make the programmer's life easier. Well, there is not a lot of demand from programmers for bitfields outside structs. The reason is that programmers pretty much only bother with bitfields when they're going to cram several small integers inside a single data structure. Otherwise, they'd use a plain integral type.
Other languages have integer range types, e.g., you can specify that a variable ranges from 17 to 42. There isn't much call for this in C because C never requires that an implementation check for overflow. So C programmers just choose a type that's capable of representing the desired range; it's their job to check bounds anyway.
C89 (i.e., the version of the C language that you can find just about everywhere) offers a limited selection of types that have at least n bits. There's unsigned char for 8 bits, unsigned short for 16 bits and unsigned long for 32 bits (plus signed variants). C99 offers a wider selection of types called uint_least8_t, uint_least16_t, uint_least32_t and uint_least64_t. These types are guaranteed to be the smallest types with at least that many value bits. An implementation can provide types for other number of bits, such as uint_least12_t, but most don't. These types are defined in <stdint.h>, which is available on many C89 implementations even though it's not required by the standard.
Bitfields provide a consistent syntax to access certain implementation-dependent functionality. The most common purpose of that functionality is to place certain data items into bits in a certain way, relative to each other. If two items (bit-fields or not) are declared as consecutive items in a struct, they are guaranteed to be stored consecutively. No such guarantee exists with individual variables, regardless of storage class or scope. If a struct contains:
struct foo {
unsigned bar: 1;
unsigned boz: 1;
};
it is guaranteed that bar and boz will be stored consecutively (most likely in the same storage location, though I don't think that's actually guaranteed). By contrast, 'bar' and 'boz' were single-bit automatic variables, there's no telling where they would be stored, so there'd be little benefit to having them as bitfields. If they did share space with some other variable, it would be hard to make sure that different functions reading and writing different bits in the same byte didn't interfere with each other.
Note that some embedded-systems compilers do expose a genuine 'bit' type, which are packed eight to a byte. Such compilers generally have an area of memory which is allocated for storing nothing but bit variables, and the processors for which they generate code have atomic instructions to test, set, and clear individual bits. Since the memory locations holding the bits are only accessed using such instructions, there's no danger of conflicts.
I have just discovered the joy of bitflags. I have several questions related to "best-practices" regarding the use of bitflags in C. I learned everything from various examples I found on the web but still have questions.
In order to save space, I am using a single 32bit integer field in a struct (A->flag) to represent several different sets of boolean properties. In all, 20 different bits are #defined. Some of these are truly presence/absence flags (STORAGE-INTERNAL vs. STORAGE-EXTERNAL). Others have more than two values (e.g. mutually exclusive set of formats: FORMAT-A, FORMAT-B, FORMAT-C). I have defined macros for setting specific bits (and simultaneously turning off mutually exclusive bits). I have also defined macros for testing if specific combination of bits are set in the flag.
However, what is lost in the above approach is the specific grouping of flags that is best captured by enums. For writing functions, I would like to use enums (e.g., STORAGE-TYPE and FORMAT-TYPE), so that function definitions look nice. I expect to use enums only for passing parameters and #defined macros for setting and testing flags.
(a) How do I define flag (A->flag) as a 32 bit integer in a portable fashion (across 32 bit / 64 bit platforms)?
(b) Should I worry about potential size differences in how A->flag vs. #defined constants vs. enums are stored?
(c) Am I making things unnecessarily complicated, meaning should I just stick to using #defined constants for passing parameters as ordinary ints? What else should I worry about in all this?
I apologize for the poorly articulated question. It reflects my ignorance about potential issues.
There is a C99 header that was intended to solve that exact problem (a) but for some reason Microsoft doesn't implement it. Fortunately, you can get <stdint.h> for Microsoft Windows here. Every other platform will already have it. The 32-bit int types are uint32_t and int32_t. These also come in 8, 16, and 64- bit flavors.
So, that takes care of (a).
(b) and (c) are kind of the same question. We do make assumptions whenever we develop something. You assume that C will be available. You assume that <stdint.h> can be found somewhere. You could always assume that int was at least 16 bits and now a >= 32 bit assumption is fairly reasonable.
In general, you should try to write conforming programs that don't depend on layout, but they will make assumptions about word length. You should worry about performance at the algorithm level, that is, am I writing something that is quadratic, polynomial, exponential?
You should not worry about performance at the operation level until (a) you notice a performance lag, and (b) you have profiled your program. You need to get your job done without bogging down worrying about individual operations. :-)
Oh, I should add that you particularly don't need to worry about low level operation performance when you are writing the program in C in the first place. C is the close-to-the-metal go-as-fast-as-possible language. We routinely write stuff in php, python, ruby, or lisp because we want a powerful language and the CPU's are so fast these days that we can get away with an entire interpreter, never mind a not-perfect choice of bit-twiddle-word-length ops. :-)
You can use bit-fields and let the compiler do the bit twiddling. For example:
struct PropertySet {
unsigned internal_storage : 1;
unsigned format : 4;
};
int main(void) {
struct PropertySet x;
struct PropertySet y[10]; /* array of structures containing bit-fields */
if (x.internal_storage) x.format |= 2;
if (y[2].internal_storage) y[2].format |= 2;
return 0;
}
Edited to add array of structures
As others have said, your problem (a) is resolvable by using <stdint.h> and either uint32_t or uint_least32_t (if you want to worry about Burroughs mainframes which have 36-bit words). Note that MSVC does not support C99, but #DigitalRoss shows where you can obtain a suitable header to use with MSVC.
Your problem (b) is not an issue; C will type convert safely for you if it is necessary, but it probably isn't even necessary.
The area of most concern is (c) and in particular the format sub-field. There, 3 values are valid. You can handle this by allocating 3 bits and requiring that the 3-bit field is one of the values 1, 2, or 4 (any other value is invalid because of too many or too few bits set). Or you could allocate a 2-bit number, and specify that either 0 or 3 (or, if you really want to, one of 1 or 2) is invalid. The first approach uses one more bit (not currently a problem since you're only using 20 of 32 bits) but is a pure bitflag approach.
When writing function calls, there is no particular problem writing:
some_function(FORMAT_A | STORAGE_INTERNAL, ...);
This will work whether FORMAT_A is a #define or an enum (as long as you specify the enum value correctly). The called code should check whether the caller had a lapse in concentration and wrote:
some_function(FORMAT_A | FORMAT_B, ...);
But that is an internal check for the module to worry about, not a check for users of the module to worry about.
If people are going to be switching bits in the flags member around a lot, the macros for setting and unsetting the format field might be beneficial. Some might argue that any pure boolean fields barely need it, though (and I'd sympathize). It might be best to treat the flags member as opaque and provide 'functions' (or macros) to get or set all the fields. The less people can get wrong, the less will go wrong.
Consider whether using bit-fields works for you. My experience is that they lead to big code and not necessarily very efficient code; YMMV.
Hmmm...nothing very definitive here, so far.
I would use enums for everything because those are guaranteed to be visible in a debugger where #define values are not.
I would probably not provide macros to get or set bits, but I'm a cruel person at times.
I would provide guidance on how to set the format part of the flags field, and might provide a macro to do that.
Like this, perhaps:
enum { ..., FORMAT_A = 0x0010, FORMAT_B = 0x0020, FORMAT_C = 0x0040, ... };
enum { FORMAT_MASK = FORMAT_A | FORMAT_B | FORMAT_C };
#define SET_FORMAT(flag, newval) (((flag) & ~FORMAT_MASK) | (newval))
#define GET_FORMAT(flag) ((flag) & FORMAT_MASK)
SET_FORMAT is safe if used accurately but horrid if abused. One advantage of the macros is that you could replace them with a function that validated things thoroughly if necessary; this works well if people use the macros consistently.
For question a, if you are using C99 (you probably are using it), you can use the uint32_t predefined type (or, if it is not predefined, it can be found in the stdint.h header file).
Regarding (c): if your enumerations are defined correctly you should be able to pass them as arguments without a problem. A few things to consider:
enumeration storage is often
compiler specific, so depending on
what kind of development you are
doing (you don't mention if it's
Windows vs. Linux vs. embedded vs.
embedded Linux :) ) you may want to
visit compiler options for enum
storage to make sure there are no
issues there. I generally agree with
the above consensus that the
compiler should cast your
enumerations appropriately - but
it's something to be aware of.
in the case that you are doing
embedded work, many static quality
checking programs such as PC Lint
will "bark" if you start getting too
clever with enums, #defines, and
bitfields. If you are doing
development that will need to pass
through any quality gates, this
might be something to keep in mind.
In fact, some automotive standards
(such as MISRA-C) get downright
irritable if you try to get trig
with bitfields.
"I have just discovered the joy of
bitflags." I agree with you - I find
them very useful.
I added comments to each answer above. I think I have some clarity. It seems enums are cleaner as it shows up in debugger and keeps fields separate. macros can be used for setting and getting values.
I have also read that enums are stored as small integers - which as I understand it, is not a problem with the boolean tests as these would be peroformed starting at the right most bits. But, can enums be used to store large integers (1 << 21)??
thanks again to you all. I have already learned more than I did two days ago!!
~Russ
The classic problem of testing and setting individual bits in an integer in C is perhaps one the most common intermediate-level programming skills. You set and test with simple bitmasks such as
unsigned int mask = 1<<11;
if (value & mask) {....} // Test for the bit
value |= mask; // set the bit
value &= ~mask; // clear the bit
An interesting blog post argues that this is error prone, difficult to maintain, and poor practice. The C language itself provides bit level access which is typesafe and portable:
typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
struct {
boolean_t user:1;
boolean_t zero:1;
boolean_t force:1;
int :28; /* unused */
boolean_t compat:1; /* bit 31 */
};
int raw;
} flags_t;
int
create_object(flags_t flags)
{
boolean_t is_compat = flags.compat;
if (is_compat)
flags.force = FALSE;
if (flags.force) {
[...]
}
[...]
}
But this makes me cringe.
The interesting argument my coworker and I had about this is still unresolved. Both styles work, and I maintain the classic bitmask method is easy, safe, and clear. My coworker agrees it's common and easy, but the bitfield union method is worth the extra few lines to make it portable and safer.
Is there any more arguments for either side? In particular is there some possible failure, perhaps with endianness, that the bitmask method may miss but where the structure method is safe?
Bitfields are not quite as portable as you think, as "C gives no guarantee of the ordering of fields within machine words" (The C book)
Ignoring that, used correctly, either method is safe. Both methods also allow symbolic access to integral variables. You can argue that the bitfield method is easier to write, but it also means more code to review.
If the issue is that setting and clearing bits is error prone, then the right thing to do is to write functions or macros to make sure you do it right.
// off the top of my head
#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)
#define CLEAR_BIT(val, bitIndex) val &= ~(1 << bitIndex)
#define TOGGLE_BIT(val, bitIndex) val ^= (1 << bitIndex)
#define BIT_IS_SET(val, bitIndex) (val & (1 << bitIndex))
Which makes your code readable if you don't mind that val has to be an lvalue except for BIT_IS_SET. If that doesn't make you happy, then you take out assignment, parenthesize it and use it as val = SET_BIT(val, someIndex); which will be equivalent.
Really, the answer is to consider decoupling the what you want from how you want to do it.
Bitfields are great and easy to read, but unfortunately the C language does not specify the layout of bitfields in memory, which means they are essentially useless for dealing with packed data in on-disk formats or binary wire protocols. If you ask me, this decision was a design error in C—Ritchie could have picked an order and stuck with it.
You have to think about this from the perspective of a writer -- know your audience. So there are a couple of "audiences" to consider.
First there's the classic C programmer, who have bitmasked their whole lives and could do it in their sleep.
Second there's the newb, who has no idea what all this |, & stuff is. They were programming php at their last job and now they work for you. (I say this as a newb who does php)
If you write to satisfy the first audience (that is bitmask-all-day-long), you'll make them very happy, and they'll be able to maintain the code blindfolded. However, the newb will likely need to overcome a large learning curve before they are able to maintain your code. They will need to learn about binary operators, how you use these operations to set/clear bits, etc. You're almost certainly going to have bugs introduced by the newb as he/she all the tricks required to get this to work.
On the other hand, if you write to satisfy the second audience, the newbs will have an easier time maintaining the code. They'll have an easier time groking
flags.force = 0;
than
flags &= 0xFFFFFFFE;
and the first audience will just get grumpy, but its hard to imagine they wouldn't be able to grok and maintain the new syntax. It's just much harder to screw up. There won't be new bugs, because the newb will more easily maintain the code. You'll just get lectures about how "back in my day you needed a steady hand and a magnetized needle to set bits... we didn't even HAVE bitmasks!" (thanks XKCD).
So I would strongly recommend using the fields over the bitmasks to newb-safe your code.
The union usage has undefined behavior according to the ANSI C standard, and thus, should not be used (or at least not be considered portable).
From the ISO/IEC 9899:1999 (C99) standard:
Annex J - Portability Issues:
1 The following are unspecified:
— The value of padding bytes when storing values in structures or unions (6.2.6.1).
— The value of a union member other than the last one stored into (6.2.6.1).
6.2.6.1 - Language Concepts - Representation of Types - General:
6 When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.[42]) The value of a structure or union object is never a trap
representation, even though the value of a member of the structure or union object may be
a trap representation.
7 When a value is stored in a member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond to other members
take unspecified values.
So, if you want to keep the bitfield ↔ integer correspondence, and to keep portability, I strongly suggest you to use the bitmasking method, that contrary to the linked blog post, it is not poor practice.
What it is about the bitfield approach that makes you cringe?
Both techniques have their place, and the only decision I have is which one to use:
For simple "one-off" bit fiddling, I use the bitwise operators directly.
For anything more complex - eg hardware register maps, the bitfield approach wins hands down.
Bitfields are more succinct to use
(at the expense of /slightly/ more
verbosity to write.
Bitfields are
more robust (what size is "int",
anyway)
Bitfields are usually just
as fast as bitwise operators.
Bitfields are very powerful when you
have a mix of single and multiple bit
fields, and extracting the
multiple-bit field involves loads of
manual shifts.
Bitfields are
effectively self-documenting. By
defining the structure and therefore
naming the elements, I know what it's
meant to do.
Bitfields also seamlessly handle structures bigger than a single int.
With bitwise operators, typical (bad) practice is a slew of #defines for the bit masks.
The only caveat with bitfields is to make sure the compiler has really packed the object into the size you wanted. I can't remember if this is define by the standard, so a assert(sizeof(myStruct) == N) is a useful check.
The blog post you are referring to mentions raw union field as alternative access method for bitfields.
The purposes blog post author used raw for are ok, however if you plan to use it for anything else (e.g. serialisation of bit fields, setting/checking individual bits), disaster is just waiting for you around the corner. The ordering of bits in memory is architecture dependent and memory padding rules vary from compiler to compiler (see wikipedia), so exact position of each bitfield may differs, in other words you never can be sure which bit of raw each bitfield corresponds to.
However if you don't plan to mix it you better take raw out and you will be safe.
Well you can't go wrong with structure mapping since both fields are accessable they can be used interchangably.
One benefit for bit fields is that you can easily aggregate options:
mask = USER|FORCE|ZERO|COMPAT;
vs
flags.user = true;
flags.force = true;
flags.zero = true;
flags.compat = true;
In some environments such as dealing with protocol options it can get quite old having to individually set options or use multiple parameters to ferry intermediate states to effect a final outcome.
But sometimes setting flag.blah and having the list popup in your IDE is great especially if your like me and can't remember the name of the flag you want to set without constantly referencing the list.
I personally will sometimes shy away from declaring boolean types because at some point I'll end up with the mistaken impression that the field I just toggled was not dependent (Think multi-thread concurrency) on the r/w status of other "seemingly" unrelated fields which happen to share the same 32-bit word.
My vote is that it depends on the context of the situation and in some cases both approaches may work out great.
Either way, bitfields have been used in GNU software for decades and it hasn't done them any harm. I like them as parameters to functions.
I would argue that bitfields are conventional as opposed to structs. Everyone knows how to AND the values to set various options off and the compiler boils this down to very efficient bitwise operations on the CPU.
Providing you use the masks and tests in the correct way, the abstractions the compiler provide should make it robust, simple, readable and clean.
When I need a set of on/off switches, Im going to continue using them in C.
In C++, just use std::bitset<N>.
It is error-prone, yes. I've seen lots of errors in this kind of code, mainly because some people feel that they should mess with it and the business logic in a totally disorganized way, creating maintenance nightmares. They think "real" programmers can write value |= mask; , value &= ~mask; or even worse things at any place, and that's just ok. Even better if there's some increment operator around, a couple of memcpy's, pointer casts and whatever obscure and error-prone syntax happens to come to their mind at that time. Of course there's no need to be consistent and you can flip bits in two or three different ways, distributed randomly.
My advice would be:
Encapsulate this ---- in a class, with methods such as SetBit(...) and ClearBit(...). (If you don't have classes in C, in a module.) While you're at it, you can document all their behaviour.
Unit test that class or module.
Your first method is preferable, IMHO. Why obfuscate the issue? Bit fiddling is a really basic thing. C did it right. Endianess doesn't matter. The only thing the union solution does is name things. 11 might be mysterious, but #defined to a meaningful name or enum'ed should suffice.
Programmers who can't handle fundamentals like "|&^~" are probably in the wrong line of work.
I nearly always use the logical operations with a bit mask, either directly or as a macro. e.g.
#define ASSERT_GPS_RESET() { P1OUT &= ~GPS_RESET ; }
incidentally your union definition in the original question would not work on my processor/compiler combination. The int type is only 16 bits wide and the bitfield definitions are 32. To make it slightly more portable then you would have to define a new 32 bit type that you could then map to the required base type on each target architecture as part of the porting exercise. In my case
typedef unsigned long int uint32_t
and in the original example
typedef unsigned int uint32_t
typedef union {
struct {
boolean_t user:1;
boolean_t zero:1;
boolean_t force:1;
int :28; /* unused */
boolean_t compat:1; /* bit 31 */
};
uint32_t raw;
} flags_t;
The overlaid int should also be made unsigned.
Well, I suppose that's one way of doing it, but I would always prefer to keep it simple.
Once you're used to it, using masks is straightforward, unambiguous and portable.
Bitfields are straightforward, but they are not portable without having to do additional work.
If you ever have to write MISRA-compliant code, the MISRA guidelines frown on bitfields, unions, and many, many other aspects of C, in order to avoid undefined or implementation-dependent behaviour.
When I google for "C operators", the first three pages are:
Operators in C and C++
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40F_HTML/AQTLTBTE/DOCU_059.HTM
http://www.cs.mun.ca/~michael/c/op.html
..so I think that argument about people new to the language is a little silly.
Generally, the one that is easier to read and understand is the one that is also easier to maintain. If you have co-workers that are new to C, the "safer" approach will probably be the easier one for them to understand.
Bitfields are great, except that the bit manipulation operations are not atomic, and can thus lead to problems in multi-threaded application.
For example one could assume that a macro:
#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)
Defines an atomic operation, since |= is one statement. But the ordinary code generated by a compiler will not try to make |= atomic.
So if multiple threads execute different set bit operations one of the set bit operation could be spurious. Since both threads will execute:
thread 1 thread 2
LOAD field LOAD field
OR mask1 OR mask2
STORE field STORE field
The result can be field' = field OR mask1 OR mask2 (intented), or the result can be field' = field OR mask1 (not intented) or the result can be field' = field OR mask2 (not intended).
I'm not adding much to what's already been said, except to emphasize two points:
The compiler is free to arrange bits within a bitfield any way it wants. This mean if you're trying to manipulate bits in a microcontroller register, or if you want to send the bits to another processor (or even the same processor with a different compiler), you MUST use bitmasks.
On the other hand, if you're trying to create a compact representation of bits and small integers for use within a single processor, bitfields are easier to maintain and thus less error prone, and -- with most compilers -- are at least as efficient as manually masking and shifting.