We have data coming in over serial (Bluetooth), which maps to a particular structure. Some parts of the structure are sub-byte size, so the "obvious" solution is to map the incoming data to a bit-field. What I can't work out is whether the bit-endianness of the machine or compiler will affect it (which is difficult to test), and whether I should just abandon the bit-fields altogether.
For example, we have a piece of data which is 1.5 bytes, so we used the struct:
{
uint8_t data1; // lsb
uint8_t data2:4; // msb
uint8_t reserved:4;
} Data;
The reserved bits are always 1
So for example, if the incoming data is 0xD2,0xF4, the value is 0x04D2, or 1234.
The struct we have used is always working on the systems we have tested on, but we need it to be as portable as possible.
My questions are:
Will data1 always represent the correct value as expected regardless of endianness (I assume yes, and that the hardware/software interface should always handle that correctly for a single, whole byte - if 0xD2 is sent, 0xD2 should be received)?
Could data2 and reserved be the wrong way around, with data2 representing the upper 4 bits instead of the lower 4 bits?
If yes:
Is the bit endianness (generally) dependent on the byte endianness, or can they differ entirely?
Is the bit-endianness determined by the hardware or the compiler? It seems all linux systems on Intel are the same - is that true for ARM as well? (If we can say we can support all Intel and ARM linux builds, we should be OK)
Is there a simple way to determine in the compiler which way around it is, and reserve the bit-fields entries if needed?
Although bit-fields are the neatest way, code-wise, to map the incoming data, I suppose I am just wondering if it's a lot safer to just abandon them, and use something like:
struct {
uint8_t data1; // lsb (0xFF)
uint8_t data2; // msb (0x0F) & reserved (0xF0)
} Data;
Data d;
int value = (d.data2 & 0x0F) << 16 + d.data1
The reason we have not just done this in the first place is because a number of the data fields are less than 1 byte, rather than more than 1 - meaning that generally with a bit-field we don't have to do any masking and shifting, so the post-processing is simpler.
Should I use bit-fields for mapping incoming serial data?
No. Bit-fields have a lot of implementation specified behaviour that makes using them a nightmare.
Will data1 always represent the correct value as expected regardless of endianness.
Yes, but that is because uint8_t is smallest possible addressable unit: a byte. For larger data types you need to take care of the byte endianness.
Could data2 and reserved be the wrong way around, with data2 representing the upper 4 bits instead of the lower 4 bits?
Yes. They could also be on different bytes. Also, compiler doesn't have to support uint8_t for bitfields, even if it would support the type otherwise.
Is the bit endianness (generally) dependent on the byte endianness, or can they differ entirely?
The least signifact bit will always be in the least significant byte, but it's impossible to determine in C where in the byte the bit will be.
Bit shifting operators give reliable abstraction of the order that is good enough: For data type uint8_t the (1u << 0) is always the least significant and (1u << 7) the most significant bit, for all compilers and for all architectures.
Bit-fields on the other hand are so poorly defined that you cannot determine the order of bits by the order of your defined fields.
Is the bit-endianness determined by the hardware or the compiler?
Compiler dictates how datatypes map to actual bits, but hardware heavily influences it. For bit-fields, two different compilers for the same hardware can put fields in different order.
Is there a simple way to determine in the compiler which way around it is, and reserve the bit-fields entries if needed?
Not really. It depends on your compiler how to do it, if it's possible at all.
Although bit-fields are the neatest way, code-wise, to map the incoming data, I suppose I am just wondering if it's a lot safer to just abandon them, and use something like:
Definitely abandon bit-fields, but I would also recommend abandoning structures altogether for this purpose, because:
You need to use compiler extensions or manual work to handle byte order.
You need to use compiler extensions to disable padding to avoid gaps due to alignment restrictions. This affects member access performance on some systems.
You cannot have variable width or optional fields.
It's very easy to have strict aliasing violations if you are unaware of those issues. If you define byte array for the data frame and cast that to pointer to structure and then dereference that, you have problems in many cases.
Instead I recommend doing it manually. Define byte array and then write each field into it manually by breaking them apart using bit shifting and masking when necessary. You can write a simple reusable conversion functions for the basic data types.
Related
I see that some embedded system Firmware books/articles suggest not to use C's structure bit field as it isn't portable. I know that the order and padding is implementation defined, but is it always not portable to use bit fields?
I mean, if for example I defined a configuration structure for a 8-bit microcontroller driver like this:
typedef struct
{
int channel_name :3 ; /*7 possible channels*/
int Enable :1 ; /*if 1 enable,otherwise disable*/
int Mode;
} conf_t
I don't understand how can the implementation defined behavior raise a portability issue in such a case, can anyone explain?
Here are some of the portability issues that are likely to occur:
Byte padding issues. What will be the size of this struct?
Endianess and bit order issues caused by it. Will channel_name get allocated in the 3 MSB or the 3 LSB?
Different behavior from different compilers when you declare an int bitfield of size 1. What goes into that bit, the sign bit or data? In a bitfield (and only there), compilers may treat int either as signed or unsigned, and in case of signed, they may behave differently in regards of the sign bit.
Behavior in terms of bit/byte padding upon mixing different types in the same bitfield.
Then there's a bunch of other things that are poorly-defined as well, but less likely to cause actual problems in reality.
Device registers often have side effects, for example, reading a status register might clear a detected conditions. You have no way of controlling how many accesses the compiler might make in a given expression. Reflexively, when you update a structure with bitfields, the compiler is free to make multiple writes to the storage location, which could have a dramatic effect.
Even if you have sorted out your compiler what advantage are you gaining with this? Does it really make the code more readable, or just shorter? Often the latter implies the former, but there are limits.
The numbering of bits in a bitfield normally follows the byte ordering of the machine; so { int x:1; } would be the least significant bit on an intel machine, but the most significant bit on a motorola machine. In contrast (1 << 0) is the least significant bit on all machines. [ I once had to go through an 8kloc video capture driver stuffed with bitfields to move it to another architecture ].
The casual notion that *p is sufficient to read a register with an appropriate bus protocol is a long dead notion, and should stay there. x = io_readb(device) is inherently self documenting; or even better : if (io_readb(device, &x) != 0) { panic("device failed"); }.
I long knew there are bit-fields in C and occasionally I use them for defining densely packed structs:
typedef struct Message_s {
unsigned int flag : 1;
unsigned int channel : 4;
unsigned int signal : 11;
} Message;
When I read open source code, I instead often find bit-masks and bit-shifting operations to store and retrieve such information in hand-rolled bit-fields. This is so common that I do not think the authors were not aware of the bit-field syntax, so I wonder if there are reasons to roll bit-fields via bit-masks and shifting operations your own instead of relying on the compiler to generate code for getting and setting such bit-fields.
Why other programmers use hand-coded bit manipulations instead of bitfields to pack multiple fields into a single word?
This answer is opinion based as the question is quite open:
Many programmers are unaware of the availability of bitfields or unsure about their portability and precise semantics. Some even distrust the compiler's ability to produce correct code. They prefer to write explicit code that they understand.
As commented by Cornstalks, this attitude is rooted in real life experience as explained in this article.
Bitfield's actual memory layout is implementation defined: if the memory layout must follow a precise specification, bitfields should not be used and hand-coded bit manipulations may be required.
The handing of signed values in signed typed bitfields is implementation defined. If signed values are packed into a range of bits, it may be more reliable to hand-code the access functions.
Are there reasons to avoid bitfield-structs?
bitfield-structs come with some limitations:
Bit fields result in non-portable code. Also, the bit field length has a high dependency on word size.
Reading (using scanf()) and using pointers on bit fields is not possible due to non-addressability.
Bit fields are used to pack more variables into a smaller data space, but cause the compiler to generate additional code to manipulate these variables. This results in an increase in both space as well as time complexities.
The sizeof() operator cannot be applied to the bit fields, since sizeof() yields the result in bytes and not in bits.
Source
So whether you should use them or not depends. Read more in Why bit endianness is an issue in bitfields?
PS: When to use bit-fields in C?
There is no reason for it. Bitfields are useful and convenient. They are in the common use in the embedded projects. Some architectures (like ARM) have even special instructions to manipulate bitfields.
Just compare the code (and write the rest of the function foo1)
https://godbolt.org/g/72b3vY
In many cases, it is useful to be able to address individual groups of bits within a word, or to operate on a word as a unit. The Standard presently does not provide
any practical and portable way to achieve such functionality. If code is written to use bitfields and it later becomes necessary to access multiple groups as a word, there would be no nice way to accommodate that without reworking all the code using the bit fields or disabling type-based aliasing optimizations, using type punning, and hoping everything gets laid out as expected.
Using shifts and masks may be inelegant, but until C provides a means of treating an explicitly-designated sequence of bits within one lvalue as another lvalue, it is often the best way to ensure that code will be adaptable to meet needs.
In my code there is a structure which have padding issues. I fixed them and my code is running fine on a little endian machine. Can there be a chance that this stucture cause a problem for a big endian machine ??
You need to keep the following in mind:
Whenever doing data communication, the endianess of the communication protocol is what matters. All data communication protocols have (should have) a specified endianess. Big endian is probably most common, because back in the days where CRC calculations were done with digital electonic gates rather than software, the checksum itself had to be big endian.
(This can lead to quite obscure protocols, like the industry standard field bus CANopen, where all integers in the sent data must be little endian, but the identifier and checksum must be big endian.)
Struct padding will always cause issues when you are writing portable code. Code like send(&my_struct, sizeof(my_struct) is never portable! Because it will send the data and any padding bytes. And padding bytes may be anywhere inside the struct and not just in the end. If you need to write truly portable code, you cannot use structs/unions for the data protocol, everything needs to be stored in arrays of bytes or similar, where the data is guaranteed to be allocated in adjacent cells. Struct padding has nothing to do with endianess, but rather of the CPU instruction set.
(Motorola CPUs have traditionally had better support for reading and storing at unaligned addresses, while Intel derivates have alignment requirements and are therefore more prone to use padding. As it happens, Motorola were with the big endians and Intel were with the little endians. So by coincidence, little endian CPUs are more likely to have padding, but this is only because of the CPU instruction set and not because of the endianess itself.)
A structure, in C, is a way of representing data in memory. (It gives "structure" to memory.)
Any conversion from "struct" to "sequence of bytes" that just casts the "struct" bit away, and uses whatever underlying byte representation C is using is going to be affected by endianness. (And padding. Maybe other issues too, like pointers, sizeof(some-integral-type), etc.)
I suspect you're doing something like this:
// Some non-standard way to get rid of padding in Foo
struct Foo
{
// Some fields...
}
// Meanwhile, in a function somewhere...
fwrite(a_foo, sizeof(a_foo), 1, fp);
Maybe you're not calling fwrite, maybe it's send, but yes, if you're doing serialization like this, you are going to be effected by endianness.
In class I've been tasked with writing a C program that decompresses a text file and prints out the characters it contains. Each character in the file is represented by 2 bits (4 possible characters).
I've recently been informed that a byte is not necessarily 8 bits on all systems, and a char is not necessarily 1 byte. This then makes me wonder how on earth I'm supposed to know how many bits got loaded from a file when I loaded 1 byte. Also how am I supposed to keep the loaded data in memory when there are no data types that can guarantee a set amount of bits.
How do I work with bit data in C?
A byte is not necessarily 8 bits. That much is certainly true. A char, on the other hand, is defined to be a byte - C does not differentiate between the two things.
However, the systems you will write for will almost certainly have 8-bit bytes. Bytes of different sizes are essentially non-existant outside of really, really old systems, or certain embedded systems.
If you have to write your code to work for multiple platforms, and one or more of those have differently sized chars, then you write code specifically to handle that platform - using e.g. CHAR_BIT to determine how many bits each byte contains.
Given that this is for a class, assume 8-bit bytes, unless told otherwise. The point is not going to be extreme platform independence, the point is to teach you something about bit fiddling (or possibly bit fields, but that depends on what you've covered in class).
This then makes me wonder how on earth I'm supposed to know how many
bits got loaded from a file when I loaded 1 byte.
You'll be hard pressed to find a platform where a byte is not 8 bits. (though as noted above CHAR_BIT can be used to verify that). Also clarify the portability requirements with your instructor or state your assumptions.
Usually bits are extracted using shifts and bitwise operations, e.g. (x & 3) are the rightmost 2 bits of x. ((x>>2) & 3) are the next two bits. Pick the right data type for the platforms you are targettiing or as others say use something like uint8_t if available for your compiler.
Also see:
Type to use to represent a byte in ANSI (C89/90) C?
I would recommend not using bit fields. Also see here:
When is it worthwhile to use bit fields?
You can use bit fields in C. These indices explicitly let you specify the number of bits in each part of the field, if you are truly concerned about width. This page gives a discussion: http://msdn.microsoft.com/en-us/library/yszfawxh(v=vs.80).aspx
As an example, check out the ieee754.h for usage in the context of implementing IEEE754 floats
I want to declare a bitfield with the size specified using the a colon (I can't remember what the syntax is called). I want to write this:
void myFunction()
{
unsigned int thing : 12;
...
}
But GCC says it's a syntax error (it thinks I'm trying to write a nested function). I have no problem doing this though:
struct thingStruct
{
unsigned int thing : 4;
};
and then putting one such struct on the stack
void myFunction()
{
struct thingStruct thing;
...
}
This leads me to believe that it's being prevented by syntax, not semantic issues.
So why won't the first example work? What am I missing?
The first example won't work because you can only declare bitfields inside structs. This is syntax, not semantics, as you said, but there it is. If you want a bitfield, use a struct.
Why would you want to do such a thing? A bit field of 12 would on all common architectures be padded to at least 16 or 32 bits.
If you want to ensure the width of an integer variable use the types in inttypes.h, e.g int16_t or int32_t.
As others have said, bitfields must be declared inside a struct (or union, but that's not really useful). Why? Here are two reasons.
Mainly, it's to make the compiler writer's job easier. Bitfields tend to require more machine instructions to extract the bits from the bytes. Only fields can be bitfields, and not variables or other objects, so the compiler writer doesn't have to worry about them if there is no . or -> operator involved.
But, you say, sometimes the language designers make the compiler writer's job harder in order to make the programmer's life easier. Well, there is not a lot of demand from programmers for bitfields outside structs. The reason is that programmers pretty much only bother with bitfields when they're going to cram several small integers inside a single data structure. Otherwise, they'd use a plain integral type.
Other languages have integer range types, e.g., you can specify that a variable ranges from 17 to 42. There isn't much call for this in C because C never requires that an implementation check for overflow. So C programmers just choose a type that's capable of representing the desired range; it's their job to check bounds anyway.
C89 (i.e., the version of the C language that you can find just about everywhere) offers a limited selection of types that have at least n bits. There's unsigned char for 8 bits, unsigned short for 16 bits and unsigned long for 32 bits (plus signed variants). C99 offers a wider selection of types called uint_least8_t, uint_least16_t, uint_least32_t and uint_least64_t. These types are guaranteed to be the smallest types with at least that many value bits. An implementation can provide types for other number of bits, such as uint_least12_t, but most don't. These types are defined in <stdint.h>, which is available on many C89 implementations even though it's not required by the standard.
Bitfields provide a consistent syntax to access certain implementation-dependent functionality. The most common purpose of that functionality is to place certain data items into bits in a certain way, relative to each other. If two items (bit-fields or not) are declared as consecutive items in a struct, they are guaranteed to be stored consecutively. No such guarantee exists with individual variables, regardless of storage class or scope. If a struct contains:
struct foo {
unsigned bar: 1;
unsigned boz: 1;
};
it is guaranteed that bar and boz will be stored consecutively (most likely in the same storage location, though I don't think that's actually guaranteed). By contrast, 'bar' and 'boz' were single-bit automatic variables, there's no telling where they would be stored, so there'd be little benefit to having them as bitfields. If they did share space with some other variable, it would be hard to make sure that different functions reading and writing different bits in the same byte didn't interfere with each other.
Note that some embedded-systems compilers do expose a genuine 'bit' type, which are packed eight to a byte. Such compilers generally have an area of memory which is allocated for storing nothing but bit variables, and the processors for which they generate code have atomic instructions to test, set, and clear individual bits. Since the memory locations holding the bits are only accessed using such instructions, there's no danger of conflicts.