C: Bit-wise operations and bit padding

C: Bit-wise operations and bit padding - c

EDIT: The premise was faulty and thanks to other users' comments, I realize the system I described below cannot work. But I wonder, is there a system that would work for storing flags in different positions within a variable so that it can be simultaneously used to store either a high-precision small value or a lower-precision large value?
Original question:
In C11 and C++11, I want to stuff two single-bit flags into a size_t variable that I am simultaneously using to store an unrelated value. Since that value will usually be low, my idea is to use the two most significant bits to store the flags unless the third-most significant bit is set, in which case I store the flags in the two least significant bits. That way, if the value is within the usual range, it can have a precision of one, and if the value is huge, it can have a precision of four. I can figure out where the flags are stored and how to interpret the value just by checking the third-most significant bit.
Unlike the uintN_t types, the standards don't seem to guarantee that size_t has no padding bits. I’m not well-versed in bit-twiddling. In the unlikely event of a system that uses padding bits in size_t, will the bit-wise operations I need to implement this system result in undefined behavior?
To satisfy the curious, I don't want to store the flags in a separate char because memory usage is a priority and doing so would enlarge the containing struct by max_align_t on most systems (because of struct alignment/padding).

In the unlikely event of a system that uses padding bits in size_t,
will the bit-wise operations I need to implement this system result in
undefined behavior?
I infer that your concern is that you might inadvertently try to twiddle a padding bit. You do not need to worry about that, because the bitwise operations are defined in terms of values, not the details of the representations of those values.

Related

Should I use bit-fields for mapping incoming serial data?

We have data coming in over serial (Bluetooth), which maps to a particular structure. Some parts of the structure are sub-byte size, so the "obvious" solution is to map the incoming data to a bit-field. What I can't work out is whether the bit-endianness of the machine or compiler will affect it (which is difficult to test), and whether I should just abandon the bit-fields altogether.
For example, we have a piece of data which is 1.5 bytes, so we used the struct:
{
uint8_t data1; // lsb
uint8_t data2:4; // msb
uint8_t reserved:4;
} Data;
The reserved bits are always 1
So for example, if the incoming data is 0xD2,0xF4, the value is 0x04D2, or 1234.
The struct we have used is always working on the systems we have tested on, but we need it to be as portable as possible.
My questions are:
Will data1 always represent the correct value as expected regardless of endianness (I assume yes, and that the hardware/software interface should always handle that correctly for a single, whole byte - if 0xD2 is sent, 0xD2 should be received)?
Could data2 and reserved be the wrong way around, with data2 representing the upper 4 bits instead of the lower 4 bits?
If yes:
Is the bit endianness (generally) dependent on the byte endianness, or can they differ entirely?
Is the bit-endianness determined by the hardware or the compiler? It seems all linux systems on Intel are the same - is that true for ARM as well? (If we can say we can support all Intel and ARM linux builds, we should be OK)
Is there a simple way to determine in the compiler which way around it is, and reserve the bit-fields entries if needed?
Although bit-fields are the neatest way, code-wise, to map the incoming data, I suppose I am just wondering if it's a lot safer to just abandon them, and use something like:
struct {
uint8_t data1; // lsb (0xFF)
uint8_t data2; // msb (0x0F) & reserved (0xF0)
} Data;
Data d;
int value = (d.data2 & 0x0F) << 16 + d.data1
The reason we have not just done this in the first place is because a number of the data fields are less than 1 byte, rather than more than 1 - meaning that generally with a bit-field we don't have to do any masking and shifting, so the post-processing is simpler.

Should I use bit-fields for mapping incoming serial data?
No. Bit-fields have a lot of implementation specified behaviour that makes using them a nightmare.
Will data1 always represent the correct value as expected regardless of endianness.
Yes, but that is because uint8_t is smallest possible addressable unit: a byte. For larger data types you need to take care of the byte endianness.
Could data2 and reserved be the wrong way around, with data2 representing the upper 4 bits instead of the lower 4 bits?
Yes. They could also be on different bytes. Also, compiler doesn't have to support uint8_t for bitfields, even if it would support the type otherwise.
Is the bit endianness (generally) dependent on the byte endianness, or can they differ entirely?
The least signifact bit will always be in the least significant byte, but it's impossible to determine in C where in the byte the bit will be.
Bit shifting operators give reliable abstraction of the order that is good enough: For data type uint8_t the (1u << 0) is always the least significant and (1u << 7) the most significant bit, for all compilers and for all architectures.
Bit-fields on the other hand are so poorly defined that you cannot determine the order of bits by the order of your defined fields.
Is the bit-endianness determined by the hardware or the compiler?
Compiler dictates how datatypes map to actual bits, but hardware heavily influences it. For bit-fields, two different compilers for the same hardware can put fields in different order.
Is there a simple way to determine in the compiler which way around it is, and reserve the bit-fields entries if needed?
Not really. It depends on your compiler how to do it, if it's possible at all.
Although bit-fields are the neatest way, code-wise, to map the incoming data, I suppose I am just wondering if it's a lot safer to just abandon them, and use something like:
Definitely abandon bit-fields, but I would also recommend abandoning structures altogether for this purpose, because:
You need to use compiler extensions or manual work to handle byte order.
You need to use compiler extensions to disable padding to avoid gaps due to alignment restrictions. This affects member access performance on some systems.
You cannot have variable width or optional fields.
It's very easy to have strict aliasing violations if you are unaware of those issues. If you define byte array for the data frame and cast that to pointer to structure and then dereference that, you have problems in many cases.
Instead I recommend doing it manually. Define byte array and then write each field into it manually by breaking them apart using bit shifting and masking when necessary. You can write a simple reusable conversion functions for the basic data types.

Are there reasons to avoid bit-field structure members?

I long knew there are bit-fields in C and occasionally I use them for defining densely packed structs:
typedef struct Message_s {
unsigned int flag : 1;
unsigned int channel : 4;
unsigned int signal : 11;
} Message;
When I read open source code, I instead often find bit-masks and bit-shifting operations to store and retrieve such information in hand-rolled bit-fields. This is so common that I do not think the authors were not aware of the bit-field syntax, so I wonder if there are reasons to roll bit-fields via bit-masks and shifting operations your own instead of relying on the compiler to generate code for getting and setting such bit-fields.

Why other programmers use hand-coded bit manipulations instead of bitfields to pack multiple fields into a single word?
This answer is opinion based as the question is quite open:
Many programmers are unaware of the availability of bitfields or unsure about their portability and precise semantics. Some even distrust the compiler's ability to produce correct code. They prefer to write explicit code that they understand.
As commented by Cornstalks, this attitude is rooted in real life experience as explained in this article.
Bitfield's actual memory layout is implementation defined: if the memory layout must follow a precise specification, bitfields should not be used and hand-coded bit manipulations may be required.
The handing of signed values in signed typed bitfields is implementation defined. If signed values are packed into a range of bits, it may be more reliable to hand-code the access functions.

Are there reasons to avoid bitfield-structs?
bitfield-structs come with some limitations:
Bit fields result in non-portable code. Also, the bit field length has a high dependency on word size.
Reading (using scanf()) and using pointers on bit fields is not possible due to non-addressability.
Bit fields are used to pack more variables into a smaller data space, but cause the compiler to generate additional code to manipulate these variables. This results in an increase in both space as well as time complexities.
The sizeof() operator cannot be applied to the bit fields, since sizeof() yields the result in bytes and not in bits.
Source
So whether you should use them or not depends. Read more in Why bit endianness is an issue in bitfields?
PS: When to use bit-fields in C?

There is no reason for it. Bitfields are useful and convenient. They are in the common use in the embedded projects. Some architectures (like ARM) have even special instructions to manipulate bitfields.
Just compare the code (and write the rest of the function foo1)
https://godbolt.org/g/72b3vY

In many cases, it is useful to be able to address individual groups of bits within a word, or to operate on a word as a unit. The Standard presently does not provide
any practical and portable way to achieve such functionality. If code is written to use bitfields and it later becomes necessary to access multiple groups as a word, there would be no nice way to accommodate that without reworking all the code using the bit fields or disabling type-based aliasing optimizations, using type punning, and hoping everything gets laid out as expected.
Using shifts and masks may be inelegant, but until C provides a means of treating an explicitly-designated sequence of bits within one lvalue as another lvalue, it is often the best way to ensure that code will be adaptable to meet needs.

Is there a general way in C to make a number greater than 64 bits or smaller than 8 bits?

First of all, I am aware that what I am trying to do might be outside the C standard.
I'd like to know if it is possible to make a uint4_t/int4_t or uint128_t/int128_t type in C.
I know I could do this using bitshifts and complex functions, but can I do it without those?

You can use bitfields within a structure to get fields narrower than a uint8_t, but, the base datatype they're stored in will not be any smaller.
struct SmallInt
{
unsigned int a : 4;
};
will give you a structure with a member called a that is 4 bits wide.

Individual storage units (bytes) are no less than CHAR_BITS bits wide1; even if you create a struct with a single 4-bit bitfield, the associated object will always take up a full storage unit.
There are multiple precision libraries such as GMP that allow you to work with values that can't fit into 32 or 64 bits. Might want to check them out.
8 bits minimum, but may be wider.

In practice, if you want very wide numbers (but that is not specified in standard C11) you probably want to use some arbitrary-precision arithmetic external library (a.k.a. bignums). I recommend using GMPlib.
In some cases, for tiny ranges of numbers, you might use bitfields inside struct to have tiny integers. Practically speaking, they can be costly (the compiler would emit shift and bitmask instructions to deal with them).
See also this answer mentioning __int128_t as an extension in some compilers.

Why can't I declare bitfields as automatic variables?

I want to declare a bitfield with the size specified using the a colon (I can't remember what the syntax is called). I want to write this:
void myFunction()
{
unsigned int thing : 12;
...
}
But GCC says it's a syntax error (it thinks I'm trying to write a nested function). I have no problem doing this though:
struct thingStruct
{
unsigned int thing : 4;
};
and then putting one such struct on the stack
void myFunction()
{
struct thingStruct thing;
...
}
This leads me to believe that it's being prevented by syntax, not semantic issues.
So why won't the first example work? What am I missing?

The first example won't work because you can only declare bitfields inside structs. This is syntax, not semantics, as you said, but there it is. If you want a bitfield, use a struct.

Why would you want to do such a thing? A bit field of 12 would on all common architectures be padded to at least 16 or 32 bits.
If you want to ensure the width of an integer variable use the types in inttypes.h, e.g int16_t or int32_t.

As others have said, bitfields must be declared inside a struct (or union, but that's not really useful). Why? Here are two reasons.
Mainly, it's to make the compiler writer's job easier. Bitfields tend to require more machine instructions to extract the bits from the bytes. Only fields can be bitfields, and not variables or other objects, so the compiler writer doesn't have to worry about them if there is no . or -> operator involved.
But, you say, sometimes the language designers make the compiler writer's job harder in order to make the programmer's life easier. Well, there is not a lot of demand from programmers for bitfields outside structs. The reason is that programmers pretty much only bother with bitfields when they're going to cram several small integers inside a single data structure. Otherwise, they'd use a plain integral type.
Other languages have integer range types, e.g., you can specify that a variable ranges from 17 to 42. There isn't much call for this in C because C never requires that an implementation check for overflow. So C programmers just choose a type that's capable of representing the desired range; it's their job to check bounds anyway.
C89 (i.e., the version of the C language that you can find just about everywhere) offers a limited selection of types that have at least n bits. There's unsigned char for 8 bits, unsigned short for 16 bits and unsigned long for 32 bits (plus signed variants). C99 offers a wider selection of types called uint_least8_t, uint_least16_t, uint_least32_t and uint_least64_t. These types are guaranteed to be the smallest types with at least that many value bits. An implementation can provide types for other number of bits, such as uint_least12_t, but most don't. These types are defined in <stdint.h>, which is available on many C89 implementations even though it's not required by the standard.

Bitfields provide a consistent syntax to access certain implementation-dependent functionality. The most common purpose of that functionality is to place certain data items into bits in a certain way, relative to each other. If two items (bit-fields or not) are declared as consecutive items in a struct, they are guaranteed to be stored consecutively. No such guarantee exists with individual variables, regardless of storage class or scope. If a struct contains:
struct foo {
unsigned bar: 1;
unsigned boz: 1;
};
it is guaranteed that bar and boz will be stored consecutively (most likely in the same storage location, though I don't think that's actually guaranteed). By contrast, 'bar' and 'boz' were single-bit automatic variables, there's no telling where they would be stored, so there'd be little benefit to having them as bitfields. If they did share space with some other variable, it would be hard to make sure that different functions reading and writing different bits in the same byte didn't interfere with each other.
Note that some embedded-systems compilers do expose a genuine 'bit' type, which are packed eight to a byte. Such compilers generally have an area of memory which is allocated for storing nothing but bit variables, and the processors for which they generate code have atomic instructions to test, set, and clear individual bits. Since the memory locations holding the bits are only accessed using such instructions, there's no danger of conflicts.

C Programming Data Types

A question was asked, and I am not sure whether I gave an accurate answer or not.
The question was, why use int, why not char, why are they separate? It's all reserved in memory, and bits, why data types have categories?
Can anyone shed some light upon it?

char is the smallest addressable chunk of memory – suits well for manipulating data buffers, but can't hold more than 256 distinct values (if char is 8 bits which is usual) and therefore not very good for numeric calculations. int is usually bigger than char – more suitable for calculations, but not so suitable for byte-level manipulation.

Remember that C is sometimes used as a higher level assembly language - to interact with low level hardware. You need data types to match machine-level features, such as byte-wide I/O registers.
From Wikipedia, C (programming language):
C's primary use is for "system programming", including implementing operating systems and embedded system applications, due to a combination of desirable characteristics such as code portability and efficiency, ability to access specific hardware addresses, ability to "pun" types to match externally imposed data access requirements, and low runtime demand on system resources.

In the past, computers had little memory. That was the prime reason why you had different data types. If you needed a variable to only hold small numbers, you could use an 8-bit char instead of using a 32-bit long. However, memory is cheap today. Therefore, this reason is less applicable now but has stuck anyway.
However, bear in mind that every processor has a default data type in the sense that it operates at a certain width (usually 32-bit). So, if you used an 8-bit char, the value would need to be extended to 32-bits and back again for computation. This may actually slow down your algorithm slightly.

The standard mandates very few limitations on char and int :
A char must be able to hold an ASCII value, that is 7 bits mininum (EDIT: CHAR_BIT is at least 8 according to the C standard). It is also the smallest addressable block of memory.
An int is at least 16 bits wide and the "recommended" default integer type. This recommendation is left to the implementation (your C compiler.)

In general, there are algorithms and designs which are abstractions and data types help in implementing those abstractions. For example - there is a good chance that weight is usually represented as a rational number which can be best implemented in storage in the form of float/double i.e. a number which has a precision part to it.
I hope this helps.

int is the "natural" integer type, you should use it for most computations.
char is essentially a byte; it's the smallest memory unit addressable. char is not 8-bit wide on all platforms, although it's the case most of the time.