"Bit-fields are assigned left to right on some machines and right to left on others"- unable to get the concept from "The C Programming Language" book - c

I was going through the text "The C Programming Language" by Kernighan and Ritchie. While discussing about bit-fields at the end of that section, the authors say:
"Fields are assigned left to right on some machines and right to left on others. This means that although fields are useful for maintaining internally-defined data structures, the question of which end comes first has to be carefully considered when picking apart externally-defined data; programs that depend on such things are not portable."
- The C Programming Language [2e] by Kernighan & Ritchie [Section 6.9, p.150]
Strictly I do not get the meaning of these lines. Can anyone please explain me with a possible diagram?
PS: Well I have taken a computer organization and architecture course. I know how computers deal with bits and bytes. In a computer system, the smallest unit of information is a single bit which can be either 0 or 1. 8 such bits form a byte. Memories are byte-addressable, which means that each byte in the memory has an address associated with it. But usually, the processors have word lengths as 2 bytes (very old systems),4 bytes, 8 bytes... This means in one memory cycle, the CPU can take up a word length number of bytes from the main memory and put it inside its registers. Now how these bytes are placed in registers depends on the endianness of the system.
But I do not get what the authors mean by "left to right" or "right to left". The words seem like they are related to the endianness but endianness depends on the CPU and C compilers have nothing to do with it... The question which comes to my mind is "left to right" of "what"? What object are the authors referring to?

When a structure contains bit-fields, the C implementation uses some storage unit to hold them (or multiple storage units if needed). The storage unit might be one eight-bit byte or it might be four bytes, or it might be other sizes—this is a determination made by each C implementation. The C standard only requires that it be addressable, which effectively means it has to be a whole number of bytes.
Once we have a storage unit, it is some number of bits. Say it is 32 bits, and number the bits from 31 to 0, where, if we consider the bits to represent a binary numeral, bit 0 represents 20, and bit 31 represents 231. Note that Kernighan and Ritchie are imprecise to use “left” and “right” here. There is no inherent left or right. We usually write numerals with the most significant digits on the left, so we might consider bit 31 to be the leftmost and bit 0 to be the rightmost.
Now we have a storage unit with some number of bits and some labeling for those bits (31 to 0 or left to right). Say you want to put two bit-fields in them, say fields of width 7 and 5.
Which 7 of the bits from bit 31 to bit 0 are used for the first field? Which 5 of the bits are used for the second field?
We could use bits 31-25 for the first field and bits 24-20 for the second field. Or we could use bits 6-0 for the first field and bits 11-7 for the second field.
In theory, we could also use bits 27-21 for the first field and bits 15-11 for the second field. However, the C standard does say that “If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit” (C 2018 6.7.2.1 11). “Adjacent” is not formally defined, but we can assume it means consecutively numbered bits. So, if the C implementation puts the first field in bits 31-25, it is required to put the second field in bits 24-20. Conversely, it it puts the first field in bits 6-0, it must put the second field in 11-7.
Thus, the C standard requires an implementation to arrange successive bit-fields in a storage unit from left-to-right or from right-to-left, but it does not say which.
(I do not see anything in the standard that says the first field must start at one end of the storage unit or the other, rather than somewhere in the middle. That would lead to wasting some bits.)

When you write:
struct {
unsigned int version: 4;
unsigned int length: 4;
unsigned char dcsn;
you end up with a big headache you weren't expecting because your code is non-portable.
When you set version to 4 and length to 5, some systems may set the first byte of the structure to 0x45 and other systems may set the first byte of the structure to 0x54.
When I went to college this thing was #ifdef'd as follows (incorrect):
struct {
#if BIG_ENDIAN
unsigned int version: 4;
unsigned int length: 4;
#else
unsigned int length: 4;
unsigned int version: 4;
#endif
unsigned char dcsn;
but this is still rolling the dice as there's no rule that the order of the bits in the bytes in a bitfield corresponds to the order of bytes in the word in the machine. I would not be surprised that when you cross-compile the bit order in the struct comes from the host machine's rules while the bit order of integers comes from the target machine's rules (as it must). In theory the code could be corrected by having a separate #ifdef for BIG_ENDIAN_BITFIELD but I've never seen it done.

Here is some demonstration code. The only goal is to demonstrate what you are asking about. Clean coding etc. is neglected.
#include <stdio.h>
#include <stdint.h>
union
{
uint32_t Everything;
struct
{
uint32_t FirstMentionedBit : 1;
uint32_t FewOTherBits :30;
uint32_t LastMentionedBit : 1;
} bitfield;
} Demonstration;
int main()
{
Demonstration.Everything =0;
Demonstration.bitfield.LastMentionedBit=1;
printf("%x\n", Demonstration.Everything);
Demonstration.Everything =0;
Demonstration.bitfield.FirstMentionedBit=1;
printf("%x\n", Demonstration.Everything);
return 0;
}
If you use this here https://www.tutorialspoint.com/compile_c_online.php
the output is
80000000
1
But in other environments it might easily be
1
80000000
This is because compilers are free to consider the first mentioned bit the MSB or the LSB and correspondingly the last mentioned bit to be the LSB or MSB.
And that is what your quote describes.

Related

Printing actual bit representation of integers in C

I wanted to print the actual bit representation of integers in C. These are the two approaches that I found.
First:
union int_char {
int val;
unsigned char c[sizeof(int)];
} data;
data.val = n1;
// printf("Integer: %p\nFirst char: %p\nLast char: %p\n", &data.f, &data.c[0], &data.c[sizeof(int)-1]);
for(int i = 0; i < sizeof(int); i++)
printf("%.2x", data.c[i]);
printf("\n");
Second:
for(int i = 0; i < 8*sizeof(int); i++) {
int j = 8 * sizeof(int) - 1 - i;
printf("%d", (val >> j) & 1);
}
printf("\n");
For the second approach, the outputs are 00000002 and 02000000. I also tried the other numbers and it seems that the bytes are swapped in the two. Which one is correct?
Welcome to the exotic world of endian-ness.
Because we write numbers most significant digit first, you might imagine the most significant byte is stored at the lower address.
The electrical engineers who build computers are more imaginative.
Someimes they store the most significant byte first but on your platform it's the least significant.
There are even platforms where it's all a bit mixed up - but you'll rarely encounter those in practice.
So we talk about big-endian and little-endian for the most part. It's a joke about Gulliver's Travels where there's a pointless war about which end of a boiled egg to start at. Which is itself a satire of some disputes in the Christian Church. But I digress.
Because your first snippet looks at the value as a series of bytes it encounters then in endian order.
But because the >> is defined as operating on bits it is implemented to work 'logically' without regard to implementation.
It's right of C to not define the byte order because hardware not supporting the model C chose would be burdened with an overhead of shuffling bytes around endlessly and pointlessly.
There sadly isn't a built-in identifier telling you what the model is - though code that does can be found.
It will become relevant to you if (a) as above you want to breakdown integer types into bytes and manipulate them or (b) you receive files for other platforms containing multi-byte structures.
Unicode offers something called a BOM (Byte Order Marker) in UTF-16 and UTF-32.
In fact a good reason (among many) for using UTF-8 is the problem goes away. Because each component is a single byte.
Footnote:
It's been pointed out quite fairly in the comments that I haven't told the whole story.
The C language specification admits more than one representation of integers and particularly signed integers. Specifically signed-magnitude, twos-complement and ones-complement.
It also permits 'padding bits' that don't represent part of the value.
So in principle along with tackling endian-ness we need to consider representation.
In principle. All modern computers use twos complement and extant machines that use anything else are very rare and unless you have a genuine requirement to support such platforms, I recommend assuming you're on a twos-complement system.
The correct Hex representation as string is 00000002 as if you declare the integer with hex represetation.
int n = 0x00000002; //n=2
or as you where get when printing integer as hex like in:
printf("%08x", n);
But when printing integer bytes 1 byte after the other, you also must consider the endianess, which is the byte order in multi-byte integers:
In big endian system (some UNIX system use it) the 4 bytes will be ordered in memory as:
00 00 00 02
While in little endian system (most of OS) the bytes will be ordered in memory as:
02 00 00 00
The first prints the bytes that represent the integer in the order they appear in memory. Platforms with different endian will print different results as they store integers in different ways.
The second prints the bits that make up the integer value most significant bit first. This result is independent of endian. The result is also independent of how the >> operator is implemented for signed ints as it does not look at the bits that may be influenced by the implementation.
The second is a better match to the question "Printing actual bit representation of integers in C". Although there is a lot of ambiguity.
It depends on your definition of "correct".
The first one will print the data exactly like it's laid out in memory, so I bet that's the one you're getting the maybe unexpected 02000000 for. *) IMHO, that's the correct one. It could be done simpler by just aliasing with unsigned char * directly (char pointers are always allowed to alias any other pointers, in fact, accessing representations is a usecase for char pointers mentioned in the standard):
int x = 2;
unsigned char *rep = (unsigned char *)&x;
for (int i = 0; i < sizeof x; ++i) printf("0x%hhx ", rep[i]);
The second one will print only the value bits **) and take them in the order from the most significant byte to the least significant one. I wouldn't call it correct because it also assumes that bytes have 8 bits, and because the shifting used is implementation-defined for negative numbers. ***) Furthermore, just ignoring padding bits doesn't seem correct either if you really want to see the representation.
edit: As commented by Gerhardh meanwhile, this second code doesn't print byte by byte but bit by bit. So, the output you claim to see isn't possible. Still, it's the same principle, it only prints value bits and starts at the most significant one.
*) You're on a "little endian" machine. On these machines, the least significant byte is stored first in memory. Read more about Endianness on wikipedia.
**) Representations of types in C may also have padding bits. Some types aren't allowed to include padding (like char), but int is allowed to have them. This second option doesn't alias to char, so the padding bits remain invisible.
***) A correct version of this code (for printing all the value bits) must a) correctly determine the number of value bits (8 * sizeof int is wrong because bytes (char) can have more then 8 bits, even CHAR_BIT * sizeof int is wrong, because this would also count padding bits if present) and b) avoid the implementation-defined shifting behavior by first converting to unsigned. It could look for example like this:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
int main(void)
{
int x = 2;
for (unsigned mask = 1U << (IMAX_BITS((unsigned)-1) - 1); mask; mask >>= 1)
{
putchar((unsigned) x & mask ? '1' : '0');
}
puts("");
}
See this answer for an explanation of this strange macro.

C Pointer Arithmetic for Unusual Architectures

I'm trying to get a better understanding of the C standard. In particular I am interested in how pointer arithmetic might work in an implementation for an unusual machine architecture.
Suppose I have a processor with 64 bit wide registers that is connected to RAM where each address corresponds to a cell 8 bits wide. An implementation for C for this machine defines CHAR_BIT to be equal to 8. Suppose I compile and execute the following lines of code:
char *pointer = 0;
pointer = pointer + 1;
After execution, pointer is equal to 1. This gives one the impression that in general data of type char corresponds to the smallest addressable unit of memory on the machine.
Now suppose I have a processor with 12 bit wide registers that is connected to RAM where each address corresponds to a cell 4 bits wide. An implementation of C for this machine defines CHAR_BIT to be equal to 12. Suppose the same lines of code are compiled and executed for this machine. Would pointer be equal to 3?
More generally, when you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
Would pointer be equal to 3?
Well, the standard doesn't say how pointers are implemented. The standard tells what is to happen when you use a pointer in a specific way but not what the value of a pointer shall be.
All we know is that adding 1 to a char pointer, will make the pointer point at the next char object - where ever that is. But nothing about pointers value.
So when you say that
pointer = pointer + 1;
will make the pointer equal 1, it's wrong. The standard doesn't say anything about that.
On most systems a char is 8 bit and pointers are (virtual) memory addresses referencing a 8 bit addressable memory loacation. On such systems incrementing a char pointer will increase the pointer value (aka memory address) by 1. However, on - unusual architectures - there is no way to tell.
But if you have a system where each memory address references 4 bits and a char is 12 bits, it seems a good guess that ++pointer will increase the pointer by three.
Pointers are incremented by the minimum of they width of the datatype they "point to", but are not guaranteed to increment to that size exactly.
For memory alignment purposes, there are many times where a pointer might increment to the next memory word alignment past the minimum width.
So, in general, you cannot assume this pointer to be equal to 3. It very well may be 3, 4, or some larger number.
Here is an example.
struct char_three {
char a;
char b;
char c;
};
struct char_three* my_pointer = 0;
my_pointer++;
/* I'd be shocked if my_pointer was now 3 */
Memory alignment is machine specific. One cannot generalize about it, except that most machines define a WORD as the first address that can be aligned to a memory fetch on the bus. Some machines can specify addresses that don't align with the bus fetches. In such a case, selecting two bytes that span the alignment may result in loading two WORDS.
Most systems don't accept WORD loads on non-aligned boundaries without complaining. This means that a bit of boiler plate assembly is applied to translate the fetch to the proceeding WORD boundary, if maximum density is desired.
Most compilers prefer speed to maximum density of data, so they align their structured data to take advantage of WORD boundaries, avoiding the extra calculations. This means that in many cases, data that is not carefully aligned might contain "holes" of bytes that are not used.
If you are interested in details of the above summary, you can read up on Data Structure Alignment which will discuss alignment (and as a consequence) padding.
char *pointer = 0;
After execution, pointer is equal to 1
Not necessarily.
This special case gives you a null pointer, since 0 is a null pointer constant. Strictly speaking, such a pointer is not supposed to point at a valid object. If you look at the actual address stored in the pointer, it could be anything.
Null pointers aside, the C language expects you to do pointer arithmetic by first pointing at an array. Or in case of char, you can also point at a chunk of generic data such as a struct. Everything else, like your example, is undefined behavior.
An implementation of C for this machine defines CHAR_BIT to be equal to 12
The C standard defines char to be equal to a byte, so your example is a bit weird and contradicting. Pointer arithmetic will always increase the pointer to point at the next object in the array. The standard doesn't really speak of representation of addresses at all, but your fictional example that would sensibly increase the address by 12 bits, because that's the size of a char.
Fictional computers are quite meaningless to discuss even from a learning point-of-view. I'd advise to focus on real-world computers instead.
When you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
On a "conventional" machine -- indeed on the vast majority of machines where C runs -- CHAR_BIT simply is the width of a memory cell on the machine, so the answer to the question is vacuously "yes" (since CHAR_BIT / CHAR_BIT is 1.).
A machine with memory cells smaller than CHAR_BIT would be very, very strange -- arguably incompatible with C's definition.
C's definition says that:
sizeof(char) is exactly 1.
CHAR_BIT, the number of bits in a char, is at least 8. That is, as far as C is concerned, a byte may not be smaller than 8 bits. (It may be larger, and this is a surprise to many people, but it does not concern us here.)
There is a strong suggestion (if not an explicit requirement) that char (or "byte") is the machine's "minimum addressable unit" or some such.
So for a machine that can address 4 bits at a time, we would have to pick unnatural values for sizeof(char) and CHAR_BIT (which would otherwise probably want to be 2 and 4, respectively), and we would have to ignore the suggestion that type char is the machine's minimum addressable unit.
C does not mandate the internal representation (the bit pattern) of a pointer. The closest a portable C program can get to doing anything with the internal representation of a pointer value is to print it out using %p -- and that's explicitly defined to be implementation-defined.
So I think the only way to implement C on a "4 bit" machine would involve having the code
char a[10];
char *p = a;
p++;
generate instructions which actually incremented the address behind p by 2.
It would then be an interesting question whether %p should print the actual, raw pointer value, or the value divided by 2.
It would also be lots of fun to watch the ensuing fireworks as too-clever programmers on such a machine used type punning techniques to get their hands on the internal value of pointers so that they could increment them by actually 1 -- not the 2 that "proper" additions of 1 would always generate -- such that they could amaze their friends by accessing the odd nybble of a byte, or confound the regulars on SO by asking questions about it. "I just incremented a char pointer by 1. Why is %p showing a value that's 2 greater?"
Seems like the confusion in this question comes from the fact that the word "byte" in the C standard doesn't have the typical definition (which is 8 bits). Specifically, the word "byte" in the C standard means a collection of bits, where the number of bits is specified by the implementation-defined constant CHAR_BITS. Furthermore, a "byte" as defined by the C standard is the smallest addressable object that a C program can access.
This leaves open the question as to whether there is a one-to-one correspondence between the C definition of "addressable", and the hardware's definition of "addressable". In other words, is it possible that the hardware can address objects that are smaller than a "byte"? If (as in the OP) a "byte" occupies 3 addresses, then that implies that "byte" accesses have an alignment restriction. Which is to say that 3 and 6 are valid "byte" addresses, but 4 and 5 are not. This is prohibited by section 6.2.8 which discusses the alignment of objects.
Which means that the architecture proposed by the OP is not supported by the C specification. In particular, an implementation may not have pointers that point to 4-bit objects when CHAR_BIT is equal to 12.
Here are the relevant sections from the C standard:
§3.6 The definition of "byte" as used in the standard
[A byte is an] addressable unit of data storage large enough to hold
any member of the basic character set of the execution environment.
NOTE 1 It is possible to express the address of each individual byte
of an object uniquely.
NOTE 2 A byte is composed of a contiguous sequence of bits, the number
of which is implementation-defined. The least significant bit is
called the low-order bit; the most significant bit is called the
high-order bit.
§5.2.4.2.1 describes CHAR_BIT as the
number of bits for smallest object that is not a bit-field (byte)
§6.2.6.1 restricts all objects that are larger than a char to be a multiple of CHAR_BIT bits:
[...]
Except for bit-fields, objects are composed of contiguous sequences of
one or more bytes, the number, order, and encoding of which are either
explicitly specified or implementation-defined.
[...] Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes.
§6.2.8 restricts the alignment of objects
Complete object types have alignment requirements which place
restrictions on the addresses at which objects of that type may be
allocated. An alignment is an implementation-defined integer value
representing the number of bytes between successive addresses at which
a given object can be allocated.
Valid alignments include only those values returned by an _Alignof
expression for fundamental types, plus an additional
implementation-defined set of values, which may be empty. Every
valid alignment value shall be a nonnegative integral power of two.
§6.5.3.2 specifies the sizeof a char, and hence a "byte"
When sizeof is applied to an operand that has type char, unsigned
char, or signed char, (or a qualified version thereof) the result is
1.
The following code fragment demonstrates an invariant of C pointer arithmetic -- no matter what CHAR_BIT is, no matter what the hardware least addressable unit is, and no matter what the actual bit representation of pointers is,
#include <assert.h>
int main(void)
{
T x[2]; // for any object type T whatsoever
assert(&x[1] - &x[0] == 1); // must be true
}
And since sizeof(char) == 1 by definition, this also means that
#include <assert.h>
int main(void)
{
T x[2]; // again for any object type T whatsoever
char *p = (char *)&x[0];
char *q = (char *)&x[1];
assert(q - p == sizeof(T)); // must be true
}
However, if you convert to integers before performing the subtraction, the invariant evaporates:
#include <assert.h>
#include <inttypes.h>
int main(void);
{
T x[2];
uintptr_t p = (uintptr_t)&x[0];
uintptr_t q = (uintptr_t)&x[1];
assert(q - p == sizeof(T)); // implementation-defined whether true
}
because the transformation performed by converting a pointer to an integer of the same size, or vice versa, is implementation-defined. I think it's required to be bijective, but I could be wrong about that, and it is definitely not required to preserve any of the above invariants.

Why does an int take up 4 bytes in c or any other language? [duplicate]

This question already has answers here:
size of int variable
(6 answers)
Closed 5 years ago.
This is a bit of a general question and not completely related to the c programming language but it's what I'm on studying at the moment.
Why does an integer take up 4 bytes or How ever many bytes dependant on the system?
Why does it not take up 1 byte per integer?
For example why does the following take up 8 bytes:
int a = 1;
int b = 1;
Thanks
I am not sure whether you are asking why int objects have fixed sizes instead of variable sizes or whether you are asking why int objects have the fixed sizes they do. This answers the former.
We do not want the basic types to have variable lengths. That makes it very complicated to work with them.
We want them to have fixed lengths, because then it is much easier to generate instructions to operate on them. Also, the operations will be faster.
If the size of an int were variable, consider what happens when you do:
b = 3;
b += 100000;
scanf("%d", &b);
When b is first assigned, only one byte is needed. Then, when the addition is performed, the compiler needs more space. But b might have neighbors in memory, so the compiler cannot just grow it in place. It has to release the old memory and allocate new memory somewhere.
Then, when we do the scanf, the compiler does not know how much data is coming. scanf will have to do some very complicated work to grow b over and over again as it reads more digits. And, when it is done, how does it let you know where the new b is? The compiler has to have some mechanism to update the location for b. This is hard and complicated and will cause additional problems.
In contrast, if b has a fixed size of four bytes, this is easy. For the assignment, write 3 to b. For the addition, add 100000 to the value in b and write the result to b. For the scanf, pass the address of b to scanf and let it write the new value to b. This is easy.
The basic integral type int is guaranteed to have at least 16 bits; At least means that compilers/architectures may also provide more bits, and on 32/64 bit systems int will most likely comprise 32 bits or 64 bits (i.e. 4 bytes or 8 bytes), respectively (cf, for example, cppreference.com):
Integer types
... int (also accessible as signed int): This is the most optimal
integer type for the platform, and is guaranteed to be at least 16
bits. Most current systems use 32 bits (see Data models below).
If you want an integral type with exactly 8 bits, use the int8_t or uint8_t.
It doesn't. It's implementation-defined. A signed int in gcc on an Atmel 8-bit microcontroller, for example, is a 16-bit integer. An unsigned int is also 16-bits, but from 0-65535 since it's unsigned.
The fact that an int uses a fixed number of bytes (such as 4) is a compiler/CPU efficiency and limitation, designed to make common integer operations fast and efficient.
There are types (such as BigInteger in Java) that take a variable amount of space. These types would have 2 fields, the first being the number of words being used to represent the integer, and the second being the array of words. You could define your own VarInt type, something like:
struct VarInt {
char length;
char bytes[]; // Variable length
}
VarInt one = {1, {1}}; // 2 bytes
VarInt v257 = {2, {1,1}}; // 3 bytes
VarInt v65537 = {4, {1,0,0,1}}; // 5 bytes
and so on, but this would not be very fast to perform arithmetic on. You'd have to decide how you would want to treat overflow; resizing the storage would require dynamic memory allocation.

When to use bit-fields in C

On the question 'why do we need to use bit-fields?', searching on Google I found that bit fields are used for flags.
Now I am curious,
Is it the only way bit-fields are used practically?
Do we need to use bit fields to save space?
A way of defining bit field from the book:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
Why do we use int?
How much space is occupied?
I am confused why we are using int, but not short or something smaller than an int.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
A quite good resource is Bit Fields in C.
The basic reason is to reduce the used size. For example, if you write:
struct {
unsigned int is_keyword;
unsigned int is_extern;
unsigned int is_static;
} flags;
You will use at least 3 * sizeof(unsigned int) or 12 bytes to represent three small flags, that should only need three bits.
So if you write:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
This uses up the same space as one unsigned int, so 4 bytes. You can throw 32 one-bit fields into the struct before it needs more space.
This is sort of equivalent to the classical home brew bit field:
#define IS_KEYWORD 0x01
#define IS_EXTERN 0x02
#define IS_STATIC 0x04
unsigned int flags;
But the bit field syntax is cleaner. Compare:
if (flags.is_keyword)
against:
if (flags & IS_KEYWORD)
And it is obviously less error-prone.
Now I am curious, [are flags] the only way bitfields are used practically?
No, flags are not the only way bitfields are used. They can also be used to store values larger than one bit, although flags are more common. For instance:
typedef enum {
NORTH = 0,
EAST = 1,
SOUTH = 2,
WEST = 3
} directionValues;
struct {
unsigned int alice_dir : 2;
unsigned int bob_dir : 2;
} directions;
Do we need to use bitfields to save space?
Bitfields do save space. They also allow an easier way to set values that aren't byte-aligned. Rather than bit-shifting and using bitwise operations, we can use the same syntax as setting fields in a struct. This improves readability. With a bitfield, you could write
directions.alice_dir = WEST;
directions.bob_dir = SOUTH;
However, to store multiple independent values in the space of one int (or other type) without bitfields, you would need to write something like:
#define ALICE_OFFSET 0
#define BOB_OFFSET 2
directions &= ~(3<<ALICE_OFFSET); // clear Alice's bits
directions |= WEST<<ALICE_OFFSET; // set Alice's bits to WEST
directions &= ~(3<<BOB_OFFSET); // clear Bob's bits
directions |= SOUTH<<BOB_OFFSET; // set Bob's bits to SOUTH
The improved readability of bitfields is arguably more important than saving a few bytes here and there.
Why do we use int? How much space is occupied?
The space of an entire int is occupied. We use int because in many cases, it doesn't really matter. If, for a single value, you use 4 bytes instead of 1 or 2, your user probably won't notice. For some platforms, size does matter more, and you can use other data types which take up less space (char, short, uint8_t, etc.).
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No, that is not correct. The entire unsigned int will exist, even if you're only using 8 of its bits.
Another place where bitfields are common are hardware registers. If you have a 32 bit register where each bit has a certain meaning, you can elegantly describe it with a bitfield.
Such a bitfield is inherently platform-specific. Portability does not matter in this case.
We use bit fields mostly (though not exclusively) for flag structures - bytes or words (or possibly larger things) in which we try to pack tiny (often 2-state) pieces of (often related) information.
In these scenarios, bit fields are used because they correctly model the problem we're solving: what we're dealing with is not really an 8-bit (or 16-bit or 24-bit or 32-bit) number, but rather a collection of 8 (or 16 or 24 or 32) related, but distinct pieces of information.
The problems we solve using bit fields are problems where "packing" the information tightly has measurable benefits and/or "unpacking" the information doesn't have a penalty. For example, if you're exposing 1 byte through 8 pins and the bits from each pin go through their own bus that's already printed on the board so that it leads exactly where it's supposed to, then a bit field is ideal. The benefit in "packing" the data is that it can be sent in one go (which is useful if the frequency of the bus is limited and our operation relies on frequency of its execution), and the penalty of "unpacking" the data is non-existent (or existent but worth it).
On the other hand, we don't use bit fields for booleans in other cases like normal program flow control, because of the way computer architectures usually work. Most common CPUs don't like fetching one bit from memory - they like to fetch bytes or integers. They also don't like to process bits - their instructions often operate on larger things like integers, words, memory addresses, etc.
So, when you try to operate on bits, it's up to you or the compiler (depending on what language you're writing in) to write out additional operations that perform bit masking and strip the structure of everything but the information you actually want to operate on. If there are no benefits in "packing" the information (and in most cases, there aren't), then using bit fields for booleans would only introduce overhead and noise in your code.
To answer the original question »When to use bit-fields in C?« … according to the book "Write Portable Code" by Brian Hook (ISBN 1-59327-056-9, I read the German edition ISBN 3-937514-19-8) and to personal experience:
Never use the bitfield idiom of the C language, but do it by yourself.
A lot of implementation details are compiler-specific, especially in combination with unions and things are not guaranteed over different compilers and different endianness. If there's only a tiny chance your code has to be portable and will be compiled for different architectures and/or with different compilers, don't use it.
We had this case when porting code from a little-endian microcontroller with some proprietary compiler to another big-endian microcontroller with GCC, and it was not fun. :-/
This is how I have used flags (host byte order ;-) ) since then:
# define SOME_FLAG (1 << 0)
# define SOME_OTHER_FLAG (1 << 1)
# define AND_ANOTHER_FLAG (1 << 2)
/* test flag */
if ( someint & SOME_FLAG ) {
/* do this */
}
/* set flag */
someint |= SOME_FLAG;
/* clear flag */
someint &= ~SOME_FLAG;
No need for a union with the int type and some bitfield struct then. If you read lots of embedded code those test, set, and clear patterns will become common, and you spot them easily in your code.
Why do we need to use bit-fields?
When you want to store some data which can be stored in less than one byte, those kind of data can be coupled in a structure using bit fields.
In the embedded word, when one 32 bit world of any register has different meaning for different word then you can also use bit fields to make them more readable.
I found that bit fields are used for flags. Now I am curious, is it the only way bit-fields are used practically?
No, this not the only way. You can use it in other ways too.
Do we need to use bit fields to save space?
Yes.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No. Memory only can be occupied in multiple of bytes.
Bit fields can be used for saving memory space (but using bit fields for this purpose is rare). It is used where there is a memory constraint, e.g., while programming in embedded systems.
But this should be used only if extremely required because we cannot have the address of a bit field, so address operator & cannot be used with them.
A good usage would be to implement a chunk to translate to—and from—Base64 or any unaligned data structure.
struct {
unsigned int e1:6;
unsigned int e2:6;
unsigned int e3:6;
unsigned int e4:6;
} base64enc; // I don't know if declaring a 4-byte array will have the same effect.
struct {
unsigned char d1;
unsigned char d2;
unsigned char d3;
} base64dec;
union base64chunk {
struct base64enc enc;
struct base64dec dec;
};
base64chunk b64c;
// You can assign three characters to b64c.enc, and get four 0-63 codes from b64dec instantly.
This example is a bit naive, since Base64 must also consider null-termination (i.e. a string which has not a length l so that l % 3 is 0). But works as a sample of accessing unaligned data structures.
Another example: Using this feature to break a TCP packet header into its components (or other network protocol packet header you want to discuss), although it is a more advanced and less end-user example. In general: this is useful regarding PC internals, SO, drivers, an encoding systems.
Another example: analyzing a float number.
struct _FP32 {
unsigned int sign:1;
unsigned int exponent:8;
unsigned int mantissa:23;
}
union FP32_t {
_FP32 parts;
float number;
}
(Disclaimer: Don't know the file name / type name where this is applied, but in C this is declared in a header; Don't know how can this be done for 64-bit floating-point numbers since the mantissa must have 52 bits and—in a 32 bit target—ints have 32 bits).
Conclusion: As the concept and these examples show, this is a rarely used feature because it's mostly for internal purposes, and not for day-by-day software.
To answer the parts of the question no one else answered:
Ints, not Shorts
The reason to use ints rather than shorts, etc. is that in most cases no space will be saved by doing so.
Modern computers have a 32 or 64 bit architecture and that 32 or 64 bits will be needed even if you use a smaller storage type such as a short.
The smaller types are only useful for saving memory if you can pack them together (for example a short array may use less memory than an int array as the shorts can be packed together tighter in the array). For most cases, when using bitfields, this is not the case.
Other uses
Bitfields are most commonly used for flags, but there are other things they are used for. For example, one way to represent a chess board used in a lot of chess algorithms is to use a 64 bit integer to represent the board (8*8 pixels) and set flags in that integer to give the position of all the white pawns. Another integer shows all the black pawns, etc.
You can use them to expand the number of unsigned types that wrap. Ordinary you would have only powers of 8,16,32,64... , but you can have every power with bit-fields.
struct a
{
unsigned int b : 3 ;
} ;
struct a w = { 0 } ;
while( 1 )
{
printf("%u\n" , w.b++ ) ;
getchar() ;
}
To utilize the memory space, we can use bit fields.
As far as I know, in real-world programming, if we require, we can use Booleans instead of declaring it as integers and then making bit field.
If they are also values we use often, not only do we save space, we can also gain performance since we do not need to pollute the caches.
However, caching is also the danger in using bit fields since concurrent reads and writes to different bits will cause a data race and updates to completely separate bits might overwrite new values with old values...
Bitfields are much more compact and that is an advantage.
But don't forget packed structures are slower than normal structures. They are also more difficult to construct since the programmer must define the number of bits to use for each field. This is a disadvantage.
Why do we use int? How much space is occupied?
One answer to this question that I haven't seen mentioned in any of the other answers, is that the C standard guarantees support for int. Specifically:
A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation defined type.
It is common for compilers to allow additional bit-field types, but not required. If you're really concerned about portability, int is the best choice.
Nowadays, microcontrollers (MCUs) have peripherals, such as I/O ports, ADCs, DACs, onboard the chip along with the processor.
Before MCUs became available with the needed peripherals, we would access some of our hardware by connecting to the buffered address and data buses of the microprocessor. A pointer would be set to the memory address of the device and if the device saw its address along with the R/W signal and maybe a chip select, it would be accessed.
Oftentimes we would want to access individual or small groups of bits on the device.
In our project, we used this to extract a page table entry and page directory entry from a given memory address:
union VADDRESS {
struct {
ULONG64 BlockOffset : 16;
ULONG64 PteIndex : 14;
ULONG64 PdeIndex : 14;
ULONG64 ReservedMBZ : (64 - (16 + 14 + 14));
};
ULONG64 AsULONG64;
};
Now suppose, we have an address:
union VADDRESS tempAddress;
tempAddress.AsULONG64 = 0x1234567887654321;
Now we can access PTE and PDE from this address:
cout << tempAddress.PteIndex;

Write 9 bits binary data in C

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you
No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.
You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.
You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

Resources