Are bits in the structure guaranteed - c

I have a question related to structure bit fields, please see below as I am a bit clueless on which keywords I should use to best describe my issue:
Context: I am writing a disassembler for MIPS R3000A Assembly Instructions, the one that were used for Playstation Programs in the early 2000.
Issue: I would like to know if in this code:
struct Instruction {
u32 other:26;
u32 op:6;
};
//main:
Instruction instruction = *(Instruction*)(data + pc);
printf("%02x\n", instruction.op);
it is guaranteed that all compilers, using little endianness, will always using the op:6 bit-fields to store the first 6 MSB ? (which is a bit counter intuitive, you would assume that the last 6 bits are stored in the op bit field)
It is an alternative to the following code:
static uint32_t get_op_code(uint32_t data) {
uint16_t mask = (1 << 6) - 1;
return (data >> 26) & mask;
}
//main:
uint32_t instruction = *(uint32_t*)(data + pc);
uint32_t op = get_op_code(instruction);
printf("%02x\n", op);
It is working fine on my side and it seems slightly faster using the structure approach, not to mention that is is more intuitive and clear, but I am afraid that it would not be guaranteed that the 6 first bits are stored in the second bit-field "op" of the structure.

The C standard does not guarantee how bit-fields are arranged. It does require each implementation to define it, so it should be in the documentation for a compiler. Per C 2018 6.7.2.1 11:
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

Related

"Bit-fields are assigned left to right on some machines and right to left on others"- unable to get the concept from "The C Programming Language" book

I was going through the text "The C Programming Language" by Kernighan and Ritchie. While discussing about bit-fields at the end of that section, the authors say:
"Fields are assigned left to right on some machines and right to left on others. This means that although fields are useful for maintaining internally-defined data structures, the question of which end comes first has to be carefully considered when picking apart externally-defined data; programs that depend on such things are not portable."
- The C Programming Language [2e] by Kernighan & Ritchie [Section 6.9, p.150]
Strictly I do not get the meaning of these lines. Can anyone please explain me with a possible diagram?
PS: Well I have taken a computer organization and architecture course. I know how computers deal with bits and bytes. In a computer system, the smallest unit of information is a single bit which can be either 0 or 1. 8 such bits form a byte. Memories are byte-addressable, which means that each byte in the memory has an address associated with it. But usually, the processors have word lengths as 2 bytes (very old systems),4 bytes, 8 bytes... This means in one memory cycle, the CPU can take up a word length number of bytes from the main memory and put it inside its registers. Now how these bytes are placed in registers depends on the endianness of the system.
But I do not get what the authors mean by "left to right" or "right to left". The words seem like they are related to the endianness but endianness depends on the CPU and C compilers have nothing to do with it... The question which comes to my mind is "left to right" of "what"? What object are the authors referring to?
When a structure contains bit-fields, the C implementation uses some storage unit to hold them (or multiple storage units if needed). The storage unit might be one eight-bit byte or it might be four bytes, or it might be other sizes—this is a determination made by each C implementation. The C standard only requires that it be addressable, which effectively means it has to be a whole number of bytes.
Once we have a storage unit, it is some number of bits. Say it is 32 bits, and number the bits from 31 to 0, where, if we consider the bits to represent a binary numeral, bit 0 represents 20, and bit 31 represents 231. Note that Kernighan and Ritchie are imprecise to use “left” and “right” here. There is no inherent left or right. We usually write numerals with the most significant digits on the left, so we might consider bit 31 to be the leftmost and bit 0 to be the rightmost.
Now we have a storage unit with some number of bits and some labeling for those bits (31 to 0 or left to right). Say you want to put two bit-fields in them, say fields of width 7 and 5.
Which 7 of the bits from bit 31 to bit 0 are used for the first field? Which 5 of the bits are used for the second field?
We could use bits 31-25 for the first field and bits 24-20 for the second field. Or we could use bits 6-0 for the first field and bits 11-7 for the second field.
In theory, we could also use bits 27-21 for the first field and bits 15-11 for the second field. However, the C standard does say that “If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit” (C 2018 6.7.2.1 11). “Adjacent” is not formally defined, but we can assume it means consecutively numbered bits. So, if the C implementation puts the first field in bits 31-25, it is required to put the second field in bits 24-20. Conversely, it it puts the first field in bits 6-0, it must put the second field in 11-7.
Thus, the C standard requires an implementation to arrange successive bit-fields in a storage unit from left-to-right or from right-to-left, but it does not say which.
(I do not see anything in the standard that says the first field must start at one end of the storage unit or the other, rather than somewhere in the middle. That would lead to wasting some bits.)
When you write:
struct {
unsigned int version: 4;
unsigned int length: 4;
unsigned char dcsn;
you end up with a big headache you weren't expecting because your code is non-portable.
When you set version to 4 and length to 5, some systems may set the first byte of the structure to 0x45 and other systems may set the first byte of the structure to 0x54.
When I went to college this thing was #ifdef'd as follows (incorrect):
struct {
#if BIG_ENDIAN
unsigned int version: 4;
unsigned int length: 4;
#else
unsigned int length: 4;
unsigned int version: 4;
#endif
unsigned char dcsn;
but this is still rolling the dice as there's no rule that the order of the bits in the bytes in a bitfield corresponds to the order of bytes in the word in the machine. I would not be surprised that when you cross-compile the bit order in the struct comes from the host machine's rules while the bit order of integers comes from the target machine's rules (as it must). In theory the code could be corrected by having a separate #ifdef for BIG_ENDIAN_BITFIELD but I've never seen it done.
Here is some demonstration code. The only goal is to demonstrate what you are asking about. Clean coding etc. is neglected.
#include <stdio.h>
#include <stdint.h>
union
{
uint32_t Everything;
struct
{
uint32_t FirstMentionedBit : 1;
uint32_t FewOTherBits :30;
uint32_t LastMentionedBit : 1;
} bitfield;
} Demonstration;
int main()
{
Demonstration.Everything =0;
Demonstration.bitfield.LastMentionedBit=1;
printf("%x\n", Demonstration.Everything);
Demonstration.Everything =0;
Demonstration.bitfield.FirstMentionedBit=1;
printf("%x\n", Demonstration.Everything);
return 0;
}
If you use this here https://www.tutorialspoint.com/compile_c_online.php
the output is
80000000
1
But in other environments it might easily be
1
80000000
This is because compilers are free to consider the first mentioned bit the MSB or the LSB and correspondingly the last mentioned bit to be the LSB or MSB.
And that is what your quote describes.

Reversing order of struct bitfields

Im trying to implement a psuedo-bit-array in C, where I define a structure with 8, 1 bit members, and then converting its memory address to an unsigned char. This approach does work, yet it flips the order of the bits in the binary number. Eg. 127 would become 254. How can I undo this?
struct bit_array {
unsigned b8:1, b7:1, b6:1, b5:1, b4:1, b3:1, b2:1, b1:1;
};
unsigned char join(struct bit_array num) {
return *(unsigned char*)&num;
}
int main() {
struct bit_array test_bits = { 1, 1, 1, 1, 1, 1, 1, 1 };
printf("%u", join(test_bits));
return 0;
}
The C standard specifies neither the order of bit-fields within whatever unit of storage is used for them nor the order of bits in an unsigned integer except to say they are implementation-defined and to imply the bit-fields are either high-to-low or low-to-high (rather than mixed).
C 2018 6.7.2.1 says:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.
C 2018 6.2.6.2 says unsigned integers are represented with pure binary, and a footnote indicates this means the values attributed to successive bits are successive powers of two. So there must be some “succession” of the bits, an order in which their positions in the succession correspond to their values. However, it is not possible to correlate this order with any hardware property, as nothing in C makes bits individually addressable, so there is no mechanism in C to discern a bit at a lower address than another or to otherwise inspect the individual bits in any memory cell (whether in register or main memory or elsewhere).
The C standard does not provide any means for a program to request that an implementation order the bits in a particular way.
If you wish the bit-fields to be allocated in the reverse order, you can reverse their declarations. However, this will not maintain that order in C implementations that allocate bit-fields in the other order.
According to the GCC 10.3 documentation, section 4.9, its order of bit-fields within a unit is “Determined by ABI,” meaning it follows the Application Binary Interface for the target platform. So I expect the order is not selectable by a command-line switch. I also do not see a predefined preprocessor macro to report it.
In any case, using named bit-fields is generally not a good way to implement an “array” of bits. You can simply use bit operators to access the bits in a unit and can write functions to set and get them if desired:
unsigned int GetBit(UnitType u, int n) { return (u >> n) & 1; }
void SetBit(UnitType *u, int n, UnitType b) { *u ^= (((*u >> n) & 1) ^ b) << n; }

Concurrent update of bit-fields in C

Section 3.15.3 of the C standard states:
"it is not safe to concurrently update two non-atomic bit-fields in
the same structure if all members declared between them are also
non-zero-length bit-fields, no matter what the size of those
intervening bit-fields happen to be."
Consider the below example:
struct S {
unsigned a: 8;
unsigned b: 4;
unsigned c: 4;
unsigned d: 8;
};
Based on the standard, it's not safe to update bit-fields a and d concurrently.
Why not?
Bit fields aren't individually addressable, so to set a bit field, the compiler makes machine code to:
Read the byte that includes the bits to set
Set the required bits in that byte
Write the whole byte back.
Sometimes this is done in a single instruction, but then the processor does the same job.
Either way, if another thread is simultaneously doing he same sort of thing on other bits in the same byte, then the operations of the two threads can interfere with each other.
Note also: you can't rely on the unit of access being a byte, it could be a whole int or unsigned, for example.

C - Why #pragma pack(1) Consider 6-bit struct member as an 8-bit?

I got stuck about #pragma pack(1) wrong behavior when define a 6-bit field and assumes it as 8-bit. I read this question to solving my problem but it doesn't help me at all.
In Visual Studio 2012 I defined bellow struct for saving Base64 characters :
#pragma pack(1)
struct BASE64 {
CHAR cChar1 : 6;
CHAR cChar2 : 6;
CHAR cChar3 : 6;
CHAR cChar4 : 6;
};
Now I got its size with sizeof, but the result isn't what I expected :
printf("%d", sizeof(BASE64)); // should print 3
Result : 4
I was expect that get 3 (because 6 * 4 = 24, so 24 bit is 3 byte)
Event I tested it with 1-bit field instead and got correct size (1-byte) :
#pragma pack(1)
struct BASE64 {
CHAR cChar1 : 2;
CHAR cChar2 : 2;
CHAR cChar3 : 2;
CHAR cChar4 : 2;
};
Actually, why 6-bit assumes 8-bit with #pragma pack(1)?
#pragma pack generally packs on byte boundaries, not bit boundaries. It's to prevent the insertion of padding bytes between fields that you want to keep compressed. From Microsoft's documentation (since you provided the winapi tag, and with my emphasis):
n (optional) : Specifies the value, in bytes, to be used for packing.
How an implementation treats bit fields when you try to get them to cross a byte boundary is implementation defined. From the C11 standard (secion 6.7.2.1 Structure and union specifiers /11, again my emphasis):
An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
More of the MS documentation calls out this specific behaviour:
Adjacent bit fields are packed into the same 1-, 2-, or 4-byte allocation unit if the integral types are the same size and if the next bit field fits into the current allocation unit without crossing the boundary imposed by the common alignment requirements of the bit fields.
The simple answer is: this is NOT wrong behavior.
Packing tries to put separate chunks of data in bytes, but it can't pack two 6-bit chunks in one 8-bit byte. So the compiler puts them in separate bytes, probably because accessing a single byte for retrieving or storing your 6-bit data is easier than accessing two consecutive bytes and handling some trailing part of one byte and some leading part from another one.
This is implementation defined, and you can do little about that. Probably there is an option for an optimizer to prefer size over speed – maybe you can use it to achieve what you expected, but I doubt the optimizer would go that far. Anyway the size optimization usually shortens the code, not data (as far as I know, but I am not an expert and I may well be wrong here).
In some implementations, bit fields cannot span across variable boundaries. You can define multiple bit fields within a variable only if their total number of bits fits within the data type of that variable.
In your first example, there are not enough available bits in a CHAR to hold both cChar1 and cChar2 when they are 6 bits each, so cChar2 has to go in the next CHAR in memory. Same with cChar3 and cChar4. Thus why the total size of BASE64 is 4 bytes, not 3 bytes:
(6 bits + 2 bits padding) = 8 bits
+ (6 bits + 2 bits padding) = 8 bits
+ (6 bits + 2 bits padding) = 8 bits
+ 6 bits
- - - - - - - - - -
= 30 bits
= needs 4 bytes
In your second example, there are enough available bits in a CHAR to hold all of cChar1...cChar4 when they are 1 bit each. Thus why the total size of BASE64 is 1 byte, not 4 bytes:
1 bit
+ 1 bit
+ 1 bit
+ 1 bit
- - - - - - - - - -
= 4 bits
= needs 1 byte

Handling bit arrays in C without padding

I'm writing a C program that runs on the Altera NIOS II processor. The program has to interface to a VHDL module on an FPGA test board through a specific memory location. My interface is provided through a Macro, which specifies a base memory address. The VHDL programmer has allocated 32-bits of memory off of that base address, which I'm to fill with binary data separated into four "elements", i.e. [0-11|12-15|16-23|24-31].
My question is, what is the best way to handle these array "elements" as separate data types. I'd like to declare the entire array as a structure to handle the data and declare the different fields using bit-fields, but it's my understanding that this will introduce padding into the 32 bit array.
it's my understanding that [using bit fields] will introduce padding into the 32 bit array
Using bit fields will not introduce padding, unless you explicitly request it: language standard prohibits the compier from padding in between bit fields:
C99 Standard, section 6.7.2.1.10: If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.
You can force padding to happen by specifying a bit field of zero width, like this:
struct hw_reg {
int a:10;
int :0; // Yes, this is legal.
int b:6;
};
In your case, sufficient space remains after the first 12 bits to allocate the next four, so there will be no padding. If you needed to split the register differently (say, 12-5-7-8), the use of padding would be implementation-defined.
binary data separated into four "elements", i.e.
[0-11|12-15|16-23|24-31].
I'd try as
struct vhdl_data {
uint32_t a : 12; // bits 0-11
uint32_t b : 4; // bits 12-15
uint32_t c : 8; // bits 16-23
uint32_t d : 8; // bits 24-31
};

Resources