Bit-Fields in C/C++: what is guaranteed, what is implementation-defined? - c

Reading https://en.cppreference.com/w/c/language/bit_field, are the following conclusions correct?
Adjacent bit-fields have no padding in between (this seems to be differentin 6.7.2.1 of the C-standard).
The placement of a bit-field within the storage-unit is implementation-defined.
The position of the bits inside a bit-field is implementation-defined.
(For C++ see also: Characteristics of bit-Fields in C++.)

As a preliminary, there is no language "C/C++" such as is referenced by the question title. C and C++ are distinct languages sharing a common subset. In particular, C is not a subset of C++.
With regard to C, all the specifics the current language spec (C17 at this moment) provides about bitfield layout are in paragraphs 6.7.2.1/11-12.
are the following conclusions correct?
Adjacent bit-fields have no padding in between (this seems to be differentin 6.7.2.1 of the C-standard).
Bit fields are not laid out directly within a structure. The C implementation lays out "addressable storage units" for them within the structure, and lays out bitfields within those. The sizes and alignment requirements of the ASUs are unspecified.
The spec does say that if there is sufficient space in the ASU to which one bitfield is assigned, then an immediately-following bitfield is packed into adjacent bits of the same ASU. This means that there are not padding bits between those bitfields. However, if there is not sufficient space, then it is implementation-defined whether the immmediately-following bitfield spans two ASUs or whether all its bits are assigned to a separate one, leaving unused (padding) bits in the first. Additionally, a zero-width bitfield can be used to force the bitfield following it to be assigned to a new ASU, possibly requiring padding bits in a previous one.
Moreover, the spec has nothing to say about whether there are padding bytes between ASUs. ASUs are not required to be uniform in size or to have the same alignment requirements as each other, so it is plausible that padding bytes would sometimes be required between them even in an implementation that is not intentionally perverse in this regard.
The placement of a bit-field within the storage-unit is implementation-defined.
The spec explicitly says that the order of bitfields within an ASU is implementation defined. That's in the right-to-left vs left-to-right sense. "Order" is not exactly the same thing as "placement", but I guess that's what you mean.
The position of the bits inside a bit-field is implementation-defined.
Not really. This is a question of representation, not layout, and the relevant paragraphs of C17 are 6.2.6.1/3-4:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
[...] Values stored in bit-fields consist of m bits, where m is the
size specified for the bit-field. The object representation is the set
of m bits the bit-field comprises in the addressable storage unit
holding it.
Footnote 49 clarifies the meaning of "pure binary notation" if you need that. All other details of bitfield representation are unspecified or undefined, not implementation-defined, which means that you cannot rely on them being documented.
Differences in C++ include, but are not necessarily limited to:
C++ officially sanctions more declared types for bit fields than does C.
C++ defines a mechanism for declaring bitfields that contain padding bits, but C does not.
Bitfield allocation and alignment are implementation defined in C++ (vs unspecified in C).
The relevant section of the C++ spec is [class-bit], 11.4.10 in the current draft spec.

Related

Characteristics of bit-Fields in C [duplicate]

Reading https://en.cppreference.com/w/c/language/bit_field, are the following conclusions correct?
Adjacent bit-fields have no padding in between (this seems to be differentin 6.7.2.1 of the C-standard).
The placement of a bit-field within the storage-unit is implementation-defined.
The position of the bits inside a bit-field is implementation-defined.
(For C++ see also: Characteristics of bit-Fields in C++.)
As a preliminary, there is no language "C/C++" such as is referenced by the question title. C and C++ are distinct languages sharing a common subset. In particular, C is not a subset of C++.
With regard to C, all the specifics the current language spec (C17 at this moment) provides about bitfield layout are in paragraphs 6.7.2.1/11-12.
are the following conclusions correct?
Adjacent bit-fields have no padding in between (this seems to be differentin 6.7.2.1 of the C-standard).
Bit fields are not laid out directly within a structure. The C implementation lays out "addressable storage units" for them within the structure, and lays out bitfields within those. The sizes and alignment requirements of the ASUs are unspecified.
The spec does say that if there is sufficient space in the ASU to which one bitfield is assigned, then an immediately-following bitfield is packed into adjacent bits of the same ASU. This means that there are not padding bits between those bitfields. However, if there is not sufficient space, then it is implementation-defined whether the immmediately-following bitfield spans two ASUs or whether all its bits are assigned to a separate one, leaving unused (padding) bits in the first. Additionally, a zero-width bitfield can be used to force the bitfield following it to be assigned to a new ASU, possibly requiring padding bits in a previous one.
Moreover, the spec has nothing to say about whether there are padding bytes between ASUs. ASUs are not required to be uniform in size or to have the same alignment requirements as each other, so it is plausible that padding bytes would sometimes be required between them even in an implementation that is not intentionally perverse in this regard.
The placement of a bit-field within the storage-unit is implementation-defined.
The spec explicitly says that the order of bitfields within an ASU is implementation defined. That's in the right-to-left vs left-to-right sense. "Order" is not exactly the same thing as "placement", but I guess that's what you mean.
The position of the bits inside a bit-field is implementation-defined.
Not really. This is a question of representation, not layout, and the relevant paragraphs of C17 are 6.2.6.1/3-4:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
[...] Values stored in bit-fields consist of m bits, where m is the
size specified for the bit-field. The object representation is the set
of m bits the bit-field comprises in the addressable storage unit
holding it.
Footnote 49 clarifies the meaning of "pure binary notation" if you need that. All other details of bitfield representation are unspecified or undefined, not implementation-defined, which means that you cannot rely on them being documented.
Differences in C++ include, but are not necessarily limited to:
C++ officially sanctions more declared types for bit fields than does C.
C++ defines a mechanism for declaring bitfields that contain padding bits, but C does not.
Bitfield allocation and alignment are implementation defined in C++ (vs unspecified in C).
The relevant section of the C++ spec is [class-bit], 11.4.10 in the current draft spec.

Is the size of an object equivalent to the size of another based upon the same alignment and/or representation?

The C standard states (emphasize mine):
28 A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.48) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
48) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Source: C11, §6.2.5/28
The wording of "the same representation and alignment" happens here often.
But what about the same size?
I wonder if there can be a difference between between these pointer objects in terms of the allocated size since the size of a pointer object can vary between the type pointed to even if the alignment and representation is the same.
Or in other words: Is there a guarantee that if the alignment and/or representation is equal, the size is so too?
Question:
Is the size of an object equivalent to the size of another based upon the same alignment and/or representation?
Annotations:
The question is not specific to pointer objects only. The pointer guidance was just a reference to my mindset as it is a good example.
Citations from the standard are highly appreciated. Accepted answer must have quotations from the standard.
Setting is the exact same specific implementation. I don't talk about various alignments/representations/sizes of objects between different implementations.
Related (regarding the pointer example):
Are there any platforms where pointers to different types have different sizes?
Is the sizeof(some pointer) always equal to four?
Does the size of pointers vary in C?
Two types having same presentation does not imply that the two types need to have same alignment requirements, though this is often true in practice.
The representation means that the similar object value is represented by the same bytes in the same order. Alignment tells what the address of the lowest byte in the type needs to be divisible by.
C 2018 clauses 6.2.6, entitled “Representations of types,” specifies representations of types. Paragraph 2 says:
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.
From this, it is clear that a representation of an object is a sequence of bytes, and that sequence has some number of bytes, some order, and some encoding. So the number of bytes, the order of the bytes, and the encoding of the bytes is part of the representation. Therefore, if two objects have the same representation, they have the same number of bytes, the same order, and the same encoding.
Since they have the same number of bytes, they have the same size.
As an example, if an object X is represented with bytes A, B, and C, and object Y is represented with bytes A, B, C, and D, then X and Y do not have the same representation.

How to check if an alignment is valid in C?

The code I am writing needs to be fully standard compliant. The standard does not promise any alignment options stronger than that of max_align_t. I want to try to align to a cache line, but I understand that to be undefined behavior if the implementation does not support alignments of that strength.
Is there any way around this? Any ways to check at preprocessing what extended alignments are available? Or are there any ways to ask for an alignment, and just not get it, rather than have undefined behavior, if the alignment is not available?
aligned_alloc works for allocated memory. However, I am also interested in statically stored memory.
EDIT:
To illustrate my problem, here are the statements from the C11 standard I have problems with:
6.2.8
Alignments are represented as values of the type size_t. Valid alignments include only those values returned by an _Alignof expression for fundamental types, plus an additional implementation-defined set of values, which may be empty. Every valid alignment value shall be a nonnegative integral power of two.
So any given 2 power is not necesarily a valid alignment, and I can't count on 64 to be less than or equal to max_align_t, and so 64 may not be a valid alignment. If it is not a valid alignment, here is my undefined behavior issue:
6.7.5 Alignment specifier
The constant expression shall be an integer constant expression. It shall evaluate to a valid fundamental alignment, or to a valid extended alignment supported by the implementation in the context in which it appears, or to zero.
No alignment that your compiler will chose by its own should be wider than max_align_t, but that is all that there is to it. There is no interdiction for asking for a wider alignment.
So to ensure that a specific field of a struct lays on a boundary as you wish, you'd just have to use _Alignas. All is well-defined as long as the value that you are asking for is a power of 2, and the particular alignment is allowed by your compiler. If it isn't, you compiler must complain.
This is exactly one of the reasons _Alignas has been added in C11.
I think, you needed see functions valloc() or memalign().
Also, very useful call:
int pagesize = sysconf(_SC_PAGESIZE);
If you needed align buffer with pagesize, you can call:
char *buf = ...; // Buffer to unaligned memory
buf -= (uinsigned)buf & (pagesize - 1); // align to low border
buf += pagesize; // align to high border

Have fields in bit fields got contiguous memory location for its elements? Is this behavior compiler independent?

In a bit field in C, are the memory locations of its elements contiguous?
If yes, is this behavior equal for all compilers?
Example:
typedef struct
{
uint8_t in_alarm :1;
uint8_t fault :1;
uint8_t overridden :1;
uint8_t out_of_service:1;
}StatusFlag_t;
Will the fields in_alarm, fault etc have contiguous memory locations?
Almost every aspect of bit-field handling is compiler dependent according to the C standard. The chances are that the 4 bits in your structure will be contiguous in a single byte, but it is not guaranteed whether they'll be the most-significant or least-significant 4 bits. If there were more than 8 bits, then the values would cross more than one storage unit (because the base type is uint8_t; note that the C standard does not require a compiler to support the use of uint8_t as the type of a bit-field). Note that bit-field members usually do not have different addresses; you can't take the address of a bit-field element.
There is no guarantee that different compilers will behave the same across different platforms. There is typically an ABI (application binary interface) which defines the behaviour for a particular O/S, and the compiler will adhere to the ABI for the platform on which it is running. But the C standard does not mandate this behaviour.
For relevant quotes from the standard, see (amongst other possibilities) How do bit-fields and their alignments work in C programming?
Well, they don't have memory locations, because they are less than byte. You can't take pointer to a bitfield.
The fields should be packed together as much as possible.
Formally the exact bit allocation is implementation defined. It may differ between compilers.
In practice the 3 main compilers, Gcc, CLang and VisualC++, do the same thing.
The Bits are actually at a only one memory location. One location corresponds to 8 bits.
The order can be compiler dependent i.e. first bit at structure element(in_alarm in your case) should be at MSB or LSB position.(At least my compiler have an option to set that.)
Edit:
As the answer is downvoted and I am not getting why,I am pasting my compiler settingd snapshot to explain what I am trying to convey:

Why this union's size is 2 with bitfields?

I am working on turbo C on windows where char takes one byte.Now my problem is with the below union.
union a
{
unsigned char c:2;
}b;
void main()
{
printf("%d",sizeof(b)); \\or even sizeof(union a)
}
This program is printing output as 2 where as union should be taking only 1 byte. Why is it so?
for struct it is fine giving 1 byte but this union is working inappropriately.
And one more thing how to access these bit fields.
scanf("%d",&b.c); //even scanf("%x",b.c);
is not working because we cannot have address for bits.So we have to use another variable like below
int x;
scanf("%d",&x);
b.c=x;
cant we avoid it?? is there any other way???
Compilers are allowed to add padding to structs and unions and while, I admit, that it's a little surprising that yours does round up the union to a two byte size when you are able to get a one byte struct it is perfectly allowed.
In answer to your second question: no it's not avoidable. Bit fields are a struct packing optimization and the performance and convenience penalty to pay is that bit field members are not individually addressable.
Turbo C is based on 8086 microprocessor which has two byte word boundary. The atomic reading and writing is typically bound to CPU's architecture, so the compiler is adding some slack bytes to align your data structure.
Calling #pragma pack(1) may be able to disable it, but not sure if it works on Turbo C.
I'm not sure where you find the requirement that the union must be precisely the minimum size. An object must be at least as big as its members, but that is a lower bound only.
You can't take the address of a bitfield; what would be its type? It can't be int*. scanf(%d) will write sizeof(int) * CHAR_BIT bits to the int* you pass in. That's writing more than 2 bits, yet you don't have that space.
There is a paragraph in the standard that states there shall be no padding before the first member of a struct. But it does not say explicitly so about unions. The difference in size could come because it wants to align the union at 2 byte boundaries, but as it cannot pad before the first member of a struct, the struct will have one byte aligning. Also note that an union could have more members with different types, which could widen the required alignment of your union. There could be reasons for the compiler to give them at least 2 bytes alignment, for example to ease code that has to handle according the required aligment of an union.
Anyway, there is no requirement that your union should be one byte exactly. It just has to have place for all its members.
Here is what the C standard has to say about your second question:
The operand of the unary & operator shall be either a function
designator or an lvalue that designates an object that is not a
bit-field and is not declared with the register storage-class
specifier.
So your best bet is to use your way using the int. you may put braces around the code, so the temporary variable is kept local:
void func(void) { struct bits f; { int x; scanf("%d", &x); f.bitfield = x; } /* ... */ }
There is a lot of misinformation in the answers so I will clarify. It could be for one of 2 reasons (I am not familiar with the compiler).
The bitfield storage unit is 2.
Alignment is forced to word (2 byte) boundary.
I doubt it is the first case as it is a common extension to take the bitfield storage unit as the size of the declared "base" type. In this case the type is char which always has a size of 1.
[In standard you can only declare bitfields of type int or unsigned int and the "storage unit" in which bitfields are grouped is fixed (usually the same size as an int). Even a single bit bitfield will use one storage unit.]
In the 2nd case it is common for C compilers to implement #pragma pack to allow control of alignment. I suspect the default packing is 2 in which case a pad byte will be added at the end of the union. The way to avoid this is to use:
#pragma pack(1)
You should also use #pragma pack() afterward to set back to the default (or even better use the push and pop arguments if supported by your compiler).
To all the repliers who said that you must put up with what the compiler does, this is contrary to the spirit of C. You should be able to use bitfields to map to any size or bit order in situations where you have no control over it such as a file format or hardware mapping.
Of course this is highly non-portable since different implementations have different byte orders, orders that bits are added to a bitfield storage unit (from top or bottom), storage units size, default alignment etc.
As to your 2nd question, I can't see the problem, though I never use scanf as it is problematic.
In addition to the fact that there "there may also be unnamed padding at the end of a structure or union", the compiler is permitted to place a bitfield in "any addressable storage unit large enough to hold a bit-field". (both quotes are from the C90 standard - there is similar, but different, wording tin the C99 standard).
Also note that the standard says that a "bit-field shall have a type that is a qualified or unqualified version of int, unsigned int, or signed int", so having a bit-field in a char type is non-standard.
Because the behavior of bitfields are so dependent on unspecified compiler implementation details (there are several other non-portable issues with bit-fields that I have not mentioned) using them is almost always a bad idea. In particular, they are a bad idea when you are trying to model bit-fields in a file format, network protocol, or hardware register.
More information from another SO answer:
In general you should avoid bitfields
and use other manifest constants
(enums or whatever) with explicit bit
masking and shifting to access the
'sub-fields' in a field.
Here's one reason why bitfields should
be avoided - they aren't very portable
between compilers even for the same
platform. from the C99 standard
(there's similar wording in the C90
standard):
An implementation may allocate any
addressable storage unit large enough
to hold a bitfield. If enough space
remains, a bit-field that immediately
follows another bit-field in a
structure shall be packed into
adjacent bits of the same unit. If
insufficient space remains, whether a
bit-field that does not fit is put
into the next unit or overlaps
adjacent units is
implementation-defined. The order of
allocation of bit-fields within a unit
(high-order to low-order or low-order
to high-order) is
implementation-defined. The alignment
of the addressable storage unit is
unspecified.
You cannot guarantee whether a bit
field will 'span' an int boundary or
not and you can't specify whether a
bitfield starts at the low-end of the
int or the high end of the int (this
is independant of whether the
processor is big-endian or
little-endian).

Resources