How do bit fields interplay with bits padding in C - c

I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C standard say about it?

Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
No, it is both system and compiler dependent. You can never assume or know that they are MSB or LSB.
Question 2: Are those unused 30 bits always initialized to 0? What do the C and C++ standards say about it?
Depends on how you initialize the struct. A struct at local scope which isn't initialized may contain garbage values in padding bits/bytes. A struct that is initialized with at least one initializer set, is guaranteed to contain zero even in padding bytes: my_struct = { something };.
Sources
The language-lawyer details of why the above works are somewhat complex.
C17 6.7.9/9 (emphasis mine) says this:
Except where explicitly stated otherwise, for the purposes of this subclause unnamed
members of objects of structure and union type do not participate in initialization.
Unnamed members of structure objects have indeterminate value even after initialization.
This means that we cannot trust padding bits/bytes at all.
But then there's this exception to the above rule (§20 emphasis mine):
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
Meaning that if there's at least one initializer, then the following rule of static storage initialization applies:
C17 6.7.9/10 (emphasis mine):
If an object that has static or thread storage duration is not initialized
explicitly, then: /--/
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

Related

Linking behavior and padding bytes

I have two source files I am linking:
src1.c:
#include <stdio.h>
typedef struct my_struct {
char a;
short b;
int c;
} my_struct;
my_struct x;
void add1();
int main()
{
x.a = 0;
x.b = 0;
x.c = 0;
add1();
printf("%x, %x, %x\n", x.a, x.b, x.c);
}
and src2.c:
#include <stdio.h>
typedef struct my_struct {
int c;
short b;
char a;
} my_struct;
extern my_struct x;
void add1(){
x.a += 1;
x.b += 1;
x.c += 1;
}
The output is "1, 0, 10001" due to the nature of the type definitions and alignment. However, this relies on the second byte of the struct to be 0x00 (which is a padding byte in the struct for src1.c).
Is this guaranteed behavior? Are padding bytes typically initialized to 0? Are there cases when they wouldn't be?
The two struct types in each source file are not compatible with each other because the members aren't declared in the same order.
This is spelled out in section 6.2.7p1 of the C standard:
Two types have compatible type if their types are the same. Additional
rules for determining whether two types are compatible are described
in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in
6.7.6 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible
if their tags and members satisfy the following requirements: If one
is declared with a tag, the other shall be declared with the same tag.
If both are completed anywhere within their respective translation
units, then the following additional requirements apply: there shall
be a one-to-one correspondence between their members such that each
pair of corresponding members are declared with compatible types; if
one member of the pair is declared with an alignment specifier, the
other is declared with an equivalent alignment specifier; and if one
member of the pair is declared with a name, the other is declared with
the same name. For two structures, corresponding members shall be
declared in the same order. For two structures or unions,
corresponding bit-fields shall have the same widths. For two
enumerations, corresponding members shall have the same values
This means that my_struct x; and extern my_struct x; are not compatible, and declaring an identifier multiple times with different types triggerer undefined behavior, which loosely speaking means there are no guarantees at all what your program will do.
Unrelated to this, as far as padding bytes go, structures declared at file scope will have padding bytes initialized to 0 if the struct is not explicitly initialized.
The proper way to examine padding bytes in a struct is to use an unsigned char * to point to the start of the struct and iterate through the individual bytes,
Is this guaranteed behavior?
No. C 2018 6.5 7 specifies rules for when the behavior is defined if you try to access an object defined as one type using a different type. Accessing one type of structure, say A, with a different structure type, say B, has defined behavior only if:
B is compatible with A,
B is a qualified version of a type compatible with A, or
one of the members of B (possibly nested) is of type A or a qualified version of A.
Option 3 clearly does not apply. Regarding 1 and 2, the rules for compatibility of structures in different translation units are in C 2018 6.2.7 1, and they require there be a one-to-one correspondence between structure members in their names, types, and alignments. The structures do not have that, so they do not satisfy the requirements.
This means the behavior of accessing the my_struct x object defined in one translation unit using the my_struct type defined in the other translation unit is not defined by the C standard. In other words, the C standard does not guarantee the behavior.
In C implementations without any cross-unit optimization or other information transfer outside of the ordinary linkage, we can reason that using the my_struct in src2.c must access the bytes of the my_struct defined in src1.c because there is no other way to implement the behavior that the C standard does require.1 This would generally be bad programming practice, as, even if it is necessary to reinterpret the bytes of one type of my_struct as another type, there are ways to do that that are defined by the C standard.
Are padding bytes typically initialized to 0?
Yes, for objects with static or thread storage duration, C 2018 6.7.9 says “… if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;…”
Are there cases when they wouldn't be?
There are cases where initialization will not set the padding bits to zero. If a structure object is defined with automatic storage duration (the default inside a function definition) or is dynamically allocated, it will not be initialized at all. If you explicitly initialize a structure object, the C standard does not specify what the padding bytes will be set to; they will have unspecified values.
There are also cases where the padding bytes will not remain zero. Whenever you store a value in a structure or its members, the padding bytes may take on unspecified values, per C 2018 6.2.6.1 6. For example, in main, you have x.a = 0;. Since a is a char and is presumably followed by one byte of padding (due to the short that follows it), the compiler is allowed to implement this by clearing eight bits of a processor register and then issuing a 16-bit store to the structure. This will set the byte for the a member to zero and will set the padding byte to whatever happened to be in the other bits of the register.
Then reading that byte through the my_struct type in the other translation unit will get those other bits.
Footnote
1 This speaks to the particular code in the question. In other circumstances, there can be further complications. For example, suppose translation unit U defines an object X with type A and then translation unit V both attempts to access X (directly by name) with an incompatible type B and to access X through a pointer of type “pointer to A”. The compiler in V is entitled to assume the two access refer to different objects, so it has no responsibility to coordinate them. For example, if V writes to X through the pointer and then reads X by name, the compiler has no obligation to actually read the bytes of X from memory; it could use a previously read value it is holding in a register, since it has no reason to believe the write through the pointer changed X.
If you use the same compiler for both units, the padding rules should be the same, unless they change due to optimizations - try disabling them (-O0 switch for GNU C Compiler).

zeroing memory area of a structure in c [duplicate]

This question already has answers here:
what is the difference between struct {0} and memset 0 [duplicate]
(2 answers)
memset() or value initialization to zero out a struct?
(8 answers)
Closed 4 years ago.
I have following structure and i want to initialize it to zero.
struct example {
int a;
int b;
char c;
};
struct example mystruct;
I can memset mystruct to zero in following ways:
memset(&mystruct, 0, sizeof(mystruct));
or
mystruct = (struct example) {0};
Is the second way is equal to the first one?
Note that i want to initialize mystruct variable to zero not only at variable definition, maybe for reusing it.
Well, in your case, they will be equivalent (disregarding the padding bytes).
Quoting C11, chapter §6.7.9/ P21, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
and, regarding the initialization of static storage duration objects, P10 (again emphasis mine)
[...] If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules,
and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these
rules, and any padding is initialized to zero bits;
For sake of completeness, as mentioned in the below comment by Serge Ballesta:
all remaining members will be initialized implicitly the same as objects that have static storage duration. And the standard does not mandate anything for the padding bits of the top level object
which indicates, the assignment will differ from the memset() call in case of the first object / member, for the padding values.
No, they are not equal.
The first is more low-level, it will set all bits of the value to 0, including those that are not part of any member (padding).
The latter is only guaranteed to set the actual members to zero, so it might be faster since it has the opportunity to touch less memory.

What's the purpose of unnamed bit field at the end of structure

I am learning C. In C Primer Plus, I saw an bit field example as follows:
struct box_props {
bool opaque : 1;
unsigned int fill_color : 3;
unsigned int : 4;
bool show_border : 1;
unsigned int border_color : 3;
unsigned int border_style : 2;
unsigned int : 2;
};
I do understand the 4-bit unnamed bit field in the middle is used for letting the following bits start at a new byte. However, I don't understand why there is another unnamed bit field at the end of the structure. What's the purpose of it? Is it necessary?
Is it necessary?
Nope, it's optional.
What's the purpose of it?
Here's what the standard says in §9.6.2, C++11 (draft N3337, emphasis mine):
A declaration for a bit-field that omits the identifier declares an unnamed bit-field. Unnamed bit-fields are not members and cannot be initialized. [Note: An unnamed bit-field is useful for padding to conform to externally-imposed layouts. — end note ] As a special case, an unnamed bit-field with a width of zero specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed bit-field may the value of the constant-expression be equal to zero.
So it's a hint to the compiler that summing up all the members of the struct leads to 2 octects and thus is done hoping the compiler would make the struct 2 octects long. However, as per the standard there's no such requirement. Here's the excerpt from the previous point, §9.6.1:
extra bits are used as padding bits and do not participate in the value representation of the bit-field. Allocation of bit-fields within a class
object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Hence the standard does not guarantee any further than this regarding the size or alignment of a struct/class using bit-fields.
What's the purpose of it? Is it necessary?
It is used for padding. You can look at it as anonymous member which can not be referenced.
It is optional and totally dependent on your layout requirement.

Which member of a global union variable that is not initialized explicitly will be initialized to 0 implicitly?

e.g.
union
{
int n;
void *p;
} u;
Is the initial value of u.n or that of u.p equal to 0?
It should be noted that a NULL pointer is not necessarily stored in all-zero bits. Therefore, even if u.n and u.p have the same size,
u.n == 0
doesn't guarantee
u.p == 0
and vice versa.
(Sorry for my poor English)
if Object with Static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
So u.n will be initilaized to zero and u.p is undetermined.
EDIT: Response to comment
above info copied from ISO/IEC 9899:201x 6.7.9.10
Since u is static then the first member will be initialized to zero, from the C99 draft standard section 6.7.8 Initialization paragraph 10:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
since n is a arithmetic type it will be initialized to zero. The value of p is unspecified but in practice type punning is usually supported by the compiler for example the gcc manual points here for Type-punning and we can see under -fstrict-aliasing section is says:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
It is also worth noting that you may be able to initialize any member of a union like so:
union { int n; void *p; } u = { .p = NULL } ;
^^^^^^^^^^^^^
I am not sure if all compilers support this though.
I think by 'global variable' you mean that it's at file scope. If so, and if it's declared 'static', it will be initialized to all zero bits, eg because it gets allocated in .BSS.
What those zero bits in the union's storage mean in terms of the value of whichever of its members you access depends on their types. In your case, all zero bits in an int means it has the value zero, and all zero bits in a pointer makes it NULL, ie #Dukeling's bang on here.
I am not sure that all zero bits in a float would yield a float with value zero.
It depends upon how you instantiate the variable itself. Usually static defined variables are initialized to zero. If you malloc the union to a pointer, you may get uninitialized memory. However, if you use calloc to allocate memory for the union, calloc will initialized the allocated memory to zero per the man page.
It may also depend on any libraries that you may be using. Google's perftools library may or may not zero out the memory when you make the call to calloc that it overwrites.

Is zero initialization of structures guaranteed to wipe padded areas?

Suppose I have the following structure:
typedef struct
{
unsigned field1 :1;
unsigned field2 :1;
unsigned field3 :1;
} mytype;
The first 3 bits will be usable but sizeof(mytype) will return 4 which means 29 bits of padding.
My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
mytype testfields = {0};
or:
mytype myfields = {1, 1, 1};
Such that it's safe to perform the following memcmp() on the assumption that bits 4..29 will be zero and therefore won't affect the comparison:
if ( memcmp(&myfields, &testfields, sizeof(myfields)) == 0 )
printf("Fields have no bits set\n");
else
printf("Fields have bits set\n");
Yes and no. The actual standard, C11, specifies:
If an object that has static or thread storage duration is not
initialized explicitly, then:
....
if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
So this only holds for objects of static storage, at a first view. But then later it says in addition:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
So this means that padding inside sub-structures that are not initialized explicitly is zero-bit initialized.
In summarry, some padding in a structure is guaranteed to be zero-bit initialized, some isn't. I don't think that such a confusion is intentional, I will file a defect report for this.
Older versions didn't have that at all. So with most existing compilers you'd have to be even more careful, since they don't implement C11, yet. But AFAIR, clang already does on that behalf.
Also be aware that this only holds for initialization. Padding isn't necessarily copied on assignment.
The C99 standard doesn't specify the padding bits would be set to zero. In fact, it specifically mentions that the values of any padding bits are unspecified, so that padding need not be copied in an assignment.
Footnote 51 to 6.2.6.1 (6) (n1570):
Thus, for example, structure assignment need not copy any padding bits.
The new C2011 standard - thanks to Jens Gustedt for sharing that knowledge - specifies that padding bits in objects of static or thread storage duration without explicit initialisation are initialised to 0.
There are still no guarantees for assignment.
My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
No.
The value of the padding is unspecified:
(C99, 6.2.6.1p6) "When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values"
EDIT: See Jens Gustedt answer, C11 now guarantees the padding is set to 0 in (rare) certain circumstances

Resources