Is zero initialization of structures guaranteed to wipe padded areas? - c

Suppose I have the following structure:
typedef struct
{
unsigned field1 :1;
unsigned field2 :1;
unsigned field3 :1;
} mytype;
The first 3 bits will be usable but sizeof(mytype) will return 4 which means 29 bits of padding.
My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
mytype testfields = {0};
or:
mytype myfields = {1, 1, 1};
Such that it's safe to perform the following memcmp() on the assumption that bits 4..29 will be zero and therefore won't affect the comparison:
if ( memcmp(&myfields, &testfields, sizeof(myfields)) == 0 )
printf("Fields have no bits set\n");
else
printf("Fields have bits set\n");

Yes and no. The actual standard, C11, specifies:
If an object that has static or thread storage duration is not
initialized explicitly, then:
....
if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
So this only holds for objects of static storage, at a first view. But then later it says in addition:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
So this means that padding inside sub-structures that are not initialized explicitly is zero-bit initialized.
In summarry, some padding in a structure is guaranteed to be zero-bit initialized, some isn't. I don't think that such a confusion is intentional, I will file a defect report for this.
Older versions didn't have that at all. So with most existing compilers you'd have to be even more careful, since they don't implement C11, yet. But AFAIR, clang already does on that behalf.
Also be aware that this only holds for initialization. Padding isn't necessarily copied on assignment.

The C99 standard doesn't specify the padding bits would be set to zero. In fact, it specifically mentions that the values of any padding bits are unspecified, so that padding need not be copied in an assignment.
Footnote 51 to 6.2.6.1 (6) (n1570):
Thus, for example, structure assignment need not copy any padding bits.
The new C2011 standard - thanks to Jens Gustedt for sharing that knowledge - specifies that padding bits in objects of static or thread storage duration without explicit initialisation are initialised to 0.
There are still no guarantees for assignment.

My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
No.
The value of the padding is unspecified:
(C99, 6.2.6.1p6) "When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values"
EDIT: See Jens Gustedt answer, C11 now guarantees the padding is set to 0 in (rare) certain circumstances

Related

When is memset to 0 nonportable? [duplicate]

This question already has answers here:
memset() or value initialization to zero out a struct?
(8 answers)
what is the difference between struct {0} and memset 0 [duplicate]
(2 answers)
Is memset(&mystruct, 0, sizeof mystruct) same as mystruct = { 0 };?
(4 answers)
Is it guaranteed that memset will zero out the padding bits in a structure?
(3 answers)
Using memset() on struct which contains a floating point number
(4 answers)
Closed 1 year ago.
From this comment in GCC bug #53119:
In C, {0} is the universal zero initializer equivalent to C++'s {} (the latter being invalid in C). It is necessary to use whenever you want a zero-initialized object of a complete but conceptually-opaque or implementation-defined type. The classic example in the C standard library is mbstate_t:
mbstate_t state = { 0 }; /* correctly zero-initialized */
versus the common but nonportable:
mbstate_t state;
memset(&state, 0, sizeof state);
It strikes me as odd that the latter version could be unportable (even for implementation-defined types, the compiler has to know the size). What is the issue here and when is a memset(x, 0, sizeof x) unportable?
memset(p, 0, n) sets to all-bits-0.
An initializer of { 0 } sets to the value 0.
On just about any machine you've ever heard of, the two concepts are equivalent.
However, there have been machines where the floating-point value 0.0 was not represented by a bit pattern of all-bits-0. And there have been machines where a null pointer was not represented by a bit pattern of all-bits-0, either. On those machines, an initializer of { 0 } would always get you the zero initialization you wanted, while memset might not.
See also question 7.31 and question 5.17 in the C FAQ list.
Postscript: One other difference, as pointed out by #ryker: memset will set any "holes" in a padded structure to 0, while setting that structure to { 0 } might not.
The reason for this has to do with how types are represented.
Section 6.7.9p10 of the C standard describes how fields are initialized as follows:
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate. If an object
that has static or thread storage duration is not initialized
explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized
to zero bits
And p21 also states:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known
size than there are elements in the array, the remainder of
the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
The difference between this and setting all bytes to zero is that some of the above values may not necessarily be represented by all bits zero.
For example, there are some architectures where the address 0 is a valid address. This means that a null pointer is not represented as all bits zero. (Note: (void *)0 is specified as a null pointer constant by the standard, however the implementation will treat this as whatever the representation of a null pointer is)
The standard also doesn't mandate a particular representation for floating point types. While the most common representation, IEEE754, does use all bits 0 to represent the value +0, this is not necessarily true for other representations.
Noting a difference in behavior between the two methods...
In ...= {0}; if padding bytes exist, they will not be cleared.
But a call to memset() will clear padding.
From here
"Possible implementation of mbstate_t is a struct type holding an array representing the incomplete multibyte character, an integer
counter indicating the number of bytes in the array that have been
processed, and a representation of the current shift state."
In the case mbstate_t is implemented as a struct it is notable that {0} will not set padding bytes that may exist to zero, making the following assumption debatable:
mbstate_t state = { 0 }; /* correctly zero-initialized */
memset() however does include padding bytes.
memset(state , 0, sizeof state);//all bytes in memory region of test will be cleared

How do bit fields interplay with bits padding in C

I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C standard say about it?
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
No, it is both system and compiler dependent. You can never assume or know that they are MSB or LSB.
Question 2: Are those unused 30 bits always initialized to 0? What do the C and C++ standards say about it?
Depends on how you initialize the struct. A struct at local scope which isn't initialized may contain garbage values in padding bits/bytes. A struct that is initialized with at least one initializer set, is guaranteed to contain zero even in padding bytes: my_struct = { something };.
Sources
The language-lawyer details of why the above works are somewhat complex.
C17 6.7.9/9 (emphasis mine) says this:
Except where explicitly stated otherwise, for the purposes of this subclause unnamed
members of objects of structure and union type do not participate in initialization.
Unnamed members of structure objects have indeterminate value even after initialization.
This means that we cannot trust padding bits/bytes at all.
But then there's this exception to the above rule (§20 emphasis mine):
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
Meaning that if there's at least one initializer, then the following rule of static storage initialization applies:
C17 6.7.9/10 (emphasis mine):
If an object that has static or thread storage duration is not initialized
explicitly, then: /--/
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

zeroing memory area of a structure in c [duplicate]

This question already has answers here:
what is the difference between struct {0} and memset 0 [duplicate]
(2 answers)
memset() or value initialization to zero out a struct?
(8 answers)
Closed 4 years ago.
I have following structure and i want to initialize it to zero.
struct example {
int a;
int b;
char c;
};
struct example mystruct;
I can memset mystruct to zero in following ways:
memset(&mystruct, 0, sizeof(mystruct));
or
mystruct = (struct example) {0};
Is the second way is equal to the first one?
Note that i want to initialize mystruct variable to zero not only at variable definition, maybe for reusing it.
Well, in your case, they will be equivalent (disregarding the padding bytes).
Quoting C11, chapter §6.7.9/ P21, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
and, regarding the initialization of static storage duration objects, P10 (again emphasis mine)
[...] If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules,
and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these
rules, and any padding is initialized to zero bits;
For sake of completeness, as mentioned in the below comment by Serge Ballesta:
all remaining members will be initialized implicitly the same as objects that have static storage duration. And the standard does not mandate anything for the padding bits of the top level object
which indicates, the assignment will differ from the memset() call in case of the first object / member, for the padding values.
No, they are not equal.
The first is more low-level, it will set all bits of the value to 0, including those that are not part of any member (padding).
The latter is only guaranteed to set the actual members to zero, so it might be faster since it has the opportunity to touch less memory.

Are struct padding bytes preserved by member assignment?

Sample code:
#include <assert.h>
struct S
{
unsigned char ch;
int i;
};
int main()
{
struct S s;
memset(&s, 0, sizeof s);
s.ch = 257;
assert( 0 == ((unsigned char *)&s)[1] );
}
Can the assertion fail?
The motivation for the question is whether a compiler on a little-endian system could decide to use a 4-byte store to implement s.ch = 257;. Obviously nobody would ever write code like I did in my example, but something similar might realistically occur if ch is assigned in various ways in a program which then goes on to use memcmp to check for struct equality.
For example, if the code does --s.ch instead of s.ch = 257 - can the compiler emit a word-size decrement instruction?
I don't think the discussion around DR 451 is relevant, as that only applies to uninitialized padding; however the memset initializes all the padding to zero bytes.
Yes, it can fail. The behavior is unspecified, but not undefined.
After the assignment s.ch = 257;, the values of all padding bits take unspecified values1 , which means that, if the second byte of the structure is a padding byte, it takes unspecified value and the result of the comparison to zero isn't specified. It may trigger or not.
The read value in the assert cannot be a trap representation because unsigned char doesn't have trap representations, and because the value is unspecified, not indeterminate.
1 (Quoted from: ISO/IEC 9899:201x 6.2.6.1 General 6):
When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.
ISO/IEC 9899:2011 §6.2.6.1 (Representations of types) General says:
¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
51) Thus, for example, structure assignment need not copy any padding bits.
However, your example doesn't do a structure assignment, so maybe that doesn't apply. I believe there is no reason to think that an assignment to a simple type member of a structure would modify the data.
However, your assert code does exhibit undefined behaviour, trying to access structure padding, which is simply not allowed.
So, it is unlikely that the assertion would fire, but because your code exhibits undefined behaviour, it could happen and you'd have no recourse.

Which member of a global union variable that is not initialized explicitly will be initialized to 0 implicitly?

e.g.
union
{
int n;
void *p;
} u;
Is the initial value of u.n or that of u.p equal to 0?
It should be noted that a NULL pointer is not necessarily stored in all-zero bits. Therefore, even if u.n and u.p have the same size,
u.n == 0
doesn't guarantee
u.p == 0
and vice versa.
(Sorry for my poor English)
if Object with Static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
So u.n will be initilaized to zero and u.p is undetermined.
EDIT: Response to comment
above info copied from ISO/IEC 9899:201x 6.7.9.10
Since u is static then the first member will be initialized to zero, from the C99 draft standard section 6.7.8 Initialization paragraph 10:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
since n is a arithmetic type it will be initialized to zero. The value of p is unspecified but in practice type punning is usually supported by the compiler for example the gcc manual points here for Type-punning and we can see under -fstrict-aliasing section is says:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
It is also worth noting that you may be able to initialize any member of a union like so:
union { int n; void *p; } u = { .p = NULL } ;
^^^^^^^^^^^^^
I am not sure if all compilers support this though.
I think by 'global variable' you mean that it's at file scope. If so, and if it's declared 'static', it will be initialized to all zero bits, eg because it gets allocated in .BSS.
What those zero bits in the union's storage mean in terms of the value of whichever of its members you access depends on their types. In your case, all zero bits in an int means it has the value zero, and all zero bits in a pointer makes it NULL, ie #Dukeling's bang on here.
I am not sure that all zero bits in a float would yield a float with value zero.
It depends upon how you instantiate the variable itself. Usually static defined variables are initialized to zero. If you malloc the union to a pointer, you may get uninitialized memory. However, if you use calloc to allocate memory for the union, calloc will initialized the allocated memory to zero per the man page.
It may also depend on any libraries that you may be using. Google's perftools library may or may not zero out the memory when you make the call to calloc that it overwrites.

Resources