When is memset to 0 nonportable? [duplicate] - c

This question already has answers here:
memset() or value initialization to zero out a struct?
(8 answers)
what is the difference between struct {0} and memset 0 [duplicate]
(2 answers)
Is memset(&mystruct, 0, sizeof mystruct) same as mystruct = { 0 };?
(4 answers)
Is it guaranteed that memset will zero out the padding bits in a structure?
(3 answers)
Using memset() on struct which contains a floating point number
(4 answers)
Closed 1 year ago.
From this comment in GCC bug #53119:
In C, {0} is the universal zero initializer equivalent to C++'s {} (the latter being invalid in C). It is necessary to use whenever you want a zero-initialized object of a complete but conceptually-opaque or implementation-defined type. The classic example in the C standard library is mbstate_t:
mbstate_t state = { 0 }; /* correctly zero-initialized */
versus the common but nonportable:
mbstate_t state;
memset(&state, 0, sizeof state);
It strikes me as odd that the latter version could be unportable (even for implementation-defined types, the compiler has to know the size). What is the issue here and when is a memset(x, 0, sizeof x) unportable?

memset(p, 0, n) sets to all-bits-0.
An initializer of { 0 } sets to the value 0.
On just about any machine you've ever heard of, the two concepts are equivalent.
However, there have been machines where the floating-point value 0.0 was not represented by a bit pattern of all-bits-0. And there have been machines where a null pointer was not represented by a bit pattern of all-bits-0, either. On those machines, an initializer of { 0 } would always get you the zero initialization you wanted, while memset might not.
See also question 7.31 and question 5.17 in the C FAQ list.
Postscript: One other difference, as pointed out by #ryker: memset will set any "holes" in a padded structure to 0, while setting that structure to { 0 } might not.

The reason for this has to do with how types are represented.
Section 6.7.9p10 of the C standard describes how fields are initialized as follows:
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate. If an object
that has static or thread storage duration is not initialized
explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized
to zero bits
And p21 also states:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known
size than there are elements in the array, the remainder of
the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
The difference between this and setting all bytes to zero is that some of the above values may not necessarily be represented by all bits zero.
For example, there are some architectures where the address 0 is a valid address. This means that a null pointer is not represented as all bits zero. (Note: (void *)0 is specified as a null pointer constant by the standard, however the implementation will treat this as whatever the representation of a null pointer is)
The standard also doesn't mandate a particular representation for floating point types. While the most common representation, IEEE754, does use all bits 0 to represent the value +0, this is not necessarily true for other representations.

Noting a difference in behavior between the two methods...
In ...= {0}; if padding bytes exist, they will not be cleared.
But a call to memset() will clear padding.
From here
"Possible implementation of mbstate_t is a struct type holding an array representing the incomplete multibyte character, an integer
counter indicating the number of bytes in the array that have been
processed, and a representation of the current shift state."
In the case mbstate_t is implemented as a struct it is notable that {0} will not set padding bytes that may exist to zero, making the following assumption debatable:
mbstate_t state = { 0 }; /* correctly zero-initialized */
memset() however does include padding bytes.
memset(state , 0, sizeof state);//all bytes in memory region of test will be cleared

Related

How do bit fields interplay with bits padding in C

I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C standard say about it?
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
No, it is both system and compiler dependent. You can never assume or know that they are MSB or LSB.
Question 2: Are those unused 30 bits always initialized to 0? What do the C and C++ standards say about it?
Depends on how you initialize the struct. A struct at local scope which isn't initialized may contain garbage values in padding bits/bytes. A struct that is initialized with at least one initializer set, is guaranteed to contain zero even in padding bytes: my_struct = { something };.
Sources
The language-lawyer details of why the above works are somewhat complex.
C17 6.7.9/9 (emphasis mine) says this:
Except where explicitly stated otherwise, for the purposes of this subclause unnamed
members of objects of structure and union type do not participate in initialization.
Unnamed members of structure objects have indeterminate value even after initialization.
This means that we cannot trust padding bits/bytes at all.
But then there's this exception to the above rule (§20 emphasis mine):
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
Meaning that if there's at least one initializer, then the following rule of static storage initialization applies:
C17 6.7.9/10 (emphasis mine):
If an object that has static or thread storage duration is not initialized
explicitly, then: /--/
if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

zeroing memory area of a structure in c [duplicate]

This question already has answers here:
what is the difference between struct {0} and memset 0 [duplicate]
(2 answers)
memset() or value initialization to zero out a struct?
(8 answers)
Closed 4 years ago.
I have following structure and i want to initialize it to zero.
struct example {
int a;
int b;
char c;
};
struct example mystruct;
I can memset mystruct to zero in following ways:
memset(&mystruct, 0, sizeof(mystruct));
or
mystruct = (struct example) {0};
Is the second way is equal to the first one?
Note that i want to initialize mystruct variable to zero not only at variable definition, maybe for reusing it.
Well, in your case, they will be equivalent (disregarding the padding bytes).
Quoting C11, chapter §6.7.9/ P21, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
and, regarding the initialization of static storage duration objects, P10 (again emphasis mine)
[...] If an object that has static or thread storage duration is not initialized
explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules,
and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these
rules, and any padding is initialized to zero bits;
For sake of completeness, as mentioned in the below comment by Serge Ballesta:
all remaining members will be initialized implicitly the same as objects that have static storage duration. And the standard does not mandate anything for the padding bits of the top level object
which indicates, the assignment will differ from the memset() call in case of the first object / member, for the padding values.
No, they are not equal.
The first is more low-level, it will set all bits of the value to 0, including those that are not part of any member (padding).
The latter is only guaranteed to set the actual members to zero, so it might be faster since it has the opportunity to touch less memory.

Value of uninitialized elements in array of C language

I have an array of 3 elements. But I only want to initialize 2 of them.
I let the third element blank.
unsigned char array[3] = {1,2,};
int main(){
printf("%d",array[2]);
return 0;
}
The print result is 0. I tested it on IAR, and some online compiler.
Is there any C rule for the value of third element?
Is there any compiler filling the third element by 0xFF ? (Especially cross compiler)
Yes, the C standard does define what happens in this case. So no, there should be no C standard compliant compiler that intialises with 0xFF in this case.
Section 6.7.9 of the standard says:
Initialisation
...
10 ...If an object that has static or thread storage duration is not
initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is
initialized to (positive or unsigned) zero;
if it is an aggregate,
every member is initialized (recursively) according to these rules,
and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
...
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
From this post, it appears that that syntax will initialize all elements after the comma to zero. Moreover; all uninitialized data in the data segment of the program (in other words all uninitialized global variables) are automatically set to zero, so if you are looking for undefined behavior in this program, there isn't any; it will always be 0.
This can be achieved with gcc extension as below
unsigned char array[10] = {1,2,[2 ... 9] = 0xFF};

Which member of a global union variable that is not initialized explicitly will be initialized to 0 implicitly?

e.g.
union
{
int n;
void *p;
} u;
Is the initial value of u.n or that of u.p equal to 0?
It should be noted that a NULL pointer is not necessarily stored in all-zero bits. Therefore, even if u.n and u.p have the same size,
u.n == 0
doesn't guarantee
u.p == 0
and vice versa.
(Sorry for my poor English)
if Object with Static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
So u.n will be initilaized to zero and u.p is undetermined.
EDIT: Response to comment
above info copied from ISO/IEC 9899:201x 6.7.9.10
Since u is static then the first member will be initialized to zero, from the C99 draft standard section 6.7.8 Initialization paragraph 10:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
since n is a arithmetic type it will be initialized to zero. The value of p is unspecified but in practice type punning is usually supported by the compiler for example the gcc manual points here for Type-punning and we can see under -fstrict-aliasing section is says:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
It is also worth noting that you may be able to initialize any member of a union like so:
union { int n; void *p; } u = { .p = NULL } ;
^^^^^^^^^^^^^
I am not sure if all compilers support this though.
I think by 'global variable' you mean that it's at file scope. If so, and if it's declared 'static', it will be initialized to all zero bits, eg because it gets allocated in .BSS.
What those zero bits in the union's storage mean in terms of the value of whichever of its members you access depends on their types. In your case, all zero bits in an int means it has the value zero, and all zero bits in a pointer makes it NULL, ie #Dukeling's bang on here.
I am not sure that all zero bits in a float would yield a float with value zero.
It depends upon how you instantiate the variable itself. Usually static defined variables are initialized to zero. If you malloc the union to a pointer, you may get uninitialized memory. However, if you use calloc to allocate memory for the union, calloc will initialized the allocated memory to zero per the man page.
It may also depend on any libraries that you may be using. Google's perftools library may or may not zero out the memory when you make the call to calloc that it overwrites.

Is zero initialization of structures guaranteed to wipe padded areas?

Suppose I have the following structure:
typedef struct
{
unsigned field1 :1;
unsigned field2 :1;
unsigned field3 :1;
} mytype;
The first 3 bits will be usable but sizeof(mytype) will return 4 which means 29 bits of padding.
My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
mytype testfields = {0};
or:
mytype myfields = {1, 1, 1};
Such that it's safe to perform the following memcmp() on the assumption that bits 4..29 will be zero and therefore won't affect the comparison:
if ( memcmp(&myfields, &testfields, sizeof(myfields)) == 0 )
printf("Fields have no bits set\n");
else
printf("Fields have bits set\n");
Yes and no. The actual standard, C11, specifies:
If an object that has static or thread storage duration is not
initialized explicitly, then:
....
if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
So this only holds for objects of static storage, at a first view. But then later it says in addition:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
So this means that padding inside sub-structures that are not initialized explicitly is zero-bit initialized.
In summarry, some padding in a structure is guaranteed to be zero-bit initialized, some isn't. I don't think that such a confusion is intentional, I will file a defect report for this.
Older versions didn't have that at all. So with most existing compilers you'd have to be even more careful, since they don't implement C11, yet. But AFAIR, clang already does on that behalf.
Also be aware that this only holds for initialization. Padding isn't necessarily copied on assignment.
The C99 standard doesn't specify the padding bits would be set to zero. In fact, it specifically mentions that the values of any padding bits are unspecified, so that padding need not be copied in an assignment.
Footnote 51 to 6.2.6.1 (6) (n1570):
Thus, for example, structure assignment need not copy any padding bits.
The new C2011 standard - thanks to Jens Gustedt for sharing that knowledge - specifies that padding bits in objects of static or thread storage duration without explicit initialisation are initialised to 0.
There are still no guarantees for assignment.
My question is, are these padding bits guaranteed by the standard to be zero initialized by the statement:
No.
The value of the padding is unspecified:
(C99, 6.2.6.1p6) "When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values"
EDIT: See Jens Gustedt answer, C11 now guarantees the padding is set to 0 in (rare) certain circumstances

Resources