Undefined behaviour due to this for-loop? - c

Is this undefined behaveiour because in the For loop the variable i has no initial value?
#include <stdio.h>
static int i;
int foo(int i)
{
int ret = i;
for(int i; i<4;i++){
ret+=i;
}
return ret;
}
int main() {
printf("%d", i+foo(4));
return 0;
}

Main Answer
Is this undefined behaveiour because in the For loop the variable i has no initial value?
Yes. C 2018 6.3.2.1 says that using the value of i in this situation has undefined behavior.
In more detail, static int i; defines an object i with static storage duration. Objects with static storage duration are “created” (in the C model of computing) when your program starts and are initialized with zero if no explicit initializer is given for them.
Then int foo(int i) defines a parameter i that is initialized when the function is called.
Then for(int i; i<4;i++) defines an object i that is not initialized. This i has automatic storage duration (which is the default for objects defined inside functions without a storage-class keyword like static). Due to a rule in the C standard, it is also relevant that your program never takes the address of this i, as with using the & operator on it. That rule is in C 2018 6.3.2.1 2, and it talks about the process of getting the value of an object:
… If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
In i<4, i is the lvalue that sentence speaks of, and the program evaluates it when starting the for loop. Because of the rule above, this evaluation has undefined behavior.
Supplement
Suppose inside the for loop you had the simple statement &i;, which merely evaluates the address of i and does nothing with it. Then the rule cited above would not apply, because taking the address of an object prevents declaring it with register. In this case, another rule applies. 6.2.4 6 says uninitialized objects of automatic storage duration have indeterminate values:
… The initial value of the object is indeterminate…
Indeterminate means an object may behave as though it has a different value each time it is used, or it can be a trap representation. Using a trap representation is another way to have undefined behavior, but many C implementations do not have trap representations for int objects these days. In this case, each time your program evaluates the i<4 test in the for or the i++ update or the ret+=i in the body, the C standard allows the program to behave as if i has a new value.
In this case, a variety of outcomes are allowed by the C standard. Some of them lead to undefined behavior in your program due to integer overflow. Others could lead to the function looping indefinitely or executing the loop as if i had been initialized to zero. Although the behavior is not formally undefined, it is not defined well enough to predict what the program will do.

In the for loop scope i is a not initialized automatic storage variable which initilial value is indeterminate. There result of operation using this variable is undefined
— The value of an object with automatic storage
duration is used while it is indeterminate (6.2.4, 6.7.9,
6.8)
6.7.9 p10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

Related

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Do a union or struct permit assignment from an uninitialised instance?

This question is about the definedness or otherwise of assigning an uninitalised automatic variable to another one of the same type.
Consider
typedef struct
{
int s1;
int s2;
} Foo;
typedef union
{
int u1;
Foo u2;
} Bar;
int main()
{
{
int a;
int b = a; // (1)
}
{
Foo a;
Foo b = a; // (2)
}
{
Bar a;
a.u1 = 0;
Bar b = a; // (3)
}
}
Referring to the comments in main:
(1) is undefined since a is uninitialised. That much I know.
But what about (2)? The struct members s1 and s2 are uninitialised.
Furthermore, what about (3)? The memory u2.s2 is uninitialised, so reading it is undefined behaviour no?
The behavior is undefined in (1) and (2).
Per the C standard, the value of an object with automatic storage duration that is not initialized is indeterminate (C 2011 [N1570] 6.7.9 10). Nominally, this means it has some value, but we do not know what it is while writing the program.
However, the standard also says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined” (6.3.2.1 2). In your sample code, the address of a is never taken, and it is not initialized, and using it in an expression is an lvalue. Therefore, the behavior is undefined.
(This passage, 6.3.2.1 2, was designed to accommodate processors that can detect use of an uninitialized register. Nonetheless, the rule in the C standard applies to all implementations.)
(3) is not clearly addressed by the C standard. Although a member of the union has been assigned a value, and hence is not uninitialized for purposes of 6.3.2.1 2, the object being used in b = a is the union, not its member. Obviously, our intuitive notion is that, if a member of a union is assigned a value, the union has a value. However, I do not see this specified in the C standard.
We can infer 6.3.2.1 2 is not intended to consider a union or structure to be uninitialized, at least if part of it has been assigned a value, because:
Structures can have unnamed members, such as unnamed bit fields.
Per C 6.7.9 9, unnamed members of structures have indeterminate value, even after initialization (of the structures).
If 6.3.2.1 2 applied to structures in which not every member had been assigned a value, then b = a would always be undefined if a were a structure with an unnamed member and had automatic storage duration.
That seems unreasonable and not what the standard intended.
However, there is some wiggle room here. The standard could have specified that a structure is not uninitialized only if it were initialized or all of its named members have been assigned values. In that case (3) would be undefined if a were a structure in which only one member had been assigned a value. I do not think this wiggle room exists with a union; if a member of the union has been assigned a value, it is only reasonable to consider the union not to be uninitialized.
In general, assigning from an uninitialized object isn't undefined behavior, it only makes the result unspecified.
But the code you show indeed has undefined behavior -- for a different reason than you assume. Citing N1570 (latest C11 draft), §6.3.2.1 p2 here:
[...] If
the lvalue designates an object of automatic storage duration that could have been
declared with the register storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
Explaining this a bit: The C standard is prepared to handle values that aren't stored in an addressable location. This is typically the case when they are held in one of the CPU's registers. Explicitly giving an object the register storage class is only a hint to the compiler that it should, if sensible, hold that object in a register. The other way around, a compiler is free to hold any object with automatic storage duration in a register as long as the code doesn't need to address it (by taking a pointer).
In your code, you have uninitialized objects with automatic storage duration that never have their address taken, so the compiler would be free to place them in registers. This means there is no value for the object (not even an unspecified one) before it is initialized. Therefore, using this potentially non-existent value to initialize another object (or, for other purposes) is undefined behavior.
If your code would take a pointer to the respective a in all these examples, the result of the assignment would be unspecified (of course), but the behavior would be defined.
It's worth to add that structs and unions have nothing to do with the answer to your question. The rules are the same for all kind of objects with automatic storage duration. That said, in your third example, a isn't uninitialized any more, after you assign one member of the union. So for your third example, the behavior is well-defined. It doesn't matter what's in the other member of the union, a union can only hold a value for one of its members at a time.

Could invoking a void statement cause undefined behavior?

Imagine this:
int X;
X = X;
this would be undefined behavior as
1 The behavior is undefined in the following circumstances:
[...]
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
But what about this?
int X;
X;
would the invokation of X; in reference to the quote allow the compiler to cause undefined behavior? Or does this not count as X is "used"?
In C 1999, it is not directly an error to use an uninitialized object. (Your quotes from Annex J are not a normative part of the standard; they are just informative.) An uninitialized object with automatic storage duration has an indeterminate value. For some objects, that value may be a trap representation, so using it may result in undefined behavior.
However, for some objects, it is possible to determine that an uninitialized object cannot have a trap value. For example, an unsigned char cannot have a trap value, and the exact-width signed integer types defined in stdint.h cannot have trap values (because they are two’s complement with no padding bits). For other types, it may be that properties defined by your C implementation cause them not to have trap values. Using an uninitialized int X does not have defined behavior in all C 1999 implementations (but does in some), but using an uninitialized unsigned char X does.
In C 2011, this text was added in 6.3.2 2: “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.” Therefore, in C 2011, both X = X; and X; have undefined behavior.
History/background:
The C 2011 change supports a Hewlett-Packard machine which has a special flag for certain registers that indicates whether the register contents are valid or not. The machine can generate an exception if a register is used while its contents are invalid. Hence, if the compiler assigns the X of unsigned char X to such a register, using the register when it is invalid may cause an exception in the machine even though there is no unsigned char trap value.
X = X;
The above is undefined behavior. Because X is not initialized. The compiler should at least generate a warning about this.
The standard states 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However:
X;
The above is similar to:
1212342413;
As X will evaluate to some expression.

Lifetime of temporary objects in C11 vs C99

I am trying to decipher a note that led to a change between C99 and C11.
The change proposed in that note ended up in C11's 6.2.4:8, namely:
A non-lvalue expression with structure or union type, where the
structure or union contains a member with array type (including,
recursively, members of all contained structures and unions) refers
to an object with automatic storage duration and temporary lifetime.
Its lifetime begins when the expression is evaluated and its initial
value is the value of the expression. Its lifetime ends when the
evaluation of the containing full expression or full declarator ends.
Any attempt to modify an object with temporary lifetime results in
undefined behavior.
I understand why the change was needed (some discussion can be found here. Note that the discussion goes back to before C11). However, what I don't understand is a side remark that Clark Nelson made in writing his note:
Please note that this approach additionally declares an example like
this, which was conforming under C99, to be non-conforming:
struct X { int a[5]; } f();
int *p = f().a;
printf("%p\n", p);
I understand why this example is non-conforming under C11. What I specifically fail to understand is how it is conforming under C99. And, if it is defined under C99, what is it supposed to do then, definedly print the value of a dangling pointer?
My understanding is that in C99, the finest grain of lifetime for an object is the block. Thus, while 6.5.2.2 (and some other § mentioned in the note you refer to) specifically says that you can't access the returned value after the next sequence point, technically its address is not indeterminate until after you have left the enclosing block (the reason why you should have some storage reserved for an inaccessible object is left as an exercise for the reader, though). Thus, something like
struct X { int a[5]; } f();
int *p;
{ p = f().a; }
printf("%p\n", p);
is undefined in C99 as well as in C11. In C11, the notion of "temporary lifetime", that does not exist in C99, allows to consider that the pointer becomes indeterminate as soon as the full expression ends.

What does printf print for an unitialized variable?

What should the code print? 0 or any garbage value or will it depend on the compiler?
#include <stdio.h>
int a;
int main()
{
printf("%d\n",a);
return 0;
}
the answer is 0. Global variables are initialized to zero.
I would say your code might output anything or simply anything can happen because your code invokes Undefined Behaviour as per C99.
You don't have a prototype for printf in scope.
J.2 Undefined behavior
— For call to a function without a function prototype in scope where the function is defined with a function prototype, either the prototype ends with an ellipsis or the types of the arguments after promotion are not compatible with the types of the parameters (6.5.2.2).
If the question is about initialization of global variables then a would be initialized to 0 because it has static storage duration.
I found on C99 standard, Section 6.7.8.10, Initialization:
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate. If an object that has static storage duration is not initialized explicitly,
then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these
rules.
Section 6.2.4.3 defines:
An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
In other words, globals are initialized as 0. Automatic variables (i.e. non-static locals) are not automatically initialized.
without automatic variable [generally what we use in function in most cases] all other variable's value is assigned to 0
Global variables are initialized as 0. Automatic variables (i.e. non-static locals) are not automatically initialized.

Resources