Imagine this:
int X;
X = X;
this would be undefined behavior as
1 The behavior is undefined in the following circumstances:
[...]
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
But what about this?
int X;
X;
would the invokation of X; in reference to the quote allow the compiler to cause undefined behavior? Or does this not count as X is "used"?
In C 1999, it is not directly an error to use an uninitialized object. (Your quotes from Annex J are not a normative part of the standard; they are just informative.) An uninitialized object with automatic storage duration has an indeterminate value. For some objects, that value may be a trap representation, so using it may result in undefined behavior.
However, for some objects, it is possible to determine that an uninitialized object cannot have a trap value. For example, an unsigned char cannot have a trap value, and the exact-width signed integer types defined in stdint.h cannot have trap values (because they are two’s complement with no padding bits). For other types, it may be that properties defined by your C implementation cause them not to have trap values. Using an uninitialized int X does not have defined behavior in all C 1999 implementations (but does in some), but using an uninitialized unsigned char X does.
In C 2011, this text was added in 6.3.2 2: “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.” Therefore, in C 2011, both X = X; and X; have undefined behavior.
History/background:
The C 2011 change supports a Hewlett-Packard machine which has a special flag for certain registers that indicates whether the register contents are valid or not. The machine can generate an exception if a register is used while its contents are invalid. Hence, if the compiler assigns the X of unsigned char X to such a register, using the register when it is invalid may cause an exception in the machine even though there is no unsigned char trap value.
X = X;
The above is undefined behavior. Because X is not initialized. The compiler should at least generate a warning about this.
The standard states 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However:
X;
The above is similar to:
1212342413;
As X will evaluate to some expression.
Related
An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.
Is this undefined behaveiour because in the For loop the variable i has no initial value?
#include <stdio.h>
static int i;
int foo(int i)
{
int ret = i;
for(int i; i<4;i++){
ret+=i;
}
return ret;
}
int main() {
printf("%d", i+foo(4));
return 0;
}
Main Answer
Is this undefined behaveiour because in the For loop the variable i has no initial value?
Yes. C 2018 6.3.2.1 says that using the value of i in this situation has undefined behavior.
In more detail, static int i; defines an object i with static storage duration. Objects with static storage duration are “created” (in the C model of computing) when your program starts and are initialized with zero if no explicit initializer is given for them.
Then int foo(int i) defines a parameter i that is initialized when the function is called.
Then for(int i; i<4;i++) defines an object i that is not initialized. This i has automatic storage duration (which is the default for objects defined inside functions without a storage-class keyword like static). Due to a rule in the C standard, it is also relevant that your program never takes the address of this i, as with using the & operator on it. That rule is in C 2018 6.3.2.1 2, and it talks about the process of getting the value of an object:
… If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
In i<4, i is the lvalue that sentence speaks of, and the program evaluates it when starting the for loop. Because of the rule above, this evaluation has undefined behavior.
Supplement
Suppose inside the for loop you had the simple statement &i;, which merely evaluates the address of i and does nothing with it. Then the rule cited above would not apply, because taking the address of an object prevents declaring it with register. In this case, another rule applies. 6.2.4 6 says uninitialized objects of automatic storage duration have indeterminate values:
… The initial value of the object is indeterminate…
Indeterminate means an object may behave as though it has a different value each time it is used, or it can be a trap representation. Using a trap representation is another way to have undefined behavior, but many C implementations do not have trap representations for int objects these days. In this case, each time your program evaluates the i<4 test in the for or the i++ update or the ret+=i in the body, the C standard allows the program to behave as if i has a new value.
In this case, a variety of outcomes are allowed by the C standard. Some of them lead to undefined behavior in your program due to integer overflow. Others could lead to the function looping indefinitely or executing the loop as if i had been initialized to zero. Although the behavior is not formally undefined, it is not defined well enough to predict what the program will do.
In the for loop scope i is a not initialized automatic storage variable which initilial value is indeterminate. There result of operation using this variable is undefined
— The value of an object with automatic storage
duration is used while it is indeterminate (6.2.4, 6.7.9,
6.8)
6.7.9 p10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
Say I have this code:
void foo() {
char s[10];
char v1 = s[0]; // UB
char v2 = s[10]; // also UB
}
void bar() {
char s[10];
strcpy(s, "foo");
char v3 = s[3]; // v3 is zero
char v4 = s[0]; // v4 is 'f'
char v5 = s[4]; // What?
}
As the address of s[0] to s[3] are accessed in strcpy, and that s[0] to s[9] are in continuous memory, I suppose the whole array should contain some value (including indeterminate).
Is the operation about v5 well-defined? Or v5 is only an indeterminate value (without tripping any UB)?
What if the array is of type int and still partially assigned?
It can't be undefined because the char there might have a trap representation, because 6.2.6.1p5 says that accessing anything with a character type is well defined.
It could be undefined because of 6.3.2.1p2
An lvalue designating an object of automatic storage duration that
could have been declared with the register storage class is used in a
context that requires the value of the designated object, but the
object is uninitialized.
so the question is, could the array have been declared with the register storage class?
The answer to that is no, it couldn't have, because you're indexing it. Indexing is defined according to 6.5.2.1p2
(
A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
)
in terms of the array coverting to the address of its first element, but for a register-classified array, such conversion would have been undefined as per bullet point:
An lvalue having array type is converted to a pointer to the initial
element of the array, and the array object has register storage class
(6.3.2.1).
in appendix J.2 Undefined behavior, which means the array couldn't have been declared register.
Footnote 121 in 6.7.1 Storage class specifiers further elaborates this:
the address of any part of an object declared with storage-class
specifier register cannot be computed, either explicitly (by use of
the unary & operator as discussed in 6.5.3.2) or implicitly (by
converting an array name to a pointer as discussed in 6.3.2.1). Thus,
the only operators that can be applied to an array declared with
storage-class specifier register are sizeof and _Alignof
(In other words, while the language allows register arrays, they're essentially unusable).
Consequently, code like:
char unspecified(void){ char s[1]; return s[0]; }
will return an unspecified value but will not render your program's behavior undefined.
The authors of the Standard did not think that it was necessary to explicitly describe corner cases which every compiler to date had consistently handled the same way, and for which they saw no reason why any implementation might behave differently if its designer wasn't being deliberately obtuse. Scenarios involving partially-written aggregates fall into this category.
The behavior of array subscripting is defined as taking the address of the array, performing arithmetic on the resulting pointer, and then accessing the resulting address. Personally I think it should be defined as a separate kind of operation with slightly different corner cases from explicitly taking an array's address, doing the pointer arithmetic, and casting the result, but the Standard defines the operation in terms of those steps. As such, a compiler that is not being deliberately obtuse should regard an array which is accessed using the subscript operator as an object whose address is taken, and which may be thus be accessed whether or not it has been written. That does, however, still leave open a question about the behavior of such code.
Assuming "unsigned char" is 8 bits and "unsigned" is 24 or more, what values could the following return:
unsigned test1(unsigned char *p)
{
unsigned x=p[0];
unsigned y=p[0];
unsigned z=y;
return x | (y << 8) | (z << 16);
}
unsigned test(void)
{
unsigned char foo[1];
return test1(foo); // Note that this takes the address of 'foo'.
}
Personally, I doubt there would be any real disadvantage to requiring that code generated for test1 must behave as though x, y and z all hold the same value in the range 0..255, or--at absolute minimum--behaving as though y and z hold the same value. I don't think the authors of the Standard would have expected that any non-obtuse implementation wouldn't behave that way, but the Standard doesn't actually require it, and some people seem to believe that requiring such behavior would unduly restrict optimization.
Yes it is undefined behavior.
Partially assigned array is an array containing initialized and uninitialized memory areas. Reading the uninitialized memory areas are undefined behavior just like reading any other uninitialized memory areas.
This question is about the definedness or otherwise of assigning an uninitalised automatic variable to another one of the same type.
Consider
typedef struct
{
int s1;
int s2;
} Foo;
typedef union
{
int u1;
Foo u2;
} Bar;
int main()
{
{
int a;
int b = a; // (1)
}
{
Foo a;
Foo b = a; // (2)
}
{
Bar a;
a.u1 = 0;
Bar b = a; // (3)
}
}
Referring to the comments in main:
(1) is undefined since a is uninitialised. That much I know.
But what about (2)? The struct members s1 and s2 are uninitialised.
Furthermore, what about (3)? The memory u2.s2 is uninitialised, so reading it is undefined behaviour no?
The behavior is undefined in (1) and (2).
Per the C standard, the value of an object with automatic storage duration that is not initialized is indeterminate (C 2011 [N1570] 6.7.9 10). Nominally, this means it has some value, but we do not know what it is while writing the program.
However, the standard also says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined” (6.3.2.1 2). In your sample code, the address of a is never taken, and it is not initialized, and using it in an expression is an lvalue. Therefore, the behavior is undefined.
(This passage, 6.3.2.1 2, was designed to accommodate processors that can detect use of an uninitialized register. Nonetheless, the rule in the C standard applies to all implementations.)
(3) is not clearly addressed by the C standard. Although a member of the union has been assigned a value, and hence is not uninitialized for purposes of 6.3.2.1 2, the object being used in b = a is the union, not its member. Obviously, our intuitive notion is that, if a member of a union is assigned a value, the union has a value. However, I do not see this specified in the C standard.
We can infer 6.3.2.1 2 is not intended to consider a union or structure to be uninitialized, at least if part of it has been assigned a value, because:
Structures can have unnamed members, such as unnamed bit fields.
Per C 6.7.9 9, unnamed members of structures have indeterminate value, even after initialization (of the structures).
If 6.3.2.1 2 applied to structures in which not every member had been assigned a value, then b = a would always be undefined if a were a structure with an unnamed member and had automatic storage duration.
That seems unreasonable and not what the standard intended.
However, there is some wiggle room here. The standard could have specified that a structure is not uninitialized only if it were initialized or all of its named members have been assigned values. In that case (3) would be undefined if a were a structure in which only one member had been assigned a value. I do not think this wiggle room exists with a union; if a member of the union has been assigned a value, it is only reasonable to consider the union not to be uninitialized.
In general, assigning from an uninitialized object isn't undefined behavior, it only makes the result unspecified.
But the code you show indeed has undefined behavior -- for a different reason than you assume. Citing N1570 (latest C11 draft), §6.3.2.1 p2 here:
[...] If
the lvalue designates an object of automatic storage duration that could have been
declared with the register storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
Explaining this a bit: The C standard is prepared to handle values that aren't stored in an addressable location. This is typically the case when they are held in one of the CPU's registers. Explicitly giving an object the register storage class is only a hint to the compiler that it should, if sensible, hold that object in a register. The other way around, a compiler is free to hold any object with automatic storage duration in a register as long as the code doesn't need to address it (by taking a pointer).
In your code, you have uninitialized objects with automatic storage duration that never have their address taken, so the compiler would be free to place them in registers. This means there is no value for the object (not even an unspecified one) before it is initialized. Therefore, using this potentially non-existent value to initialize another object (or, for other purposes) is undefined behavior.
If your code would take a pointer to the respective a in all these examples, the result of the assignment would be unspecified (of course), but the behavior would be defined.
It's worth to add that structs and unions have nothing to do with the answer to your question. The rules are the same for all kind of objects with automatic storage duration. That said, in your third example, a isn't uninitialized any more, after you assign one member of the union. So for your third example, the behavior is well-defined. It doesn't matter what's in the other member of the union, a union can only hold a value for one of its members at a time.
In C is it valid to use a variable in the same statement in which it is declared?
In both gcc 4.9 and clang 3.5 the following program compiles and runs without error:
#include "stdio.h"
int main() {
int x = x;
printf("%d\n", x);
}
In gcc it outputs 0 and in clang 32767 (which is largest positive 2-byte integer value).
Why does this not cause a compilation error? Is this valid in any particular C specification? Is its behavior explicitly undefined?
int x = x;
This is "valid" in the sense that it doesn't violate a constraint or syntax rule, so no compile-time diagnostic is required. The name x is visible within the initializer, and refers to the object being declared. The scope is defined in N1570 6.2.1 paragraph 7:
Any other identifier [other than a struct, union, or enum tag, or
an enum constant] has scope that begins just after the completion of
its declarator.
The declarator in this case is int x.
This allows for things like:
int x = 10, y = x + 1;
But the declaration has undefined behavior, because the initializer refers to an object that hasn't been initialized.
The explicit statement that the behavior is undefined is in N1570 6.3.2.1 paragraph 2, which describes the "conversion" of an lvalue (an expression that designates an object) to the value stored in that object.
Except when [list of cases that don't apply here], an
lvalue that does not have array type is converted to the value stored
in the designated object (and is no longer an lvalue); this is called
lvalue conversion.
[...]
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never
had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
The object in question is x, referenced in the initializer. At that point, no value has been assigned to x, so the expression has undefined behavior.
In practice, you'll probably get a compile-time warning if you enable a high enough warning level. The actual behavior might be the same as if you had omitted the initializer:
int x;
but don't count on it.
According to the language specification
6.7.8.10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
Further, it says
6.7.8.11 The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion).
Hence, the value of the initializer expression (x to the right of =) is indeterminate, so we are dealing with undefined behavior, because initializer reads from variable x that has indeterminate value.
Various compilers provide warning settings to catch such conditions.
int x = x;
is cause for undefined behavior. Don't count on any predictable behavior.
Clang does warn about this:
$ clang -c -Wall ub_or_not_ub.c
ub_or_not_ub.c:4:11: warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]
int x = x;
~ ^
So I guess it's undefined behavior.