In C is it valid to use a variable in the same statement in which it is declared?
In both gcc 4.9 and clang 3.5 the following program compiles and runs without error:
#include "stdio.h"
int main() {
int x = x;
printf("%d\n", x);
}
In gcc it outputs 0 and in clang 32767 (which is largest positive 2-byte integer value).
Why does this not cause a compilation error? Is this valid in any particular C specification? Is its behavior explicitly undefined?
int x = x;
This is "valid" in the sense that it doesn't violate a constraint or syntax rule, so no compile-time diagnostic is required. The name x is visible within the initializer, and refers to the object being declared. The scope is defined in N1570 6.2.1 paragraph 7:
Any other identifier [other than a struct, union, or enum tag, or
an enum constant] has scope that begins just after the completion of
its declarator.
The declarator in this case is int x.
This allows for things like:
int x = 10, y = x + 1;
But the declaration has undefined behavior, because the initializer refers to an object that hasn't been initialized.
The explicit statement that the behavior is undefined is in N1570 6.3.2.1 paragraph 2, which describes the "conversion" of an lvalue (an expression that designates an object) to the value stored in that object.
Except when [list of cases that don't apply here], an
lvalue that does not have array type is converted to the value stored
in the designated object (and is no longer an lvalue); this is called
lvalue conversion.
[...]
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never
had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
The object in question is x, referenced in the initializer. At that point, no value has been assigned to x, so the expression has undefined behavior.
In practice, you'll probably get a compile-time warning if you enable a high enough warning level. The actual behavior might be the same as if you had omitted the initializer:
int x;
but don't count on it.
According to the language specification
6.7.8.10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
Further, it says
6.7.8.11 The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion).
Hence, the value of the initializer expression (x to the right of =) is indeterminate, so we are dealing with undefined behavior, because initializer reads from variable x that has indeterminate value.
Various compilers provide warning settings to catch such conditions.
int x = x;
is cause for undefined behavior. Don't count on any predictable behavior.
Clang does warn about this:
$ clang -c -Wall ub_or_not_ub.c
ub_or_not_ub.c:4:11: warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]
int x = x;
~ ^
So I guess it's undefined behavior.
Related
An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.
Is this undefined behaveiour because in the For loop the variable i has no initial value?
#include <stdio.h>
static int i;
int foo(int i)
{
int ret = i;
for(int i; i<4;i++){
ret+=i;
}
return ret;
}
int main() {
printf("%d", i+foo(4));
return 0;
}
Main Answer
Is this undefined behaveiour because in the For loop the variable i has no initial value?
Yes. C 2018 6.3.2.1 says that using the value of i in this situation has undefined behavior.
In more detail, static int i; defines an object i with static storage duration. Objects with static storage duration are “created” (in the C model of computing) when your program starts and are initialized with zero if no explicit initializer is given for them.
Then int foo(int i) defines a parameter i that is initialized when the function is called.
Then for(int i; i<4;i++) defines an object i that is not initialized. This i has automatic storage duration (which is the default for objects defined inside functions without a storage-class keyword like static). Due to a rule in the C standard, it is also relevant that your program never takes the address of this i, as with using the & operator on it. That rule is in C 2018 6.3.2.1 2, and it talks about the process of getting the value of an object:
… If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
In i<4, i is the lvalue that sentence speaks of, and the program evaluates it when starting the for loop. Because of the rule above, this evaluation has undefined behavior.
Supplement
Suppose inside the for loop you had the simple statement &i;, which merely evaluates the address of i and does nothing with it. Then the rule cited above would not apply, because taking the address of an object prevents declaring it with register. In this case, another rule applies. 6.2.4 6 says uninitialized objects of automatic storage duration have indeterminate values:
… The initial value of the object is indeterminate…
Indeterminate means an object may behave as though it has a different value each time it is used, or it can be a trap representation. Using a trap representation is another way to have undefined behavior, but many C implementations do not have trap representations for int objects these days. In this case, each time your program evaluates the i<4 test in the for or the i++ update or the ret+=i in the body, the C standard allows the program to behave as if i has a new value.
In this case, a variety of outcomes are allowed by the C standard. Some of them lead to undefined behavior in your program due to integer overflow. Others could lead to the function looping indefinitely or executing the loop as if i had been initialized to zero. Although the behavior is not formally undefined, it is not defined well enough to predict what the program will do.
In the for loop scope i is a not initialized automatic storage variable which initilial value is indeterminate. There result of operation using this variable is undefined
— The value of an object with automatic storage
duration is used while it is indeterminate (6.2.4, 6.7.9,
6.8)
6.7.9 p10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
I have encountered a strange behaviour when using compound literals for static struct initialization in GCC in c99/gnu99 modes.
Apparently this is fine:
struct Test
{
int a;
};
static struct Test tt = {1}; /* 1 */
However, this is not:
static struct Test tt = (struct Test) {1}; /* 2 */
This triggers following error:
initializer element is not constant
Also this does not help either:
static struct Test tt = (const struct Test) {1}; /* 3 */
I do understand that initializer value for a static struct should be a compile-time constant. But I do not understand why this simplest initializer expression is not considered constant anymore? Is this defined by the standard?
The reason I'm asking is that I have encountered some legacy code written in GCC in gnu90 mode, that used such compound literal construct for static struct initialization (2). Apparently this was a GNU extension at the time, which was later adopted by C99.
And now it results in that the code that successfully compiled with GNU90 cannot be compiled with neither C99, nor even GNU99.
Why would they do this to me?
This is/was a gcc bug (HT to cremno), the bug report says:
I believe we should just allow initializing objects with static
storage duration with compound literals even in gnu99/gnu11. [...]
(But warn with -pedantic.)
We can see from the gcc document on compound literals that initialization of objects with static storage duration should be supported as an extension:
As a GNU extension, GCC allows initialization of objects with static
storage duration by compound literals (which is not possible in ISO
C99, because the initializer is not a constant).
This is fixed in gcc 5.2. So, in gcc 5.2 you will only get this warning when using the -pedantic flag see it live, which does not complain without -pedantic.
Using -pedantic means that gcc should provide diagnostics as the standard requires:
to obtain all the diagnostics required by the standard, you should
also specify -pedantic (or -pedantic-errors if you want them to be
errors rather than warnings)
A compound literal is not a constant expression as covered by the C99 draft standard section 6.6 Constant expressions, we see from section 6.7.8 Initialization that:
All the expressions in an initializer for an object that has static storage duration shall be
constant expressions or string literals.
gcc is allowed to accept other forms of constant expressions as an extension, from section 6.6:
An implementation may accept other forms of constant expressions.
interesting to note that clang does not complain about this using -pedantic
C language relies on an exact definition of what is constant expression. Just because something looks "known at compile time" does not mean that it satisfies the formal definition of constant expression.
C language does not define the constant expressions of non-scalar types. It allows implementations to introduce their own kinds of constant expressions, but the one defined by the standard are restricted to scalar types only.
In other words, C language does not define the concept of constant expression for your type struct Test. Any value of struct Test is not a constant. Your compound literal (struct Test) {1} is not a constant (and is not a string literal) and, for this reason, it cannot be used as an initializer for objects with static storage duration. Adding a const qualifier to it will not change anything since in C const qualifier has no relation whatsoever to the concept of constant expression. It will never make any difference in such contexts.
Note that your first variant does not involve a compound literal at all. It uses a raw { ... } initializer syntax with constant expressions inside. This is explicitly allowed for objects with static storage duration.
So, in the most restrictive sense, the initialization with a compound literal is illegal, while the initialization with ordinary { ... } initializer is fine. Some compilers might accept compound literal initialization as an extension. (By extending the concept of constant expression or by taking some other extension path. Consult compiler documentation to figure out why it compiles.)
Interestingly, the clang does not complain with this code, even with -pedantic-errors flag.
This is most certainly about C11 §6.7.9/p4 Initialization (emphasis mine going forward)
All the expressions in an initializer for an object that has static or
thread storage duration shall be constant expressions or string
literals.
Another subclause to look into is §6.5.2.5/p5 Compound literals:
The value of the compound literal is that of an unnamed object
initialized by the initializer list. If the compound literal occurs
outside the body of a function, the object has static storage
duration; otherwise, it has automatic storage duration associated with
the enclosing block.
and (for completeness) §6.5.2.5/p4:
In either case, the result is an lvalue.
but this does not mean, that such unnamed object can be treated as constant expression. The §6.6 Constant expressions says inter alia:
2) A constant expression can be evaluated during translation rather
than runtime, and accordingly may be used in any place that a constant
may be.
3) Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.
10) An implementation may accept other forms of constant expressions.
There is no explicit mention about compound literals though, thus I would interpret this, they are invalid as constant expressions in strictly conforming program (thus I'd say, that clang has a bug).
Section J.2 Undefined behavior (informative) also clarifies that:
A constant expression in an initializer is not, or does not evaluate
to, one of the following: an arithmetic constant expression, a null
pointer constant, an address constant, or an address constant for a
complete object type plus or minus an integer constant expression
(6.6).
Again, no mention about compound literals.
Neverthless, there is a light in the tunnel. Another way, that is fully sanitized is to convey such unnamed object as address constant. The standard states in §6.6/p9 that:
An address constant is a null pointer, a pointer to an lvalue
designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type. The
array-subscript [] and member-access . and -> operators, the address &
and indirection * unary operators, and pointer casts may be used in
the creation of an address constant, but the value of an object shall
not be accessed by use of these operators.
hence you can safely initialize it with constant expression in this form, because such compound literal indeed designates an lvalue of object, that has static storage duration:
#include <stdio.h>
struct Test
{
int a;
};
static struct Test *tt = &((struct Test) {1}); /* 2 */
int main(void)
{
printf("%d\n", tt->a);
return 0;
}
As checked it compiles fine with -std=c99 -pedantic-errors flags on both gcc 5.2.0 and clang 3.6.
Note, that as opposite to C++, in C the const qualifier has no effect on constant expressions.
ISO C99 does support compound literals (according to this). However, currently only the GNU extension provides for initialization of objects with static storage duration by compound literals, but only for C90 and C++.
A compound literal looks like a cast containing an initializer. Its value is an object of the type specified in the cast, containing the elements specified in the initializer; it is an lvalue. As an extension, GCC supports compound literals in C90 mode and in C++, though the semantics are somewhat different in C++.
Usually, the specified type is a structure. Assume that struct foo and structure are declared as shown:
struct foo {int a; char b[2];} structure;
Here is an example of constructing a struct foo with a compound literal:
structure = ((struct foo) {x + y, 'a', 0});
This is equivalent to writing the following:
{
struct foo temp = {x + y, 'a', 0};
structure = temp;
}
GCC Extension:
As a GNU extension, GCC allows initialization of objects with static storage duration by compound literals ( which is not possible in ISO C99, because the initializer is not a constant ). It is handled as if the object is initialized only with the bracket enclosed list if the types of the compound literal and the object match. The initializer list of the compound literal must be constant. If the object being initialized has array type of unknown size, the size is determined by compound literal size.
static struct foo x = (struct foo) {1, 'a', 'b'};
static int y[] = (int []) {1, 2, 3};
static int z[] = (int [3]) {1};
Note:
The compiler tags on your post include only GCC; however, you make comparisons to C99, (and multiple GCC versions). It is important to note that GCC is quicker to add extended capabilities to its compilers than the larger C standard groups are. This has sometimes lead to buggy behavior and inconsistencies between versions. Also important to note, extensions to a well known and popular compiler, but that do not comply with an accepted C standard, lead to potentially non-portable code. It is always worth considering target customers when deciding to use an extension that has not yet been accepted by the larger C working groups/standards organizations. (See ISO (Wikipedia) and ANSI (Wikipedia).)
There are several examples where the smaller more nimble Open Source C working groups or committees have responded to user base expressed interest by adding extensions. For example, the switch case range extension.
Quoting the C11 standard, chapter §6.5.2.5, Compound literals, paragraph 3, (emphasis mine)
A postfix expression that consists of a parenthesized type name followed by a brace-enclosed list of initializers is a compound literal. It provides an unnamed object whose value is given by the initializer list.
So, a compound literal is tread as an unnamed object, which is not considered a compile time constant.
Just like you cannot use another variable to initialize a static variable, onward C99, you cannot use this compound literal either to initialize a static variable anymore.
Imagine this:
int X;
X = X;
this would be undefined behavior as
1 The behavior is undefined in the following circumstances:
[...]
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
But what about this?
int X;
X;
would the invokation of X; in reference to the quote allow the compiler to cause undefined behavior? Or does this not count as X is "used"?
In C 1999, it is not directly an error to use an uninitialized object. (Your quotes from Annex J are not a normative part of the standard; they are just informative.) An uninitialized object with automatic storage duration has an indeterminate value. For some objects, that value may be a trap representation, so using it may result in undefined behavior.
However, for some objects, it is possible to determine that an uninitialized object cannot have a trap value. For example, an unsigned char cannot have a trap value, and the exact-width signed integer types defined in stdint.h cannot have trap values (because they are two’s complement with no padding bits). For other types, it may be that properties defined by your C implementation cause them not to have trap values. Using an uninitialized int X does not have defined behavior in all C 1999 implementations (but does in some), but using an uninitialized unsigned char X does.
In C 2011, this text was added in 6.3.2 2: “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.” Therefore, in C 2011, both X = X; and X; have undefined behavior.
History/background:
The C 2011 change supports a Hewlett-Packard machine which has a special flag for certain registers that indicates whether the register contents are valid or not. The machine can generate an exception if a register is used while its contents are invalid. Hence, if the compiler assigns the X of unsigned char X to such a register, using the register when it is invalid may cause an exception in the machine even though there is no unsigned char trap value.
X = X;
The above is undefined behavior. Because X is not initialized. The compiler should at least generate a warning about this.
The standard states 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However:
X;
The above is similar to:
1212342413;
As X will evaluate to some expression.
I was answering a question and made this test program.
#include <stdio.h>
int main()
{
volatile const int v = 5;
int * a = &v;
*a =4;
printf("%d\n", v);
return 0;
}
Without the volatile keyword the code optimizes (compiled with -O3 apple clang 4.2) the change of the var away, with it works as expected and the const variable is modified correctly.
I was wondering if a more experienced C developer knows if there is a part of the standard that says this is unsafe or UB.
UPDATE: #EricPostpischil gave me this standards quote
A program may not modify its own object defined with a const-qualified type, per C 2011 (N1570) 6.7.3 6: “If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.” An external agent may modify an object that has volatile-qualified type, per 6.7.3 7: “An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects
My program breaks the first rule but I thought that the second rule may exempt a program from the first.
UPDATE 2:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.
If you look at this quote you can see the var must be evaluated according to certain rules, I haven't read through all of section 5.1.2.3 but I believe that this may shed some light on the issue.
It is unsafe because the same behavior cannot be guaranteed for use in other compilers. So your code is compiler-dependent and may even be compiler switch dependent. That's why it's a bad idea.
This line:
int * a = &v;
is a constraint violation. The compiler must produce a diagnostic message, and may reject the program. If the compiler produces an executable anyway, then that executable has completely undefined behaviour (i.e. the C Standard no longer covers the program at all).
The constraints violated are that volatile nor const may not be implicitly converted away.
To comply with the C standard, the pointer must have its pointed-to type having the same or stronger qualifiers as the object being pointed to, e.g.:
int const volatile *a = &v;
after which you will find that the line *a = 4; causes a compilation error.
A possible attempt might be:
int *a = (int *)&v;
This line must compile, but then it causes undefined behaviour to read or write via *a. The undefined behaviour is specified by C11 6.7.3/6 (C99 and C89 had similar text):
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.