Does the C99 standard permit assignment of a variable to itself? - c

Does the C99 standard allow variables to be assigned to themselves? For instance, are the following valid:
int a = 42;
/* Case 1 */
a = a;
/* Case 2 */
int *b = &a;
a = *b;
While I suspect Case 1 is valid, I'm hesitant to say the same for Case 2.
In the case of an assignment, is the right side completely evaluated before assigning the value to the variable on the left -- or is a race condition introduced when dereferencing a pointer to the variable being assigned?

Both cases are perfectly valid, since the value of a is only used to determine the value that is to be stored, not to determine the object in which this value is to be store.
In essence in an assignment you have to distinguish three different operations
determine the object to which the value is to be stored
evaluate the RHS
store the determined value in the determined object
the first two of these three operations can be done in any order, even in parallel. The third is obviously a consequence of the two others, so it will come after.

This is perfectly valid, you are only using the previous value to determine the value to be stored. This is covered in the draft C99 standard section 6.5.2 which says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.Furthermore, the prior value shall be read only to
determine the value to be stored.
One of the examples of valid code is as follows:
i = i + 1;
The C and C++ section here covers the different places where a sequence point can occur.

C99 6.5.16.1 Simple assignment
3 If the value being stored in an object is read from another object that overlaps in any way
the storage of the first object, then the overlap shall be exact and the two objects shall
have qualified or unqualified versions of a compatible type; otherwise, the behavior is
undefined.
I think the example code qualifies the "overlap" condition. Since they do have qualified version of a compatible type, the result is valid.
Also 6.5.16 Assignment operators
4 The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.
Still, there's no "attempt to modify the result" so the result is valid.

Assuming the compiler doesn't optimize the first instruction out by simply removing it, there is even a race condition here. On most architecture, if a is stored in memory a = a will be compiled in two move instructions (mem => reg, reg => mem) and therefore is not atomic.
Here is an example:
int a = 1;
int main()
{ a = a; }
Result on an Intel x86_64 with gcc 4.7.1
4004f0: 8b 05 22 0b 20 00 mov 0x200b22(%rip),%eax # 601018 <a>
4004f6: 89 05 1c 0b 20 00 mov %eax,0x200b1c(%rip) # 601018 <a>

I can't see a C compiler not permitting a = a. Such an assignment may occur serendipitously due to macros without a programmer knowing it. It may not even generate any code for that is an optimizing issue.
#define FOO (a)
...
a = FOO;
Sample code readily compiles and my review of the C standard shows no prohibition.
As to race conditions #Yu Hao answers that well: no race condition.

Related

Does every expression in C make a copy of the variables?

I'm trying to understand R-values and L-values when it comes to expressions in C and I know that many expressions are not valid L-values because the address of the end result is not known. This is because in many instances, a copy of the variables is used in the expression and so the address of the copy is not known. For example,
char ch = 'a';
char *p = &ch;
p + 1; //This expression can be used as an R-value, but not an L-value
I believe what is happening (please correct me if I'm wrong) in the expression is that a copy of p is created, 1 is added to that copy so that p+1 points at the char after ch, but the address of this new p+1 pointer is not known, so it can't be used as an L-value.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C? For example, if I have
int a = 100;
int b = 25;
a - b;
Is a copy of variable a created (and stored at an unknown location) and a copy of variable b created (the copy of b is also stored at an unknown location), the data in the copies is used to subtract and then the result is stored in another unknown location or is the data taken from the original variables, subtracted and then the result is stored at an unknown location?
Does every expression in C make a copy of the variables?
It doesn't really matter.
The compiler has optimization. Compiler is able to do anything with the code, as long as side effects are the same. In your examples p + 1; and a - b; are doing nothing, they can be optimized out by the compiler, so nothing happens.
What happens on a actual machine with the code generated by the compiler is highly machine specific. Compiler can first load the values of variables to some registers, then perform computation on these registers - or may not, may perform computation on values as they are stored, or may not even reserve any memory for the variables.
I know that many expressions are not valid L-values because the address of the end
result is not known
That could be the underlying reason you could make to justify that, but the rules are just more direct. C standard just lists operators and says that the result of + is a value or the result of * is a lvalue. There is no "generic" rule.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C?
C standard doesn't talk about "variables" or "copies of variables", C standard talks about "values". An expression has a defined value, defined end result. C standard doesn't concern really how the compiler will arrive at that value. So the actual behavior can be anything. If a particular compiler decides to make a copy of the variables or decides not to - good for him, as long as the end results are correct.
lvalue is something that can be on the left side of = operator. Not necessarily it is a variable, for example ((char*)0x01)[1] is a lvalue although it's a result of some operators. rvalue (or just value) is a value that you can't assign to.
The implementation of such calculations are not standard so they can be different in different compilers. The important thing to understand is that R-values represent temporary values. It can be a register or some allocated memory.
Calculations are made in the CPU on registers. So, the compiler will move the value of the variables in to registers and than will calculate the subtraction.
this is an assembly of such a calculation:
659: c7 45 f8 64 00 00 00 movl $0x64,-0x8(%rbp)
660: c7 45 fc 19 00 00 00 movl $0x19,-0x4(%rbp)
667: 8b 45 f8 mov -0x8(%rbp),%eax
66a: 2b 45 fc sub -0x4(%rbp),%eax
You can see that the values 100(64 in hex) and 25(19 in hex) are saved on the stack at addresses relative to base pointer(-0x8, -0x4 respectively).
than the value from -0x8 is moved into the eax register and the value from -0x4 is subtracted from the value in the register and stored in the register itself.
As you can see, the result will end up in a register, so it doesn't have an address at the memory.
According to the C11 standard(ISO-IEC-9899-2011)--6.3.2.1 Lvalues, arrays and function designators:
Except when it is the operand of the sizeof operator, the unary & operator, the ++
operator, the -- operator, or the left operand of the . operator or an assignment operator,
an lvalue that does not have array type is converted to the value stored in the designated
object (and is no longer an lvalue); this is called lvalue conversion.

Is an optimized out variable allowed to hold a value out of its range? [duplicate]

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Operands in "int i = 0"

I would like to ask if this short code:
int i = 0;
has 1 operand or 2? The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't (or maybe I misunderstand). If 0 isn't operand, is it a constant or something?
If it is important, the code is in C99.
In int i = 0;, = is not an operator. It's simply a part of the variable initializaton syntax. On the other hand, in int i; i = 0; it would be an operator.
Since = here is not an operator, there are no operands. Instead, 0 is the initializer.
Since you've tagged the question as "C", one way to look at it is by reading the C standard. As pointed out by this answer, initialization (such as int i = 0) is not an expression in itself, and by a strict reading of the standard, the = here is not an operator according to the usage of those terms in the standard.
It is not as clear whether i and 0 are operands, however. On one hand, the C standard does not seem to refer to the parts of the initialization as operands. On the other hand, it doesn't define the term "operand" exhaustively. For example, one could interpret section 6.3 as calling almost any non-operator part of an expression an "operand", and therefore at least the 0 would qualify as one.
(Also note that if the code was int i; i = 0; instead, the latter i and the 0 would definitely both be operands of the assignment operator =. It remains unclear whether the intent of the question was to make a distinction between assignment and initialization.)
Apart from the C standard, you also refer to Wikipedia, in particular:
In computing, an operand is the part of a computer instruction which specifies what data is to be manipulated or operated on, while at the same time representing the data itself.
If we consider the context of a "computer instruction", the C code might naively be translate to assembly code like mov [ebp-4], 0 where the two operands would clearly be the [ebp-4] (a location where the variable called i is stored) and the 0, which would make both i and 0 operands by this definition. Yet, in reality the code is likely to be optimized by the compiler into a different form, such as only storing i in a register, in which case zeroing it might become xor eax, eax where the 0 no longer exists as an explicit operand (but is the result of the operation). Or, the whole 0 might be optimized away and replaced by some different value that inevitably gets assigned. Or, the whole variable might end up being removed, e.g., if it is used as a loop counter and the loop is unrolled.
So, in the end, either it is something of a philosophical question ("does the zero exist as an operand if it gets optimized away"), or just a matter of deciding on the desired usage of the terms (perhaps depending on the context of discussion).
The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't
(or maybe I misunderstand).
The question links to the Wikipedia page describing the term "operand" in a mathematical context. This likely factors in to your confusion about whether the 0 on the right-hand side of the = is an operand, because in a mathematical statement of equality such as appears in the article, it is in no way conventional to consider = an operator. It does not express a computation, so the expressions it relates are not operated upon. I.e. they do not function as operands.
You have placed the question in C context, however, and in C, = can be used as an assignment operator. This is a bona fide operator in that it expresses a (simple) operation that produces a result from two operands, and also has a side effect on the left-hand operand. In an assignment statement such as
i = 0;
, then, both i and 0 are operands.
With respect to ...
If 0 isn't operand, is it a constant or something?
... 0 standing on its own is a constant in C, but that has little to do with whether it serves as an operand in any given context. Being an operand is a way an expression, including simply 0, can be used. More on that in a moment.
Now it's unclear whether it was the intent, but the code actually presented in the question is very importantly different from the assignment statement above. As the other answers also observe,
int i = 0;
is a declaration, not a statement. In that context, the i is the identifier being declared, the 0 is used as its initializer, and just as the = in a mathematical equation is not an operator, the = introducing an initializer in a C declaration is not an operator either. No operation is performed here, in that no value computation is associated with this =, as opposed to when the same symbol is used as an assignment operator. There being no operator and no operation being performed, there also are no operands in this particular line of code.

Parentheses operator clarification

int a=10,b=20;
b = a+b-(a=b);
In this expression why (a=b) is not first operation? if it is performs according to priority then it b has to get 20 itself. But b is getting 10 itself, why? Could any one clarify my doubt?
This invokes undefined behavior. Anything could be happen. Note that it is sure here that (a=b) evaluates before the subtraction but it does not guarantee that the value of b get assigned to a just after the evaluation. a may get modified after the next sequence point (the ; of the statement here).
The Standard states that
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
Suggested reading: c-faq Question 3.8

(Why) is using an uninitialized variable undefined behavior?

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Resources