Can two implementation defined identical expressions give different results? - c

Related to: Three questions: Is NULL - NULL defined? Is (uintptr_t)NULL - (uintptr_t)NULL defined?
Lets consider:
Case 1:
(uintptr_t)NULL - (uintptr_t)NULL will the result always be zero?
Case 2 (ispired by the Eric comment):
uintptr_t x = (uintptr_t)NULL;
will x - x be always zero?
case 3:
uintptr_t x = (uintptr_t)NULL, y = (uintptr_t)NULL;
Will x-y be always zero?
Case 4:
void *a;
/* .... */
uintptr_t x = (uintptr_t)a, y = (uintptr_t)a;
Will x-y be always zero?
If not - why?

Can two implementation defined identical expressions give different results?
Yes. It's "implementation-defined" - all rules are up to implementation. An imaginary implementation may look like this:
int main() {
void *a = 0;
#pragma MYCOMPILER SHIFT_UINTPTR 0
printf("%d\n", (int)(uintptr_t)a); // prints 0
#pragma MYCOMPILER SHIFT_UINTPTR 5
printf("%d\n", (int)(uintptr_t)a); // prints 5
}
Still such an implementation would be insane on most platforms.
I could imagine a example: architecture that has to deal with memory in "banks". A compiler for that architecture uses a #pragma switch to select the "bank" that is used for dereferencing pointers.
(uintptr_t)NULL - (uintptr_t)NULL will the result always be zero?
Not necessarily.
will x - x be always zero?
Yes. uintptr_t is an unsigned integer type, it has to obey the laws of mathematics.
Will x-y be always zero?
Not necessarily.
Will x-y be always zero?
Not necessarily.
If not - why?
The result of conversion from void* to uintptr_t is implementation defined - the implementation may convert the pointer value to different uintptr_t value each time, which would result in a non-zero difference between the values.
I could see a example: on some imaginary architecture pointers have 48-bits, while uintptr_t has 64-bits. A compiler for such architecture just "doesn't care" what is in those 16 extra bits and when converting uintptr_t to a pointer it uses only the 48-bits. When converting pointer to an uintrpt_t compiler uses whatever garbage value was leftover in registers for the extra 16-bits, because it's fast to do that in that specific architecture and because they will never be used when converting back..

You won't find a system in use anywhere where NULL isn't defined as some form of 0, be it the literal value or a void * with value 0, so all of your checks will either work as you'd expect or be syntax errors (can't subtract void * values).
Is it possible to be defined as anything else though? Theoretically. Though it's still a constant, so subtracting them (language and type allowing) will still equal 0.

Related

Is an optimized out variable allowed to hold a value out of its range? [duplicate]

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

unexpected byte order after casting pointer-to-char into pointer-to-int

unsigned char tab[4] = 14;
If I print as individual bytes...
printf("tab[1] : %u\n", tab[0]); // output: 0
printf("tab[2] : %u\n", tab[1]); // output: 0
printf("tab[3] : %u\n", tab[2]); // output: 0
printf("tab[4] : %u\n", tab[3]); // output: 14
If I print as an integer...
unsigned int *fourbyte;
fourbyte = *((unsigned int *)tab);
printf("fourbyte : %u\n", fourbyte); // output: 234881024
My output in binary is : 00001110 00000000 00000000 00000000, which is the data I wanted but in this order tab[3] tab[2] tab[1] tab[0].
Any explanation of that, why the unsigned int pointer points to the last byte instead of the first ?
The correct answer here is that you should not have expected any relationship, order or otherwise. Except for unions, the C standard does not define a linear address space in which objects of different types can overlap. It is the case on many architecture/compiler-tool-chain combinations that these coincidences can occur from time to time, but you should never rely on them. The fact that by casting a pointer to a suitable scalar type yields a number comparable to others of the same type, in no-way implies that number is any particular memory address.
So:
int* p;
int z = 3;
int* pz = &z;
size_t cookie = (size_t)pz;
p = (int*)cookie;
printf("%d", *p); // Prints 3.
Works because the standard says it must work when cookie is derived from the same type of pointer that it is being converted to. Converting to any other type is undefined behavior. Pointers do not represent memory, they reference 'storage' in the abstract. They are merely references to objects or NULL, and the standard defines how pointers to the same object must behave and how they can be converted to scalar values and back again.
Given:
char array[5] = "five";
The standard says that &(array[0]) < &(array[1]) and that (&(array[0])) + 1) == &(array[1]), but it is mute on how elements in array are ordered in memory. The compiler writers are free to use whatever machine codes and memory layouts that they deem are appropriate for the target architecture.
In the case of unions, which provides for some overlap of objects in storage, the standard only says that each of its fields must be suitably aligned for their types, but just about everything else about them is implementation defined. The key clause is 6.2.6.1 p7:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
The gist of all of this is that the C standard defines an abstract machine. The compiler generates an architecture specific simulation of that machine based on your code. You cannot understand the C abstract machine through simple empirical means because implementation details bleed into your data set. You must limit your observations to those that are relevant to the abstraction. Therefore, avoid undefined behavior and be very aware of implementation defined behaviors.
Your example code is running on a computer that is Little-Endian. This term means that the "first byte" of an integer contains the least significant bits. By contrast, a Big-Endian computer stores the most significant bits in the first byte.
Edited to add: the way that you've demonstrated this is decidedly unsafe, as it relies upon undefined behavior to get "direct access" to the memory. There is a safer demonstration here

Using bit operations to "turn off" binary digits of a pointer

I was able to use bit operations to "turn off" binary digits of a number.
Ex:
x = x & ~(1<<0)
x = x & ~(1<<1)
(and repeat until desired number of digits starting from the right are changed to 0)
I would like to apply this technique to a pointer's address.
Unfortunately, the & operator cannot be used with pointers. Using the same lines of code as above, where x is a pointer, the compiler says "invalid operands to binary & (have int and int)."
I tried to typecast the pointers as ints, but that doesn't work as I assume the ints are too small (and I just realized I'm not allowed to cast).
(note: though this is part of a homework problem, I've already reasoned out why I need to turn off some digits after a good couple hours, so I'm fine in that regard. I'm simply trying to see if I can get a clever technique to do what I want to do here).
Restrictions: I cannot use loops, conditionals, any special functions, constants greater than 255, division, mod.
(edit: added restrictions to the bottom)
Use uintptr_t from <stdint.h>. You should always use unsigned types for bit twiddling, and (u)intptr_t is specifically chosen to be able to hold a pointer's value.
Note however that adjusting a pointer manually and dereferencing it is undefined behaviour, so watch your step. You shall be able to recover the exact original value of the pointer (or another valid pointer) before doing so.
Edit : from your comment I understand that you don't plan on dereferencing the twiddled pointer at all, so no undefined behaviour for you. Here is how you can check if your pointers share the same 64-byte block :
uintptr_t p1 = (uintptr_t)yourPointer1;
uintptr_t p2 = (uintptr_t)yourPointer2;
uintptr_t mask = ~(uintptr_t)63u; // Shave off 5 low-order bits
return (p1 & mask) == (p2 & mask);
C language standard library includes the (optional though) type intptr_t, for which there is guarantee that "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Of course if you perform bitwise operation on the integer than the result is undefined behaviour.
Edit:
How unfortunate haha. I need a function to show two pointers are in
the same 64-byte block of memory. This holds true so long as every
digit but the least significant 6 digits of their binary
representations are equal. By making sure the last 6 digits are all
the same (ex: 0), I can return true if both pointers are equal. Well,
at least I hope so.
You should be able to check if they're in the same 64 block of memory by something like this:
if ((char *)high_pointer - (char *)low_pointer < 64) {
// do stuff
}
Edit2: This is likely to be undefined behaviour as pointed out by chris.
Original post:
You're probably looking for intptr_t or uintptr_t. The standard says you can cast to and from these types to pointers and have the value equal to the original.
However, despite it being a standard type, it is optional so some library implementations may choose not to implement it. Some architectures might not even represent pointers as integers so such a type wouldn't make sense.
It is still better than casting to and from an int or a long since it is guaranteed to work on implementations that supply it. Otherwise, at least you'll know at compile time that your program will break on a certain implementation/architecture.
(Oh, and as other answers have stated, manually changing the pointer when casted to an integer type and dereferencing it is undefined behaviour)

Short int values above 32767

When trying to store short integer values above 32,767 in C, just to see what happens, I notice the result that gets printed to the screen is the number I am trying to store, - 65,536. E.g. if i try to store 65,536 in a short variable, the number which is printed to the screen is 0. If I try to store 32,768 I get -32,768 printed to the screen. And if I try to store 65,546 and print it to the screen I get 10. I think you get the picture. Why am I seeing these results?
Integer values are stored using Twos Complement. In twos complement, the range of possible values is -2^n to 2^n-1, where n is the number of bits used for storage. Because of the way the storage is done, when you go above 2^n-1, you end up wrapping around back to 2^n.
Shorts use 16 bits (15 bits for numeric storage, with the final bit being the sign).
EDIT: Keep in mind that this behavior is NOT guaranteed to occur. Programming languages may act completely differently. Technically, going above or below the max/min value is undefined behavior and it should be treated as such. (Thanks to Eric Postpischil for keeping me on my toes)
When a value is converted to a signed integer type but cannot be represented in that type, overflow occurs, and the behavior is undefined. It is common to see results as if a two’s complement encoding is used and as if the low bits are stored (or, equivalently, the value is wrapped modulo an appropriate power of two). However, you cannot rely on this behavior. The C standard says that, when signed integer overflow occurs, the behavior is undefined. So a compiler may act in surprising ways.
Consider this code, compiled for a target where short int is 16 bits:
void foo(int a, int b)
{
if (a)
{
short int x = b;
printf("x is %hd.\n", x);
}
else
{
printf("x is not assigned.\n");
}
}
void bar(int a, int b)
{
if (b == 65536)
foo(a, b);
}
Observe that foo is a perfectly fine function on its own, provided b is within range of a short int. And bar is a perfectly fine function, as long as it is called only with a equal to zero or b not equal to 65536.
While the compiler is in-lining foo in bar, it may deduce from the fact that b must be 65536 at this point, that there would be an overflow in short int x = b;. This implies either that this path is never taken (i.e., a must be zero) or that any behavior is permitted (because the behavior upon overflow is undefined by the C standard). In either case, the compiler is free to omit this code path. Thus, for optimization, the compiler could omit this path and generate code only for printf("x is not assigned.\n");. If you then executed code containing bar(1, 65536), the output would be “x is not assigned.”!
Compilers do make optimizations of this sort: The observation that one code path has undefined behavior implies the compiler may conclude that code path is never used.
To an observer, it looks like the effect of assigning a too-large value to a short int is to cause completely different code to be executed.

(Why) is using an uninitialized variable undefined behavior?

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Resources