Signed integers' undefined behavior and Apple Secure Coding Guide - c

Apple Secure Coding Guide says the following (page 27):
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped.
However, regards to signed integer overflow C standard (89) says:
An example of undefined behavior is the behavior on integer overflow.
and
If an exception occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined.
Is the Coding Guide wrong? Is there something here that I don't get? I am not convinced myself that Apple Secure Coding Guide could get this wrong.

Here is a second opinion, from a static analyzer described as detecting undefined behavior:
int x;
int main(){
x = 0x7fffffff + 1;
}
The analyzer is run so:
$ frama-c -val -machdep x86_32 t.c
And it produces:
[kernel] preprocessing with "gcc -C -E -I. t.c"
[value] Analyzing a complete application starting at main
...
t.c:4:[kernel] warning: signed overflow. assert 0x7fffffff+1 ≤ 2147483647;
...
[value] Values at end of function main:
NON TERMINATING FUNCTION
This means that the program t.c contains undefined behavior, and that no execution of it ever terminates without causing undefined behavior.

Let's take this example:
1 << 32
If we assume 32-bit int, C clearly says it is undefined behavior. Period.
But any implementation can define this undefined behavior.
gcc for example says (while not very explicit in defining the behavior):
GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change.
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
I don't know for clang but I suspect that as for gcc, the evaluation of an expression like 1 << 32 would give no surprise (that is, evaluate to 0).
But even if it is defined on implementations running in Apple operating systems, a portable program should not make use of expressions that invoke undefined behavior in the C language.
EDIT: I thought the Apple sentence was dealing only with bitwise << operator. It looks like it's more general and in that case for C language, they are utterly wrong.

The two statements are not mutually incompatible.
The standard does not define what behaviour each implementation is required to provide (so different implementations can do different things and still be standard conformant).
Apple is allowed to define the behaviour of its implementation.
You as a programmer would be well advised to treat the behaviour as undefined since your code may need to be moved to other platforms where the behaviour is different, and perhaps because Apple could, in theory, change its mind in the future and still conform to the standard.

Consider the code
void test(int mode)
{
int32_t a = 0x12345678;
int32_t b = mode ? a*0x10000 : a*0x10000LL;
return b;
}
If this method is invoked with a mode value of zero, the code will compute the long long value 0x0000123456780000 and store it into a. The behavior of this is fully defined by the C standard: if bit 31 of the result is clear, it will lop off all but the bottom 32 bits and store the resulting (positive) integer into a. If bit 31 were set and the result were being stored to a 32-bit int rather than a variable of type int32_t, the implementation would have some latitude, but implementations are only allowed to define int32_t if they would perform such narrowing conversions according to the rules of two's-complement math.
If this method were invoked with a non-zero mode value, then the numerical computation would yield a result outside the range of the temporary expression value, and as such would cause Undefined Behavior. While the rules dictate what should happen if a calculation performed on a longer type is stored into a shorter one, they do not indicate what should happen if calculations don't fit in the type with which they are performed. A rather nasty gap in the standard (which should IMHO be plugged) occurs with:
uint16_t multiply(uint16_t x, uint16_t y)
{
return x*y;
}
For all combinations of x and y values where the Standard says anything about what this function should do, the Standard requires that it compute and return the product mod 65536. If the Standard were to mandate that for all combinations of x and y values 0-65535 this method must return the arithmetical value of (x*y) mod 65536, it would be mandating behavior with which 99.99% of standards-compliant compilers would already be in conformance. Unfortunately, on machines where int is 32 bits, the Standard presently imposes no requirements with regard to this function's behavior in cases where the arithmetical product would be larger than 2147483647. Even though any portion of the intermediate result beyond the bottom 16 bits will ignored, the code will try to evaluate the result using a 32-bit signed integer type; the Standard imposes no requirements on what should happen if a compiler recognizes that the product will overflow that type.

Related

Is an optimized out variable allowed to hold a value out of its range? [duplicate]

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

using bit shifting variables inside if statement - error or not

Suppose we have some variables x and y, and the following if statement which involves bit shifting:
if (x<<y)
I've read some posts which also deal with the issue of using bit shifting with variables (of some type) and inside if statement, but unfortunately I haven't been able to reach a unequivocal conclusion whether it is an error or not.
I assume that if it is an error, then it's a semantic error or a run-time error .
But is it necessarily en error ?
If x is of an unsigned integer type that is at least as large as unsigned int, and y is less than the number of bits in x's type, then the above partial statement will test whether bits in x that aren't in the top y are set. The C89 Standard would require that implementations behave likewise if x is of a signed type or a small unsigned type, with the caveat that setting the top bit of a small signed type is regarded as setting all bits beyond. The C99 and later standards, however, wouldn't require that implementations usefully process any situation in which x is non-zero but the expression x<<y would yield zero, unless x is an unsigned integer type at least as large as unsigned int.
It's not a syntactic error. if expects a parenthesized expression. (int_x<<int_y) satisfies that. The shift expression may cause a runtime error, but only if the particular values of int_x and int_y invoke undefined behavior (see 6.5.7 for when that might happen).

Why shifting a negative value with literal is giving [-Wshift-negative-value] warning

I am doing a bitwise left shift operation on negative number.
int main(void) {
int count = 2;
printf("%d\n", ~0<<count);
printf("%d\n", ~0<<2); // warning:shifting a negative signed value is undefined [-Wshift-negative-value]
return 0;
}
My doubt is why the warning is coming on compiling above code when integer literal is used in shifting and not when variable is used.
Under C89, ones'-complement and sign-magnitude implementations were required to process left shifts of negative values in ways that may not have been the most logical on those platforms. For example, on a ones'-complement platform, C89 defined -1<<1 as -3. The authors of the Standard decided to correct this problem by allowing compiler writers to handle left shifts of negative numbers in any way they saw fit. The fact that they allowed that flexibility to all implementations including two's-complement ones shouldn't be taken to imply that they intended that two's-complement implementations to deviate from the C89 behavior. Much more likely, they intended and expected that the sensible behavior on two's-complement platforms would be sufficiently obvious that compiler writers would figure it out with or without a mandate.
Compilers often squawk about left-shifting negative constants by other constants because x<<y can be simplified when both x and y are constants, but such simplification would require performing the shift at compile time whether or not the code containing the shift is executed. By contrast, given someConstant << nonConstant, no simplification would usually be possible and thus the compiler would simply generate code that does the shift at run-time.

what values are allowed for the shift count operation?

#include <stdio.h>
int main(void)
{
unsigned int var=1;
var = var<<32;
printf("%u ",var);
}
This code yield 1 as its output. if i write var = var<<31; it yields 2147483648.
if i type var = 12; and then var = var<<32; it yields 12. I read from my textbook, an old one, that ANSI C does not allow to shift all the bits out of a value in a single operation.
Do all major compilers behave the same(copy paste input to output) or just the GCC does copy paste 12 from input to output when i instruct it to do var = var<<32; ???
C11 6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined.
Meaning that there is no well-defined behavior if you shift 32 or more bits in this case. Anything could happen, including crashes and strange results.
I read from my textbook, an old one, that ANSI C does not allow to shift all the bits out of a value in a single operation.
This is correct, you would have to do several bit operations, for example x<<=16; x<<=16; to avoid undefined behavior.
GCC literally says you answer:
left shift count >= width of type [-Wshift-count-overflow].
Also it dependce on the architecture you using. And your question have been already answered here.
Just for a little more depth:
Classically the C language developed as a "portable assembler." The built-in operations are simply the ones implemented by most CPUs, and their semantics provide a lowest common denominator for portability.
Almost every CPU provides shift-left and -right operations. But platforms differ about details such as overflow and negative numbers, so the C standard leaves corner cases undefined.
In particular, for shift counts greater than the register width, many CPUs only use as many bits as needed and truncate the rest. When the count is a constant, the compiler may note the undefined behavior and simply discard the operation as garbage.

(Why) is using an uninitialized variable undefined behavior?

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Resources