Default value to non initialized variables - c

I'm reading this tutorial about debugging. I pasted the factorial code in my .c archive:
#include <stdio.h>
int main()
{
int i, num, j;
printf ("Enter the number: ");
scanf ("%d", &num );
for (i=1; i<num; i++)
j=j*i;
printf("The factorial of %d is %d\n",num,j);
}
When I run the executable, it always print 0, however, the author of the tutorial says that it return numbers garbage value. I've googled about this and I've read that this is right, except for static variables. So it should return a garbage number instead of 0.
I thought that this might be due to a different version of C, but the guide is from 2010.
Why do I always see 0, instead of a garbage value?

Both the C99 draft standard and the C11 draft standard say the value of an uninitialized automatic variable is indeterminate, from the draft c99 standard section 6.2.4 Storage durations of objects paragraph 5 says (emphasis mine):
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
the draft standard defines indeterminate as:
either an unspecified value or a trap representation
and an unspecified value is defined as:
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
so the value can be anything. It can vary with the compiler, optimization settings and it can even vary from run to run but it can not be relied and thus any program that uses a indeterminate value is invoking undefined behavior.
The standard says this is undefined in one of the examples in section 6.5.2.5 Compound literals paragraph 17 which says:
Note that if an iteration statement were used instead of an explicit goto and a labeled statement, the lifetime of the unnamed object would be the body of the loop only, and on entry next time around p would have an indeterminate value, which would result in undefined behavior.
this is also covered in Annex J.2 Undefined behavior:
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
In some very specific cases you can make some predictions about such behavior, the presentation Deep C goes into some of them. These types of examination should only be used as a tool to further understand how systems work and should never even come close to a production system.

You need to initialize j to 1. If j happens to be zero, the answer will always be zero (one type of garbage). If j happens to non-zero, you'll get different garbage. Using uninitialized variables is undefined behaviour; 'undefined' does not exclude always being zero in the tests you've done so far.

Some systems have their memory set to 0 (Mac OS for example) so your variable will often contain 0 when you initialise it but it's a bad practice that will lead to unstable results.

You can't say what should happen in this case because the language specification doesn't say what should happen. In fact it says that the values of uninitialised non-static variables are indeterminate.
That means they can be any value. They can be different values on different runs of your program, or when your code is compiled on a different compiler, or when compiled on the same compiler with different optimisation settings. Or on different days of the week, national holidays or after 6pm.
An uninitialised variable can even hold what's called a trap representation, which is a value which is not valid for that type. If you access such a value then you're into the scary world of undefined behaviour where literally anything can happen.

Related

In C, I'm struggling with Pointers

I have this code that I'm using for something else, but, boiled it down to the root problem I think. If I enter 5 for the scanf variable when I run it, the printf out is 0,16. I don't understand why this is giving me 16 for *pScores?
#include <stdio.h>
int main(void) {
int a=0;
int sum=0;
scanf("%d",&a);
int scores[a];
int *pScores = &scores[0];
printf("%d, %d\n",scores[0],*pScores);
}
You are declaring an array
int scores[a];
and then printing out the value of scores[0] in two different ways. However, you have not stored anything into any of the elements of the scores array, so the values there are indeterminate.
Whether use of uninitialized (and therefore indeterminate) values in this way actually rises to the level of Undefined Behavior is a surprisingly deep and actually somewhat contentious question. (See the comment thread raging at the other answer.) Nevertheless, printing an uninitialized value like this isn't terribly useful. If you fill in a well-defined value to at least scores[0], I believe you'll find that both scores[0] and *pScores will print the same — that same — value.
Now, you might expect that the uninitialized value — whatever it is — would at least be consistent no matter how you print it (and I might agree with you), but when it comes to gray areas like this, and especially when a modern compiler starts leveraging every nuance of the rules in performing aggressive optimizations, the end results can be pretty surprising. When I tried your program, I got the same number printed twice (that is, I couldn't initially reproduce your result), but as suggested by Barmar in a comment, when I turned on optimization (with -O3), I started seeing conflicting results, also.
You have undefined behavior, caused by reading a variable with automatic storage duration whose value is indeterminate.
In 6.2.4 Storage durations of objects one finds the following rule
For such an object that does have a variable length array type, its lifetime extends from the declaration of the object until execution of the program leaves the scope of the declaration. If the scope is entered recursively, a new instance of the object is created
each time. The initial value of the object is indeterminate.
Then in J.2 Undefined behavior:
The behavior is undefined in the following circumstances
...
The value of an object with automatic storage duration is used while it is indeterminate.
...
Among permitted very weird outcomes when dealing with indeterminate values is that they have a different value each time you read them. The Schroedinger wavefunction does not collapse!

Auto Initialization of local variables

I have the following code snippet.
int j;
printf("%d",j);
As expected, I get a garbage value.
32039491
But when I include a loop in the above snippet, like
int j;
print("%d",j);
while(j);
I get the following output on multiple trials of the program.
0
I always thought local variables are initialized to a garbage value by default, but it looks like variables get auto initialized when a loop is used.
It is having indeterminate value. It can be anything.
Quoting C11 §6.7.9
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate. [...]
Automatic local variables, unless initialized explicitly, will contain indeterminate value. In case you try to use a variable while it holds indeterminate value and either
does not have the address taken
can have trap representation
the usage will lead to undefined behavior.
As expected, I get a garbage value.
Then your expectation is unjustifiably hopeful. When you use the indeterminate value of an uninitialized object, you generally get (and for your code snippets alone you do get) undefined behavior. Printing a garbage value is but one of infinitely many possible manifestations.
I always thought local variables are initialized to a garbage value by default, but it looks like variables get auto initialized when a loop is used.
You thought wrong, and you're also drawing the wrong conclusion. Both of your code snippets, when standing alone, exhibit undefined behavior. You cannot safely rely on any particular result.

Reading an indeterminate value invokes UB? [duplicate]

This question already has answers here:
(Why) is using an uninitialized variable undefined behavior?
(7 answers)
Closed 6 years ago.
Various esteemed, high rep users on SO keeps insisting that reading a variable with indeterminate value "is always UB". So where exactly is this mentioned in the C standard?
It is very clear that an indeterminate value could either be an unspecified value or a trap representation:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
It is also clear that reading a trap representation invokes undefined behavior, 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
However, an indeterminate value does not necessarily contain a trap representation. In fact, trap representations are very rare for systems using two's complement.
Where in the C standard does it actually say that reading an indeterminate value invokes undefined behavior?
I was reading the non-normative Annex J of C11 and found that this is indeed listed as one case of UB:
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.9, 6.8).
However, the listed sections are irrelevant. 6.2.4 only states rules regarding life time and when a variable's value becomes indeterminate. Similarly, 6.7.9 is regarding initialization and states how a variable's value becomes indeterminate. 6.8 seems mostly irrelevant. None of the sections contains any normative text saying that accessing an indeterminate value can lead to UB. Is this a defect in Annex J?
There is however some relevant, normative text in 6.3.2.1 regarding lvalues:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
But that is a special case, which only applies to variables of automatic storage duration that never had their address taken. I have always thought that this section of 6.3.2.1 is the only case of UB regarding indeterminate values (that are not trap representations). But people keep insisting that "it is always UB". So where exactly is this mentioned?
As far as I know, there is nothing in the standard that says that using an indeterminate value is always undefined behavior.
The cases that are spelled out as invoking undefined behavior are:
If the value happens to be a trap representation.
If the indeterminate value is an object of automatic storage.
If the value is a pointer to an object whose lifetime has ended.
As an example, the C standard specifies that the type unsigned char has no padding bits and therefore none of its values can ever be a trap representation.
Portable implementations of functions such as memcpy take advantage of this fact to perform a copy of any value, including indeterminate values. Those values could potentially be trap representations when used as values of a type that contains padding bits, but they are simply unspecified when used as values of unsigned char.
I believe that it is erroneous to assume that if something could invoke undefined behavior then it does invoke undefined behavior when the program has no safe way of checking. Consider the following example:
int read(int* array, int n, int i)
{
if (0 <= i)
if (i < n)
return array[i];
return 0;
}
In this case, the read function has no safe way of checking whether array really is of (at least) length n. Clearly, if the compiler considered these possible UB operations as definite UB, it would be nearly impossible to write any pointer code.
More generally, if the compiler cannot prove that something is UB, it has to assume that it isn't UB, otherwise it risks breaking conforming programs.
The only case where the possibility is treated like a certainty, is the case of objects of automatic storage. I think it's reasonable to assume that the reason for that is because those cases can be statically rejected, since all the information the compiler needs can be obtained through local flow analysis.
On the other hand, declaring it as UB for non-automatic storage objects would not give the compiler any useful information in terms of optimizations or portability (in the general case). Thus, the standard probably doesn't mention those cases because it wouldn't change anything in realistic implementations anyway.
To allow the best blend of optimization opportunities and useful semantics, types which have no trap representations should have Indeterminate Values subdivided into three kinds:
The first read will yield any value that could result from an unspecified
bit pattern; subsequent would be guaranteed to yield the same value.
This would be similar to "Unspecified value", except that the Standard
doesn't generally distinguish between types which do and don't have trap
representations, and in cases where the Standard calls for "Unspecified
Value" it requires that an implementation ensure the value is not a trap
representation; in the general case, that would require that an
implementation include code to guard against certain bit patterns.
Each read may independently yield any value that could result from an
unspecified bit pattern.
The value read, and the result of most computations performed upon it,
may behave non-deterministically as though the read had yielded any
possible value.
Unfortunately, the Standard doesn't make such distinctions, and there is some
disagreement about what it calls for. I would suggest that #2 should be the
default, but it should be possible for code to indicate all places where code
needs to force the compiler to pick a concrete value, and indicate that a
compiler may use #3-style semantics everywhere else. For example, if code for
a collection of distinct 16-bit values stored as:
struct COLLECTION { size_t count; uint16_t values[65536], locations[65536]; };
maintains the invariant that for each i < count, locations[values[i]]==i, it
should be possible to initialize such a structure merely by setting "count"
to zero, even if the storage had previously been used as some other type.
If casts are specified as always yielding concrete values, code which wants
to see if something is in the collection could use:
uint32_t index = (uint32_t)(collection->locations[value]);
if (index < collection->count && collections->values[index]==value)
... value was found
It would be acceptable to have the above code arbitrarily yield any number for "index" each time it reads an item from the array, but it would be essential that both uses of "index" in the second line use the same value.
Unfortunately, some compiler writers seem to think compilers should treat all indeterminate values as #3, while some algorithms require #1 and some require #2, and there's no real way to distinguish the varying requirements.
3.19.2 permits implementation to be a trap representation, and both reading and writing are undefined behaviour.
Your platform may give you guarantees (e.g. that integer types never have trap representations) but that is not required by the Standard, and if you rely on that, your code loses some portability. That's a valid choice, but shouldn't be made in ignorance.
More systems have trap representations for floating-point types than for integer types, but C programs may be run on processors that track register validity - see (Why) is using an uninitialized variable undefined behavior in C?. This degree of latitude is the principal reason for C's wide adoption across many hardware architectures.

Is using any indeterminate value undefined or just those stored in objects with automatic storage?

According to C99 J.2, the behavior is undefined when:
The value of an object with automatic storage duration is used while it is
indeterminate
What about all the other cases where an object has an indeterminate value? Do we also always invoke UB if we use them? Or do we invoke UB only when they contain a trap representation?
Examples include:
the value of an object allocated using malloc (7.20.3.3p2)
[storing in non-automatic storage] a FILE* after calling fclose on it (7.19.3p4)
[storing in non-automatic storage] a pointer after calling free on it (6.2.4p2)
...and so on.
I've used C99 for my references, but feel free to refer to C99 or C11 in your answer.
I am using C11 revision here:
The definitions from the standard are:
indeterminate value
either an unspecified value or a trap representation
trap representation
an object representation that need not represent a value of the object type
unspecified value
Unspecified valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
An unspecified value is a valid value of the relevant type and as such it does not cause undefined behaviour. Using a trap representation will.
But why this wording exists in the standard is that the excerpt enables compilers to issue diagnostics, or reject programs that use the value of uninitialized local variables yet still stay standard-compliant; because there are types of which it is said that they cannot contain trap representations in memory, so they'd always be having unspecified value there in their indeterminate state. This applies to for example unsigned char. And since using an unspecified value does not have undefined behaviour then the standard does not allow one to reject such a program.
Additionally, say an unsigned char normally does not have a trap representation... except, IIRC there are computer architectures where a register can be set to "uninitialized", and reading from a register in such an architecture will trigger a fault. Thus even if an unsigned char does not really have trap representations in memory, on this architecture it will with cause a hardware fault with 100 % probability, if it is of automatic storage duration and compiler decides to store it in a register and it is still uninitialized at the time of the call.

Variable initailization after being called is reflected beforehand

I am learning about scoping of variable in C.
Can anyone please explain what is going on below?
int w;
printf("\nw=%d\n", w);
w =-1;
Despite the fact that I initialized variable 'w' after 'printf', it always gets the value of "-1". This confused me, as I expect it to run sequentially. Hence, it should have printed some random value.
*** I also tried changing the value there, and it always read the written value. Hence, it did not randomly show "-1"
For experiment, I again tried the code below.
int w;
printf("\nw=%d\n", w);
w =-9;
w =-1;
Now, it reads a value of "2560". As I expect since it was not properly initialized before.
In your code
int w;
printf("\nw=%d\n", w);
invokes undefined behavior as you're trying to read the value of an uninitialized (automatic local) variable. The content of w is indeterminate at this point, and the output result is, well, undefined.
Always initialize your local variable before reading (using) the value.
Related: Quoting C11, chapter §6.7.9, Initialization
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate. [....]
and, related to Undefined behavior, annex §J.2
The value of an object with automatic storage duration is used while it is
indeterminate
The variable in uninitialized. In "C", this means its value is "nondeterministic". In reality, the variable generally gets a value based on what's "laying around" at the memory address to which it gets assigned. In this case, its some value left on the stack.
It just so happens that often you will get consistent results across multiple runs simply due to external factors on which a program should not rely.
The compiler is optimizing the assignment of w in the first case. In the second case, it is deciding not to optimize.
In both cases, the compiler could choose to optimize out both assignments, since w is not used after they appear.
Initialize your variables before using them.
In both the above cases
int w;
printf("\nw=%d\n", w);
returns a random garbage value as we might call it which could be anything including -1 or 2560.
Blockquote
When you do not initialize a variable it can contain garbage value. Hence it's undefined behaviour and in most cases it will print random numbers as you experienced. By the way, as pointed out by others it's up to the compiler, so it may work with the expected value or it may don't work.

Resources