Why we must initialize a variable before using it? [duplicate] - c

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What happens to a declared, uninitialized variable in C? Does it have a value?
Now I'm reading Teach Yourself C in 21 Days. In chapter 3, there is a note like this:
DON'T use a variable that hasn't been initialized. Results can be
unpredictable.
Please explain to me why this is the case. The book provide no further clarification about it.

Because, unless the variable has static storage space, it's initial value is indeterminate. You cannot rely on it being anything as the standard does not define it. Even statically allocated variables should be initialized though. Just initialize your variables and avoid a potential headache in the future. There is no good reason to not initialize a variable, plenty of good reasons to do the opposite.
On a side note, don't trust any book that claims to teach you X programming language in 21 days. They are lying, get yourself a decent book.

When a variable is declared, it will point to a piece of memory.
Accessing the value of the variable will give you the contents of that piece of memory.
However until the variable is initialised, that piece of memory could contain anything. This is why using it is unpredictable.
Other languages may assist you in this area by initialising variables automatically when you assign them, but as a C programmer you are working with a fairly low-level language that makes no assumptions about what you want to do with your program. You as the programmer must explicitly tell the program to do everything.
This means initialising variables, but it also means a lot more besides. For example, in C you need to be very careful that you de-allocate any resources that you allocate once you're finished with them. Other languages will automatically clean up after you when the program finished; but in C if you forget, you'll just end up with memory leaks.
C will let you get do a lot of things that would be difficult or impossible in other languages. But this power also means that you have to take responsibility for the housekeeping tasks that you take for granted in those other languages.

You might end up using a variable declared outside the scope of your current method.
Consider the following
int count = 0;
while(count < 500)
{
doFunction();
count ++;
}
...
void doFunction() {
count = sizeof(someString);
print(count);
}

In C, using the values of uninitialized automatic variables (non-static locals and parameter variables) that satisfy the requirement for being a register variable is undefined behavior, because such variables values may have been fetched from a register and certain platforms may abort your program if you read such uninitialized values. This includes variables of type unsigned char (this was added to a later C spec, to accommodate these platforms).
Using values of uninitialized automatic variables that do not satisfy the requirement for being a register variable, like variables that have their addresses taken, is fine as long as the C implementation you use hasn't got trap representations for the variable type you use. For example if the variable type is unsigned char, the Standard requires all platforms not to have trap representations stored in such variables, and a read of an indeterminate value from it will always succeed and is not undefined behavior. Types like int or short don't have such guarantees, so your program may crash, depending on the C implementation you use.
Variables of static storage duration are always initialized if you don't do it explicitly, so you don't have to worry here.
For variables of allocated storage duration (malloc ...), the same applies as for automatic variables that don't satisfy the requirements for being a register variable, because in these cases the affected C implementations need to make your program read from memory, and won't run into the problems in which register reads may raise exceptions.

In C the value of an uninitialized variable is indeterminate. This means you don't know its value and it can differ depending on platform and compiler. The same is also true for compounds such as struct and union types.
Why? Sometimes you don't need to initialize a variable as you are going to pass it to a function that fills it and that doesn't care about what is in the variable. You don't want the overhead of initialization imposed upon you.
How? Primitive types can be initialized from literals or function return values.
int i = 0;
int j = foo();
Structures can be zero intialized with the aggregate intializer syntax:
struct Foo { int i; int j; }
struct Foo f = {0};

At the risk of being pedantic, the statement
DON'T use a variable that hasn't been initialized.
is incorrect. If would be better expressed:
Do not use the value of an uninitialised variable.
The language distinguishes between initialisation and assignment, so the first warning is, in that sense over cautious - you need not provide an initialiser for every variable, but you should have either assigned or initialised a variable with a useful and meaningful value before you perform any operations that subsequently use its value.
So the following is fine:
int i ; // uninitialised variable
i = some_function() ; // variable is "used" in left of assignment expression.
some_other_function( i ) ; // value of variable is used
even though when some_function() is called i is uninitialised. If you are assigning a variable, you are definitely "using" it (to store the return value in this case); the fact that it is not initialised is irrelevant because you are not using its value.
Now if you adhere to
Do not use the value of an uninitialised variable.
as I have suggested, the reason for this requirement becomes obvious - why would you take the value of a variable without knowing that it contained something meaningful? A valid question might then be "why does C not initialise auto variables with a known value. And possible answers to that would be:
Any arbitrary compiler supplied value need not be meaningful in the context of the application - or worse, it may have a meaning that is contrary to the actual state of the application.
C was deliberately designed to have no hidden overhead due to its roots as systems programming language - initialisation is only performed when explicitly coded, because it required additional machine instructions and CPU cycles to perform.
Note that static variables are always initialised to zero. .NET languages, such as C# have the concept of an null value, or a variable that contains nothing, and that can be explicitly tested and even assigned. A variable in C cannot contain nothing, but what it contains can be indeterminate, and therefore code that uses its value will behave non-deterministically.

Related

What value does a const variable take when it's initialized by a non-const and non-initialized variable?

#include <stdio.h>
int main()
{
int a;
const int b = a;
printf("%d %d\n", a, b);
return 0;
}
The same code I tried to execute on onlinegdb.com compiler and on Ubuntu WSL. In onlinegdb.com, I got both a and b as 0 with every run, whereas in WSL it was a garbage value. I am not able to understand why garbage value is not coming onlinegdb.com
Using int a; inside a function is described by C 2018 6.2.4 5:
An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration, as do some compound literals…
Paragraph 6 continues:
… The initial value of the object is indeterminate…
An indeterminate value is not an actual specific value but is a theoretical state used to describe the semantics of C. 3.19.2 says it is:
… either an unspecified value or a trap representation…
and 3.19.3 says an unspecified value is:
… valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance…
That “any instance” part means that the program may behave as if a has a different value each time it is used. For example, these two statements may print different values for a:
printf("%d\n", a);
printf("%d\n", a);
Regarding const int b = a;, this is not covered explicitly by the C standard, but I have seen a committee response: When an indeterminate value is assigned (or initialized into) another object, the other object is also said to have an indeterminate value. So, after this declaration, b has an indeterminate value. The const is irrelevant; it means the source code of the program is not supposed to change b, but it cannot remedy the fact that b does not have a determined value.
Since the C standard permits any value to be used in each instance, onlinegdb.com conforms when it prints zero, and WSL conforms when it prints other values. Any int values printed for printf("%d %d\n", a, b); conform to the C standard.
Further, another provision in the C standard actually renders the entire behavior of the program undefined. C 2018 6.3.2.1 2 says:
… If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
This applies to a: It could have been declared register because its address is never taken, and it is not initialized or assigned a value. So, using a has undefined behavior, and that extends to the entire behavior of the program on the code branch where a is used, which is the entire program execution.
I am not able to understand why garbage value is not coming
This is a very strange statement. I wonder: what kind of answer or explanation do you expect you might get? Something like:
Everyone who said that "uninitialized local variables start out containing random values" lied to you. WSL was wrong for giving you random values. You should have gotten 0, like you did with onlinegdb.com.
onlinegdb.com is buggy. It should have given truly random values.
The rules for const variables are special. When you say const int b = a;, it magically makes a's uninitialized value more predictable.
Are you expecting to get an answer like any of those? Because, no, none of those is true, none of those can possibly be true.
I'm sorry if it sounds like I'm teasing you here. I agree, it's surprising at first if an uninitialized local variable always starts out containing 0, because that's not very random, is it?
But the point is, the value of an uninitialized local variable is not defined. It is unspecified, indeterminate, and/or undefined. You cannot know what it is going to be. But that means that no value — no possible value — that it contains can ever be "wrong". In particular, onlinegdb.com is not wrong for not giving you random values: remember, it's not obligated to give you anything!
Think about it like this. Suppose you buy a carton of milk. Suppose it's printed on the label, "Contents may spoil — keep refrigerated." Suppose you leave the carton of milk on the counter overnight. That is, suppose you fail to properly refrigerate it. Suppose that, a day later, you realize your mistake. Horrified, you carefully open the milk carton and take a small taste, to see if it has spoiled. But you got lucky! It's still okay! It didn't spoil!
Now, at this point, what do you do?
Hastily put the milk in the refrigerator, and vow to be more careful next time.
March back to the store where you bought the milk, and accuse the shopkeeper of false advertising: "The label says 'contents may spoil', but it didn't!!" What do you think the shopkeeper is going to say?
This may seem like a silly analogy, but really, it's just like your C/C++ coding situation. The rules say you're supposed to initialize your local variables before you use them. You failed. Yet, somehow, you got predictable values anyway, at least under one compiler. But you can't complain about this, because it's not causing you a problem. And you can't depend on it, because as your experience with the other compiler showed you, it's not a guaranteed result.
Typically, local variables are stored on the stack. The stack gets used for all sorts of stuff. When you call a function, it receives a new stack frame where it can store its local variables, and where other stuff pertaining to that function call is stored, too. When a function returns, its stack frame is popped, but the memory is typically not cleared. That means that, when the next function is called, its newly-allocated stack frame may end up containing random bits of data left over from the previous function whose stack frame happened to occupy that part of stack memory.
So the question of what value an uninitialized local variable contains ends up depending on what the previous function might have been and what it might have left lying around on the stack.
In the case of main, it's quite possible that since it's the first function to be called, and the stack might start out empty, that main's stack frame always ends up being built on top of virgin, untouched, all-0 memory. That would mean that uninitialized variables in main might always seem to start out containing 0.
But this is not, not, not, not, not guaranteed!!!
Nobody said the stack was guaranteed to start out containing 0. Nobody said that there wasn't some startup code that ran before main that might have left some random garbage lying around on the stack.
If you want to enumerate possibilities, I can think of 3:
The function you're wondering about is one that always gets called first, or always gets called at a "deep leaf" of the call stack, meaning that it always gets a brand-new stack frame, and on a machine where the stack always starts out containing 0. Under these circumstances, uninitialized variables might always seem to start out containing 0.
The function you're wondering about does not always get a brand-new stack frame. It always gets a "dirty" stack frame with some previous function's random data lying around, and the program is such that, during every run, that previous function was doing something different and left something different on the stack, such that the next function's uninitialized local variables always seem to start out containing different, seemingly random values.
The function you're wondering about is always called right after a previous function that always does the same thing, meaning that it always leaves the same values lying around, meaning that the next function's uninitialized local variables always seem to start out with the same values every time, which aren't zero but aren't random.
But I hope it's obvious that you absolutely can't depend on any of this! And of course there's no reason to depend on any of this. If you want your local variables to have predictable values, you can simply initialize them. But if you're curious what happens when you don't, hopefully this explanation has helped you understand that.
Also, be aware that the explanation I've given here is somewhat of a simplification, and is not complete. There are systems that don't use a conventional stack at all, meaning that none of those possibilities 1, 2, or 3 could apply. There are systems that deliberately randomize the stack every time, either to help new programs not to accidentally become dependent on uninitialized variables, or to make sure that attackers can't exploit certain predictable results of a badly written program's undefined behavior.
When your operating system gives your program memory to work with, it will likely be zero to start (though not guaranteed). As your program calls functions it creates stack frames, and your program will effectively go from the .start assembly function to the int main() c function, so when main is called, no stack frame has written the memory that local variables are placed at. Therefore, a and b are both likely to be 0 (and b is guaranteed to be the same as a). However, it's not guaranteed to be 0, especially if you call some functions that have local variables or lots of parameters. For instance, if your code was instead
#include <stdio.h>
void foo()
{
int x = 42;
}
int main()
{
foo();
int a;
const int b = a;
printf("%d %d\n", a, b);
return 0;
}
then a would PROBABLY have the value 42 (in unoptimized builds), but that would depend on the ABI (https://en.wikipedia.org/wiki/Application_binary_interface) that your compiler uses and probably a few other things.
Basically, don't do that.

Local Variable in Function

I found out something hilarious problem in C.
Here is my Code
#include <stdio.h>
void method()
{
int indx;
printf("%d\n", indx);
indx++;
}
int main(void)
{
method();
method();
method();
}
This is simple example.
indx variable is not initialized. So, The result of printf in method function would be something strange value.
And indx is local variable. So indx++ is useless.
But The answer is
0
1
2
It seems like that indx is remain
I can't understand. why?
Local variables are stored on the stack. Because you are doing no other action between each function call, the stack is essentially preserved (unaltered) between calls, and the lack of initialisation means that the value of indx is whatever happens to be stored in that stack location.
If you made other function calls, the stack might well be overridden, resulting in non-sequential values, and likely not starting from zero.
One reason why the stack might contain zero values at startup can be because, depending on compiler, compiler options and standard library startup, the stack might be deliberately initialised to zero (or just because the region of memory used happens to contain zeros from a previous application etc).
One should never rely on such behaviour, it is specifically undefined behaviour, and thus initialising variables and ensuring you understand variable scope is an important thing to learn in software development.
I think there are plenty of descriptions of how stack is used in software development languages (and variability across hardware architectures and implementations) that you should research that separately.
The local variables are especially stored in the memory area of the corresponding function, hence you can see the increment of the indx value by single value each time the function is called.
But the answer returns the memory values of the stored variable.

C : Passing auto variable by reference

I came across a piece of code which is working fine as of now,but in my opinion its undefined behavior and might introduce a bug in future.
Pseudo code :
void OpertateLoad(int load_id)
{
int value = 0;
/* code to calculate value */
SetLoadRequest(load_id,&value);
/*some processing not involving value**/
}
void SetLoadRequest(int load_id, int* value)
{
/**some processing**/
LoadsArray[load_id] = *value;
/**some processing**/
}
In my understanding C compiler will not guarantee where Auto variables will be stored. It could be stack/register(if available and suitable for processing).
I am suspecting that if compiler decides to store value onto general purpose register then, SetLoadRequest function might refer to wrong data.
Am I getting it right?or I am overthinking it?
I am using IARARM compiler for ARM CORTEX M-4 processor.
----------:EDIT:----------
Answers Summarize that " Compiler will ensure that data is persisted between the calls, no matter where the variable is stored ".
Just want to confirm : Is this behavior also true 'if a function is returning the address of local auto variable and caller is de-referencing it?'.
If NO then is there anything in C standard which guarantees the behavior in both cases? Or As I stated earlier its undefined behavior?
You are overthinking it. The compiler knows that, if value is in a register, that it must be stored to memory before passing a pointer to that memory to SetLoadRequest.
More generally, don't think about stack and registers at all. The language says there's a variable (without saying how it's implemented), and that you can take its address and use that in another function to refer to the variable. So you can!
The language also says that local variables cease to exist when leaving a block, so this permission does not extend to returning pointers to local variables (which causes undefined behavior if the caller does anything at all with the pointer).
I am overthinking it?
Yes!
The compiler will take care of this. If it stores it in a register, it will know how to handle it (with a memory load).
C11 draft standard n1570:
6.2.4 Storage durations of objects
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration or
compound literal is reached in the execution of the block; otherwise, the value becomes
indeterminate each time the declaration is reached.
To my understanding, the scope of value is the body of OpertateLoad. However, SetLoadRequest assigns the value pointed to, so the actual value of value is copied. No undefined behaviour is involved.

Why are static variables auto-initialized to zero? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why global and static variables are initialized to their default values?
What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?
It is required by the standard (§6.7.8/10).
There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.
Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:
int foo() {
static int *ptr;
if (NULL == ptr)
// initialize it
}
If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.
Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.
Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.
The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.
Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.
Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.
When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.
The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.
Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.
Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.
Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.
There is discussion about this here:
First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.
Take a look : here 6.2.4(3) and 6.7.8 (10)
Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.
Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.
If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.
I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.

What is the default state of variables?

When a C program starts and variables are assigned to memory locations, does the C standard say if that value is initialized?
// global variables
int a;
int b = 0;
static int c;
In the above code, 'b' will be initialized to 0. What is the initial value of 'a'? Is 'c' any different since it is static to this module?
Since you specifically mention global variables: In the case of global variables, whether they are declared static or not, they will be initialised to 0.
Local variables on the other hand will be undefined (unless they are declared static, in which case they too will be initialised to 0 -- thanks Tyler McHenry). Translated, that means that you can't rely on them containing any particular thing -- they will just contain whatever random garbage was already in memory at that location, which could be different from run to run.
Edit: The following only applies to local variables - not globals.
The initial value of a variable is undefined. In some languages a variable's location in memory is zero'd out upon declaration but in C (and C++ as well) an uninitialized variable will contain whatever garbage that lives at that location at that time.
So the best way to think of it is that an uninitialized variable will most likely contain garbage and have undefined behavior.
a will be zero, and c too, if they are global and not explicitly initialized. This is also true for local static variables.
Only local non-static variables are not initialized. Also memory allocated with malloc is not initialized.
See here for rules of initializations and allocation in C for the different objects.
I'm typing way too slowly this morning. Three people popped in quickly as I was answering, so I've removed most of my post. The link I found was clear and brief, so I'm posting that anyhow: Wikipedia on "uninitialized variable" for a discussion of the basic issues.
A quick test shows a and c to be 0.
int a;
static int c;
int main() {
printf("%d %d\n", a, c);
return 0;
}
The location of a (and c) are determined at compile-time; that is, they're neither put on the stack nor in a memory interval returned by malloc. I think the C standard says they're initialized to 0 in all cases, then.
I'm 99.9% confident about with respect to c, and 98% confident with regard to a. The keyword static, in the context of global variables, really is analogous to private in (say) C++ and Java: it's about visibility, not storage location.
What Andrew Hare says about uninitialized variables is true for data stored on the stack or in malloc'd memory. Not so for statically stored variables.

Resources