What is the default state of variables? - c

When a C program starts and variables are assigned to memory locations, does the C standard say if that value is initialized?
// global variables
int a;
int b = 0;
static int c;
In the above code, 'b' will be initialized to 0. What is the initial value of 'a'? Is 'c' any different since it is static to this module?

Since you specifically mention global variables: In the case of global variables, whether they are declared static or not, they will be initialised to 0.
Local variables on the other hand will be undefined (unless they are declared static, in which case they too will be initialised to 0 -- thanks Tyler McHenry). Translated, that means that you can't rely on them containing any particular thing -- they will just contain whatever random garbage was already in memory at that location, which could be different from run to run.

Edit: The following only applies to local variables - not globals.
The initial value of a variable is undefined. In some languages a variable's location in memory is zero'd out upon declaration but in C (and C++ as well) an uninitialized variable will contain whatever garbage that lives at that location at that time.
So the best way to think of it is that an uninitialized variable will most likely contain garbage and have undefined behavior.

a will be zero, and c too, if they are global and not explicitly initialized. This is also true for local static variables.
Only local non-static variables are not initialized. Also memory allocated with malloc is not initialized.
See here for rules of initializations and allocation in C for the different objects.

I'm typing way too slowly this morning. Three people popped in quickly as I was answering, so I've removed most of my post. The link I found was clear and brief, so I'm posting that anyhow: Wikipedia on "uninitialized variable" for a discussion of the basic issues.

A quick test shows a and c to be 0.
int a;
static int c;
int main() {
printf("%d %d\n", a, c);
return 0;
}
The location of a (and c) are determined at compile-time; that is, they're neither put on the stack nor in a memory interval returned by malloc. I think the C standard says they're initialized to 0 in all cases, then.
I'm 99.9% confident about with respect to c, and 98% confident with regard to a. The keyword static, in the context of global variables, really is analogous to private in (say) C++ and Java: it's about visibility, not storage location.
What Andrew Hare says about uninitialized variables is true for data stored on the stack or in malloc'd memory. Not so for statically stored variables.

Related

When variable will free in C program

In dynamic memory allocation, the memory which is used by malloc() and calloc() can be freed by using free().
In static memory allocation, when is the variable in main() freed from the program?
In a tutorial, I learned that when the whole programs is finished then after all variables are freed from RAM.
But someone tells me that when the program is long enough and variable is used early and after that if the variable has no use in the whole code then compiler will automatically free the variable before the end of program.
Can someone please clarify if both statements are correct or not?
The language guarantees that the lifetime of a static storage duration variable is the whole program. So it can be safely accessed at any time.
That being said, the standard only requires the code produced by a conformant compiler to behave as if all language rules were observed. That means that for optimization purposes a compiler is free to release the memory used by a static variable if it knows that the variable will not be used passed a point. Said differently it is neither required nor forbidden and as a programmer you should not even worry for that except for very specific low level optimization questions.
Example:
...
int main() {
static int arr[10240];
// uses arr
..
// no uses of arr passed this point - called B
...
}
The program shall behave as is the array arr existed from the beginning of the program until its end. But as long as arr is not used past point B, the compiler may reuse its memory, because there will be no change in any observable state.
This in only a low level optimization point allowed (but not required) by the as if rule.
As you can see in this question: Why does C not require a garbage collector?
Stephen C, the author of the answer says that:
The culture of these languages (C and C++) is to leave storage
management to the programmer.
Would the correct answer be when the process is terminated all memory used is freed? I think yes. C compiler does not look for garbage or non reachable variables, this is a progammer work.
But I have read that C or C++ garbage collectors exists like Java, they can be useful but remember, the implementation will be slower.
Again, I recommend to you read the question I have attached at the beggining for more information.
In a tutorial, I learned that ...
This tutorial is talking about what you as a programmer can see (unless you are debugging your program):
If you write a program, you can rely on the fact that you can read a static variable until the program has finished.
But someone tells me ...
This person is talking about what is really happening in the background (not visible to the programmer unless you are debugging) when you use an "optimizing" compiler.
... when is the variable ... freed from the program?
We have to distinguish between three types of variables:
Local variables
Local variables reside on the stack. When the function returns, all memory allocated on the stack during the execution of the function is automatically freed.
For this reason, local variables are freed when the function returns or even earlier.
In the following function:
void myFunction(void)
{
int a, b;
a = func1(); /* Line "1" */
func2();
func3(a); /* Line "2" */
b = func4(); /* Line "3" */
func5();
func6(b); /* Line "4" */
}
... an "optimizing" C compiler will detect that the variable a is not needed any longer when the variable b is set. For this reason, it may allocate only enough memory for one int variable. This is as if you only defined one variable (a_and_b) and the compiler replaced both a and b by a_and_b:
int a_and_b;
a_and_b = func1();
...
func6(a_and_b);
If you debug the program, the debugger will tell you that variable b does not exist when debugging lines "1"-"2"; and it will tell you that variable a does not exist when debugging lines "3"-"4".
For this reason, you might say that variable a is "freed" after line "2".
Global variables
Global variables reside in the .data or in the .bss section.
On modern desktop computers (on microcontrollers and on MS-DOS computers it is different) the operating system allocates this memory before the program is started and the operating system frees this memory after the program has finished.
Theoretically, it might be possible to optimize global variables the same way as local variables; this means: To use the same memory for two different global variables if one variable is set the first time after the other one is no longer needed.
However, this would be very complicated because a global variable can be accessed from any C source file in the project. For this reason, I doubt that many compilers and linkers have such a feature.
Static variables
Variables marked with the keyword static are normally also stored in .data and .bss sections - just like global ones.
However, because they can only be accessed from one C source file (or even from only one function), it is much easier to detect that such a variable is no longer used at a certain point in time.
For this reason, a compiler may optimize a static variable the same way as a local variable (so the memory is shared between two variables) or even replace a static variable by a local one (on the stack).
One Example:
int someFunction()
{
static int a;
int b;
a = func7();
b = func8(a); /* Line "5" */
return b + func9(); /* Line "6" */
}
In this case, the program behaves the same way if the variable a is not static. For this reason, we can replace static int by just int.
Now we see that a is no longer read after writing to b. We can replace the two variables a and b by one variable a_and_b.
If the "optimizing" compiler does this, you will see some message "variable does not exist" in the debugger if you stop your program in line "6" and want to read the value of a.
You may say that the variable a has been "freed" in line "5".

Are there valid reasons to declare variables static inside the main() function of a C program?

I am aware of the various meaning of the static keyword in C, but my question is more specific: Is there any valid reason to declare some variables in the main() function of an embedded C-language program as static?
Since we are talking about variables declared inside the braces of main(), the scope is already local to main(). As regards persistence, the main function is at the top of the call stack and can't exit as long as the program is running. Hence, on the face of it, there would seem to be no valid reason to declare using static inside of main().
However, I notice that an effect of using static declarations is to keep variables and arrays from being placed on the stack. In cases of a limited stack size, this could be important.
Another possible, but rather uncommon case is that of main() calling itself recursively, where you might need some variables to persist at different levels of the recursion.
Are there any other possible valid reasons to use static declarations of variables inside the body of a main() function?
.. valid reasons to use static declarations of variables inside the body of a main() function?
Initialization
int main(void) {
static int a; // initialized to 0
int b; // uninitialized
In C, main() call be called. ref So the usual issues about static variables apply.
int main(void) {
static int c; // Only one instance
int d; // One instance per call
...
main();
Memory location. Various compilers will organize main()'s variable. It is not specified by C, so a compiler dependent issue.
A variable has three attributes besides type:
Scope (visibility)
Life-time
Location
Selection of these attributes is best done according to the semantics of the code. While while a static variable in main() may appear to have the same scope and lifetime as a non-static so differ only in location, it does not have the same semantics. If - as is likely under development - you decide to move or reorganise the code in main() into a sub-routine, the use of static will cause such code to behave differently and potentially incorrectly.
My advice therefore is that you use the same determination of the use of static in main() as you would any other function and not treat it as a special case semantically identical to a non-static. Such an approach does not lead to reusable or maintainable code.
You have to partition the memory into stack, heap and static space in any case, if you have a stack variable with a lifetime of the entire process you rob Peter to pay Paul by using more stack in exchange for less static space, so it is not really an argument, except for the fact that insufficient memory is then a build-time issue rather then run-time, but if that is a serious concern, you might consider making all or most variables static as a matter of course (necessarily precluding reentrancy, and thus recursion and multi-threading). If stack space were truly an issue, recursion would be a bad idea in any case - it is a bad idea in most cases already, and certainly recursion of main().
I see a practical reason to declare a static variable inside  main (with most compilers on most desktop or laptop operating systems): a local variable inside main is consuming space on the call stack.
A static variable consumes space outside the call stack (generally in the .bss or .data segment, at least in ELF executables)
That would make a big difference if that variable takes a lot of space (think of an array of million integers).
The stack space is often limited (to one or a few megabytes) on current (desktop, laptop, tablet) systems.
On some embedded processors, the stack is limited to less than a kilobyte.
I think you have said it. It one makes them static which for main() is not very interesting, second it makes them what I call local globals, it essentially puts them in .data with the globals (and not on the stack), but restricts their scope of access like a local. So for main() I guess the reason is to save some stack. If you read around stack overflow though it seems like some compilers put a big stack frame on main() anyway, which most of the time doesnt make sense, so are you really saving any space? Likewise if they are on the stack from main unless you recursively call main they are not taking any more or less space being in .data vs the stack. if they are optimized into registers, then you still burn the .data where you wouldnt have burned the stack space, so it could cost you a little space.
The result of trying to indirectly access an object defined with automatic storage duration, i.e. a local variable, in a different thread from which the object is associated with, is implementation defined (see: 6.2.4 Storage duration of objects, p5).
If you want your code to be portable you should only share with threads objects defined with static, thread, or allocated storage duration.
In this example, the automatic object defined in the main should have been defined with static:
int main( void )
{
CreateThreads();
type object = { 0 }; //missing static storage-class specifier
ShareWithThreads( &object );
}
As already mentioned, the main reason would be to save stack space and also give you a better idea of the actual stack size need of your program.
Another reason to declare such variables static would be program safety. There are various embedded system design rules that need to be considered here:
The stack should always be memory-mapped so that it grows towards invalid memory and not towards .bss or .data. That means that in case of stack overflow, there is a chance for the program to raise an exception, rather than going Toyota all over your static variables.
For similar reasons, you should avoid declaring large chunks of data on the stack, as that makes the code more prone to stack overflows. This applies particularly to small, memory-constrained microcontroller systems.
Another possible, but rather uncommon case is that of main() calling itself recursively
That's nonsense, no sane person would ever write such code. Using recursion in an embedded system is very questionable practice and usually banned by coding standards. Truth is, there are very few cases where recursion makes sense to use in any program, embedded or not.
Once reason is that the offset of a static variable in the executable file is determined at link time and can be relied upon to be at the same place.
Therefore, a useful purpose of a static variable at the main level is to include program version data as readable strings or binary readable data that later can be used to analyze/organize executables without needing to have the original source code or any program-specific utilities, other than a hex dump program. We used this hack back in the day when deploying programs where the target system could not be relied upon to have any development tools.
For example,
int main(void)
{
// tag to search for revision number
static char versionID[3] = {'V','E','R'};
// statically embedded program version is 2.1
static unsigned char major_revision = 2;
static unsigned char minor_revision = 1;
printf("\nHello World");
return 0;
}
Now the version of the program can be determined without running it:
$ od -c -x hello-world | grep "V E R"
0010040 001 002 V E R G C C : ( U b u n t
Actually, there's an enormous reason!
If you declare a variable as static within main() (or, any function ...), you are declaring that this local variable is also static.
What this means is that, if this function calls itself ... (which main() probably wouldn't do, although it certainly could ...) ... each recursive instance would see the same value, because this variable isn't being allocated on the stack. (Because, ummm, "it is static!")
"Stack size" is not a valid reason to use static. (In fact, it has nothing to do with it.) If you're concerned about having room "on the stack" to store something, the correct thing to do is to "store it in the heap, instead," using a pointer variable. (Which, like any variable, could be static, or not.)

How does the process of memory allocation of dynamic variables?

When a function is called, a space in memory is reserved for local variables (formal parameters and those declared within the function's scope).
I understand that in ANSI C, because it is required that the variables are declared at the beginning of a block.
However, in the case of the following C code compiled with GCC, will the z variable will have its space allocated at the beginning of the block or only when y is equal to 42?
void foo(int x) {
int y;
scanf("%d%*c", &y);
if (y != 42)
return;
int z;
return;
}
Is the behavior the same for other higher level languages such as Python and Ruby, with similar code?
This is typically implemented by reserving space on the stack for all variables that are declared in the method. It would certainly be possible to do it dynamically, but that would require each "potential" variable to internally be represented as a pointer (since its address cannot be known in advance), and the overhead would almost certainly not be worth it. If you really want "dynamic" variables, you can implement it yourself with pointers and dynamic memory allocation.
Java and C# do the same thing: they reserve space for the total collection of local variables.
I don't really know about Python or Ruby, but in these languages, there is no such thing as a primitive data type: all values are references and stored on the heap. As such, it is entirely possible that the storage space for the value referred to by a variable won't appear until the variable "declaration" is executed (although "declaration" isn't really a thing in dynamic languages; it's more of an assignment to a variable that happens do not exist yet). Note, though, that the variable itself also requires storage space (it's a pointer, after all) - however, the variables of dynamic languages are often implemented as hashmaps, so the variables themselves may also dynamically appear and disappear.

Why we must initialize a variable before using it? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What happens to a declared, uninitialized variable in C? Does it have a value?
Now I'm reading Teach Yourself C in 21 Days. In chapter 3, there is a note like this:
DON'T use a variable that hasn't been initialized. Results can be
unpredictable.
Please explain to me why this is the case. The book provide no further clarification about it.
Because, unless the variable has static storage space, it's initial value is indeterminate. You cannot rely on it being anything as the standard does not define it. Even statically allocated variables should be initialized though. Just initialize your variables and avoid a potential headache in the future. There is no good reason to not initialize a variable, plenty of good reasons to do the opposite.
On a side note, don't trust any book that claims to teach you X programming language in 21 days. They are lying, get yourself a decent book.
When a variable is declared, it will point to a piece of memory.
Accessing the value of the variable will give you the contents of that piece of memory.
However until the variable is initialised, that piece of memory could contain anything. This is why using it is unpredictable.
Other languages may assist you in this area by initialising variables automatically when you assign them, but as a C programmer you are working with a fairly low-level language that makes no assumptions about what you want to do with your program. You as the programmer must explicitly tell the program to do everything.
This means initialising variables, but it also means a lot more besides. For example, in C you need to be very careful that you de-allocate any resources that you allocate once you're finished with them. Other languages will automatically clean up after you when the program finished; but in C if you forget, you'll just end up with memory leaks.
C will let you get do a lot of things that would be difficult or impossible in other languages. But this power also means that you have to take responsibility for the housekeeping tasks that you take for granted in those other languages.
You might end up using a variable declared outside the scope of your current method.
Consider the following
int count = 0;
while(count < 500)
{
doFunction();
count ++;
}
...
void doFunction() {
count = sizeof(someString);
print(count);
}
In C, using the values of uninitialized automatic variables (non-static locals and parameter variables) that satisfy the requirement for being a register variable is undefined behavior, because such variables values may have been fetched from a register and certain platforms may abort your program if you read such uninitialized values. This includes variables of type unsigned char (this was added to a later C spec, to accommodate these platforms).
Using values of uninitialized automatic variables that do not satisfy the requirement for being a register variable, like variables that have their addresses taken, is fine as long as the C implementation you use hasn't got trap representations for the variable type you use. For example if the variable type is unsigned char, the Standard requires all platforms not to have trap representations stored in such variables, and a read of an indeterminate value from it will always succeed and is not undefined behavior. Types like int or short don't have such guarantees, so your program may crash, depending on the C implementation you use.
Variables of static storage duration are always initialized if you don't do it explicitly, so you don't have to worry here.
For variables of allocated storage duration (malloc ...), the same applies as for automatic variables that don't satisfy the requirements for being a register variable, because in these cases the affected C implementations need to make your program read from memory, and won't run into the problems in which register reads may raise exceptions.
In C the value of an uninitialized variable is indeterminate. This means you don't know its value and it can differ depending on platform and compiler. The same is also true for compounds such as struct and union types.
Why? Sometimes you don't need to initialize a variable as you are going to pass it to a function that fills it and that doesn't care about what is in the variable. You don't want the overhead of initialization imposed upon you.
How? Primitive types can be initialized from literals or function return values.
int i = 0;
int j = foo();
Structures can be zero intialized with the aggregate intializer syntax:
struct Foo { int i; int j; }
struct Foo f = {0};
At the risk of being pedantic, the statement
DON'T use a variable that hasn't been initialized.
is incorrect. If would be better expressed:
Do not use the value of an uninitialised variable.
The language distinguishes between initialisation and assignment, so the first warning is, in that sense over cautious - you need not provide an initialiser for every variable, but you should have either assigned or initialised a variable with a useful and meaningful value before you perform any operations that subsequently use its value.
So the following is fine:
int i ; // uninitialised variable
i = some_function() ; // variable is "used" in left of assignment expression.
some_other_function( i ) ; // value of variable is used
even though when some_function() is called i is uninitialised. If you are assigning a variable, you are definitely "using" it (to store the return value in this case); the fact that it is not initialised is irrelevant because you are not using its value.
Now if you adhere to
Do not use the value of an uninitialised variable.
as I have suggested, the reason for this requirement becomes obvious - why would you take the value of a variable without knowing that it contained something meaningful? A valid question might then be "why does C not initialise auto variables with a known value. And possible answers to that would be:
Any arbitrary compiler supplied value need not be meaningful in the context of the application - or worse, it may have a meaning that is contrary to the actual state of the application.
C was deliberately designed to have no hidden overhead due to its roots as systems programming language - initialisation is only performed when explicitly coded, because it required additional machine instructions and CPU cycles to perform.
Note that static variables are always initialised to zero. .NET languages, such as C# have the concept of an null value, or a variable that contains nothing, and that can be explicitly tested and even assigned. A variable in C cannot contain nothing, but what it contains can be indeterminate, and therefore code that uses its value will behave non-deterministically.

Why don't I get 0, 1, 2 as output?

I made a small test to see how the stack is used in a C application.
#include <stdio.h>
void a(void)
{
int a = 0;
}
void b(void)
{
int b;
printf("%i\n", b++);
}
int main(void)
{
a();
b();
b();
b();
fflush(stdin), getc(stdin);
return 0;
}
Isn't b allocated in the same place on the stack where a was? I would expect the output to be 0 1 2, but instead I get the same garbage value three times. Why is that?
About the only way to get a definitive answer about why you're getting what you're getting is to get the assembly language output from the compiler and see what it's doing. At a guess, the entirety of a() is being removed as dead code, and b is probably being allocated in a register, so even if a had been allocated and initialized, there'd still be a fair chance that b wouldn't end up in the same storage.
From a viewpoint of the language, there's not really any answer -- your code simply has undefined behavior from using an uninitialized variable. Just to add insult to injury, your fflush(stdin) also causes undefined behavior, so even if the rest of the code made some sense, you still wouldn't have any guarantee about what output it would produce.
I might guess that you got all zeroes as output, but this is not necessarily the case. In function b, you are declaring a new int b which is created for each execution of the code. B is uninitialized in your code, but some compilers will zero this value. This is not standard, and should NOT be counted on. You should always initialize your variables.
As far as the stack goes, that is implementation specific and dependent upon the compiler, optimizer settings, etc. There is no guarantee that this variable ever lives on the stack. Chances are it does not, given the short duration of scope it may just live in a CPU register.
In the above code b and a are completely independent variables, and thus should not be counted on to have the same value, even if they are stored in the same memory location.
What you are doing is invoking undefined behaviour, except in C99 where the value is indeterminate. Either way you don't know exactly what will happen.
There are no guarantees about the condition of the stack when you leave a method, nor what value uninitialized variables will have.
The assignement that is part of b++ in your function b() must not necessarily be performed by the compiler since b is not read afterwards. But what is more important here is if you don't have an initializer:
The initial value of the object is indeterminate.
that's it. (Not UB, as other say.) The compiler is free to implement this in any way of his liking.
NB: The word "stack" doesn't appear anywhere in the C standard. Whereas this a convenient concept to implement auto variables in C, there is no obligation for the compiler to use that concept for a given variable, and in particular there is no obligation at all to store a variable in memory. It can well just hold all variables in registers, if the platform allows for that. So if you'd look into the assembler that is produced for a() you most probably just see nothing, just an empty return.
Have you read these slides already?
http://www.slideshare.net/olvemaudal/deep-c
There is some discussion about stack behaviour like this. Of course you can never rely on the value in an auto variable. The compiler is free to put these variables on registers. Or on the stack.
B's value is indeterminate. You can't learn anything by running this program.

Resources