When variable will free in C program - c

In dynamic memory allocation, the memory which is used by malloc() and calloc() can be freed by using free().
In static memory allocation, when is the variable in main() freed from the program?
In a tutorial, I learned that when the whole programs is finished then after all variables are freed from RAM.
But someone tells me that when the program is long enough and variable is used early and after that if the variable has no use in the whole code then compiler will automatically free the variable before the end of program.
Can someone please clarify if both statements are correct or not?

The language guarantees that the lifetime of a static storage duration variable is the whole program. So it can be safely accessed at any time.
That being said, the standard only requires the code produced by a conformant compiler to behave as if all language rules were observed. That means that for optimization purposes a compiler is free to release the memory used by a static variable if it knows that the variable will not be used passed a point. Said differently it is neither required nor forbidden and as a programmer you should not even worry for that except for very specific low level optimization questions.
Example:
...
int main() {
static int arr[10240];
// uses arr
..
// no uses of arr passed this point - called B
...
}
The program shall behave as is the array arr existed from the beginning of the program until its end. But as long as arr is not used past point B, the compiler may reuse its memory, because there will be no change in any observable state.
This in only a low level optimization point allowed (but not required) by the as if rule.

As you can see in this question: Why does C not require a garbage collector?
Stephen C, the author of the answer says that:
The culture of these languages (C and C++) is to leave storage
management to the programmer.
Would the correct answer be when the process is terminated all memory used is freed? I think yes. C compiler does not look for garbage or non reachable variables, this is a progammer work.
But I have read that C or C++ garbage collectors exists like Java, they can be useful but remember, the implementation will be slower.
Again, I recommend to you read the question I have attached at the beggining for more information.

In a tutorial, I learned that ...
This tutorial is talking about what you as a programmer can see (unless you are debugging your program):
If you write a program, you can rely on the fact that you can read a static variable until the program has finished.
But someone tells me ...
This person is talking about what is really happening in the background (not visible to the programmer unless you are debugging) when you use an "optimizing" compiler.
... when is the variable ... freed from the program?
We have to distinguish between three types of variables:
Local variables
Local variables reside on the stack. When the function returns, all memory allocated on the stack during the execution of the function is automatically freed.
For this reason, local variables are freed when the function returns or even earlier.
In the following function:
void myFunction(void)
{
int a, b;
a = func1(); /* Line "1" */
func2();
func3(a); /* Line "2" */
b = func4(); /* Line "3" */
func5();
func6(b); /* Line "4" */
}
... an "optimizing" C compiler will detect that the variable a is not needed any longer when the variable b is set. For this reason, it may allocate only enough memory for one int variable. This is as if you only defined one variable (a_and_b) and the compiler replaced both a and b by a_and_b:
int a_and_b;
a_and_b = func1();
...
func6(a_and_b);
If you debug the program, the debugger will tell you that variable b does not exist when debugging lines "1"-"2"; and it will tell you that variable a does not exist when debugging lines "3"-"4".
For this reason, you might say that variable a is "freed" after line "2".
Global variables
Global variables reside in the .data or in the .bss section.
On modern desktop computers (on microcontrollers and on MS-DOS computers it is different) the operating system allocates this memory before the program is started and the operating system frees this memory after the program has finished.
Theoretically, it might be possible to optimize global variables the same way as local variables; this means: To use the same memory for two different global variables if one variable is set the first time after the other one is no longer needed.
However, this would be very complicated because a global variable can be accessed from any C source file in the project. For this reason, I doubt that many compilers and linkers have such a feature.
Static variables
Variables marked with the keyword static are normally also stored in .data and .bss sections - just like global ones.
However, because they can only be accessed from one C source file (or even from only one function), it is much easier to detect that such a variable is no longer used at a certain point in time.
For this reason, a compiler may optimize a static variable the same way as a local variable (so the memory is shared between two variables) or even replace a static variable by a local one (on the stack).
One Example:
int someFunction()
{
static int a;
int b;
a = func7();
b = func8(a); /* Line "5" */
return b + func9(); /* Line "6" */
}
In this case, the program behaves the same way if the variable a is not static. For this reason, we can replace static int by just int.
Now we see that a is no longer read after writing to b. We can replace the two variables a and b by one variable a_and_b.
If the "optimizing" compiler does this, you will see some message "variable does not exist" in the debugger if you stop your program in line "6" and want to read the value of a.
You may say that the variable a has been "freed" in line "5".

Related

How stack structure works with compound statements inside a function?

I'm trying to learn c programming and can't understand how stacks work.
Everywhere I read I find that when a function is called stack frame is created in the stack which contains all the data for the function call- parameters, return address and local variables. And the stack frame is removed releasing the memory when the function returns.
But what if we had a compound statement inside the function which have its own variables. Is the memory for the local variables for block is also allocated inside the stack frame when the function call and released when it returns.
Example
int main(){
int a = 10;
if(int a<50){
int b=9;
}
else{
int c=10;
}
}
Is the memory for b and c is allocated with a when the function starts executing?
And deallocated when the function returns?
If so than there is no difference other than the visibility of the variable when declaring it in the beginning of the function or inside a another block in the function.
Please explain.
The C standard doesn't specify how such things are to be implemented. The C standard doesn't even mention a stack! A stack is a common way of implementing function calls but nothing in the standard requires a stack. All such things are implementation specific details. For the posted code, the standard only specifies when the variables are in scope.
So there is no general answer to your question. The answer depends on your specific system, i.e. processor, compiler, etc.
Provided that your system uses a stack (which is likely), the compiler may reserve stack space for all 3 variables or it may reserve space for 2 variables, i.e. one for awhile b and c share the other. Both implementations will be legal. The compiler is even allowed to place the variables directly in some registers so that nothing needs to be reserved on the stack.
You can check your specific system by looking at the generated assembly code.
A C implementation may implement this in multiple ways. Let’s suppose your example objects, a, b, and c, are actually used in your code in some way that results in the compiler actually allocating memory for them and not optimizing them away. Then:
The compiler could allocate stack space (by decreasing the top-of-stack pointer) for all of a, b, and c when the function starts, and release it when the function ends.
The compiler could allocate stack space for a when the function starts, then allocate space (again by decreasing the stack pointer) in the middle of the function when space for b or c is needed, then release that stack space as each block ends.
In a good modern compiler, the compiler is likely to analyze all the active lifetimes of the objects and find a somewhat optimal solution for using stack space in overlapping ways. By “active lifetime”, I mean the time from when the value of an object is set to the last time that value is needed (not the C standard’s definition of “lifetime”). For example, in int a = f(x); … g(a); h(y); a = f(y); … g(a);, there are actually two lifetimes for a, from its initial assignment to the first g(a) and from the assignment a = f(y); to the second g(a);. If the compiler needs memory to store a, it might use different memory for these two lifetimes.
Because of the above, what memory is used for which C object can get quite complicated. A particular memory location might be used for a at one time and for b at another. It may depend on loops and goto statements in your code. It also depends on whether the address of an object is taken—if the address is taken, the compiler may have to keep the object in one place, so that the address is consistent. (It might be able to get away without doing that, depending on how it can see the address is used.)
Basically, the compiler is free to use the stack, other memory, and registers in whatever way it chooses as long as the observable behavior of your program remains as it is defined by the C standard.
(The observable behavior is the input/output interactions of your program, the data written to files, and the accesses to volatile objects.)
Your example as stated is not valid since you have no brackets in the if-else statement. However, in the example below all variables are typically allocated when the function is entered:
int main(void)
{
int a = 10;
if (a < 50) {
int b = 9;
} else {
int c = 10;
}
}
As mentioned by user "500 - Internal Server Error", this is an implementation issue.

Are there valid reasons to declare variables static inside the main() function of a C program?

I am aware of the various meaning of the static keyword in C, but my question is more specific: Is there any valid reason to declare some variables in the main() function of an embedded C-language program as static?
Since we are talking about variables declared inside the braces of main(), the scope is already local to main(). As regards persistence, the main function is at the top of the call stack and can't exit as long as the program is running. Hence, on the face of it, there would seem to be no valid reason to declare using static inside of main().
However, I notice that an effect of using static declarations is to keep variables and arrays from being placed on the stack. In cases of a limited stack size, this could be important.
Another possible, but rather uncommon case is that of main() calling itself recursively, where you might need some variables to persist at different levels of the recursion.
Are there any other possible valid reasons to use static declarations of variables inside the body of a main() function?
.. valid reasons to use static declarations of variables inside the body of a main() function?
Initialization
int main(void) {
static int a; // initialized to 0
int b; // uninitialized
In C, main() call be called. ref So the usual issues about static variables apply.
int main(void) {
static int c; // Only one instance
int d; // One instance per call
...
main();
Memory location. Various compilers will organize main()'s variable. It is not specified by C, so a compiler dependent issue.
A variable has three attributes besides type:
Scope (visibility)
Life-time
Location
Selection of these attributes is best done according to the semantics of the code. While while a static variable in main() may appear to have the same scope and lifetime as a non-static so differ only in location, it does not have the same semantics. If - as is likely under development - you decide to move or reorganise the code in main() into a sub-routine, the use of static will cause such code to behave differently and potentially incorrectly.
My advice therefore is that you use the same determination of the use of static in main() as you would any other function and not treat it as a special case semantically identical to a non-static. Such an approach does not lead to reusable or maintainable code.
You have to partition the memory into stack, heap and static space in any case, if you have a stack variable with a lifetime of the entire process you rob Peter to pay Paul by using more stack in exchange for less static space, so it is not really an argument, except for the fact that insufficient memory is then a build-time issue rather then run-time, but if that is a serious concern, you might consider making all or most variables static as a matter of course (necessarily precluding reentrancy, and thus recursion and multi-threading). If stack space were truly an issue, recursion would be a bad idea in any case - it is a bad idea in most cases already, and certainly recursion of main().
I see a practical reason to declare a static variable inside  main (with most compilers on most desktop or laptop operating systems): a local variable inside main is consuming space on the call stack.
A static variable consumes space outside the call stack (generally in the .bss or .data segment, at least in ELF executables)
That would make a big difference if that variable takes a lot of space (think of an array of million integers).
The stack space is often limited (to one or a few megabytes) on current (desktop, laptop, tablet) systems.
On some embedded processors, the stack is limited to less than a kilobyte.
I think you have said it. It one makes them static which for main() is not very interesting, second it makes them what I call local globals, it essentially puts them in .data with the globals (and not on the stack), but restricts their scope of access like a local. So for main() I guess the reason is to save some stack. If you read around stack overflow though it seems like some compilers put a big stack frame on main() anyway, which most of the time doesnt make sense, so are you really saving any space? Likewise if they are on the stack from main unless you recursively call main they are not taking any more or less space being in .data vs the stack. if they are optimized into registers, then you still burn the .data where you wouldnt have burned the stack space, so it could cost you a little space.
The result of trying to indirectly access an object defined with automatic storage duration, i.e. a local variable, in a different thread from which the object is associated with, is implementation defined (see: 6.2.4 Storage duration of objects, p5).
If you want your code to be portable you should only share with threads objects defined with static, thread, or allocated storage duration.
In this example, the automatic object defined in the main should have been defined with static:
int main( void )
{
CreateThreads();
type object = { 0 }; //missing static storage-class specifier
ShareWithThreads( &object );
}
As already mentioned, the main reason would be to save stack space and also give you a better idea of the actual stack size need of your program.
Another reason to declare such variables static would be program safety. There are various embedded system design rules that need to be considered here:
The stack should always be memory-mapped so that it grows towards invalid memory and not towards .bss or .data. That means that in case of stack overflow, there is a chance for the program to raise an exception, rather than going Toyota all over your static variables.
For similar reasons, you should avoid declaring large chunks of data on the stack, as that makes the code more prone to stack overflows. This applies particularly to small, memory-constrained microcontroller systems.
Another possible, but rather uncommon case is that of main() calling itself recursively
That's nonsense, no sane person would ever write such code. Using recursion in an embedded system is very questionable practice and usually banned by coding standards. Truth is, there are very few cases where recursion makes sense to use in any program, embedded or not.
Once reason is that the offset of a static variable in the executable file is determined at link time and can be relied upon to be at the same place.
Therefore, a useful purpose of a static variable at the main level is to include program version data as readable strings or binary readable data that later can be used to analyze/organize executables without needing to have the original source code or any program-specific utilities, other than a hex dump program. We used this hack back in the day when deploying programs where the target system could not be relied upon to have any development tools.
For example,
int main(void)
{
// tag to search for revision number
static char versionID[3] = {'V','E','R'};
// statically embedded program version is 2.1
static unsigned char major_revision = 2;
static unsigned char minor_revision = 1;
printf("\nHello World");
return 0;
}
Now the version of the program can be determined without running it:
$ od -c -x hello-world | grep "V E R"
0010040 001 002 V E R G C C : ( U b u n t
Actually, there's an enormous reason!
If you declare a variable as static within main() (or, any function ...), you are declaring that this local variable is also static.
What this means is that, if this function calls itself ... (which main() probably wouldn't do, although it certainly could ...) ... each recursive instance would see the same value, because this variable isn't being allocated on the stack. (Because, ummm, "it is static!")
"Stack size" is not a valid reason to use static. (In fact, it has nothing to do with it.) If you're concerned about having room "on the stack" to store something, the correct thing to do is to "store it in the heap, instead," using a pointer variable. (Which, like any variable, could be static, or not.)

Organization of Virtual Memory in C

For each of the following, where does it appear to be stored in memory, and in what order: global variables, local variables, static local variables, function parameters, global constants, local constants, the functions themselves (and is main a special case?), dynamically allocated variables.
How will I evaluate this experimentally,i.e., using C code?
I know that
global variables -- data
static variables -- data
constant data types -- code
local variables(declared and defined in functions) -- stack
variables declared and defined in main function -- stack
pointers(ex: char *arr,int *arr) -- data or stack
dynamically allocated space(using malloc,calloc) -- heap
You could write some code to create all of the above, and then print out their addresses. For example:
void func(int a) {
int i = 0;
printf("local i address is %x\n", &i);
printf("parameter a address is %x\n", &a);
}
printf("func address is %x\n", (void *) &func);
note the function address is a bit tricky, you have to cast it a void* and when you take the address of a function you omit the (). Compare memory addresses and you will start to get a picture or where things are. Normally text (instructions) are at the bottom (closest to 0x0000) the heap is in the middle, and the stack starts at the top and grows down.
In theory
Pointers are no different from other variables as far as memory location is concerned.
Local variables and parameters might be allocated on the stack or directly in registers.
constant strings will be stored in a special data section, but basically the same kind of location as data.
numerical constants themselves will not be stored anywhere, they will be put into other variables or translated directly into CPU instructions.
for instance int a = 5; will store the constant 5 into the variable a (the actual memory is tied to the variable, not the constant), but a *= 5 will generate the code necessary to multiply a by the constant 5.
main is just a function like any other as far as memory location is concerned. A local main variable is no different from any other local variable, main code is located somewhere in code section like any other function, argc and argv are just parameters like any others (they are provided by the startup code that calls the main), etc.
code generation
Now if you want to see where the compiler and runtime put all these things, a possibility is to write a small program that defines a few of each, and ask the compiler to produce an assembly listing. You will then see how each element is stored.
For heap data, you will see calls to malloc, which is responsible for interfacing with the dynamic memory allocator.
For stack data, you will see strange references to stack pointers (the ebp register on x86 architectures), that will both be used for parameters and (automatic) local variables.
For global/static data, you will see labels named after your variables.
Constant strings will probably be labelled with an awful name, but you will notice they all go into a section (usually named bss) that will be linked next to data.
runtime addresses
Alternatively, you can run this program and ask it to print the addresses of each element. This, however, will not show you the register usage.
If you use a variable address, you will force the compiler to put it into memory, while it could have kept it into a register otherwise.
Note also that the memory organization is compiler and system dependent. The same code compiled with gcc and MSVC may have completely different addresses and elements in a completely different order.
Code optimizer is likely to do strange things too, so I advise to compile your sample code with all optimizations disabled first.
Looking at what the compiler does to gain size and/or speed might be interesting though.

How a pointer initialization C statement in global space gets its assigned value during compile/link time?

The background of this question is to understand how the compiler/linker deals with the pointers when it is initialized in global space.
For e.g.
#include <stdio.h>
int a = 8;
int *p = &a;
int main(void) {
printf("Address of 'a' = %x", p);
return 0;
}
Executing the above code prints the exact address for a.
My question here is, during at which process (compile? or linker?) the pointer p gets address of a ? It would be nice if your explanation includes equivalent Assembly code of the above program and how the compiler and linker deals with pointer assignment int *p = &a; in global space.
P.S: I could find lot of examples when the pointer is declared and initialized in local scope but hardly for global space.
Thanks in advance!
A module is linked (often named crt0.o) along with your program code, which is responsible for setting up the environment for a C program. There will be global and static variables initialized which is executed before main is called.
The actual address of the global variables are determined by the operating system, when it loads an executable and performs the necessary relocations so that the new process can be executed.
To run a program, the system has to load it into RAM. So it creates one huge memory block containing the actual compiled instructions. This block usually also contains a "data section" which contains strings etc. If you declare a global variable, what compilers usually do is reserve space for that variable in such a data section (there's usually several, non-writable ones for strings, and writable ones for globals etc.).
Whenever you reference the global, it just records the offset from the current instruction to that global. So an instruction can just calculate [current instruction address] + [offset] to get at the global, wherever it ended up being loaded. Since space in the data section has been reserved in the file anyway, they can write any (constant) value in there you want, and it will get loaded with the rest of the code.
This is how it works in C, and is why C only allows constants. C++ works like Devolus wrote, where there is extra code that is run before main(). Effectively they rename the main function and give you a function that does the setup, then calls your main function. This allows C++ to call constructors.
There are also some optimizations like, if a global is initialized to zero, it usually just gets an offset in a "zero" section that doesn't exist in the file. The file just says: "After this code, I want 64 bytes of zeroes". That way, your file doesn't waste space on disk with hundreds of "empty" bytes.
It gets a tad more complicated if you have dynamically loaded libraries (dylibs or DLLs), where you have two segments loaded into separate memory blocks. Since neither knows where in RAM the other one ended up, the executable file contains a list of name -> offset mappings. When you load a library, the loader looks up the symbol in the (already loaded) other library and calculates the actual address at which e.g. the global is at, before main() is called (and before any of the constructors run).

Why are static variables auto-initialized to zero? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why global and static variables are initialized to their default values?
What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?
It is required by the standard (§6.7.8/10).
There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.
Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:
int foo() {
static int *ptr;
if (NULL == ptr)
// initialize it
}
If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.
Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.
Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.
The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.
Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.
Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.
When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.
The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.
Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.
Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.
Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.
There is discussion about this here:
First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.
Take a look : here 6.2.4(3) and 6.7.8 (10)
Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.
Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.
If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.
I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.

Resources