If I understand right that global variables (which go into data segment) in C are initialized where auto variables (which go into stack) are not. or perhaps the other way round?
Why is it so? What is merit of compiler not initializing both kind of variables? Does it increase speed etc?
As you say, global variables go in the data segment, so their value is contained in the final executable, and it might as well be an initialised value as there is no performance difference either way.
On the other hand, local variables are allocated onto the stack, which is set up at run time, so initialising them would have a performance hit.
You understand right, global are initialized, auto are not. This is because globals are loaded directly from the program binary image and the initialization is "free", whereas auto are on stack, and code needs to run to change values and initialize them (i.e.: performance hit).
Related
I've actually seen some results by testing it, but I want to know which way is better and why.
Question #1: Do local variables get declared every time when I call that function again and again? I know that it is better to declare variables in the narrowest scope possible. But I can not stop myself thinking about declaring it as a global variable and make it get declared only once, not in every function call. Or, does it get declared again in every function call? I know that the scope of a local variable is only that function. So when it leaves that function, it must forget that variable as it is going out of its scope right?
Question #2: When I have some function variables which need to store its previous content(e.g. timer counter variables), which way is better: to declare them as a global variable or to declare them as a static local variable? I don't need them to get their initial values whenever I call that function, I am already setting them to zero or etc whenever I need.
Question #1: Do local variables get declared every time when I call that function again and again?
A1: Yes, but it's not an issue really.
Declaring a local variable means that space is made for that variable on the stack, within the stack frame of that function. Declaring a variable global means that space is made for that variable in the data section of the executable (if the variable is initialized), or the BSS section (if not).
Allocating on the stack comes at zero cost. At function entry, the stack frame is sized to make room for all local variables of the function. One more or less does not matter. Statically allocating (for a global variable) is a tad quicker, but you only get that one variable. This can become a huge issue at some later point, e.g. if you want to make your program multithreaded, your function re-entrant, or your algorithm recursive. It can also become a major hassle during debugging, wasting hours of unproductive time while you are hunting down that bug.
(This is the main point of it all: The performance difference is really negligible. The time you can waste on a suboptimal design riddled with globals, on the other hand, can be quite significant.)
Question #2: [...] which way is better: to declare them as a global variable or to declare them as a static local variable?
A2: From an architectural standpoint, avoid globals wherever possible. There are a few specific cases where they make sense, but you know them when you see them. If you can make it work without globals, avoid them. (The same is true, actually, for static locals. They are better than globals as they are limited in scope, and there are cases where they make sense, but local variables should really be the "default" in your mind.)
Global variable - declared at the start of the program, their global scope means they can be used in any procedure or subroutine in
the program
It is seldom advisable to use Global variables as they are liable to cause bugs, waste memory and can be hard to follow when tracing code. If you declare a global variable it will continue to use memory whilst a program is running even if you no longer need/use it.
Local variable - declared within subroutines or programming blocks, their local scope means they can only be used within the
subroutine or program block they were declared in
Local variables are initiated within a limited scope, this means they are declared when a function or subroutine is called, and once the function ends, the memory taken up by the variable is released. This contrasts with global variables which do not release memory.
Question #1: YES. Local variables get declared every time when you call that function again and again. After it leaves the function it forgots the variables that you declared in that scope. You must also remember that when some variable faced, the program will start to search for it. So when it is closer like declaring in the same scope, it will find faster and be able to continue. Also this will be more efficent while you are coding and will cause less bugs and mistakes etc.
Question #2: If you use the same variable with different functions, I strongly suggest you to declare them as global or define, this will lead the program to carry your "counter" with it. So it can be fastly use it when you need between the scopes you travel.
But after these conditions I must strongly suggest you to:
avoid globals wherever possible (as #DevSolar said)
Q2:
It is usually more preferable to use static variables in your function. The main reason is that since all functions can access global variables, it is very hard to keep track of and debug your program.
Q1:
Yes, local variables are created every time the function it belongs to is run, and deleted when the ends.
Suppose your program has 5 functions (that are rarely used), and each function uses 6 local variables. If you change them all to global variables, you will have all 30 variables taking up space for the entire duration of your program, instead of only have 5 variables occasionally being created and destroyed. Moreover, allocation does not really take much time.
I read some topics over optimization and it is mentioned that global variables can not be stored in registers and hence if we need to optimize we use register variable to store the global data and modify that register variable. Is this applies to static variables too?
For auto storage, what if we store auto variables in register variables? Won't it faster the access from register instead of stack?
Both global variables and static variables exist in the data segment, which includes the data, BSS, and heap sections. If the static variable is initialized to 0 or not initialized to anything, it goes in the BSS section. If it is given a non-zero initialization value, then it is in the "data" section. See:
http://en.wikipedia.org/wiki/Data_segment
As for auto vs. register variables: register does not guarantee that the variable will be stored in a register, it is more providing a hint from the programmer. See:
http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/CONCEPT/storage_class.html
Yes, it is (much) faster to access a register than to access the stack memory, but this optimization nowadays is left up to the compiler (the problem of register allocation) as well as the CPU architecture (which has a great many optimizations too complex to explain here).
Unless you're programming for a really simple or old architecture and/or using a really outdated compiler, you probably should not worry about this kind of optimization.
global variables' values can be held in registers for so long as the compiler can prove there is no other access to the stored value. With values that can't be held in a register themselves, declaring a pointer with the restrict keyword declares that a value isn't being accessed via any other means for that pointer's lifetime; just don't give away any copies and the compiler will take care of the rest. For scalars declaring thistype localval=globalval; works at least as well if you're not changing the value or you've got good control over scope exits -- or even better.
You can only use the restrict declaration if the value really won't be accessed otherwise. Optimizers these days can for instance deduce from your declaring the object won't be accessed in one function that a code path that does access it in another won't be executed, and from that deduce the content of the expression used to take that code path, and so on. "If you lie to the compiler, it will have its revenge" is more true today than ever.
Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.
If i have a test.c file with the following
#include ...
int global = 0;
int main() {
int local1 = 0;
while(1) {
int local2 = 0;
// Do some operation with one of them
}
return 0;
}
So if I had to use one of this variables in the while loop, which one would be preferred?
Maybe I'm being a little vague here, but I want to know if the difference in time/space allocation is actually relevant.
If you are wondering whether declaring a variable inside a for loop causes it to be created/destroyed at every iteration, there is nothing really to worry about. These variables are not dynamically allocated at runtime, nothing is being malloced here - just some memory is being set aside for use inside the loop. So having the variable inside is just the same as having it outside the loop in terms of performance.
The real difference here is scope not performance. Whether you use a global or local variable only affects where you want this variable to be visible.
In case you're wondering about performance differences: most likely there aren't any. If there are theoretical performance differences, you'll find it hard to actually devise a test to measure them.
A decision like this should not be based on performance but semantics. Unless the semantic behavior of a global variable is required, you should always use automatic (local non-static) variables.
As others have said and surely will say, there are unlikely to be any differences in performance. If there are, the automatic variable will be faster.
The C compiler will have an easier time making optimizations on the variables declared local to the function. The global variable would require an optimizer to perform "Inter-Procedural Data Flow Analysis", which isn't that commonly done.
As an example of the difference, consider that all your declarations initialize the variable to zero. However, in the case of the global variable, the compiler cannot use that information unless it verifies that no flow of control in your program can change the global prior to using it in your example function. In the case of the locally declared ("automatic") variables, there is no way the initial value can be changed by another function (in particular, the compiler verifies that their address is never passed to a sub-function) and the compiler can perform "killed definitions" and "value liveness" analysis to determine whether the zero value can be assumed in some code paths.
Of the two local variables, as a guideline, the optimizer will always have an easier time optimizing access to the variable with the smaller (more limited) scope.
Having stated the above, I would suggest that other answers concerning a bias toward semantics over optimizer-meta-optimization is correct. Use the variable which causes the code to read best, and you will be rewarded with more time returned to you than assisting the def-use optimization calculation.
In general, avoid using a global variable, or any variable which can be accessed more broadly than absolutely necessary. Limited scoping of variables helps prevent bugs from being introduced during later program maintenance.
There are three broad classes of variables: static (global), stack (auto), and register.
Register variables are stored in CPU registers. Registers are very fast word-sized memories, which are integrated in the CPU pipeline. They are free to access, but there are a very limited number of them (typically between 8 and 32 depending on your processor and what operations you're doing).
Stack variables are stored in an area of RAM called the stack. The stack is almost always going to be in the cache, so stack variables typically take 1-4 cycles to access.
Generally, local variables can be either in registers or on the stack. It doesn't matter whether they are allocated at the top of a function or in a loop; they will only be allocated once per function call, and allocation is basically free. The compiler will put variables in registers if at all possible, but if you have more active variables than registers, they won't all fit. Also, if you take the address of a variable, it must be stored on the stack since registers don't have addresses.
Global and static variables are a different beast. Since they are not usually accessed frequently, they may not be in cache, so it could take hundreds of cycles to access them. Also, since the compiler may not know the address of a global variable ahead of time, it may need to be looked up, which is also expensive.
As others have said, don't worry too much about this stuff. It's definitely good to know, but it shouldn't affect the way you write your programs. Write code that makes sense, and let the compiler worry about optimization. If you get into compiler development, then you can start worrying about it. :)
Edit: more details on allocation:
Register variables are allocated by the compiler, so there is no runtime cost. The code will just put a value in a register as soon as the value is produced.
Stack variables are allocated by your program at runtime. Typically, when a function is called, the first thing it will do is reserve enough stack space for all of its local variables. So there is no per-variable cost.
Why do the variables take garbage values?
I guess the rationale for this is that your program will be faster.
If compiler automatically reset (ie: initialize to 0 or to NaN for float/doubles etc etc) your variables, it would take some time doing that (it'd have to write to memory).
In many cases initializing variables could be unneeded: maybe you will never access your variable, or will write on it the first time you access it.
Today this optimization is arguable: the overhead due to initializing variables is maybe not worth the problems caused by variables uninitialized by mistake, but when C has been defined things were different.
Unassigned variables has so-called indeterminate state that can be implemented in whatever way, usually by just keeping unchanged whatever data was in memory now occupied by the variable.
It just takes whatever is in memory at the address the variable is pointing to.
When you allocate a variable you are allocating some memory. if you dont overwrite it, memory will contain whatever "random" information was there before and that is called garbage value.
Why would it not? A better question might be "Can you explain how it comes about that a member variable in C# which is not initialised has a known default value?"
When variable is declared in C, it involves only assigning memory to variable and no implicit assignment. Thus when you get value from it, it has what is stored in memory cast to your variable datatype. That value we call as garbage value. It remains so, because C language implementations have memory management which does not handle this issue.
This happens with local variables and memory allocated from the heap with malloc(). Local variables are the more typical mishap. They are stored in the stack frame of the function. Which is created simply by adjusting the stack pointer by the amount of storage required for the local variables.
The values those variables will have upon entry of the function is essentially random, whatever happened to be stored in those memory locations from a previous function call that happened to use the same stack area.
It is a nasty source of hard to diagnose bugs. Not in the least because the values aren't really random. As long as the program has predictable call patterns, it is likely that the initial value repeats well. A compiler often has a debug feature that lets it inject code in the preamble of the function that initializes all local variables. A value that's likely to produce bizarre calculation results or a protected mode access violation.
Notable perhaps as well is that managed environments initialize local variables automatically. That isn't done to help the programmer fall into the pit of success, it's done because not initializing them is a security hazard. It lets code that runs in a sandbox access memory that was written by privileged code.