global variables (memory binding) - c

Consider the following code:
#include<stdio.h>
int a=0;
int main()
{
//some code
}
I have learned that physical memory binding for static variables is done at loadtime.
When is the memory binding done for 'a'? And where is it stored, in the stack area or static area?

As has been pointed out, the general behavior is platform-dependent and thus there's no universally valid answer, but on most modern, "normal" systems, what happens is that the compiler generates a .data section in the resulting object file, containing the initialization values of the variables you define.
When you start the program, then, the program loader memory-maps that .data section directly from the executable file into the newly created process' virtual memory, available for your program to read from and write to (probably using some COW scheme to keep each process' copy private).
The term "memory binding" that you use is not part of the normal terminology, so I don't know exactly what you're asking, but perhaps this helps?

a is in static storage, since it is global. Only the local variables of a function are on the stack.
You can use the static keyword in a function to make the storage type of that variable static, too.
However, static on globals has a different meaning (since they are already of static storage type): the symbol for the variable is not exported to the object file, so that variable will not be directly accessible from other modules (.c files).

When compiling, the compiler knows "a" is a global variable and put "a" into the data section of the executable file. In that area, the executable file records the virtual address of "a". And when the executable is loaded into the operating system for running, and the "a" is used during running, the OS will map a physical address to the virtual address of "a". The rest code of the executable only needs to know the virtual address of "a" to access it, and the OS will do the mapping and go to the physical memory for reading/writing. And the virtual address of "a" is determined by the compiler during compiling.
For more knowledge, the book "Computer Systems: A Programmer's Perspective" is a good source.

Related

How/when memory is assigned to global variables in C

I am aware of C memory layout and binary formation process.
I have a doubt/query regarding the phase when and who assigns address to global variables.
extern int dummy; //Declared in some other file
int * pTest = &dummy;
This code compiles well. Here pTest will have address of dummy only if address is assigned to it.
I want to know in which phase (compilation or linker) does dummy variable gets address?
The compiler says:
int *pTest = &<where is dummy?>;
The linker says:
int *pTest= &<dummy is here>;
The loader says:
int *pTest= <dummy is at 0x1234>;
This somewhat simplified explanation tries to convey the following:
The compiler identifies that an external variable dummy is used
The linker identifies where and in which module this variable resides
But only once the executable program is placed in memory is the actual location of the variable known and the loader puts this actual address in all the places where dummy is used.
the actual process is actually a bit different.
The compiler saves the information in the object file about the the assignment and the external object reference.
The linker depending on the actual hardware IS and implementation calculates the absolute address ( if the code will be placed at the fixed address - for example the embedded uC project) or same virtual and sets the entry in the relocation table (If the code is position independent) and the loaded is changing this virtuall address to the correct one during the program loading and start-up.

Static variable inside a function

This is more of a theoretic question.
Say I have the following C program:
int a;
int f(){
double b;
static float c;
}
The question reads: For each of the variables (a, b, c), name the following: storage duration (lifetime), scope of identifier, the memory segment in which it is kept and its initial value.
As far as I've understood the theory so far:
For the variable a:
lifetime: static
scope of identifier: file level scope
memory segment: data segment
initial value: 0
For the variable b:
lifetime: automatic (local)
scope level: block level scope
memory segment: stack
initial value: undefined (random)
But the variable C is what confuses me.
As far as I understand its lifetime i static, its scope level is of block level scope, but I'm not sure about the memory segment or the initial value.
Usually, the local variables of a function are kept in the stack segment, but since the variable is static, should it then be kept in the data segment instead?
Normally you don't need to deal with concepts like "segment", it depends on the file format(ELF, Mach-O, etc.).
A static variable, no matter where it is defined, their lifetime and initialization rules are the same. The only difference is the visibility of this symbol to compiler and linker. In your particular example, static float c is also zero initialized, just as int a.
And technically, if you are dealing with linux and ELF format, static variable without explicit initialization is put in .bss segment, not .data segment. .bss segment has no physical size in the file, but will be zero-initialized when the ELF file is loaded to execute.
You can use nm command to see the symbols in your file if you are interested in.
This is a just a complement to you own analysis and #liliscent's answer. Variable a has external linkage, because it declared at file level with no static specifier. That means that it can be accessed from a different translation unit provided it is declared there as extern int a;. The other variables cannot be accessed from other translation units.
The concept of segment can refer to 2 different things :
Either the segments as seen by the CPU, which are references to a part of the memory pointed to by a segment register, Or a logical segment which is a name for some kind of data (as seen in assembler source code).
For an example, the .bss segment has no real existence. It only means : a part of the data segment which is initialized to zero and for this reason, doesn't need to be saved as data in the program file.
For the rest, one can assume there 3 kind of segments : Code, data and stack, with a special case for the heap, which is dynamically allocated in data segment, but this merely an implementation problem, which might vary according to the implementation.
However, for the purpose of simplification, one could consider as true that all static variables are allocated in the data segment, with just one specificity for data initialized to 0, which is in .bss (and thus, still in the data segment, but not imaged in the program file).
The only difference between global and local static, is it's visibility and its "name space" : you can have multiple static variables with the same name, local to different function and they will all be seen only in the function in which they were declared, but initialized at the beginning of the execution.
So on the contrary as automatic variables, which are allocated on the stack, each time the function is called - and thus, exists multiple times if the function is called recursively; static variable are shared by all simultaneous instances of the function. i.e. if a function calls itself and the called change the value of a static variable, the value will be changed for the caller too.

What uses up more space in FLASH? static variable or global variable

As the title says, what uses up more space in FLASH (in an STM32 µC for example)? Declaring a global variable or declaring a static variable inside a function? Or do they take equal space? Both variables are available throughout the whole runtime of the program in my understanding. Just their scopes are different.
You can have 0-initialized global and static variables. Those normally take up no flash, because they are placed in memory location which is allocated and zeroed when program starts and does not come from flash.
You can initialize the variables with value too. In that case they are placed in the initalized data segment, so take up space from flash according to size of the data type.
Static variables inside functions you can also initialize with code. That initializaton must happen at runtime, but can happen only once, so it actually generates more code, which will in almost any case take more space than the size of the data (not necessarily, at least if you initialize a large enough struct with a function return value). You can do almost same for non-const global variables too, you just need to leave them 0-initialized orignally and put assignment (for example) at the start of main(), where it takes the same space as initialization of function scope static variable by code takes elsewhere.
Conclusion, both global and function-scope static variables take up same amount of space.
Above assumes "global variable" in embedded context, or as a file-scope static variable. If it is exported global symbol in a dynamically linkable executable, then relocation information for that symbol will take some space in the executable binary. However, I don't think given example system supports or uses relocatable executables.
The formal term for "available throughout the whole runtime" is static storage duration. Variables declared at file scope ("global") as well as all variables declared with static both have static storage duration.
So there is a relation between scope and storage duration: scope can dictate what storage duration a variable gets. But there is no relation between scope and memory usage.
How much space a variable takes up only depends on how large that variable type is. Scope and storage duration has nothing to do with it.
On most compilers/linkers, there are usually two things required for a variable to end up in flash:
It must be declared as const, and
It must have static storage duration
If these conditions aren't met, the variable will not end up in flash/nvm, regardless of which scope it is declared at.
As the title says, what uses up more space in FLASH (in an STM32 µC for example)? Declaring a global variable or declaring a static variable inside a function? Or do they take equal space?
Using arm-none-eabi-gcc as the reference for an STM32 build, neither take any flash space at all.
Global and static variables that are not declared const go either into the .data section if they require startup initialisation or into .bss if they don't. Both of those segments are placed into SRAM by your linker script. If you're doing C++ then static C++ classes end up in .bss.
If you do declare them const then they'll be placed into the .rodata section which, if you consult your linker script you should find being located into a subsection of .text which is in flash. Flash is usually more plentiful than SRAM so do make use of const where you can.
Finally, the optimizer can come along and totally rearrange anything it sees fit, including the elimination of storage in favour of inlining.

How the static variable gets retrieved for every function call

We know that when the control exits from function the stack space will be freed. So what happens for static variables. Will they be saved in any memory and retrieved when the function gets called ??
The wiki says:
In the C programming language, static is used with global variables
and functions to set their scope to the containing file. In local
variables, static is used to store the variable in the statically
allocated memory instead of the automatically allocated memory. While
the language does not dictate the implementation of either type of
memory, statically allocated memory is typically reserved in data
segment of the program at compile time, while the automatically
allocated memory is normally implemented as a transient call stack.
and
Static local variables: variables declared as static inside a function
are statically allocated while having the same scope as automatic
local variables. Hence whatever values the function puts into its
static local variables during one call will still be present when the
function is called again.
Yes, static variables persist between function calls. They reside in data section of the program, like global variables.
You can (and probably should) read more about general memory layout of C applications here.
Adding some more information on top of previously given answers -
The memory for static objects is allocated at compile/link time. Their address is fixed by the linker based on the linker control file.
The linker file defines the physical memory layout (Flash/SRAM) and placement of the different program regions.
The static region is actually subdivided into two further sections, one for initial value, and the other for changes done in run time.
And finally, remember that if you will not specify otherwise, the value will be set to 0 during compilation.
You made an incorrect assumption that static variables are placed on the stack* when the function that uses them is running, so they need to be saved and retrieved.
This is not how C does it: static variables are allocated in an entirely different memory segment outside of stack, so they do not get freed when the function ends the scope of its automatic variables.
Typically, static data segment is created and initialized once upon entering the program. After that the segment stays allocated for as long as your program is running. All your global variables, along with the static variables from all functions, are placed in this segment by the compiler. That is why entering or leaving functions has no effect on these variables.
* The official name for "stack" is "automatic storage area".
Consider this example:
static int foo;
void f(void)
{
static int bar;
}
The only difference between foo and bar is that foo has file scope whereas bar has function scope. Both variables exist during the whole lifetime of the program.

How is scope of variable implemented in compiler at machine level or memory level

How is scope of a variable is implemented by compilers?
I mean, when we say static variable, the scope is limited to the block or functions that defined in the same file where the static variable is defined?
How is this achieved in machine level or at memory level?
How actually is this restriction achieved?
How is this scoping resolved at program run time?
It is not achieved at all at the machine level. The compiler checks for scopes before machine code is actually generated. The rules of C are implemented by the compiler, not by the machine. The compiler must check those rules, the machine does not and cannot.
A very simplistic explanation of how the compiler checks this:
Whenever a scope is introduced, the compiler gives it a name and puts it in a structure (a tree) that makes it easy to determine the position of that scope in relation to other scopes, and it is marked as being the current scope. When a variable is declared, its assigned to the current scope. When accessing a variable, it is looked for in the current scope. If not found, the tree is looked up to find the scope above the current one. This continues until we reach the topmost scope. If the variable is still not found, then we have a scope violation.
inside compilers, its implementation defined. For example if I were writing a compiler, I would use a tree to define 'scope' and it would definitely be a symbol table inside a binary tree.
Some would use an arbitrary depth Hash table. Its all implementation defined.
I'm not 100% sure I understand what you are asking, but if you mean "how are static variables and functions stored in the final program", that is implementation-defined.
That said, a common way of storing such variables and functions is in the same place as any other global symbols (and some non-global ones) -- the difference is that these are not "exported", and thus not visible in any outside code trying to link to our software.
In other words, a program which has the following in it:
int var;
static int svar;
int func() { static int func_static; ... }
static int sfunc() { ... }
... might have the following layout in memory (let's say our data starts at 0xF000 and functions at 0xFF00):
0xF000: var
0xF004: svar
0xF008: func.func_static
...
0xFF00: func's data
0xFF40: sfunc's data /* assuming we needed 0x40 bytes for `func`! */
The list of exports, however, would only contain the non-static symbols, aka the exported ones:
var v 0xF000
func f 0xFF00
Again -- note how, while the static data is still written into the files (it has to be stored somewhere!), it is not exported; in layman's terms, our program does not tell anyone that it contains svar, sfunc and similar.
In Unices, you can list the symbols that a library or a program exports with the nm tool: http://unixhelp.ed.ac.uk/CGI/man-cgi?nm ; there do exist similar tools for Windows (GnuWin32 might have something similar).
In practice, executable code is often stored separately from the data (so that it can be protected from writes, for example), and it both may get reordered to minimize memory use and cache misses, but the idea remains the same.
Of course, optimizations can be applied -- for example, a static function could be inlined in its every invokation, meaning that no code is generated for the function itself at all, and thus it does not exist on its own anywhere.

Resources