Extern variable memory location and Compile/Run time behaviour - c

I have read a lot about extern variables but no one seems to address it appropriately. If I declare and define a variable in C, it gets memory assigned in that scope of the file. but at a later stage in multi-file modular project that variable is declared as an extern which should store the it in the Data segment to exhibit the global behavior intended with the extern functionality.
So I am trying to figure out how and when the memory is being allocated, i.e. the compile time and run time behavior of the extern variable.

One of the compilation units has to define the variable as a global variable. When compiling this file, memory is allocated for the variable in the data segment, similar to file scope variables. The difference is that the variable is registered in the linkage table so that other object files can find it.
All the other compilation units declare it using the extern keyword. This prevents them from allocating memory for the variable, and arranges for the linker to find the external variable.
When you link all the object files together, the linker finds all the object files that have the external reference to the variable, and connects that link to the memory that was allocated in the first object file.

Related

Static global variable vs function static variable C language

my first question here.
can anyone explain to me the differnce between a global static variable (that is, defined above the functions and the "main" combined with "static") and a static variable that is defined inside a function?
as i understand it, they do the same - both enlarge the variable scope to the whole file, and both restrict it to the specific file. so... what am i missing?
plus another question (that came up when looking for a solution) - what is extern? i thought that defining a variable above the main would make it global (as in to other files too), but then i read someone explaining that in order to do so, i must add "extern" before the varible defining.
For variables defined outside of a function, the static keyword limits the variable to being referenced using its identifier from the file in which it is defined. Variables of this type will be assigned a fixed address by the linker (outside of the heap and stack), but will not be assigned a global symbol. If variables of this type are defined in multiple files with the same identifier, the linker should allocate separate memory in each case and should compile without warnings.
Variables defined with the static keyword inside of a function have similar memory allocation, but references to the variable using its identifier are limited to within that function. You could have multiple functions defining static variables with the same identifier and each will be allocated separately.
If an initializer is used for static variables, the variable is initialized once before the program begins executed and the initializer is evaluated at compile time. The initializer must evaluate to a constant expression in this case (no function calls or parameter or variable references).
The compiler will typically build each C file independent of all of the others in a program using symbols to define external dependencies, such as variables and functions located in other files. After each source file is compiled, the linker processes the compiler output to replace symbols with fixed addresses and build the entire program. In order to properly execute the build, the compiler must know the types and sizes of all external functions and variables. The type and number of parameters and return type of functions are typically shared in a header file using function prototype declarations. Function prototype declarations are implicitly identified as external by the compiler. The extern keyword is used to specify the type and size of variables declared in files outside of the file referencing the variable. Declaring a variable without the external keyword would cause it to be defined within the module where it is declared. If two variables with the same identifier are declared in multiple files without the static keyword, then the compiler will typically generate an error since the same global symbol is used multiple times in the program.

Where is the local static variable stored? If it is data segment, why its scope is not whole program?

If static local variable also stored in the data segment, why can't values are not persist for variable which is used in two different functions. example like this.
void func()
{
static int i=0;
i++;
}
void func1()
{
i++; // here i is stored in the data segment,
// then the scope should be available for entire program
}
Why the value 'i' is only accessible to block scope if is stored in data segment? it might be a silly question but I am trying to understand to concept. Please help me to understand concept. Thanks in advance.
You need to differentiate between the scope and the lifetime of a variable.
In simple words:
"scope" means the region of your source code where the variable is known to the compiler. If a variable is (by the rules) not visible to the compiler, it will refuse to compile accesses to it.
"lifetime" means the time beginning with the allocation of memory for the variable until the moment the memory is assigned to another variable or released. A static variable lives as long as the program runs. A non-static variable lives just as long as its scope is in control.
However, just because both scope and lifetime of a variable are "finished", that does not mean that the memory disappears. The physical cells are still there, and they keep their last contents. That's why you can program functions that return a pointer to some local variable, and retrieve that variables contents after both the scope and the lifetime of the variable are gone. This is a fine example of a beginner's confusing issue.
Consider a compiler for an embedded processor like the 8051. Granted, a quite old and simple machine, but a good example. This compiler will commonly put local variables in its data segment. But to use the limited memory space (128 bytes in total, including working registers and stack) the same memory locations are re-used for variables with non-overlapping lifetimes. Eventhough, you could access any memory from all of the program.
Now, language lawyers, start picking on me. ;-)
A variable in C consists of two things:
A name, called an identifier. An identifier has a scope, which is a region of the program source code in which it is visible (may be used).
A region of storage (memory), called an object. An object has a lifetime, which is a portion of program execution during which memory is reserved for it. This is also called storage duration.
For a variable declared inside a function, its identifier has block scope, and the identifier is visible only from its declaration to the } that closes the innermost block it is in. (A block is a list of statements and declarations inside { and }.)
Inside a function, declaring a variable with static makes its object have static storage duration, causing it to exist for all of program execution, but it does not change the scope of its identifier. The object exists throughout program execution, but the identifier is visible only inside the function.
When another function is called, the object still exists (and it can be used if the function has its address, perhaps because it has been passed as a parameter). However, the identifier for the variable is not known inside the source code of other functions, so they cannot use the identifier.

How global and local with same static variable names stored in C internally memory?

#include<stdio.h>
static int a=5;
main()
{
static int a=15;
printf("%d\n",a);
}
So, how are both variables a stored in internal memory?
How are global and local variables with the same variable names stored internally in memory?
#include<stdio.h>
static int a=5;
int main()
{
printf("%p\n",(void *)&a);
static int a=15;
printf("%p\n",(void *)&a);
return 0;
}
Output for the upper program is
0x564e6b67a030
0x564e6b67a034
So you can see that both are stored in different addresses. As one is a global variable and other is local.
The names are only of interest to the human reader and the compiler/linker translating that code to machine executable code. The final object code resolves these to addresses and the names no longer exist.
The compiler distinguishes these the same way you do - by scope; when two identical symbols in the same namespace are in scope simultaneously, the symbol with the most restrictive scope is visible (i.e. may be accessed via the name).
For symbols with external linkage (in your example there are none other then main), the compiler retains the symbol name in order to resolve links between separately compiled modules. In the fully linked executable the symbol names cease to exist (except in debug build symbol meta-data).
The thing is the scope don't let them mess up. The first one has file scope and the other has block scope. (They are different variables - they are stored in separate memories.)
When you use it in the block - compiler checks whether this reference is resolved by anything in the same block. It gets one. And done.
And in case it is in some other function - if it doesn't find anything named a - the search ends in file scope where it finds the name a. That is where the story ends.
Both being static their storage duration is same. They live till the program exists. But their scope is different. If the scope was same too - compiler would have shown you error message.
Here if you compile with -Wshadow option - it will warn you about shadowing a variable. You shadowed the outer a with the inner on that block. That's it.
The facetious answer is that they are stored in different places.
Remember that the names of variables do not (normally) form part of the compiled program, so the compiler just follows the normal rules of variable shadowing. So in your case your print function (that's not a standard C function by the way - did you mean printf?) outputs the a declared in main. The fact that you've used the same name will not bother the compiler at all.
Finally C provides no way of accessing the global scoped a once the other declaration is encountered in main as it's static. (It is wasn't static you could use extern.) See How can I access a shadowed global variable in C?

Static variable inside a function

This is more of a theoretic question.
Say I have the following C program:
int a;
int f(){
double b;
static float c;
}
The question reads: For each of the variables (a, b, c), name the following: storage duration (lifetime), scope of identifier, the memory segment in which it is kept and its initial value.
As far as I've understood the theory so far:
For the variable a:
lifetime: static
scope of identifier: file level scope
memory segment: data segment
initial value: 0
For the variable b:
lifetime: automatic (local)
scope level: block level scope
memory segment: stack
initial value: undefined (random)
But the variable C is what confuses me.
As far as I understand its lifetime i static, its scope level is of block level scope, but I'm not sure about the memory segment or the initial value.
Usually, the local variables of a function are kept in the stack segment, but since the variable is static, should it then be kept in the data segment instead?
Normally you don't need to deal with concepts like "segment", it depends on the file format(ELF, Mach-O, etc.).
A static variable, no matter where it is defined, their lifetime and initialization rules are the same. The only difference is the visibility of this symbol to compiler and linker. In your particular example, static float c is also zero initialized, just as int a.
And technically, if you are dealing with linux and ELF format, static variable without explicit initialization is put in .bss segment, not .data segment. .bss segment has no physical size in the file, but will be zero-initialized when the ELF file is loaded to execute.
You can use nm command to see the symbols in your file if you are interested in.
This is a just a complement to you own analysis and #liliscent's answer. Variable a has external linkage, because it declared at file level with no static specifier. That means that it can be accessed from a different translation unit provided it is declared there as extern int a;. The other variables cannot be accessed from other translation units.
The concept of segment can refer to 2 different things :
Either the segments as seen by the CPU, which are references to a part of the memory pointed to by a segment register, Or a logical segment which is a name for some kind of data (as seen in assembler source code).
For an example, the .bss segment has no real existence. It only means : a part of the data segment which is initialized to zero and for this reason, doesn't need to be saved as data in the program file.
For the rest, one can assume there 3 kind of segments : Code, data and stack, with a special case for the heap, which is dynamically allocated in data segment, but this merely an implementation problem, which might vary according to the implementation.
However, for the purpose of simplification, one could consider as true that all static variables are allocated in the data segment, with just one specificity for data initialized to 0, which is in .bss (and thus, still in the data segment, but not imaged in the program file).
The only difference between global and local static, is it's visibility and its "name space" : you can have multiple static variables with the same name, local to different function and they will all be seen only in the function in which they were declared, but initialized at the beginning of the execution.
So on the contrary as automatic variables, which are allocated on the stack, each time the function is called - and thus, exists multiple times if the function is called recursively; static variable are shared by all simultaneous instances of the function. i.e. if a function calls itself and the called change the value of a static variable, the value will be changed for the caller too.

How are globals handled when multiple .c files are involved

I have have two .c files (main.c and support.c). Support.c is compiled first and then main.c is compiled and linked with support.o. I have several non-static global variables in support.c.
How are those global variables from support.c stored? If main.c is multithreaded and has two threads calling the functions in support.c, are they sharing those globals, or do they each have their own copy?
A global variable is a global variable, and there's always just one, no matter in how many pieces you compile and link your program. If multiple threads access global data concurrently, you need to ensure the proper synchronization yourself.
The only way to get a separate copy of a global or block-static variable is to declare it _Thread_local, which was introduced in C11. Thread-local global variables are initialized when the thread is started, and deallocated when the thread is joined.
I think you might be confusing the usage of the static keyword when it applies to variables that exist top-level in C source (i.e. outside of any functions or methods), vs when you use static on variables within a function or keyword.
A variable declared top-level in the source code, outside of any functions, will be global unless you declare it as static. If it's static, it will be local only to that file. It controls the scope of the variable.
If you declare it static inside a function, it controls the lifetime of the variable. In this case, the variable will retain itself in memory even after the function call exits, resulting in its value persisting across multiple function calls.
If you declare a global variable (i.e. it's not static and is top-level in a source file), there will always only be one instance of it in memory. In other source files, you will have to declare it as extern so the linker knows to look for its memory location as defined in the object file for your other file, but there will be only one of it in memory.
Don't forget to declare the globals as volatile, or the compiler might not realized they can be modified by another thread and make unsafe optimizations.
volatile int g_example;

Resources