How/when memory is assigned to global variables in C - c

I am aware of C memory layout and binary formation process.
I have a doubt/query regarding the phase when and who assigns address to global variables.
extern int dummy; //Declared in some other file
int * pTest = &dummy;
This code compiles well. Here pTest will have address of dummy only if address is assigned to it.
I want to know in which phase (compilation or linker) does dummy variable gets address?

The compiler says:
int *pTest = &<where is dummy?>;
The linker says:
int *pTest= &<dummy is here>;
The loader says:
int *pTest= <dummy is at 0x1234>;
This somewhat simplified explanation tries to convey the following:
The compiler identifies that an external variable dummy is used
The linker identifies where and in which module this variable resides
But only once the executable program is placed in memory is the actual location of the variable known and the loader puts this actual address in all the places where dummy is used.

the actual process is actually a bit different.
The compiler saves the information in the object file about the the assignment and the external object reference.
The linker depending on the actual hardware IS and implementation calculates the absolute address ( if the code will be placed at the fixed address - for example the embedded uC project) or same virtual and sets the entry in the relocation table (If the code is position independent) and the loaded is changing this virtuall address to the correct one during the program loading and start-up.

Related

Static variable inside a function

This is more of a theoretic question.
Say I have the following C program:
int a;
int f(){
double b;
static float c;
}
The question reads: For each of the variables (a, b, c), name the following: storage duration (lifetime), scope of identifier, the memory segment in which it is kept and its initial value.
As far as I've understood the theory so far:
For the variable a:
lifetime: static
scope of identifier: file level scope
memory segment: data segment
initial value: 0
For the variable b:
lifetime: automatic (local)
scope level: block level scope
memory segment: stack
initial value: undefined (random)
But the variable C is what confuses me.
As far as I understand its lifetime i static, its scope level is of block level scope, but I'm not sure about the memory segment or the initial value.
Usually, the local variables of a function are kept in the stack segment, but since the variable is static, should it then be kept in the data segment instead?
Normally you don't need to deal with concepts like "segment", it depends on the file format(ELF, Mach-O, etc.).
A static variable, no matter where it is defined, their lifetime and initialization rules are the same. The only difference is the visibility of this symbol to compiler and linker. In your particular example, static float c is also zero initialized, just as int a.
And technically, if you are dealing with linux and ELF format, static variable without explicit initialization is put in .bss segment, not .data segment. .bss segment has no physical size in the file, but will be zero-initialized when the ELF file is loaded to execute.
You can use nm command to see the symbols in your file if you are interested in.
This is a just a complement to you own analysis and #liliscent's answer. Variable a has external linkage, because it declared at file level with no static specifier. That means that it can be accessed from a different translation unit provided it is declared there as extern int a;. The other variables cannot be accessed from other translation units.
The concept of segment can refer to 2 different things :
Either the segments as seen by the CPU, which are references to a part of the memory pointed to by a segment register, Or a logical segment which is a name for some kind of data (as seen in assembler source code).
For an example, the .bss segment has no real existence. It only means : a part of the data segment which is initialized to zero and for this reason, doesn't need to be saved as data in the program file.
For the rest, one can assume there 3 kind of segments : Code, data and stack, with a special case for the heap, which is dynamically allocated in data segment, but this merely an implementation problem, which might vary according to the implementation.
However, for the purpose of simplification, one could consider as true that all static variables are allocated in the data segment, with just one specificity for data initialized to 0, which is in .bss (and thus, still in the data segment, but not imaged in the program file).
The only difference between global and local static, is it's visibility and its "name space" : you can have multiple static variables with the same name, local to different function and they will all be seen only in the function in which they were declared, but initialized at the beginning of the execution.
So on the contrary as automatic variables, which are allocated on the stack, each time the function is called - and thus, exists multiple times if the function is called recursively; static variable are shared by all simultaneous instances of the function. i.e. if a function calls itself and the called change the value of a static variable, the value will be changed for the caller too.

How to share functions symbols and addresses between projects in C?

I have two distinct projects which are running on the same target.
I want my second project to use few functions written in the first project at specific addresses.
To do that I thought I could use the symbol table from the first project in the second but it doesn't work. (I use arm-none-eabi toolchain and -nm on .elf file to generate symbols table).
I know that is possible but how can I do that ?
Well, the brute-force approach will very likely work:
int (*far_function)(int a, int b, int c) = (int(*)(int, int, int)) 0xfeedf00d;
far_function(1, 2, 3);
In other words, just make a function pointer and initialize it using the known address.
If the address isn't well-known (which it won't be if the other application is re-built and you haven't taken steps to "lock" the target function to a particular address), I would instead add meta-data at some fixed address, that contains the pointer. The other application would embed this data, thereby "exporting" the location of the interesting function.
The addresses yielded by nm are the location of the symbols, but on Cortex-M which used the Thumb2 instruction set, those addresses cannot be used directly for jump/call/branch execution - it is necessary to set the LSB of the address to 1 to indicate Thumb mode.
For example:
typedef void (*voidFn_void_t)(void) ;
uint32_t symbol_address = symbolLookup( "myfunction" ) ;
symbol_address |= 1 ; // Make Thumb mode address
((voidFn_void_t)symbol_address)() ; // Make call
The called function must even then have no dependencies on the execution environment since it is executing in the environment of the caller, not that of the project it was built in. You may get away with it if the execution environment is be identical but maintaining that may be a problem.

First executable statement in C

Is main really the first function or first executable statement in a C program? What if there is a global variable int a=0;?
I have always been taught that main is the starting point of a program. But what about global variable which is assigned some value and is an executable statement in my opinion?
The global variable and in general objects of static storage duration are initialized conceptually before program execution.
C11 (N1570) 5.1.2/1 Execution environments:
All objects with static storage duration shall be initialized (set to
their initial values) before program startup.
Given a hosted environment, function main is designated to be an required entry point, where program execution begins. It may be in one of two forms:
int main(void)
int main(int argc, char* argv[])
where parameters' names does not need to be the same as above (it is just a convention).
For a freestanding environment entry point is implementation-defined, that's why you can sometimes encounter void main() or any different form in C implementations for embedded devices.
C11 (N1570) 5.1.2.1/1 Freestanding environment:
In a freestanding environment (in which C program execution may take
place without any benefit of an operating system), the name and type
of the function called at program startup are implementation-defined.
main is not a starting point of the program. The starting point of the program is the entry point of the program, which is in most cases is transparent for a C programmer. Usually it is denoted by _start symbol, and defined in a startup code written in assembly or precompiled into a C runtime initialization library (like crt0.o). It is responsible for low-level initialization of stuff you are taking as given, like initializing the uninitialized static variables to zeros. After it is done, it is calling to a predefined symbol main, which is the main you know.
But what about global variable which is assigned some value and is an execuatable statement in my opinion
Your opinion is wrong.
In a global context, only a variable definition can exist, with an explicit initialization. All the executable statements (i.e, the assignment) have to reside inside a function.
To elaborate, in global context, you cannot have a statement like
int globalVar;
globalVar = 0; //error, assignement statement should be inside a function
however, the above would be perfectly valid inside a function, like
int main()
{
int localVar;
localVar = 0; //assignment is valid here.
Regarding the initialization, like
int globalVar = 0;
the initialization takes place before start of main(), so that's not really the part of execution, per se.
To elaborate the scenario of the initialization of a global variable, quoting the C11, chapter 6.2,
If the declarator or type specifier that declares the identifier
appears outside of any block or list of parameters, the identifier has file scope, which
terminates at the end of the translation unit.
and for flie scope variables,
If
the declaration of an identifier for an object has file scope and no storage-class specifier,
its linkage is external.
and for objects with external linkage,
An object whose identifier is declared without the storage-class specifier
_Thread_local, and either with external or internal linkage or with the storage-class
specifier static, has static storage duration. Its lifetime is the entire execution of the
program and its stored value is initialized only once, prior to program startup.
In a theoretical, C-standards-only program, it is.
In practice, it's usually more involved.
On Linux, AFAIK, the kernel loads your linked image into the a reserved address space and first calls the dynamic linker that the executable image specifies (unless the executable is compiled statically in which case there's no dynammic linking part).
The dynamic linker can load dependent libraries, such as the C library.
These libraries may register their own startup code, and so can you (on gcc mainly via __attribute__((constructorr))).
(User-supplied init code is especially needed for C++ where you need to run some startup code on C++ globals that have constructors.)
Then the linker calls the entry point of your image, which is _start by default (linkers allow you to choose a different name if you want to dig that deep) which is by default supplied by the C library. _start initializes the C library an continues by calling main.
In any case, simple global initializations such as int x = 42; should get compiled and linked into your executable and then get loaded by the OS (rather than your code) all at once, as part of loading the process image so there's no need for user-supplied initialization code for such variables.
If you use turbo c watch you would find that first global is declared and then execution of main starts that is at compile time data segment (giving memory to global and static variable) is initialized with 0.
So though assignment is not possible but declaration occurs at compile time.
Yes, when you declare a variable memory is allocated to it at compile time until and unless you don't use heap segment (allocating memory to pointer)i.e dynamic allocation which occurs at run time. But since global got its memory from data segment section of RAM variable is allocated memory at compile time.
Hope this helps.

global variables (memory binding)

Consider the following code:
#include<stdio.h>
int a=0;
int main()
{
//some code
}
I have learned that physical memory binding for static variables is done at loadtime.
When is the memory binding done for 'a'? And where is it stored, in the stack area or static area?
As has been pointed out, the general behavior is platform-dependent and thus there's no universally valid answer, but on most modern, "normal" systems, what happens is that the compiler generates a .data section in the resulting object file, containing the initialization values of the variables you define.
When you start the program, then, the program loader memory-maps that .data section directly from the executable file into the newly created process' virtual memory, available for your program to read from and write to (probably using some COW scheme to keep each process' copy private).
The term "memory binding" that you use is not part of the normal terminology, so I don't know exactly what you're asking, but perhaps this helps?
a is in static storage, since it is global. Only the local variables of a function are on the stack.
You can use the static keyword in a function to make the storage type of that variable static, too.
However, static on globals has a different meaning (since they are already of static storage type): the symbol for the variable is not exported to the object file, so that variable will not be directly accessible from other modules (.c files).
When compiling, the compiler knows "a" is a global variable and put "a" into the data section of the executable file. In that area, the executable file records the virtual address of "a". And when the executable is loaded into the operating system for running, and the "a" is used during running, the OS will map a physical address to the virtual address of "a". The rest code of the executable only needs to know the virtual address of "a" to access it, and the OS will do the mapping and go to the physical memory for reading/writing. And the virtual address of "a" is determined by the compiler during compiling.
For more knowledge, the book "Computer Systems: A Programmer's Perspective" is a good source.

An Example of complicated define in C

#define _FUID1(x) __attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (x);
I am trying to make sense of the about the above define. the _FUID(x) macro. This relates to program memory and has the attribute of the section defining in the code section memory area?
what does the above trying to accomplish?
The macro isn't doing anything interesting or complicated at all; it just outputs a declaration for int _FUID1, with its parameter as an initializer, and with an attributes list ahead of it.
As for what the attributes list means, look at the documentation for variable attributes in GCC. section puts the variable in a named section, which allows the linker to relocate it to a special address or do some other interesting thing to it, and space isn't documented, but space(prog) sounds like a directive to put a value into the program address space instead of the data address space on a Harvard-architecture machine.
I think this is hardware specific (some Microchip unit), it places a value, for example:
__attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (0xf1);
into unit id register 1 (__FUID1.sec), in the program flash to configure the hardware. See the pic documentation (for references to FUID) and MPLAB C30 manual (for description of memory spaces).

Resources