How a pointer initialization C statement in global space gets its assigned value during compile/link time? - c

The background of this question is to understand how the compiler/linker deals with the pointers when it is initialized in global space.
For e.g.
#include <stdio.h>
int a = 8;
int *p = &a;
int main(void) {
printf("Address of 'a' = %x", p);
return 0;
}
Executing the above code prints the exact address for a.
My question here is, during at which process (compile? or linker?) the pointer p gets address of a ? It would be nice if your explanation includes equivalent Assembly code of the above program and how the compiler and linker deals with pointer assignment int *p = &a; in global space.
P.S: I could find lot of examples when the pointer is declared and initialized in local scope but hardly for global space.
Thanks in advance!

A module is linked (often named crt0.o) along with your program code, which is responsible for setting up the environment for a C program. There will be global and static variables initialized which is executed before main is called.
The actual address of the global variables are determined by the operating system, when it loads an executable and performs the necessary relocations so that the new process can be executed.

To run a program, the system has to load it into RAM. So it creates one huge memory block containing the actual compiled instructions. This block usually also contains a "data section" which contains strings etc. If you declare a global variable, what compilers usually do is reserve space for that variable in such a data section (there's usually several, non-writable ones for strings, and writable ones for globals etc.).
Whenever you reference the global, it just records the offset from the current instruction to that global. So an instruction can just calculate [current instruction address] + [offset] to get at the global, wherever it ended up being loaded. Since space in the data section has been reserved in the file anyway, they can write any (constant) value in there you want, and it will get loaded with the rest of the code.
This is how it works in C, and is why C only allows constants. C++ works like Devolus wrote, where there is extra code that is run before main(). Effectively they rename the main function and give you a function that does the setup, then calls your main function. This allows C++ to call constructors.
There are also some optimizations like, if a global is initialized to zero, it usually just gets an offset in a "zero" section that doesn't exist in the file. The file just says: "After this code, I want 64 bytes of zeroes". That way, your file doesn't waste space on disk with hundreds of "empty" bytes.
It gets a tad more complicated if you have dynamically loaded libraries (dylibs or DLLs), where you have two segments loaded into separate memory blocks. Since neither knows where in RAM the other one ended up, the executable file contains a list of name -> offset mappings. When you load a library, the loader looks up the symbol in the (already loaded) other library and calculates the actual address at which e.g. the global is at, before main() is called (and before any of the constructors run).

Related

When variable will free in C program

In dynamic memory allocation, the memory which is used by malloc() and calloc() can be freed by using free().
In static memory allocation, when is the variable in main() freed from the program?
In a tutorial, I learned that when the whole programs is finished then after all variables are freed from RAM.
But someone tells me that when the program is long enough and variable is used early and after that if the variable has no use in the whole code then compiler will automatically free the variable before the end of program.
Can someone please clarify if both statements are correct or not?
The language guarantees that the lifetime of a static storage duration variable is the whole program. So it can be safely accessed at any time.
That being said, the standard only requires the code produced by a conformant compiler to behave as if all language rules were observed. That means that for optimization purposes a compiler is free to release the memory used by a static variable if it knows that the variable will not be used passed a point. Said differently it is neither required nor forbidden and as a programmer you should not even worry for that except for very specific low level optimization questions.
Example:
...
int main() {
static int arr[10240];
// uses arr
..
// no uses of arr passed this point - called B
...
}
The program shall behave as is the array arr existed from the beginning of the program until its end. But as long as arr is not used past point B, the compiler may reuse its memory, because there will be no change in any observable state.
This in only a low level optimization point allowed (but not required) by the as if rule.
As you can see in this question: Why does C not require a garbage collector?
Stephen C, the author of the answer says that:
The culture of these languages (C and C++) is to leave storage
management to the programmer.
Would the correct answer be when the process is terminated all memory used is freed? I think yes. C compiler does not look for garbage or non reachable variables, this is a progammer work.
But I have read that C or C++ garbage collectors exists like Java, they can be useful but remember, the implementation will be slower.
Again, I recommend to you read the question I have attached at the beggining for more information.
In a tutorial, I learned that ...
This tutorial is talking about what you as a programmer can see (unless you are debugging your program):
If you write a program, you can rely on the fact that you can read a static variable until the program has finished.
But someone tells me ...
This person is talking about what is really happening in the background (not visible to the programmer unless you are debugging) when you use an "optimizing" compiler.
... when is the variable ... freed from the program?
We have to distinguish between three types of variables:
Local variables
Local variables reside on the stack. When the function returns, all memory allocated on the stack during the execution of the function is automatically freed.
For this reason, local variables are freed when the function returns or even earlier.
In the following function:
void myFunction(void)
{
int a, b;
a = func1(); /* Line "1" */
func2();
func3(a); /* Line "2" */
b = func4(); /* Line "3" */
func5();
func6(b); /* Line "4" */
}
... an "optimizing" C compiler will detect that the variable a is not needed any longer when the variable b is set. For this reason, it may allocate only enough memory for one int variable. This is as if you only defined one variable (a_and_b) and the compiler replaced both a and b by a_and_b:
int a_and_b;
a_and_b = func1();
...
func6(a_and_b);
If you debug the program, the debugger will tell you that variable b does not exist when debugging lines "1"-"2"; and it will tell you that variable a does not exist when debugging lines "3"-"4".
For this reason, you might say that variable a is "freed" after line "2".
Global variables
Global variables reside in the .data or in the .bss section.
On modern desktop computers (on microcontrollers and on MS-DOS computers it is different) the operating system allocates this memory before the program is started and the operating system frees this memory after the program has finished.
Theoretically, it might be possible to optimize global variables the same way as local variables; this means: To use the same memory for two different global variables if one variable is set the first time after the other one is no longer needed.
However, this would be very complicated because a global variable can be accessed from any C source file in the project. For this reason, I doubt that many compilers and linkers have such a feature.
Static variables
Variables marked with the keyword static are normally also stored in .data and .bss sections - just like global ones.
However, because they can only be accessed from one C source file (or even from only one function), it is much easier to detect that such a variable is no longer used at a certain point in time.
For this reason, a compiler may optimize a static variable the same way as a local variable (so the memory is shared between two variables) or even replace a static variable by a local one (on the stack).
One Example:
int someFunction()
{
static int a;
int b;
a = func7();
b = func8(a); /* Line "5" */
return b + func9(); /* Line "6" */
}
In this case, the program behaves the same way if the variable a is not static. For this reason, we can replace static int by just int.
Now we see that a is no longer read after writing to b. We can replace the two variables a and b by one variable a_and_b.
If the "optimizing" compiler does this, you will see some message "variable does not exist" in the debugger if you stop your program in line "6" and want to read the value of a.
You may say that the variable a has been "freed" in line "5".

Organization of Virtual Memory in C

For each of the following, where does it appear to be stored in memory, and in what order: global variables, local variables, static local variables, function parameters, global constants, local constants, the functions themselves (and is main a special case?), dynamically allocated variables.
How will I evaluate this experimentally,i.e., using C code?
I know that
global variables -- data
static variables -- data
constant data types -- code
local variables(declared and defined in functions) -- stack
variables declared and defined in main function -- stack
pointers(ex: char *arr,int *arr) -- data or stack
dynamically allocated space(using malloc,calloc) -- heap
You could write some code to create all of the above, and then print out their addresses. For example:
void func(int a) {
int i = 0;
printf("local i address is %x\n", &i);
printf("parameter a address is %x\n", &a);
}
printf("func address is %x\n", (void *) &func);
note the function address is a bit tricky, you have to cast it a void* and when you take the address of a function you omit the (). Compare memory addresses and you will start to get a picture or where things are. Normally text (instructions) are at the bottom (closest to 0x0000) the heap is in the middle, and the stack starts at the top and grows down.
In theory
Pointers are no different from other variables as far as memory location is concerned.
Local variables and parameters might be allocated on the stack or directly in registers.
constant strings will be stored in a special data section, but basically the same kind of location as data.
numerical constants themselves will not be stored anywhere, they will be put into other variables or translated directly into CPU instructions.
for instance int a = 5; will store the constant 5 into the variable a (the actual memory is tied to the variable, not the constant), but a *= 5 will generate the code necessary to multiply a by the constant 5.
main is just a function like any other as far as memory location is concerned. A local main variable is no different from any other local variable, main code is located somewhere in code section like any other function, argc and argv are just parameters like any others (they are provided by the startup code that calls the main), etc.
code generation
Now if you want to see where the compiler and runtime put all these things, a possibility is to write a small program that defines a few of each, and ask the compiler to produce an assembly listing. You will then see how each element is stored.
For heap data, you will see calls to malloc, which is responsible for interfacing with the dynamic memory allocator.
For stack data, you will see strange references to stack pointers (the ebp register on x86 architectures), that will both be used for parameters and (automatic) local variables.
For global/static data, you will see labels named after your variables.
Constant strings will probably be labelled with an awful name, but you will notice they all go into a section (usually named bss) that will be linked next to data.
runtime addresses
Alternatively, you can run this program and ask it to print the addresses of each element. This, however, will not show you the register usage.
If you use a variable address, you will force the compiler to put it into memory, while it could have kept it into a register otherwise.
Note also that the memory organization is compiler and system dependent. The same code compiled with gcc and MSVC may have completely different addresses and elements in a completely different order.
Code optimizer is likely to do strange things too, so I advise to compile your sample code with all optimizations disabled first.
Looking at what the compiler does to gain size and/or speed might be interesting though.

Big array initializations issues in C

need your help in three questions (which regard more or less to the same subject I guess).
1) I have a LARGE array of int's which is initialized in the following manner:
int arr [] = {.....}; // allot of values !!
within the program there is only one function that "uses" this array for "read only" operations.
We have two options regarding this array:
a) Declare that array as a local array in the that function.
b) Declare it as a global array outside of this function.
How will the image file of the program will be modified for both these cases?
How will the program speed of execution will be modified ?
2) Regarding the TI MSP430 micro controller:
I have in my program a very large global array of C style string as follows:
char *arr [] = {"string 1","string2",.......}; // allot of strings
Usually, at the beginning of the main program I use a command to stop the "Watch Dog" timer.
As I see it , it is needed , for instance, to cases where there is , for example, a very large array that needs to be initialized ....so my question is:
Does it the case? (having the large array of "strings" ) ? When does the array gets initialized?
Will it matter if I declare it in a different manner?
3) How (if so) the answer to questions 1 & 2 will be different in C++?
Thanks allot ,
Guy.
"How will the image file of the program be modified for each of these cases?":
If you declare it as a local variable, then the total size of your executable will remain the same, but every time you call the function, a large amount of data-copy operations will take place before the rest of the function code is executed.
If you declare it as a global variable, then the total size of your executable will increase, but there will be no additional data-copy operations during runtime, as the image values will be hard-coded into the executable itself (the executable loading-time will increase, if that makes any difference).
So option #1 is better in terms of size, and option #2 is better in terms of performance.
HOWEVER, please note that in the first option, you will most likely have a stack-overflow during runtime, which will cause your program to perform a memory access violation and crash. In order to avoid it, you will have to increase the size of your stack, typically defined in your project's linker-command file (*.lcf). Increasing the size of the stack means increasing the size of the executable, hence option #1 is no better than option #2 in any aspect, leaving you with only one choice to take (declaring it as a global variable).
Another issue, is that you might wanna declare this array as const, for two reasons:
It will prevent runtime errors (and give you compilation errors instead), should you ever attempt to change values within this read-only array.
It will tell the linker to allocate this array in the RO section of your program, which is possibly mapped to an EPROM on the MSP430. If you choose not to use const, then the linker will allocate this array in the RW section of your program, which is probably mapped to the RAM. So this consideration is really a matter of - which memory you're shorter of, RAM or EPROM. If you're not sure, then you can check it in your project's linker-command file, or in the map file that is generated every time you build the project.
"When does the global array get initialized?":
It is initialized during compilation, and the actual values are hard-coded into the executable.
So there is no running-time involved here, and there is something else causing your watch-dog to perform a HW reset (my guess - some memory access violation which causes your program to crash).
NOTE:
An executable program is typically divided into three sections:
Code (read-only) section, which contains the code and all the constant variables.
Data (read-write) section, which contains all the non-constant global/static variables.
Stack (read-write) section, where all the other variables are allocated during runtime.
The size and base-address of each section, can be configured in the linker-settings of the project (or in the linker-command file of the project).
You have a third option; declare it within the function, but with the static keyword:
void func()
{
static int arr[] = {...};
...
}
This will set aside storage in a different memory segment (depends on the architecture and the executable file format; for ELF, it will use the .data segment), which will be initialized at program startup and held until the program terminates.
Advantages:
The array is allocated and initialzed once, at program startup, rather than every time you enter the function;
The array name is still local to the function, so it's not visible to the rest of the program;
The array size can be quite a bit larger than if allocated on the stack;
Disadvantages:
If the array is not truly read-only, but is updated by the function, then the function is no longer re-entrant;
Note that if the array is truly meant to be read-only, you might want to declare it
static const int arr[] = {...}

Questions about the Memory Layout of a C program

I have some questions about memory layout of C programs.
Text Segment
Here is my first question:
When I searched the text segment (or code segment) I read that "Text segment contain executable instructions". ut what are executable instructions for any function? Could you give some different examples?
I also read that "Text segment is sharable so that only a single copy needs to be in memory for frequently executed programs such as text editors, the C compiler, etc.", but I couldn't make a connection between C programs and "text editors".
What should I understand from this statement?
Initialized Data Segment
It is said that the "Initialized Data Segment" contains the global variables and static variables, but I also read that const char* string = "hello world" makes the string literal "hello world" to be stored in initialized read-only area and the character pointer variable string in initialized read-write area. char* string is stored read-only area or read-write area? Since both are written here I'm a bit confused.
Stack
From what I understand, the stack contains the local variables. Is this right?
The text segment contains the actual code of your program, i.e. the machine code emitted by your compiler. The idea of the last statement is that your C program and, say, a text editor is exactly the same thing; it's just machine code instructions executing from memory.
For example, we'll take the following code, and a hypothetical architecture I've just thought up now because I can't remember x86 assembly.
while(i != 10)
{
x -= 5;
i++;
}
This would translate to the following instructions
LOOP_START:
CMP eax, 10 # EAX contains i. Is it 10?
JZ LOOP_END # If it's 10, exit the loop
SUB ebx, 5 # Otherwise, subtract 5 from EBX (x)
ADD eax, 1 # And add 1 to i
JMP LOOP_START # And then go to the top of the loop.
LOOP_END:
# Do something else
These are low-level operations that your processor can understand. These would then be translated into binary machine code, which is then stored in memory. The actual data stored might be 5, 2, 7, 6, 4, 9, for example, given a mapping between operation and opcode that I just thought up. For more information on how this actually happens, look up the relationship between assembler and machine code.
-- Ninja-edit - if you take RBK's comment above, you can view the actual instructions which make up your application using objdump or a similar disassembler. There's one in Visual Studio somewhere, or you could use OllyDbg or IDA on Windows.
Because the actual instructions of your program should be read-only, the text segment doesn't need to be replicated for multiple runs of your program since it should always be the same.
As for your question on the data segment, char* string will actually be stored in the .bss segment, since it doesn't have an initializer. This is an area of memory that is cleared before your program runs (by the crt0 or equivalent) unless you give GCC a flag that I can't remember off-hand. The .bss segment is read-write.
Yes, the stack segment contains your local variables. In reality, it stores what are called "stack frames". One of these is created for each function you call, and they stack on top of each other. It contains stuff like the local variables, as you said, and other useful bits like the address that the function was called from, and other useful data so that when the function exits, the previous state can be reinstated. For what is actually contained on a stack frame, you need to delve into your architecture's ABI (Application Binary Interface).
The text segment is often also called "code" ("text" tends to be the Unix/linux name, other OS's doesn't necessarily use that name).
And it is shareable in the sense that if you run TWO processes that both execute the C-compiler, or you open the text editor in two different windows, both of those share the same "text" section - because it doesn't change during the running of the code (self-modifying code is not allowed in text-segment).
Initialized string value is stored in either "ro-data" or "text", depending on the compiler. And yes, it's not writeable.
If string is a global variable, it will end up in "initialized data", which will hold the address of the "hello world" message in the value of string. The const part is referring to the fact that the contents the pointer points at is constant, so we can actually change the pointer by string = "foo bar"; later in the code.
The stack is, indeed, used for local variables and, typically, the call stack (where the code returns to after it finishes the current function).
However, the actual layout of a program's in-memory image is left entirely up to the operating system, and often the program itself as well. Yet, conceptually we can think of two segments of memory for a running program[1].
Text or Code Segment - Contains compiled program code.
Data Segment - Contains data (global, static, and local) both initialized and uninitialized. Data segment can further be sub-categorized as follows:
2.1 Initialized Data Segments
2.2 Uninitialized Data Segments
2.3 Stack Segment
2.4 Heap Segment
Initialized data segment stores all global, static, constant, and external variables (declared with extern keyword) that are initialized beforehand.
Uninitialized data segment or .bss segment stores all uninitialized global, static, and external variables (declared with extern keyword).
Stack segment is used to store all local variables and is used for passing arguments to the functions along with the return address of the instruction which is to be executed after the function call is over.
Heap segment is also part of RAM where dynamically allocated variables are stored.
Coming to your first question - If you are aware of function pointers then you know that the function name returns the address of the function (which is the entry point for that function). These instructions are coded in assembly. Instruction set may vary from architecture to architecture.
Text or code section is shareable - If more than one running process belong to the same program then the common compiled code need not be loaded into memory separately. For example if you have opened two .doc documents then there will be two processes for them but definitely there will be some common code being used by both processes.
The stack segment is area where local variables are stored. By saying local variable means that all those variables which are declared in every function including main( ) in your C program.
When we call any function, stack frame is created and when function returns, stack frame is destroyed including all local variables of that particular function.
Stack frame contain some data like return address, arguments passed to it, local variables, and any other information needed by the invoked function.
A “stack pointer (SP)” keeps track of stack by each push & pop operation onto it, by adjusted stack pointer to next or previous address.
you can refer this link for practical info:- http://www.firmcodes.com/memory-layout-c-program-2/

Where are constant variables stored in C?

I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?
How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.
Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.
Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)
Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.
Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.
offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????
This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.
This is specific to Win32 systems.
It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};
Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.
It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.
Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.
Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.
I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.

Resources