Is main really the first function or first executable statement in a C program? What if there is a global variable int a=0;?
I have always been taught that main is the starting point of a program. But what about global variable which is assigned some value and is an executable statement in my opinion?
The global variable and in general objects of static storage duration are initialized conceptually before program execution.
C11 (N1570) 5.1.2/1 Execution environments:
All objects with static storage duration shall be initialized (set to
their initial values) before program startup.
Given a hosted environment, function main is designated to be an required entry point, where program execution begins. It may be in one of two forms:
int main(void)
int main(int argc, char* argv[])
where parameters' names does not need to be the same as above (it is just a convention).
For a freestanding environment entry point is implementation-defined, that's why you can sometimes encounter void main() or any different form in C implementations for embedded devices.
C11 (N1570) 5.1.2.1/1 Freestanding environment:
In a freestanding environment (in which C program execution may take
place without any benefit of an operating system), the name and type
of the function called at program startup are implementation-defined.
main is not a starting point of the program. The starting point of the program is the entry point of the program, which is in most cases is transparent for a C programmer. Usually it is denoted by _start symbol, and defined in a startup code written in assembly or precompiled into a C runtime initialization library (like crt0.o). It is responsible for low-level initialization of stuff you are taking as given, like initializing the uninitialized static variables to zeros. After it is done, it is calling to a predefined symbol main, which is the main you know.
But what about global variable which is assigned some value and is an execuatable statement in my opinion
Your opinion is wrong.
In a global context, only a variable definition can exist, with an explicit initialization. All the executable statements (i.e, the assignment) have to reside inside a function.
To elaborate, in global context, you cannot have a statement like
int globalVar;
globalVar = 0; //error, assignement statement should be inside a function
however, the above would be perfectly valid inside a function, like
int main()
{
int localVar;
localVar = 0; //assignment is valid here.
Regarding the initialization, like
int globalVar = 0;
the initialization takes place before start of main(), so that's not really the part of execution, per se.
To elaborate the scenario of the initialization of a global variable, quoting the C11, chapter 6.2,
If the declarator or type specifier that declares the identifier
appears outside of any block or list of parameters, the identifier has file scope, which
terminates at the end of the translation unit.
and for flie scope variables,
If
the declaration of an identifier for an object has file scope and no storage-class specifier,
its linkage is external.
and for objects with external linkage,
An object whose identifier is declared without the storage-class specifier
_Thread_local, and either with external or internal linkage or with the storage-class
specifier static, has static storage duration. Its lifetime is the entire execution of the
program and its stored value is initialized only once, prior to program startup.
In a theoretical, C-standards-only program, it is.
In practice, it's usually more involved.
On Linux, AFAIK, the kernel loads your linked image into the a reserved address space and first calls the dynamic linker that the executable image specifies (unless the executable is compiled statically in which case there's no dynammic linking part).
The dynamic linker can load dependent libraries, such as the C library.
These libraries may register their own startup code, and so can you (on gcc mainly via __attribute__((constructorr))).
(User-supplied init code is especially needed for C++ where you need to run some startup code on C++ globals that have constructors.)
Then the linker calls the entry point of your image, which is _start by default (linkers allow you to choose a different name if you want to dig that deep) which is by default supplied by the C library. _start initializes the C library an continues by calling main.
In any case, simple global initializations such as int x = 42; should get compiled and linked into your executable and then get loaded by the OS (rather than your code) all at once, as part of loading the process image so there's no need for user-supplied initialization code for such variables.
If you use turbo c watch you would find that first global is declared and then execution of main starts that is at compile time data segment (giving memory to global and static variable) is initialized with 0.
So though assignment is not possible but declaration occurs at compile time.
Yes, when you declare a variable memory is allocated to it at compile time until and unless you don't use heap segment (allocating memory to pointer)i.e dynamic allocation which occurs at run time. But since global got its memory from data segment section of RAM variable is allocated memory at compile time.
Hope this helps.
Related
I keep on seeing the same sentence that static variables are initialized only once, and I also saw a sentence stating that "when the block is entered for the first time".
Are local static variables initialized like other global variables - at the start of program execution? Or do local static variables differ from normal globals, and only get initialized once their function/block is called/reached?
C17 6.2.4 (3)
An object whose identifier is declared without the storage-class specifier _Thread_local, and either
with external or internal linkage or with the storage-class specifier static, has static storage duration.
Its lifetime is the entire execution of the program and its stored value is initialized only once, prior
to program startup.
However, remember the as-if rule. An implementation could wait to initialize the variable until the first call to the function, as a conforming program has no way to access its value before then, and so would not be able to tell the difference.
If you have an implementation with extensions or implementation-defined behavior that do provide a way to access the variable before the first call to the function, then such an implementation ought to document whether you would see the initialized value in such a case. In most cases I would expect the answer to be "yes".
The most common implementation I'm familiar with is to load the initial value from the executable, or to place it in a bss section that is zeroed at startup, just as is done for global or file-scope static variables.
Although implementation dependent, static variables - any scope - are initialized as the executable is loaded.
We all know the common example to how static variable work - a static variable is declared inside a function with some value (let's say 5), the function adds 1 to it, and in the next call to that function the variable will have the modified value (6 in my example).
How does that happen behind the scene? What makes the function ignore the variable declaration after the first call? How does the value persist in memory, given the stack frame of the function is "destroyed" after its call has finished?
static variables and other variables with static storage duration are stored in special segments outside the stack. Generally, the C standard doesn't mention how this is done other than that static storage duration variables are initialized before main() is called. However, the vast majority of real-world computers work as described below:
If you initialize a static storage duration variable with a value, then most systems store it in a segment called .data. If you don't initialize it, or explicitly initialize it to zero, it gets stored in another segment called .bss where everything is zero-initialized.
The tricky part to understand is that when we write code such as this:
void func (void)
{
static int foo = 5; // will get stored in .data
...
Then the line containing the initialization is not executed the first time the function is entered (as often taught in beginner classes) - it is not executed inside the function at all and it is always ignored during function execution.
Before main() is even called, the "C run-time libraries" (often called CRT) run various start-up code. This includes copying down values into .data and .bss. So the above line is actually executed before your program even starts.
So by the time func() is called for the first time, foo is already initialized. Any other changes to foo inside the function will happen in run-time, as with any other variable.
This example illustrates the various memory regions of a program. What gets allocated on the stack and the heap? gives a more generic explanation.
The variable isn't stored in the stack frame, it's stored in the same memory used for global variables. The only difference is that the scope of the variable name is the function where the variable is declared.
Quoting C11, chapter 6.2.4
An object whose identifier is declared [...] with the storage-class specifier static, has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.
In a typical implementation, the objects with static storage duration are stored either in the data segment or the BSS (based on whether initialized or not). So every function call does not create a new variable in the stack of the called function, as you might have expected. There's a single instance of the variable in memory which is accessed for each iteration.
#include<stdio.h>
static int a=5;
main()
{
static int a=15;
printf("%d\n",a);
}
So, how are both variables a stored in internal memory?
How are global and local variables with the same variable names stored internally in memory?
#include<stdio.h>
static int a=5;
int main()
{
printf("%p\n",(void *)&a);
static int a=15;
printf("%p\n",(void *)&a);
return 0;
}
Output for the upper program is
0x564e6b67a030
0x564e6b67a034
So you can see that both are stored in different addresses. As one is a global variable and other is local.
The names are only of interest to the human reader and the compiler/linker translating that code to machine executable code. The final object code resolves these to addresses and the names no longer exist.
The compiler distinguishes these the same way you do - by scope; when two identical symbols in the same namespace are in scope simultaneously, the symbol with the most restrictive scope is visible (i.e. may be accessed via the name).
For symbols with external linkage (in your example there are none other then main), the compiler retains the symbol name in order to resolve links between separately compiled modules. In the fully linked executable the symbol names cease to exist (except in debug build symbol meta-data).
The thing is the scope don't let them mess up. The first one has file scope and the other has block scope. (They are different variables - they are stored in separate memories.)
When you use it in the block - compiler checks whether this reference is resolved by anything in the same block. It gets one. And done.
And in case it is in some other function - if it doesn't find anything named a - the search ends in file scope where it finds the name a. That is where the story ends.
Both being static their storage duration is same. They live till the program exists. But their scope is different. If the scope was same too - compiler would have shown you error message.
Here if you compile with -Wshadow option - it will warn you about shadowing a variable. You shadowed the outer a with the inner on that block. That's it.
The facetious answer is that they are stored in different places.
Remember that the names of variables do not (normally) form part of the compiled program, so the compiler just follows the normal rules of variable shadowing. So in your case your print function (that's not a standard C function by the way - did you mean printf?) outputs the a declared in main. The fact that you've used the same name will not bother the compiler at all.
Finally C provides no way of accessing the global scoped a once the other declaration is encountered in main as it's static. (It is wasn't static you could use extern.) See How can I access a shadowed global variable in C?
Consider following C program (see live demo here).
const int main = 195;
I know that in the real world no programmer writes code like this, because it serves no useful purpose and doesn't make any sense. But when I remove the const keyword from above the program it immediately results in a segmentation fault. Why? I am eager to know the reason behind this.
GCC 4.8.2 gives following warning when compiling it.
warning: 'main' is usually a function [-Wmain]
const int main = 195;
^
Why does the presence and absence of const keyword make a difference here in the behavior of the program?
Observe how the value 195 corresponds to the ret (return from function) instruction on 8086 compatibles. This definition of main thus behaves as if you defined it as int main() {} when executed.
On some platforms, const data is loaded into an executable but not writeable memory region whereas mutable data (i.e. data not qualified const) is loaded into a writeable but not executable memory region. For this reason, the program “works” when you declare main as const but not when you leave off the const qualifier.
Traditionally, binaries contained three segments:
The text segment is (if supported by the architecture) write-protected and executable, and contains executable code, variables of static storage duration qualified const, and string literals
The data segment is writeable and cannot be executed. It contains variables not qualified const with static storage duration and (at runtime) objects with allocated storage duration
The bss segment is similar to the data segment but is initialized to all zeroes. It contains variables of static storage duration not qualified const that have been declared without an initializer
The stack segment is not present in the binary and contains variables with automatic storage duration
Removing the const qualifier from the variable main causes it to be moved from the text to the data segment, which isn't executable, causing the segmentation violation you observe.
Modern platforms often have further segments (e.g. a rodata segment for data that is neither writeable nor executable) so please don't take this as an accurate description of your platform without consulting platform-specific documentation.
Please understand that not making main a function is usually incorrect, although technically a platform could allow main to be declared as a variable, cf. ISO 9899:2011 §5.1.2.2.1 ¶1, emphasis mine:
1 The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters (...) or with two parameters (...) or equivalent; or in some other implementation-defined manner.
In C, main at global scope is almost always a function.
To use main as a variable at global scope makes the behaviour of the program undefined.
(It just might be the case that when you write const the compiler optimises out the variable to a constant and so your program behaviour is different. But the program behaviour is still undefined).
I know when a program is run, the main() function is executed first. But when does the initialization of global variables declared outside the main() happens? I mean if I declare a variable like this:
unsigned long current_time = millis();
void main() {
while () {
//some code using the current_time global variable
}
}
Here, the exact time when the global variable initializes is important. Please tell what happens in this context.
Since you didn't define the language you're talking about, I assumed it to be C++.
In computer programming, a global variable is a variable that is accessible in every scope (unless shadowed). Interaction mechanisms with global variables are called global environment (see also global state) mechanisms. The global environment paradigm is contrasted with the local environment paradigm, where all variables are local with no shared memory (and therefore all interactions can be reconducted to message passing). Wikipedia.
In principle, a variable defined outside any function (that is, global, namespace, and class static variables) is initialized before main() is invoked. Such nonlocal variables in a translation unit are initialized in their declaration order (§10.4.9). If such a variable has no explicit initializer, it is by default initialized to the default for its type (§10.4.2). The default initializer value for built-in types and enumerations is 0. [...] There is no guaranteed order of initialization of global variables in different translation units. Consequently, it is unwise to create order dependencies between initializers of global variables in different compilation units. In addition, it is not possible to catch an exception thrown by the initializer of a global variable (§14.7). It is generally best to minimize the use of global variables and in particular to limit the use of global variables requiring complicated initialization. See.
(Quick answer: The C standard doesn't support this kind of initialization; you'll have to consult your compiler's documentation.)
Now that we know the language is C, we can see what the standard has to say about it.
C99 6.7.8 paragraph 4:
All the expressions in an initializer for an object that has static
storage duration shall be constant expressions or string literals.
And the new 2011 standard (at least the draft I has) says:
All the expressions in an initializer for an object that has static
storage duration shall be constant expressions or string literals.
So initializing a static object (e.g., a global such as your current_time) with a function call is a constraint violation. A compiler can reject it, or it can accept it with a warning and do whatever it likes if it provides an language extension.
The C standard doesn't say when the initialization occurs, because it doesn't permit that kind of initialization. Basically none of your code can execute before the main() function starts executing.
Apparently your compiler permits this as an extension (assuming you've actually compiled this code). You'll have to consult your compiler's documentation to find out what the semantics are.
(Normally main is declared as int main(void) or int main(int argc, char *argv[]) or equivalent, or in some implementation-defined manner. In many cases void main() indicates a programmer who's learned C from a poorly written book, of which there are far too many. But this applies only to hosted implementations. Freestanding implementations, typically for embedded systems, can define the program's entry point any way they like. Since you're targeting the Arduino, you're probably using a freestanding implementation, and you should declare main() however the compiler's documentation tells you to.)