I wrote the following 3 C source files to test the extern keyword
in C :
main.c
#include<stdio.h>
extern int var;
int main()
{
var = 10;
printf("%d %p",var,&var);
var = 20;
}
main2.c
#include<stdio.h>
extern int var;
int main()
{
printf("%d %p",var,&var);
}
other.c
int var;
I compiled the two files main.c and main2.c separately but also linking
the other.c file with each
On running the first program i got the following output :
10 0x8049660
But on running the second after that i got this :
0 0x8049660
It is evident that the two var's point to the same address which is the
point of using the extern keyword.
But why does it again get initialized to 0 ?
Also if I run the second program without running the first I get
the same output.
Why is it so ?
Keyword extern is not for sharing variables among different programs or even not among consecutive runs of the same program. A variable declared as extern just means that this variable is defined in another translation unit of the same program, and the value of this variable will be shared only in this program run.
Note that other.c and main.c are two translation units that get linked to the same program, let's say main.exe. Within one run of main.exe, the value of var will be shared among other.c and main.c, and the address will be the same. For a second run of main.exe, you may receive different addresses, i.e. a different value for &var compared to that of the first run.
Related
When you have multiple C files, say main.c and process.c I was trying to understand where variables declared outside of functions in both cases are stored.
// this is main.c
#include <stdio.h>
#include "process.h"
int foo = 1;
void main() {
int count = get_counter();
}
// this is process.c
#include <stdio.h>
int counter = 0;
int get_counter() {
return counter;
{
So when you have two c files, your main.c and a process.c, you can call get_counter() in main.c and it will return the value from the process.c file. What I was trying to understand is where the compiler, or how it stores int foo in main.c and int count in process.c? Is this part of some data storage section? It is not on the stack right? It also seems having a separate process.c file makes it so it is not a global variable.
I have been really trying to understand how variables scope is handled and can get a little tricky for me. Does the #include "process.h" essentially compile as if you had the functions and their prototypes in the main.c above the rest of the code? To me that would make the int counter global so I know I am confusing something.
Thank you for taking your time to read this.
That's a function of the executable file format, not the C language itself. For ELF (*nix and similar systems) and PE/COFF (Windows and similar), globals or other objects with static storage duration will be stored in either the .bss or .data sections depending on whether they're initialized or not. This is space allocated from within the program's binary image itself (not taken from the stack or heap).
Other executable file formats may use different section names.
Taking the following program as an example:
// myprogram.c
#include<stdio.h>
int a, b;
int main(void)
{
printf("A: %d\n", a);
printf("B: %d\n", b);
}
// friendsprogram.c
int a=1;
static int b=2;
$ gcc myprogram.c friendsprogram.c -o out; ./out
A: 1
B: 0
How would the translation units be classified in the above? And how what that be different than just the contents of the file "myprogram.c" and "friendsprogram.c"? Does the translation unit ever depend on the command that is issued to the compiler? For example, If I change the command to just:
$ gcc myprogram.c -o out; ./out
My output becomes:
A: 0
B: 0
A translation unit is a source file along with all of its included headers that is compiled as a single unit.
In this example, myprogram.c along with the header stdio.h is one translation unit. The file friendsprogram.c is another translation unit.
Note that this doesn't change when you compile like this:
gcc myprogram.c friendsprogram.c -o out
Because this command line combines compiling and linking into a single step. A temporary object file is created for myprogram.c and another for friendsprogram.c, then those object files are linked to create the file "out".
What's happening is a side effect of old lenient compiler/linker behavior and a so-called "common" section.
int a;
The C spec says this global-scope variable is initialized to zero. You would expect this to go into the .bss (zero-initialized data) section of the executable.
But in GCC <10, the variable is put into the "common" section when that file (translation unit) is compiled.
int a=1;
Now you've provided an initialization, and this variable will go into the .data section.
But when the linker links these two object files together, rather than issue a "multiple definitions" (for the same name) error, it will do something controversial, and merge them into one variable, due to the common section semantics.
By passing -fno-common, or using GCC >= 10, the common section is not used, and the linker will issue an error and refuse to link your program.
So what should you do? Simple: provide only one definition for any name.
If you really want to use global variables (undesirable in general), and you want to put them in a separate translation unit (weird), use extern in your other files:
data.h
// Declaration: Tells everyone that 'a' exists somewhere
extern int a;
data.c
#include "data.h"
// Definition: defines the variable and its initial value
int a = 42;
main.c
#include <stdio.h>
#include "data.h"
int main(void)
{
printf("a = %d\n", a);
}
This question already has answers here:
What happens if I define the same variable in each of two .c files without using "extern"?
(3 answers)
Closed 2 years ago.
From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.
However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:
a.c
#include <stdio.h>
#include "a.h"
int main()
{
p1.x = 5;
p1.x = 4;
com = 6;
change();
printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
return 0;
}
b.c
#include "a.h"
void change(void)
{
p1.x = 7;
p1.y = 9;
com = 1;
}
a.h
typedef struct coord{
int x;
int y;
} coord;
coord p1;
int com;
void change(void);
Makefile
all:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
gcc a.o b.o -o run.out
clean:
rm a.o b.o run.out
Output
p1 = 7, 9
com = 1
How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...
This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)
AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.
It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").
Example of data symbol categories:
#!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o
Output:
0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol
Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.
But when those redefinitions are in the common category, the linker will merge them into one.
Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.
In other cases you get symbol redefinition errors.
Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).
clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.
(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)
That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.
A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.
As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.
Let's say you have a module A.c with some variable defined in it:
A.c
int I_am_a_global_variable; /* you can even initialize it */
well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):
B.c
extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */
As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.
A.h
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.
So now B.c is:
B.c
#include "A.h"
void some_function() {
/* ... */
I_am_a_global_variable = /* some complicated expression */;
}
this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)
A.c
#include "A.h" /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27;
In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and the final definition (that is included in A.c):
int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */
are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).
Why I say that you will have double definition errors when linking?
Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:
#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
* compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */
then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.
The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.
I have these two different program where I want to access the static variable declared in program1 from program2.
Program1. (
/* file a.c */)
#include<stdio.h>
static int a = 100; /* global static variable not visible outside this file.*/
int *b = &a; /* global int pointer, pointing to global static*/
Program2
#include<stdio.h>
/* file b.c */
extern int *b; /* only declaration, b is defined in other file.*/
int main()
{
printf("%d\n",*b); /* dereferencing b will give the value of variable a in file a.c */
return 0;
}
While I compile program1 , gcc a.c , no compilation error, but while I compile program2 ( gcc b.c) I am getting compilation error .
test_b.c:(.text+0x7): undefined reference to `b'
collect2: error: ld returned 1 exit status
Why there is compile error ? Here is the link of program static
Thanks in advance.
EDIT 1:
My intention to use static variable from other program. I thought every .c program must have main() function and only .h program have declaration , I am wrong at that point. So I remove main() function from a.c program and instead of compiling two different program separately , now I compile only once using gcc a.c b.c as per suggestion of Filip. Now it's working fine. Thanks all of you.
You have to link against a.c while compiling b.c:
gcc a.c b.c
You can't expect the linker to magically find the C file where b is defined. extern means it is defined elsewhere - you have to say where. By compiling and linking with a.c, the linker can now find a declaration for b.
Of course, you can't have 2 main() functions.
Well, your code already said it. b.cpp only has a declaration, not a definition, of the symbol in question.
Since these are clearly meant to be source files from two separate projects, I would suggest moving your definition to its own .cpp file which may then be shared between the two projects.
$ gcc a.c myIntPointerIsHere.c
$ gcc b.c myIntPointerIsHere.c
However, there are clearer ways to share code between two different projects.
The both modules contain the definition of main. It seems that the compiler did not include the first module in your project. Otherwise I think it would issue an error that main was redefined.
// File: foo.c
static int var;
void foo()
{
var++;
}
// end of file foo.c
// File bar.c:
static int var;
void bar()
{
var++;
}
// end of file bar.c
// file main.c
static int var;
void main()
{
foo();
bar();
printf("%d", var);
}
// end of file main.c
Question: Will the above program compile ? If so what will be the result ?
I tested the code and found it couldn't be compiled. I try to use extern in main.c to use the function foo() and bar() but it still couldn't be compiled.
main.c has a few minor problems - it should be something like this:
#include <stdio.h>
static int var;
extern void foo();
extern void bar();
int main(void)
{
foo();
bar();
printf("%d\n", var);
return 0;
}
It should build OK like this:
$ gcc -Wall main.c foo.c bar.c -o main
and the result should be:
$ ./main
0
I would expect it to compile and print 0 (though if you want to compile it as C++, you'll have to add declarations for foo() and bar(), and in either C or C++, you might get a warning that main() should really return an int).
Since var is defined as static in each of the three files, you really have three separate variables that all happen to have the same name. Perhaps it's easiest to think of each file as defining a struct that contains its static variables. What you've done is called foo(), which increments foo.var. Then you've called bar(), which increments bar.var. Then you've printed out main.var, which was initialized to zero, and never modified.
This compiles for me (although with a warning about the return type from main()).
The result, in terms of what main() will print, is undetermined because you have not initialized the value of var in main.c. The most likely result when the compiler is invoked without optimizations is zero because the OS will have zeroed the physical memory supplied to the process for data storage (the OS does this to avoid leaking confidential data between processes).
The static qualifier to the var variable definitions means that the variable is not visible outside the source file it is defined in. It also means that each of the three var variables gets its own storage location.
If we add a printf("[module name] *var=%p\n", &var) to foo(), bar() and main() respectively to print the address of the memory location that stores those three variables you should get something like this:-
foo() *var=0x8049620
bar() *var=0x8049624
main() *var=0x8049628
Note that each variable gets its own storage location. The code in each source file will access the version of var that is specific to to that source file. Using static like this in .c files is typically used to implement the concept of information hiding in C.
The code (as it is) will compile and result will be 0 (as Jerry explains) because static variable will have file scope.
But, if you include foo.c and bar.c in main.c, and compile as
gcc main.c
then the result will be 2 because there will only be one global variable var.