Why can non-extern be in .h files in C/C++? - c

Take this file as example,there are many non-extern structures like:
struct list_head source_list;
How can it work when this header file is included by more than one compile units?
There should be error reporting that the same symbol is defined twice,right?

Technically there should, but that usage has been around for years and is impossible to eradicate (it's been tried; every so often some vendor decides to make it an error, and reverts after the first hundred or so bug reports). Pedantically, the .h file should declare it extern and one .c/.cpp file should define it.
Briefly, when you don't specify the linkage (static, extern, etc.) of a top level variable, it's declared as "common". At link time, if all references to that variable are the same size (and type, when available) then it is allocated once and all references are made to point to it. If the linker finds different sizes / types / linkages for the same variable, it throws an error.
EDIT: this is clearly confounding people. Here:
jinx:1714 Z$ cat foo.h
int foo;
extern void bar();
jinx:1715 Z$ cat foo.c
#include "foo.h"
int
main(int argc, char **argv)
{
bar();
return 0;
}
jinx:1716 Z$ cat bar.c
#include "foo.h"
void
bar(void)
{
return;
}
jinx:1717 Z$ gcc -Wall foo.c bar.c -o foo
jinx:1718 Z$ ./foo
jinx:1719 Z$ _
Note the complete lack of errors about int foo being multiply defined. This is what I've been trying to say.

The term for this is "tentative definition":
A declaration of an identifier for an
object that has file scope without an
initializer, and without a
storage-class specifier or with the
storage-class specifier static,
constitutes a
tentative definition. If a translation unit contains one or more
tentative definitions for an
identifier, and the translation unit contains no external
definition for that identifier, then
the behavior is exactly as if the translation unit contains a
file scope declaration of that
identifier, with the composite type as of the end of the
translation unit, with an initializer
equal to 0.
So this is well defined in C (but often frowned upon).

This struct list_head source_list; fields are declared inside other structures so they are not symbols.
Declarations of other (top level) structures have distinct names so it's ok too.
edit
Note that all variables it this header are really marked with extern.

There should be an extern indeed. However, there's no explicit definition of that variable, so the compiler marks it as extern for you.
You would get a linker error if you had
struct list_head source_list = { 0 };
...since this does define the symbol once per translation unit (and hence the linker complains).

Related

How Linkers Resolve Multiply Defined Global Symbols in C

My Textbook says that:
"Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.Given a strong symbol and multiple weak symbols, choose the strong symbol"
So I create two files to see:
file1.c:
int number;
int main(int argc, char *argv[])
{
printf("%d",number);
return 0;
}
file2.c (just one line):
int number = 2018;
and I ran gcc -Wall -o program file1.c file2.c and the output is 0, which I can understand before I study linker ('number' in file1.c has been initialized to 0), but after I study how linker works, I start to wonder why the output is not 2018, since the 'number' in file2 is strong symbol(initialized global variable) and the 'number' in file1 is weak symbol, so the linker will choose the strong one whose value is 2018, so why the linker choose the weak symbol?
The int number; in file1.c is not uninitialized. Note that it is declared at file scope, it is declared without an initializer, and it is declared without a storage-class specifier (particularly no extern or static). Then C 2018 6.9.2 2 says:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So, int number; in file1.c is the same as int number = 0;. It is initialized.
An issue with the text you quote is that it is describing the linker using terminology for that linker, and this is different terminology than the the C standard uses. The C standard does not have any “global” variables or “strong” or “weak” symbols.
The number in file2.c is global, but still locally scoped just to that file. If you want file1.c to use number from file2.c you need to mark it as extern like this:
extern int number;
int main(int argc, char *argv[])
{
printf("%d",number);
return 0;
}

C --> headers & variables

Can the headers files in C include variables?
I am a beginner in programming; started with C, and I know the importance of precision especially in the first steps of the learning process
Including files is done by the preprocessor before even attempting to compile the code and it simply does text replacement – it puts the contents of the included file in the current unit that is going to be passed to the compiler. The compiler then sees the concatenated output and no #include directives at all.
With that said, technically you can include anything that is valid C code.
The good practice, however, is that only type definitions, #defines, function declarations (not definitions) and data declarations (not definitions) should be in a header. A function declaration is also called a prototype and merely specifies the function signature (its return type, name and parameters). Data declarations look very similar to data definitions, but have an extern storage class specifier and cannot be initialised:
extern int a; // declares "a" but does not define it
extern int a = 0; // defines "a" (initialisation requested), the extern is redundant
int a; // a tentative definition (no initialisation but "a" is zeroed)
Why is defining functions and data in a header file frowned upon? Because at link time, different units that have included the same header files will have the same symbols defined and the linker will see duplicate definitions of some symbols.
Also consider that a header is a kind of a "public" interface for the rest of the project (world?) and not every function that is defined in the source file needs to have a declaration there. It is perfectly fine to have internal types and static functions and data in the source file that never get exposed to the outside world.
Basically in header files, we can declare variables point to be noted only declaration is allowed there, do not define
let me clear.
int a=10; // definition
extern int a; //declaration - it can be used in another file if u include this header file.
you can also define the macro and declare the functions in header file.
Yes, header files may include variable declarations, but you generally don't want to do that because it will introduce maintenance headaches over time, especially as your code gets larger and more complex. Ideally, functions should share information through parameters and return values, not by using such "global" data items.
There are times when you can't avoid it; I haven't done any embedded programming, but my understanding is that using globals is fairly common in that domain due to space and performance constraints.
Ideally, headers should be limited to the following:
Macro definitions
Type definitions
Function declarations
But suppose you do create a header file with a variable declaration, like so:
/**
* foo.h
*/
int foo;
and you have several source files that all include that header1:
/**
* bar.c
*/
#include "foo.h"
void bar( void )
{
printf( "foo = %d\n", foo );
}
/**
* blurga.c
*/
#include "foo.h"
void blurga( void )
{
foo = 10;
}
/**
* main.c
*/
#include "foo.h"
int main( void )
{
foo = 5;
blurga();
bar();
return 0;
}
Each file will contain a declaration for foo at file scope (outside of any function). Now you compile each file separately
gcc -c bar.c
gcc -c blurga.c
gcc -c main.c
giving you three object files - bar.o, blurga.o, and main.o. Each of these object files will have their own unique copy of the foo variable. However, when we build them into a single executable with
gcc -o foo main.o bar.o blurga.o
the linker is smart enough to realize that those separate declarations of foo are meant to refer to the same object (the identifier foo has external linkage across those translation units). So the foo that main initializes to 5 is the same foo that blurga sets to 10, which is the same foo that bar prints out.
However, if you change the declaration of foo to
static int foo;
in foo.h and rebuild your files, then those separate declarations will not refer to the same object; they will remain three separate and distinct objects, such that the foo that main initializes is not the same foo that blurga sets to 10, which is not the same foo that bar prints out (foo has internal linkage within each translation unit).
If you must use a global variable between several translation units, my preferred style is to declare the variable in the header file as extern2
/**
* foo.h
*/
extern int foo;
and then define it in a corresponding .c file
/**
* foo.c
*/
int foo;
so only a single object file creates an instance of foo and it's crystal clear that you intend for other translation units to make use of it. The declaration in the header file isn't necessary for the variable to be shared (the foo identifier has external linkage by simple virtue of being declared in foo.c outside of any function and without the static keyword), but without it nobody else can be sure if you meant for it to be visible or if you just got sloppy.
Edit
Note that headers don't have to be included at the top of a file; you can be perverse and put an #include directive within a function body
void bar( void )
{
#include "foo.h"
// do stuff with foo
}
such that int foo; will be local to the function, although that will likely earn you a beating from your fellow programmers. I got to maintain code where somebody did that, and after 25 years it still gives me nightmares.
1. Please don't write code like this; it's only to illustrate the concept of linkage.
2. The extern keyword tells the compiler that the object the identifier refers to is defined somewhere else.

Multiple declaration of same struct variable ok?

Here's the setup:
foo.h:
typedef struct my_struct {
int a;
} my_struct;
const my_struct my_struct1;
my_struct my_struct2;
foo.c:
#include "foo.h"
const my_struct my_struct1 = { .a = 1 };
my_struct my_struct2 = { .a = 2 };
main.c:
#include "foo.h"
#include <stdio.h>
int main() {
printf("%d %d\n", my_struct1.a, my_struct2.a);
return 0;
}
Which when compiled with gcc main.c foo.c prints 1 2. The question is, haven't I declared multiple variables with the same name (the two sets of structs)?
edit: Thanks for the reply all. I see I may have posed a slightly confusing question. Originally I thought const may have implied some sort of extern declaration (which makes no sense, I know), which is why I thought to create my_struct2. Much to my surprise, it still works.
According to the C Standard (6.9.2 External object definitions)
1 If the declaration of an identifier for an object has file scope and
an initializer, the declaration is an external definition for the
identifier.
2 A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
Thus in your example these declarations of identifiers in header foo.h itself included in module foo.c
const my_struct my_struct1;
my_struct my_struct2;
are not their external definitions because they do not have initializers.
These objects are externally defined only in module foo.c itself
const my_struct my_struct1 = { .a = 1 };
my_struct my_struct2 = { .a = 2 };
where they are explicitly initialized.
In module main.c these external declarations constitute tentative definitions and zero initialized.
According to the Appendix J
J.5.11 Multiple external definitions 1 There may be more than one
external definition for the identifier of an object, with or without
the explicit use of the keyword extern; if the definitions disagree,
or more than one is initialized, the behavior is undefined (6.9.2).
Thus the behaviour of the program is undefined unless your compiler supports the extension described in the Appendix J.
You should set specifier extern for these identifiers in header foo.h that the declarations in main.c would not constitute tentative definitions.
The one declaration rule is applied to identifiers that have no linkage. (6.7 Declarations)
3 If an identifier has no linkage, there shall be no more than one
declaration of the identifier (in a declarator or type specifier)
with the same scope and in the same name space, except that a typedef
name can be redefined to denote the same type as it currently does and
tags may be redeclared as specified in 6.7.2.3.
In your example all identifiers have external linkage. So they may be declared several times but defined only once.
const my_struct my_struct1;
here my_struct1 is a constant object of type my_struct. I hope you know what is a constant variable.
my_struct my_struct2;
Here my_struct2 is a object of type my-struct.
So to sum it up these are 2 different objects and have separate memory allocated for them so there is no mutiple definitions for the same object you are defining 2 different objects which is totally fine.

Extern makes no difference

I am defining a global variable in test2.h
#ifndef TEST2_H
#define TEST2_H
int test_var;
void use_it(void);
#endif
and defining it again in two different files, test.c
#include <stdio.h>
#include "test2.h"
int test_var;
int main() {
printf("The test_var is: %d\n", ++test_var); // prints 1
use_it(); // prints 2
}
and test2.c
#include <stdio.h>
#include "test2.h"
int test_var;
void use_it() {
printf("The test_var is: %d", ++test_var);
}
I replaced the definition of test_var with extern int test_var and got the same result. That is, in both cases both files, test.c and test2.c have access to the global variable test_var. I was under the impression that without extern, each file would have their own copy of test_var. Observation suggests that this is not the case. So when does extern actually do something?
You end up with two copies of test_var and this is undefined behavior.
(C99, 6.9p5) "If an identifier declared with external linkage is used in an expression (other than as part of the operandof a sizeof operator whose result is an integer constant),
somewhere in the entire program there shall be exactly one external
definition for the identifier; otherwise, there shall be no more than one"
In your case the linker may be nice with you and merges the two symbols but this is still not portable and is undefined behavior. If you are using the GNU linker, you can use --warn-common to get the warning (and --fatal-warnings if you want an error).
To fix your issue, put the extern specifier in the declaration of test_var in the .h file and remove one of the definition of test_var (for example the one in test.c file).
This is undefined behavior, as others have noted, but what you are seeing here is the common extension described in appendix J.5.11 of the C99 spec, where multiple external definitions in different compilation units are allowed as long as none or only one of them are initialized and the types of all of them are the same.
In this case, with the extension, the definitions will be combined into a single definition at link time.
You also appear to be confused by the fact that the extern keyword, when used at the global scope, has nothing to do with extern linkage for declarations and definitions. ALL declarations at the global scope have extern linkage unless they have a static or inline keyword. The extern keyword serves to make such a declaration just a declaration. Without the extern keyword a global variable declaration is also a definition, and that is the only effect of the extern keyword in the global scope.
If you have the same variable declared in 2 diferent files as int test_var for example:
file1.c
int test_var;
file2.c
int test_var;
both variables will have their own memory adress, so they are two diferent variables with the same name.
if you have, two variables declared in 2 diferent files declared as extern int test_var, for example:
file1.c
extern int test_var; //this is a mistake
file2.c
extern int test_var; //this is a mistake
the compiler will return an error when you try to do something with that variables because with the keyword externyou are not reserving any space for that variable, you only use that keyword to say that a variable is already defined (commonly in another file).
The point is to unsderstand that a global variable is defined once with a sentence like int test_var (when you define a variable the compiler reserve space for it) and it's declared in every other file that need access to it with extern int test_var (when you declare a variable with the extern keyword you saying the compiler that variable is already defined and you want to have access to it in the file you are declaring it).
So an example of how to use a global variable wil be:
file1.c
int test_var; //definition
void useit(void);
int main () {
test_var=7;
useit();
return 0;
}
file2.c
#include <stdio.h>
void useit (void) {
extern int test_var; //declaration
printf ("the variable value is %d",test_var);
}
To answer your question:
extern int test_var; is a declaration. This announces that "Somewhere, there should exist test_var . We don't know where that is yet, but by the time we finish compiling and linking, we will find it in exactly one place".
So there has to be exactly one definition to match. A definition serves as a declaration, and also says "Here is the storage for test_var".
Also, test_var could either have internal linkage or external linkage. The default behaviour is external linkage. If you provide more than one definition for a variable of external linkage, it is undefined behaviour.
Internal linkage is indicated by including static in the declaration. You can have as many definitions as you want of a static variable (so long as only one per file has an initializer).
Summing up, we have:
extern int test_var; // declaration, external linkage
static int test_var; // declaration, definition, internal linkage
int test_var; // declaration, definition, external linkage
Note: the last two cases are actually tentative definitions: this is a thing that C has but C++ doesn't; the way it works is that it behaves like a declaration at first; but then , for each unit, if there is no later definition then this actually serves as a definition.
So you can write in C:
int test_var;
// stuff
int test_var = 5;
If you are using gcc and possibly some other compilers, you just stumbled upon some Unix tradition. Namely that uninitialized global variables are placed in the common block where multiple definitions of the same variable are merged during linking.
gcc can be told to put uninitialized global variables into the data section with the option -fno-common. With this, the linker will report an error when there are multiple definitions of the same variable name.

C - Static variable masking global variable

Have a look at the following code snippet...
File1.h
void somefunc(int);
File1.c
#include "File1.h"
extern int var;
void somefunc(int x)
{
......
var ++;
etc, etc,
....
return;
}
File2.h
static int var;
void someotherfunc(int);
File2.c
#include "File2.h"
#include "File1.h"
int var;
void someotherfunc(int z)
{
z = etc etc;
var --;
......
somefunc(z);
.....
return;
}
The above four files compile without any problem.
The problem occurs when i try to initialize the variable 'var'.
If the 'var' is initialized in the File2.c where it is a global variable, the code compiles without any problems. But when i try to initialize the static variable in File2.h, the compiler throws an error saying 'the variable 'var' in File1.c is undefined'. Can someone please tell what is happening here.
I was just trying to understand the concept of static variables and came upon this confusion. Any help would be appreciated.
static int var;
This gives var internal linkage in the File2.c translation unit, whatever might follow (yes, even if the extern declaration follows).
So if the first declaration seen is static int var, in that translation unit var will forever be internal, thus inaccessible to other translation units.
6.2.2-4
For an identifier declared with the storage-class specifier extern
[File1.h] in a scope in which a prior declaration of that
identifier is visible [the one in File2.h] if the prior declaration
specifies internal or external linkage [it specifies internal], the linkage of the identifier at > the later declaration is the same as the linkage specified at the
prior declaration.
It can't be static. Static means its "visibility" (not the official term but probably more understandable) is limited to the C source file it appears in (in this case, that's File2.c).
That means, when you try to link together File1 and File2, the linker will not be able to see var in File2, which is why you're getting the error.
If you want it accessible from File1.c, ditch the "static" bit. In fact, since you already have var defined in File2.c, ditch the entire line from File2.h.

Resources