My Textbook says that:
"Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.Given a strong symbol and multiple weak symbols, choose the strong symbol"
So I create two files to see:
file1.c:
int number;
int main(int argc, char *argv[])
{
printf("%d",number);
return 0;
}
file2.c (just one line):
int number = 2018;
and I ran gcc -Wall -o program file1.c file2.c and the output is 0, which I can understand before I study linker ('number' in file1.c has been initialized to 0), but after I study how linker works, I start to wonder why the output is not 2018, since the 'number' in file2 is strong symbol(initialized global variable) and the 'number' in file1 is weak symbol, so the linker will choose the strong one whose value is 2018, so why the linker choose the weak symbol?
The int number; in file1.c is not uninitialized. Note that it is declared at file scope, it is declared without an initializer, and it is declared without a storage-class specifier (particularly no extern or static). Then C 2018 6.9.2 2 says:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So, int number; in file1.c is the same as int number = 0;. It is initialized.
An issue with the text you quote is that it is describing the linker using terminology for that linker, and this is different terminology than the the C standard uses. The C standard does not have any “global” variables or “strong” or “weak” symbols.
The number in file2.c is global, but still locally scoped just to that file. If you want file1.c to use number from file2.c you need to mark it as extern like this:
extern int number;
int main(int argc, char *argv[])
{
printf("%d",number);
return 0;
}
Related
This question already has answers here:
Tentative definitions in C and linking
(3 answers)
Closed 8 years ago.
Let's say I have two source files: main.c and a.c:
main.c:
#include <stdio.h>
int a;
int i;
int i;
int main(void)
{
printf("a = %d\n", a);
printf("i = %d\n", i);
return 0;
}
a.c:
int a;
Then, according to latest C99 draft 6.9.2 External object definitions p. 2 (emphasis mine):
A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
Compilation (no warnings given):
gcc -g -std=c99 -pedantic-errors -Wall -Wextra main.c a.c
I understand that for i variable there two are tentative definitions and since main.c does not have "true" external definition they are merged into such one. What about a ? Do I state correctly, that tentative definitions are not "shared" between multiple source files (i.e. translation units) ?
Your program is erroneous: it defines the same external name more than once. The GNU tool chain follows a relaxed linkage model which does not flag this as an error; it merges the multiple definitions. However, that is effectively a language extension. Strictly conforming ISO C programs cannot define a name more than once.
The notion of a "tentative definition" is purely syntactic, within one translation unit. At the end of a translation unit, any definitions which are still tentative are "cemented" as definitions.
The reason int i; is called "tentative" is that it is "weak" in a sense. It can be overridden by a later definition. If, by the end of the translation unit, it isn't then it turns into int i = 0.
So for instance this is valid:
int i; /* might become int i = 0 */
int i = 42; /* i is now defined; the tentative definition is replaced */
In this situation, i is understood to be defined once, not twice. The translated unit contains a single definition of i.
I am defining a global variable in test2.h
#ifndef TEST2_H
#define TEST2_H
int test_var;
void use_it(void);
#endif
and defining it again in two different files, test.c
#include <stdio.h>
#include "test2.h"
int test_var;
int main() {
printf("The test_var is: %d\n", ++test_var); // prints 1
use_it(); // prints 2
}
and test2.c
#include <stdio.h>
#include "test2.h"
int test_var;
void use_it() {
printf("The test_var is: %d", ++test_var);
}
I replaced the definition of test_var with extern int test_var and got the same result. That is, in both cases both files, test.c and test2.c have access to the global variable test_var. I was under the impression that without extern, each file would have their own copy of test_var. Observation suggests that this is not the case. So when does extern actually do something?
You end up with two copies of test_var and this is undefined behavior.
(C99, 6.9p5) "If an identifier declared with external linkage is used in an expression (other than as part of the operandof a sizeof operator whose result is an integer constant),
somewhere in the entire program there shall be exactly one external
definition for the identifier; otherwise, there shall be no more than one"
In your case the linker may be nice with you and merges the two symbols but this is still not portable and is undefined behavior. If you are using the GNU linker, you can use --warn-common to get the warning (and --fatal-warnings if you want an error).
To fix your issue, put the extern specifier in the declaration of test_var in the .h file and remove one of the definition of test_var (for example the one in test.c file).
This is undefined behavior, as others have noted, but what you are seeing here is the common extension described in appendix J.5.11 of the C99 spec, where multiple external definitions in different compilation units are allowed as long as none or only one of them are initialized and the types of all of them are the same.
In this case, with the extension, the definitions will be combined into a single definition at link time.
You also appear to be confused by the fact that the extern keyword, when used at the global scope, has nothing to do with extern linkage for declarations and definitions. ALL declarations at the global scope have extern linkage unless they have a static or inline keyword. The extern keyword serves to make such a declaration just a declaration. Without the extern keyword a global variable declaration is also a definition, and that is the only effect of the extern keyword in the global scope.
If you have the same variable declared in 2 diferent files as int test_var for example:
file1.c
int test_var;
file2.c
int test_var;
both variables will have their own memory adress, so they are two diferent variables with the same name.
if you have, two variables declared in 2 diferent files declared as extern int test_var, for example:
file1.c
extern int test_var; //this is a mistake
file2.c
extern int test_var; //this is a mistake
the compiler will return an error when you try to do something with that variables because with the keyword externyou are not reserving any space for that variable, you only use that keyword to say that a variable is already defined (commonly in another file).
The point is to unsderstand that a global variable is defined once with a sentence like int test_var (when you define a variable the compiler reserve space for it) and it's declared in every other file that need access to it with extern int test_var (when you declare a variable with the extern keyword you saying the compiler that variable is already defined and you want to have access to it in the file you are declaring it).
So an example of how to use a global variable wil be:
file1.c
int test_var; //definition
void useit(void);
int main () {
test_var=7;
useit();
return 0;
}
file2.c
#include <stdio.h>
void useit (void) {
extern int test_var; //declaration
printf ("the variable value is %d",test_var);
}
To answer your question:
extern int test_var; is a declaration. This announces that "Somewhere, there should exist test_var . We don't know where that is yet, but by the time we finish compiling and linking, we will find it in exactly one place".
So there has to be exactly one definition to match. A definition serves as a declaration, and also says "Here is the storage for test_var".
Also, test_var could either have internal linkage or external linkage. The default behaviour is external linkage. If you provide more than one definition for a variable of external linkage, it is undefined behaviour.
Internal linkage is indicated by including static in the declaration. You can have as many definitions as you want of a static variable (so long as only one per file has an initializer).
Summing up, we have:
extern int test_var; // declaration, external linkage
static int test_var; // declaration, definition, internal linkage
int test_var; // declaration, definition, external linkage
Note: the last two cases are actually tentative definitions: this is a thing that C has but C++ doesn't; the way it works is that it behaves like a declaration at first; but then , for each unit, if there is no later definition then this actually serves as a definition.
So you can write in C:
int test_var;
// stuff
int test_var = 5;
If you are using gcc and possibly some other compilers, you just stumbled upon some Unix tradition. Namely that uninitialized global variables are placed in the common block where multiple definitions of the same variable are merged during linking.
gcc can be told to put uninitialized global variables into the data section with the option -fno-common. With this, the linker will report an error when there are multiple definitions of the same variable name.
I am reading this code from here(in Chinese). There is one piece of code about testing global variable in C. The variable a has been defined in the file t.h which has been included twice. In file foo.c defined a struct b with some value and a main function. In main.c file, defined two variables without initialized.
/* t.h */
#ifndef _H_
#define _H_
int a;
#endif
/* foo.c */
#include <stdio.h>
#include "t.h"
struct {
char a;
int b;
} b = { 2, 4 };
int main();
void foo()
{
printf("foo:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\tsizeof(b)=%d\n\tb.a=%d\n\tb.b=%d\n\tmain:0x%08x\n",
&a, &b, sizeof b, b.a, b.b, main);
}
/* main.c */
#include <stdio.h>
#include "t.h"
int b;
int c;
int main()
{
foo();
printf("main:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\t(&c)=0x%08x\n\tsize(b)=%d\n\tb=%d\n\tc=%d\n",
&a, &b, &c, sizeof b, b, c);
return 0;
}
After using Ubuntu GCC 4.4.3 compiling, the result is like this below:
foo: (&a)=0x0804a024
(&b)=0x0804a014
sizeof(b)=8
b.a=2
b.b=4
main:0x080483e4
main: (&a)=0x0804a024
(&b)=0x0804a014
(&c)=0x0804a028
size(b)=4
b=2
c=0
Variable a and b has the same address in two function, but the size of b has changed. I can't understand how it worked!
You are violating C's "one definition rule", and the result is undefined behavior. The "one definition rule" is not formally stated in the standard as such. We are looking at objects in different source files (aka, translation units), so we concerned with "external definitions". The "one external definition" semantic is spelled out (C11 6.9 p5):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
Which basically means you are only allowed to define an object at most once. (The otherwise clause allows you to not define an external object at all if it is never used anywhere in the program.)
Note that you have two external definitions for b. One is the structure that you initialize in foo.c, and the other is the tentative definition in main.c, (C11 6.9.2 p1-2):
If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So you have multiple definitions of b. However, there is another error, in that you have defined b with different types. First note that multiple declarations to the same object with external linkage is allowed. However, when the same name is used in two different source files, that name refers to the same object (C11 6.2.2 p2):
In the set of translation units and libraries that constitutes an entire program, each
declaration of a particular identifier with external linkage denotes the same object or
function.
C puts a strict limitation on declarations to the same object (C11 6.2.7 p2):
All declarations that refer to the same object or function shall have compatible type;
otherwise, the behavior is undefined.
Since the types for b in each of your source files do not actually match, the behavior is undefined. (What constitutes a compatible type is described in detail in all of C11 6.2.7, but it basically boils down to being that the types have to match.)
So you have two failings for b:
Multiple definitions.
Multiple declarations with incompatible types.
Technically, your declaration of int a in both of your source files also violates the "one definition rule". Note that a has external linkage (C11 6.2.2 p5):
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
But, from the quote from C11 6.9.2 earlier, those int a tentative definitions are external definitions, and you are only allowed one of those from the quote from C11 6.9 at the top.
The usual disclaimers apply for undefined behavior. Anything can happen, and that would include the behavior you observed.
A common extension to C is to allow multiple external definitions, and is described in the C standard in the informative Annex J.5 (C11 J.5.11):
There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
(Emphasis is mine.) Since the definitions for a agree, there is no harm there, but the definitions for b do not agree. This extension explains why your compiler does not complain about the presence of multiple definitions. From the quote of C11 6.2.2, the linker will attempt to reconcile the multiple references to the same object.
Linkers typically use one of two models for reconciling multiple definitions of the same symbol in multiple translation units. These are the "Common Model" and the "Ref/Def Model". In the "Common Model", multiple objects with the same name are folded into a single object in a union style manner so that the object takes on the size of the largest definition. In the "Ref/Def Model", each external name must have exactly one definition.
The GNU toolchain uses the "Common Model" by default, and a "Relaxed Ref/Def Model", where it enforces a strictly one definition rule for a single translation unit, but does not complain about violations across multiple translation units.
The "Common Model" can be suppressed in the GNU compiler by using the -fno-common option. When I tested this on my system, it caused "Strict Ref/Def Model" behavior for code similar to yours:
$ cat a.c
#include <stdio.h>
int a;
struct { char a; int b; } b = { 2, 4 };
void foo () { printf("%zu\n", sizeof(b)); }
$ cat b.c
#include <stdio.h>
extern void foo();
int a, b;
int main () { printf("%zu\n", sizeof(b)); foo(); }
$ gcc -fno-common a.c b.c
/tmp/ccd4fSOL.o:(.bss+0x0): multiple definition of `a'
/tmp/ccMoQ72v.o:(.bss+0x0): first defined here
/tmp/ccd4fSOL.o:(.bss+0x4): multiple definition of `b'
/tmp/ccMoQ72v.o:(.data+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `b' changed from 8 in /tmp/ccMoQ72v.o to 4 in /tmp/ccd4fSOL.o
collect2: ld returned 1 exit status
$
I personally feel the last warning issued by the linker should always be provided regardless of the resolution model for multiple object definitions, but that is neither here nor there.
References:
Unfortunately, I can't give you the link to my copy of the C11 Standard
What are extern variables in C?
The "Beginner's Guide to Linkers"
SAS Documentation on External Variable Models
Formally, it is illegal to define the same variable (or function) with external linkage more than once. So, from the formal point of view the behavior of your program is undefined.
Practically, allowing multiple definitions of the same variable with external linkage is a popular compiler extension (a common extension, mentioned as such in the language specification). However, in order to be used properly, each definition shall declare it with the same type. And no more than one definition shall include initializer.
Your case does not match the common extension description. Your code compiles as a side effect of that common extension, but its behavior is still undefined.
The piece of code seems to break the one-definition rule on purpose. It will invoke undefined behavior, don't do that.
About the global variable a: don't put definition of a global variable in a header file, since it will be included in multiple .c files, and leads to multiple definition. Just put declarations in the header and put the definition in one of the .c files.
In t.h:
extern int a;
In foo.c
int a;
About the global variable b: don't define it multiple times, use static to limit the variable in a file.
In foo.c:
static struct {
char a;
int b;
} b = { 2, 4 };
In main.c
static int b;
b has the same address because the linker decided to resolve the conflict for you.
sizeof shows different values because sizeof is evaluated at compile time. At this stage, the compiler only knows about one b (the one defined in the current file).
At the time foo is being compiled, the b that is in scope is the two ints vector {2, 4} or 8 bytes when an sizeof(int) is 4.
When main is compiled, b has just been redeclared as an int so a size of 4 makes sense. Also there is probably "padding bytes" added to the struct after "a" such that the next slot (the int) is aligned on 4 bytes boundary.
a and b have the same addresses because they occur at the same points in the file. The fact that b is a different size doesn't matter where the variable begins. If you added a variable c between a and b in one of the files, the address of the bs would differ.
a)The definition of an external variable is same as that of a local variable,ie, int i=2; (only outside all functions).
But why is extern int i=2; too working as the definition? Isn't extern used only in variable declaration in other files?
b)file 1
#include<stdio.h>
int i=3;
int main()
{
printf("%d",i);
fn();
}
file2
int i; // although the declaration should be: extern int i; So, why is this working?
void fn()
{
printf("%d",i);
}
OUTPUT: 3 in both cases
For historical reasons, the rules for determination of linkage and when a declaration provides a definition are a bit of a mess.
For your particular example, at file scope
extern int i = 2;
and
int i = 2;
are equivalent external definitions, ie extern is optional if you provide an initializer.
However, if you do not provide an initializer, extern is not optional:
int i;
is a tentative definition with external linkage, which becomes an external definition equivalent to
int i = 0;
if the translation unit doesn't contain another definition with explicit initializer.
This is different from
extern int i;
which is never a definition. If there already is another declaration of the same identifier visible, then the variable will get its linkage from that; if this is the first declaration, the variable will have external linkage.
This means that in your second example, both file1 and file2 provide an external definition of i, which is undefined behaviour, and the linker is free to choose the definition it likes best (it may also try to make demons fly out of your nose). There's a common extension to C (see C99 Annex J.5.11 and this question) which makes this particular case well-defined.
In C, an extern with an initialization results in a variable being allocated. That is the declaration will be considered a defining declaration. This is in contrast with the more common use of extern.
The C standard says:
6.9.2 External object definitions
.....
If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
As for the second part of your question, your declaration int i at file scope has external linkage. If you want to give it internal linkage you need to declare it static int i. The C standard says:
6.2.2 Linkages of identifiers
......
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
Take this file as example,there are many non-extern structures like:
struct list_head source_list;
How can it work when this header file is included by more than one compile units?
There should be error reporting that the same symbol is defined twice,right?
Technically there should, but that usage has been around for years and is impossible to eradicate (it's been tried; every so often some vendor decides to make it an error, and reverts after the first hundred or so bug reports). Pedantically, the .h file should declare it extern and one .c/.cpp file should define it.
Briefly, when you don't specify the linkage (static, extern, etc.) of a top level variable, it's declared as "common". At link time, if all references to that variable are the same size (and type, when available) then it is allocated once and all references are made to point to it. If the linker finds different sizes / types / linkages for the same variable, it throws an error.
EDIT: this is clearly confounding people. Here:
jinx:1714 Z$ cat foo.h
int foo;
extern void bar();
jinx:1715 Z$ cat foo.c
#include "foo.h"
int
main(int argc, char **argv)
{
bar();
return 0;
}
jinx:1716 Z$ cat bar.c
#include "foo.h"
void
bar(void)
{
return;
}
jinx:1717 Z$ gcc -Wall foo.c bar.c -o foo
jinx:1718 Z$ ./foo
jinx:1719 Z$ _
Note the complete lack of errors about int foo being multiply defined. This is what I've been trying to say.
The term for this is "tentative definition":
A declaration of an identifier for an
object that has file scope without an
initializer, and without a
storage-class specifier or with the
storage-class specifier static,
constitutes a
tentative definition. If a translation unit contains one or more
tentative definitions for an
identifier, and the translation unit contains no external
definition for that identifier, then
the behavior is exactly as if the translation unit contains a
file scope declaration of that
identifier, with the composite type as of the end of the
translation unit, with an initializer
equal to 0.
So this is well defined in C (but often frowned upon).
This struct list_head source_list; fields are declared inside other structures so they are not symbols.
Declarations of other (top level) structures have distinct names so it's ok too.
edit
Note that all variables it this header are really marked with extern.
There should be an extern indeed. However, there's no explicit definition of that variable, so the compiler marks it as extern for you.
You would get a linker error if you had
struct list_head source_list = { 0 };
...since this does define the symbol once per translation unit (and hence the linker complains).