Explanation of tentative definitions in C [duplicate] - c

This question already has answers here:
Tentative definitions in C and linking
(3 answers)
Closed 8 years ago.
Let's say I have two source files: main.c and a.c:
main.c:
#include <stdio.h>
int a;
int i;
int i;
int main(void)
{
printf("a = %d\n", a);
printf("i = %d\n", i);
return 0;
}
a.c:
int a;
Then, according to latest C99 draft 6.9.2 External object definitions p. 2 (emphasis mine):
A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
Compilation (no warnings given):
gcc -g -std=c99 -pedantic-errors -Wall -Wextra main.c a.c
I understand that for i variable there two are tentative definitions and since main.c does not have "true" external definition they are merged into such one. What about a ? Do I state correctly, that tentative definitions are not "shared" between multiple source files (i.e. translation units) ?

Your program is erroneous: it defines the same external name more than once. The GNU tool chain follows a relaxed linkage model which does not flag this as an error; it merges the multiple definitions. However, that is effectively a language extension. Strictly conforming ISO C programs cannot define a name more than once.
The notion of a "tentative definition" is purely syntactic, within one translation unit. At the end of a translation unit, any definitions which are still tentative are "cemented" as definitions.
The reason int i; is called "tentative" is that it is "weak" in a sense. It can be overridden by a later definition. If, by the end of the translation unit, it isn't then it turns into int i = 0.
So for instance this is valid:
int i; /* might become int i = 0 */
int i = 42; /* i is now defined; the tentative definition is replaced */
In this situation, i is understood to be defined once, not twice. The translated unit contains a single definition of i.

Related

Share a variable between two files in C

I'm trying to understand on how to link two files with a variable. I know that the standard way is through extern keyword as shown below.
File 1
#include<stdio.h>
int i;
void func1();
int main()
{
i = 10;
func1();
printf("i value with method 1: %d\n", i);
}
File 2
extern int i;
void func1()
{
i = i+1;
}
Compile and execute => gcc *.c ./a.out
Output=> i value with method 1: 11
One thing I found is through headers as shown below:
common.h
#ifndef __COMMON_H
#define __COMMON_H
int i;
#endif
File 1
#include<stdio.h>
#include"common.h"
int main()
{
i = 10;
func1();
printf("i value with method 2: %d\n", i);
}
File 2
#include"common.h"
void func1()
{
i = i+1;
}
Compile and execute => gcc *.c ./a.out
Output=> i value with method 2: 11
My doubt is, How method2 works?
At file scope, int i; is a special kind of declaration called a tentative definition. In spite of its name, it is not a definition. However, it can cause a definition to be created.
A clean way to declare and define an object that is used in multiple translation units is to declare it with extern in a header that is included in each unit that uses the object by name:
extern int i;
and to define the object in one translation unit:
int i = 0;
At file scope, int i = 0; is a definition; the initialization with = 0 makes it a regular definition instead of a tentative definition.
Ideally, all source code would use clean declarations and definitions. However, C was not completely planned and designed in advance. It developed through experiments and different people in different places implementing things differently. When the C committee standardized C, they had to deal with different practices and implementations. One common practice was that of declarations such as int i; in multiple units that were intended to create a single i. (This behavior was inherited from FORTRAN, which had common objects as a similar feature.)
To accommodate this, the committee described int i; at file scope as a special kind of declaration, a tentative definition. If there is a regular definition in the same translation unit that defines the same identifier, the tentative definition acts as a plain declaration, not a definition. If there is no regular definition, the compiler (or other part of C implementation) creates a definition for the identifier as if it had been initialized with zero.
The C standard leaves reconciling of multiple tentative definitions to each C implementation; it does not define the behavior when int i; is used in multiple translation units. Prior to version 10, the default behavior of GCC was to use the “common symbol” behavior; multiple tentative definitions would be reconciled to a single definition when linking. (To support this, the compiler marks tentative definitions differently from regular definitions when creating object modules, so the linker knows which is which.) In version 10, the default changed, and GCC now treats the definitions resulting from tentative definitions as regular symbols instead of common symbols.
This is why you will see some people report they get an error when linking sources with tentative definitions while you and others do not. It is simply a matter of which version of which compiler and linker they used.
You can explicitly request either behavior with the GCC switch -fcommon for the common symbol behavior or -fno-common for the regular symbol behavior.
Generally, you should use the clean method above; declare identifiers with extern in headers, and put exactly one definition in each identifier in one source file.

Multiple definitions gives no errors or warnings in either gcc or clang

If I did
#include <stdio.h>
int a; //definition
int a; //definition
int a; //definition
int a; //definition
int a; //definition
int main() {
return 0;
}
For example, I would get no errors or warnings from either gcc or clang despite defining a variable multiple times. Why? I thought I was allowed to declare a variable as many times as I wanted to, but could only define it once?
This is a tentative definition. That is each declaration of a file scope variable without an initializer is considered as a declaration not as a definition. The definition is implicitly generated at the end of the translation unit with an initializer equal to 0.
From the C Standard (6.9.2 External object definitions)
2 A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.

Multiple declaration of same struct variable ok?

Here's the setup:
foo.h:
typedef struct my_struct {
int a;
} my_struct;
const my_struct my_struct1;
my_struct my_struct2;
foo.c:
#include "foo.h"
const my_struct my_struct1 = { .a = 1 };
my_struct my_struct2 = { .a = 2 };
main.c:
#include "foo.h"
#include <stdio.h>
int main() {
printf("%d %d\n", my_struct1.a, my_struct2.a);
return 0;
}
Which when compiled with gcc main.c foo.c prints 1 2. The question is, haven't I declared multiple variables with the same name (the two sets of structs)?
edit: Thanks for the reply all. I see I may have posed a slightly confusing question. Originally I thought const may have implied some sort of extern declaration (which makes no sense, I know), which is why I thought to create my_struct2. Much to my surprise, it still works.
According to the C Standard (6.9.2 External object definitions)
1 If the declaration of an identifier for an object has file scope and
an initializer, the declaration is an external definition for the
identifier.
2 A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
Thus in your example these declarations of identifiers in header foo.h itself included in module foo.c
const my_struct my_struct1;
my_struct my_struct2;
are not their external definitions because they do not have initializers.
These objects are externally defined only in module foo.c itself
const my_struct my_struct1 = { .a = 1 };
my_struct my_struct2 = { .a = 2 };
where they are explicitly initialized.
In module main.c these external declarations constitute tentative definitions and zero initialized.
According to the Appendix J
J.5.11 Multiple external definitions 1 There may be more than one
external definition for the identifier of an object, with or without
the explicit use of the keyword extern; if the definitions disagree,
or more than one is initialized, the behavior is undefined (6.9.2).
Thus the behaviour of the program is undefined unless your compiler supports the extension described in the Appendix J.
You should set specifier extern for these identifiers in header foo.h that the declarations in main.c would not constitute tentative definitions.
The one declaration rule is applied to identifiers that have no linkage. (6.7 Declarations)
3 If an identifier has no linkage, there shall be no more than one
declaration of the identifier (in a declarator or type specifier)
with the same scope and in the same name space, except that a typedef
name can be redefined to denote the same type as it currently does and
tags may be redeclared as specified in 6.7.2.3.
In your example all identifiers have external linkage. So they may be declared several times but defined only once.
const my_struct my_struct1;
here my_struct1 is a constant object of type my_struct. I hope you know what is a constant variable.
my_struct my_struct2;
Here my_struct2 is a object of type my-struct.
So to sum it up these are 2 different objects and have separate memory allocated for them so there is no mutiple definitions for the same object you are defining 2 different objects which is totally fine.

Variable access across multiple files [duplicate]

This question already has answers here:
Use of 'extern' keyword while defining the variable
(2 answers)
Closed 8 years ago.
I was trying out programs based on extern and as I understand, this is helpful when accessing variables across multiple files having only one definition.
But I tried a simple program as below without extern and thing seem to work when I expected it would fail during linking process
file5.c:
#include <stdio.h>
#include "var.h"
int a = 20;
int main() {
printf("\n File5.c a = %d", a);
test();
return 0;
}
file6.c:
#include <stdio.h>
#include "var.h"
int test() {
printf("\n File6.c a = %d",a);
}
var.h
int a;
As I have included var.h in all header files without extern, int a would be included in both the .c file and during linking, compiler should have thrown a warning or error message but it compiles file without any issue.
Shouldn't var.h have the following extern int a?
It is generally best if the header uses extern int a;. See also How do I share a variable between source files in C?
The standard says:
ISO/IEC 9899:2011 §6.9.2 External object definitions
Semantics
¶1 If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
¶2 A declaration of an identifier for an object that has file scope without an initializer, and
without a storage-class specifier or with the storage-class specifier static, constitutes a
tentative definition. If a translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for that identifier, then
the behavior is exactly as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation unit, with an initializer
equal to 0.
Thus, what's in the header is a tentative definition of the variable. At the end of the translation unit (TU) for file5.c, you have no longer got a tentative definition; the 'external definition' specified by int a = 20; has specified that. At the end of the TU for file6.c, you have a definition equivalent to int a = 0;.
When you try to link file5.c and file6.c, you should run into multiple definitions of a. However, there is a common extension, documented in the standard in Annex J:
J.5.11 Multiple external definitions
¶1 There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than
one is initialized, the behavior is undefined (6.9.2).
Your compiler is providing the extension identified by §J.5.11, and therefore (legitimately) not complaining.
580

C the same global variable defined in different files

I am reading this code from here(in Chinese). There is one piece of code about testing global variable in C. The variable a has been defined in the file t.h which has been included twice. In file foo.c defined a struct b with some value and a main function. In main.c file, defined two variables without initialized.
/* t.h */
#ifndef _H_
#define _H_
int a;
#endif
/* foo.c */
#include <stdio.h>
#include "t.h"
struct {
char a;
int b;
} b = { 2, 4 };
int main();
void foo()
{
printf("foo:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\tsizeof(b)=%d\n\tb.a=%d\n\tb.b=%d\n\tmain:0x%08x\n",
&a, &b, sizeof b, b.a, b.b, main);
}
/* main.c */
#include <stdio.h>
#include "t.h"
int b;
int c;
int main()
{
foo();
printf("main:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\t(&c)=0x%08x\n\tsize(b)=%d\n\tb=%d\n\tc=%d\n",
&a, &b, &c, sizeof b, b, c);
return 0;
}
After using Ubuntu GCC 4.4.3 compiling, the result is like this below:
foo: (&a)=0x0804a024
(&b)=0x0804a014
sizeof(b)=8
b.a=2
b.b=4
main:0x080483e4
main: (&a)=0x0804a024
(&b)=0x0804a014
(&c)=0x0804a028
size(b)=4
b=2
c=0
Variable a and b has the same address in two function, but the size of b has changed. I can't understand how it worked!
You are violating C's "one definition rule", and the result is undefined behavior. The "one definition rule" is not formally stated in the standard as such. We are looking at objects in different source files (aka, translation units), so we concerned with "external definitions". The "one external definition" semantic is spelled out (C11 6.9 p5):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
Which basically means you are only allowed to define an object at most once. (The otherwise clause allows you to not define an external object at all if it is never used anywhere in the program.)
Note that you have two external definitions for b. One is the structure that you initialize in foo.c, and the other is the tentative definition in main.c, (C11 6.9.2 p1-2):
If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So you have multiple definitions of b. However, there is another error, in that you have defined b with different types. First note that multiple declarations to the same object with external linkage is allowed. However, when the same name is used in two different source files, that name refers to the same object (C11 6.2.2 p2):
In the set of translation units and libraries that constitutes an entire program, each
declaration of a particular identifier with external linkage denotes the same object or
function.
C puts a strict limitation on declarations to the same object (C11 6.2.7 p2):
All declarations that refer to the same object or function shall have compatible type;
otherwise, the behavior is undefined.
Since the types for b in each of your source files do not actually match, the behavior is undefined. (What constitutes a compatible type is described in detail in all of C11 6.2.7, but it basically boils down to being that the types have to match.)
So you have two failings for b:
Multiple definitions.
Multiple declarations with incompatible types.
Technically, your declaration of int a in both of your source files also violates the "one definition rule". Note that a has external linkage (C11 6.2.2 p5):
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
But, from the quote from C11 6.9.2 earlier, those int a tentative definitions are external definitions, and you are only allowed one of those from the quote from C11 6.9 at the top.
The usual disclaimers apply for undefined behavior. Anything can happen, and that would include the behavior you observed.
A common extension to C is to allow multiple external definitions, and is described in the C standard in the informative Annex J.5 (C11 J.5.11):
There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
(Emphasis is mine.) Since the definitions for a agree, there is no harm there, but the definitions for b do not agree. This extension explains why your compiler does not complain about the presence of multiple definitions. From the quote of C11 6.2.2, the linker will attempt to reconcile the multiple references to the same object.
Linkers typically use one of two models for reconciling multiple definitions of the same symbol in multiple translation units. These are the "Common Model" and the "Ref/Def Model". In the "Common Model", multiple objects with the same name are folded into a single object in a union style manner so that the object takes on the size of the largest definition. In the "Ref/Def Model", each external name must have exactly one definition.
The GNU toolchain uses the "Common Model" by default, and a "Relaxed Ref/Def Model", where it enforces a strictly one definition rule for a single translation unit, but does not complain about violations across multiple translation units.
The "Common Model" can be suppressed in the GNU compiler by using the -fno-common option. When I tested this on my system, it caused "Strict Ref/Def Model" behavior for code similar to yours:
$ cat a.c
#include <stdio.h>
int a;
struct { char a; int b; } b = { 2, 4 };
void foo () { printf("%zu\n", sizeof(b)); }
$ cat b.c
#include <stdio.h>
extern void foo();
int a, b;
int main () { printf("%zu\n", sizeof(b)); foo(); }
$ gcc -fno-common a.c b.c
/tmp/ccd4fSOL.o:(.bss+0x0): multiple definition of `a'
/tmp/ccMoQ72v.o:(.bss+0x0): first defined here
/tmp/ccd4fSOL.o:(.bss+0x4): multiple definition of `b'
/tmp/ccMoQ72v.o:(.data+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `b' changed from 8 in /tmp/ccMoQ72v.o to 4 in /tmp/ccd4fSOL.o
collect2: ld returned 1 exit status
$
I personally feel the last warning issued by the linker should always be provided regardless of the resolution model for multiple object definitions, but that is neither here nor there.
References:
Unfortunately, I can't give you the link to my copy of the C11 Standard
What are extern variables in C?
The "Beginner's Guide to Linkers"
SAS Documentation on External Variable Models
Formally, it is illegal to define the same variable (or function) with external linkage more than once. So, from the formal point of view the behavior of your program is undefined.
Practically, allowing multiple definitions of the same variable with external linkage is a popular compiler extension (a common extension, mentioned as such in the language specification). However, in order to be used properly, each definition shall declare it with the same type. And no more than one definition shall include initializer.
Your case does not match the common extension description. Your code compiles as a side effect of that common extension, but its behavior is still undefined.
The piece of code seems to break the one-definition rule on purpose. It will invoke undefined behavior, don't do that.
About the global variable a: don't put definition of a global variable in a header file, since it will be included in multiple .c files, and leads to multiple definition. Just put declarations in the header and put the definition in one of the .c files.
In t.h:
extern int a;
In foo.c
int a;
About the global variable b: don't define it multiple times, use static to limit the variable in a file.
In foo.c:
static struct {
char a;
int b;
} b = { 2, 4 };
In main.c
static int b;
b has the same address because the linker decided to resolve the conflict for you.
sizeof shows different values because sizeof is evaluated at compile time. At this stage, the compiler only knows about one b (the one defined in the current file).
At the time foo is being compiled, the b that is in scope is the two ints vector {2, 4} or 8 bytes when an sizeof(int) is 4.
When main is compiled, b has just been redeclared as an int so a size of 4 makes sense. Also there is probably "padding bytes" added to the struct after "a" such that the next slot (the int) is aligned on 4 bytes boundary.
a and b have the same addresses because they occur at the same points in the file. The fact that b is a different size doesn't matter where the variable begins. If you added a variable c between a and b in one of the files, the address of the bs would differ.

Resources