This question already has answers here:
Tentative definitions in C and linking
(3 answers)
Closed 6 years ago.
--- a.c ----
int i; // external definition
---- main.c ------
int i=0; // external definition
int main(void)
{
i=0;
}
In both files i is an external defnition in each translation unit and i is used in an expression. That should violate:
If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof operator
whose result is an integer constant), somewhere in the entire program
there shall be exactly one external definition for the identifier;
otherwise, there shall be no more than one.140)
This non-standard behavior is a common extension implemented in many C compilers.
This matter is discussed rather extensively in Rationale to C99 standard (see pp. 32-34). And, according to that document, this set of definitions would be legal under Relaxed Ref/Def model typically implemented in UNIX OS's C compilers of pre-C89 era. This is the reason for its popularity and this is why we often see it implemented as an extension. It is supposed to simplify support of legacy code.
Nevertheless, standard C definition model is different: it is a combination of Strict Ref/Def model and Initialization model. Standard C does not allow this.
P.S. While it is true that definition of i in a.c is a tentative definition, it has nothing to do with the issue. By the end of the containing translation unit all tentative definitions of some object combine and give birth to an external definition of the object. Their "tentative" nature is not in any way visible at inter-module level. Tentative definitions do not allow one to create multiple definitions of the same object in different translation units.
Related
I am trying to implement a global singleton variable in the header-only library in C (not C++). So after searching on this forum and elsewhere, I came across a variation of Meyer's singleton that I am adapting to C here:
/* File: sing.h */
#ifndef SING_H
#define SING_H
inline int * singleton()
{
static int foo = 0;
return &foo;
}
#endif
Notice that I am returning a pointer because C lacks & referencing available in C++, so I must work around it.
OK, now I want to test it, so here is a simple test code:
/* File: side.h */
#ifndef SIDE_H
#define SIDE_H
void side();
#endif
/*File: side.c*/
#include "sing.h"
#include <stdio.h>
void side()
{
printf("%d\n",*(singleton()));
}
/*File: main.c*/
#include "sing.h"
#include "side.h"
#include <stdio.h>
int main(int argc, char * argv[])
{
/* Output default value - expected output: 0 */
printf("%d\n",*(singleton()));
*(singleton()) = 5;
/* Output modified value - expected output: 5 */
printf("%d\n",*(singleton()));
/* Output the same value from another module - expected output: 5*/
side();
return 0;
}
Compiles and runs fine in MSVC in C mode (also in C++ mode too, but that's not the topic). However, in gcc it outputs two warnings (warning: ‘foo’ is static but declared in inline function ‘singleton’ which is not static), and produces an executable which then segfaults when I attempt to run it. The warning itself kind of makes sense to me (in fact, I am surprised I don't get it in MSVC), but segfault kind of hints at the possibility that gcc never compiles foo as a static variable, making it a local variable in stack and then returns expired stack address of that variable.
I tried declaring the singleton as extern inline, it compiles and runs fine in MSVC, results in linker error in gcc (again, I don't complain about linker error, it is logical).
I also tried static inline (compiles fine in both MSVC and gcc, but predictably runs with wrong output in the third line because the side.c translation unit now has its own copy of singleton.
So, what am I doing wrong in gcc? I have neither of these problems in C++, but I can't use C++ in this case, it must be straight C solution.
I could also accept any other form of singleton implementation that works from header-only library in straight C in both gcc and MSVC.
I am trying to implement a global singleton variable in the header-only library in C (not C++).
By "global", I take you to mean "having static storage duration and external linkage". At least, that's as close as C can come. That is also as close as C can come to a "singleton" of a built-in type, so in that sense, the term "global singleton" is redundant.
Notice that I am returning a pointer because C lacks & referencing available in C++, so I must work around it.
It is correct that C does not have references, but you would not need either pointer or reference if you were not using a function to wrap access to the object. I'm not really seeing what you are trying to gain by that. You would likely find it easier to get what you are looking for without. For example, when faced with duplicate external defintions of the same variable identifier, the default behavior of all but the most recent versions of GCC was to merge them into a single variable. Although current GCC reports this situation as an error, the old behavior is still available by turning on a command-line switch.
On the other hand, your inline function approach is unlikely to work in many C implementations. Note especially that inline semantics are rather different in C than in C++, and external inline functions in particular are rarely useful in C. Consider these provisions of the C standard:
paragraph 6.7.4/3 (a language constraint):
An inline definition of a function with external linkage shall not contain a definition of a modifiable object with static or thread storage duration, and shall not contain a reference to an identifier with internal linkage.
Your example code is therefore non-conforming, and conforming compilers are required to diagnose it. They may accept your code nonetheless, but they may do anything they choose with it. It seems unreasonably hopeful to expect that you could rely on a random conforming C implementation to both accept your code for the function and compile it such that callers in different translation units could obtain pointers to the same object by calling that function.
paragraph 6.9/5:
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression [...], somewhere in the entire program there shall be exactly one external definition for the identifier [...].
Note here that although an inline definition of a function identifier with external linkage -- such as yours -- provides an external declaration of that identifier, it does not provide an external definition of it. This means that a separate external definition is required somewhere in the program (unless the function goes altogether unused). Moreover, that external definition cannot be in a translation unit that includes the inline definition. This is large among the reasons that extern inline functions are rarely useful in C.
paragraph 6.7.4/7:
For a function with external linkage, the following restrictions apply: [...] If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.
In addition to echoing part of 6.9/5, that also warns you that if you do provide an external definition of your function to go with the inline definitions, you cannot be sure which will be used to serve any particular call.
Furthermore, you cannot work around those issues by declaring the function with internal linkage, for although that would allow you to declare a static variable within, each definition of the function would be a different function. Lest there be any doubt, Footnote 140 clarifies that in that case,
Since an inline definition is distinct from the corresponding external definition and from any other corresponding inline definitions in other translation units, all corresponding objects with static storage duration are also distinct in each of the definitions.
(Emphasis added.)
So again, the approach presented in your example cannot be relied upon to work in C, though you might find that in practice, it does work with certain compilers.
If you need this to be a header-only library, then you can achieve it in a portable manner by placing an extra requirement on your users: exactly one translation unit in any program using your header library must define a special macro before including the header. For example:
/* File: sing.h */
#ifndef SING_H
#define SING_H
#ifdef SING_MASTER
int singleton = 0;
#else
extern int singleton;
#endif
#endif
With that, the one translation unit that defines SING_MASTER before including sing.h (for the first time) will provide the needed definition of singleton, whereas all other translation units will have only a declaration. Moreover, the variable will be accessible directly, without either calling a function or dereferencing a pointer.
0.c
int i = 5;
int main(){
return i;
}
1.c
int i;
Above compiles fine with gcc 0.c 1.c without any link errors about multiple definitions. The reason is i gets generated as common blocks (-fcommon which is the default behaviour in gcc).
The proper way to do this is using the extern keyword which is missing here.
I have been searching online to see if this is undefined behaviour or not, some post say it is, some say it isn't and it's very confusing:
It is UB
Is having multiple tentative definitions in separate files undefined behaviour?
Why can I define a variable twice in C?
How do I use extern to share variables between source files?
http://port70.net/~nsz/c/c11/n1570.html#J.2
An identifier with external linkage is used, but in the program there does not exist exactly one external definition for the identifier, or the identifier is not used and there exist multiple external definitions for the identifier (6.9).
It is NOT UB
Global variables and the .data section
Defining an extern variable in multiple files in C
Does C have One Definition Rule like C++?
Look for -fno-common:
https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Code-Gen-Options.html
So which one is it? is using -fcommon one of the few places where having multiple definition is allowed and the compiler sorts it out for you? or it is still UB?
Analysis of the code according to the C Standard
This is covered in section 6.9/5 of the latest C Standard:
Semantics
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
The term "external definition" should not be confused with "external linkage" or the extern keyword, those are are entirely different concepts that happen to have similar spelling.
"external definition" means a definition that is not tentative, and not inside a function.
Regarding tentative definition, ths is covered by 6.9.2/2:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static , constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So in your file 1.c, as per 6.9.2/2 the behaviour is exactly as if it had said int i = 0; instead. Which would be an external definition. This means 0.c and 1.c both behave as if they had external definitions which violates the rule 6.9/5 saying there shall be no more than one external definition.
Violating a semantic rule means the behaviour is undefined with no diagnostic required.
Explanation of what "undefined behaviour" means
See also: Undefined, unspecified and implementation-defined behavior
In case it is unclear, the C Standard saying "the behaviour is undefined" means that the C Standard does not define the behaviour. The same code built on different conforming implementations (or rebuilt on the same conforming implementation) may behave differently, including rejecting the program , accepting it, or any other outcome you might imagine.
(Note - some programs can have the defined-ness of their behaviour depend on runtime conditions; those programs cannot be rejected at compile-time and must behave as specified unless the condition occurs that causes the behaviour to be undefined. But that does not apply to the program in this question since all possible executions would encounter the violation of 6.9/5).
Compiler vendors may or may not provide stable and/or documented behaviour for cases where the C Standard does not define the behaviour.
For the code in your question it is common (ha ha) for compiler vendors to provide reliable behaviour ; this is documented in a non-normative Annex J.5.11 to the Standard:
J.5 Common extensions
J.5.11 Multiple external definitions
1 There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern ; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
It seems the gcc compiler implements this extension if -fcommon switch is provided, and disables it if -fno-common is provided (and the default setting may vary between compiler versions).
Footnote: I intentionally avoid using the word "defined" in relation to behaviour that is not defined by the C Standard as it seems to me that is one of the cause of confusion for OP.
The GCC compiler and the Clang compilers behave differently, where the Clang allows a static variable to be declared before it is defined, while the GCC compiler treats the declaration (or "tentative definition") as a definition.
I believe this is a bug in GCC, but complaining about it and opening a bug report won't solve the problem that I need the code to compile on GCC today (or yesterday)...
Heres a fast example:
static struct example_s { int i; } example[];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[] = {{1}, {2}, {3}};
With the Clang compiler, the program compiles and prints out:
Number: 1
However, with GCC the code won't compile and I get the following errors (ignore line numbers):
src/main2.c:26:36: error: array size missing in ‘example’
static struct example_s { int i; } example[];
^~~~~~~
src/main2.c:33:25: error: conflicting types for ‘example’
static struct example_s example[256] = {{1}, {2}, {3}};
^~~~~~~
src/main2.c:26:36: note: previous declaration of ‘example’ was here
static struct example_s { int i; } example[];
Is this a GCC bug or a Clang bug? who knows. Maybe if you're on one of the teams you can decide.
As for me, the static declaration coming before the static definition should be (AFAIK) valid C (a "tentative definition", according to section 6.9.2 of the C11 standard)... so I'm assuming there's some extension in GCC that's messing things up.
Any way to add a pragma or another directive to make sure GCC treats the declaration as a declaration?
The C11 draft has this in §6.9.2 External object definitions:
3 If the declaration of an identifier for an object is a tentative definition and has
internal linkage, the declared type shall not be an incomplete type
I read this as saying that the first line in your code, which has an array of unspecified length, fails to be a proper tentative definition. Not sure what it becomes then, but that would kind of explain GCC's first message.
TL;DR
The short answer is that this particular construct is not allowed by the C11 standard -- or any other C standard going back to ANSI C (1989) -- but it is accepted as a compiler extension by many, though not all, modern C compilers. In the particular case of GCC, you need to not use -pedantic (or -pedantic-errors), which would cause a strict interpretation of the C standard. (Another workaround is described below.)
Note: Although you can spell -pedantic with a W, it is not like many -W options, in that it does not only add warning messages: What it does is:
Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++.
Workarounds
It does not appear to be possible to suppress this error using a GCC #pragma, or at least the ones that I tried didn't have any effect. It is possible to suppress it for a single declaration using the __extension__ extension, but that seems to just be trading one incompatibility for another, since you would then need to find a way to remove (or macro expand away) __extension__ for other compilers.
Quoting the GCC manual:
-pedantic and other options cause warnings for many GNU C extensions. You can prevent such warnings within one expression by writing __extension__ before the expression. __extension__ has no effect aside from this.
On the GCC versions I had handy, the following worked without warnings even with -pedantic:
__extension__ static struct example_s { int i; } example[];
Probably your best bet it to just remove -pedantic from the build options. I don't believe that -pedantic is actually that useful; it's worth reading what the GCC manual has to say about it. In any event, it is doing its job here: the documented intent is to ban extensions, and that's what it is doing.
Language-lawyering
The language-lawyer justification for the above, taking into account some of the lengthy comment threads:
Definitions
An external declaration is a declaration at file scope, outside of any function definition. This shouldn't be confused with external linkage, which is a completely different usage of the word. The standard calls external declarations "external" precisely because they are outside any function definitions.
A translation unit is, thus, a sequence of external-declaration. See §6.9.
If an external declaration is also a definition -- that is, it is either a function declaration with a body or an object declaration with an initializer -- then it is referred to as an external definition.
A type is incomplete at a point in a program where there is not "sufficient information to determine the size of objects of that type" (§6.2.5p1), which includes "an array type of unknown size" (§6.2.5p22). (I'll return to this paragraph later.) (There are other ways for a type to be incomplete, but they're not relevant here.)
An external declaration of an object is a tentative definition (§6.9.2) if it is not a definition and is either marked static or has no storage-class specifier. (In other words, extern declarations are not tentative.)
What's interesting about tentative definitions is that they might become definitions. Multiple declarations can be combined with a single definition, and you can also have multiple declarations (in a translation unit) without any definition (in that translation unit) provided that the symbol has external linkage and that there is a definition in some other translation unit. But in the specific case where there is no definition and all declarations of a symbol are tentative, then the compiler will automatically insert a definition.
In short, if a symbol has any (external) declaration with an explicit extern, it cannot qualify for automatic definition (since the explicitly-marked declaration is not tentative).
A brief detour: the importance of the linkage of the first declaration
Another curious feature: if the first declaration for an object is not explicitly marked static, then no declaration for that object can be marked static, because a declaration without a storage class is considered to have external linkage unless the identifier has already been declared to have internal linkage (§6.2.2p5), and an identifier cannot be declared to have internal linkage if it has already been declared to have external linkage (§6.2.2p7). However, if the first declaration for an object is explicitly static, then subsequent declarations have no effect on its linkage. (§6.2.2p4).
What this all meant for early implementers
Suppose you're writing a compiler on an extremely resource-limited CPU (by modern standards), which was basically the case for all early compiler writers. When you see an external declaration for a symbol, you need to either give it an address within the current translation unit (for symbols with internal linkage) or you need to add it to the list of symbols you're going to let the linker handle (for symbols with external linkage). Since the linker will assign addresses to external symbols, you don't yet need to know what their size is. But for the symbols you're going to handle yourself, you will want to immediately give them an address (within the data segment) so that you can generate machine code referencing the data, and that means that you do need to know what size these objects are.
As noted above, you can tell whether a symbol is internally or externally linked when you first see a declaration for it, and it must be declared before it is used. So by the time you need to emit code using the symbol, you can know whether to emit code referencing a specific known offset within the data segment, or to emit a relocatable reference which will be filled in later by the linker.
But there's a small problem: What if the first declaration is incomplete? That's not a problem for externally linked symbols, but for internally-linked symbols it prevents you from allocating it to an address range since you don't know how big it is. And by the time you find out, you might have had to have emitted code using it. To avoid this problem, it's necessary that the first declaration of an internally-linked symbol be complete. In other words, there cannot be a tentative declaration of an incomplete symbol, which is what the standard says in §6.9.2p3:
If the declaration of an identifier for an object is a tentative definition and has internal linkage, the declared type shall not be an incomplete type.
A bit of paleocybernetics
That's not a new requirement. It was present, with precisely the same wording, in §3.7.2 of C89. And the issue has come up several times over the years in the comp.lang.c and comp.std.c Usenix groups, without ever attracting a definitive explanation. The one I provided above is my best guess, combined with hints from the following discussions:
in 1990: https://groups.google.com/forum/#!msg/comp.std.c/l3Ylvw-mrV0/xPS0dXfJtW4J
in 1993: https://groups.google.com/d/msg/comp.std.c/abG9x3R9-1U/Ib09BSo5EI0J
in 1996: https://groups.google.com/d/msg/comp.lang.c/j6Ru_EaJNkg/-O3jR5tDJMoJ
in 1998: https://groups.google.com/d/msg/comp.std.c/aZMaM1pYBHA/-YbmPnNI-lMJ
in 2003: https://groups.google.com/d/msg/comp.std.c/_0bk-xK9uA0/dAoULatJIKwJ (I got several links from Fergus Henderson's post in this thread.)
in 2011: https://groups.google.com/d/msg/comp.lang.c/aoUSLbUBs7I/7BdNQhAq5DgJ
And it's also come up a few times on Stackoverflow:
What is the meaning of statement below that the declared type shall not be incomplete type
Why is this statement producing a linker error with gcc?
A final doubt
Although no-one in any of the above debates has mentioned it, the actual wording of §6.2.5p22 is:
An array type of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage).
That definitely seems to contradict §6.9.2p3, since it contemplates a "later declaration with interal linkage", which would not be allowed by the prohibition on tentative definitions with internal linkage and incomplete type. This wording is also contained word-for-word in C89 (in §3.1.2.5), so if this is an internal contradiction, it's been in the standard for 30 years, and I was unable to find a Defect Report mentioning it (although DR010 and DR016 hover around the edges).
Note:
For C89, I relied on this file saved in the Wayback Machine but I have no proof that it's correct. (There are other instances of this file in the archive, so there is some corroboration.) When the ISO actually released C90, the sections were renumbered. See this information bulletin, courtesy wikipedia.
Edit: Apparently gcc was throwing an error due to the -Wpedantic flag, which (for some obscure reason) added errors in addition to warnings (see: godbolt.org and remove the flag to compile).
¯\_(ツ)_/¯
A possible (though not DRY) answer is to add the array length to the initial declaration (making a complete type with a tentative declaration where C11 is concerned)... i.e.:
static struct example_s { int i; } example[3];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[3] = {{1}, {2}, {3}};
This is super annoying, as it introduces maintenance issues, but it's a temporary solution that works.
From Storage-class specifiers:
The storage-class specifiers determine two independent properties of the names they declare: storage duration and linkage.
So, for example, when static keyword is used on global variables and functions (who's storage class is static anyway) it sets their linkage to Internal-linkage. When used on variables inside functions (which have no linkage) - it sets their storage class to static.
My question is: why is the same specifier used for both things?
The reason is mostly historical: linkage came into the design of C language as an afterthought. In the early versions you could redeclare global variables as many times as you wish, and linker would merge all these declarations for you:
Ritchie's original intention had been to model C's rules on FORTRAN COMMON declarations, on the theory that any machine that could handle FORTRAN would be ready for C. In the common-block model, a public variable may be declared multiple times; identical declarations are merged by the linker. (source)
The current rule of a single declaration came later, along with extern keyword. At that point there was a body of C code significant enough to make backward compatibility important. That is probably the reason why language designers refrained from introducing a new keyword for handling linkage, reusing static instead.
Consider following program. Will this give any compilation errors?
#include <stdio.h>
int s=5;
int s;
int main(void)
{
printf("%d",s);
}
At first glance it seems that compiler will give variable redefinition error but program is perfectly valid according to C standard. (See live demo here http://ideone.com/Xyo5SY).
A tentative definition is any external data declaration that has no storage class specifier and no initializer.
C99 6.9.2/2
A declaration of an identifier for an object that has file scope without
an initializer, and without a storage-class specifier or with the
storage-class specifier static, constitutes a tentative definition. If a
translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for
that identifier, then the behavior is exactly as if the translation
unit contains a file scope declaration of that identifier, with the
composite type as of the end of the translation unit, with an
initializer equal to 0.
My question is, what is rationale for allowing tentative definitions? Is there any use of this in C? Why does C allow tentative definitions?
Tentative definitions was created as a way to bridge incompatible models that existed pre-C89. This is covered in the C99 rationale section 6.9.2 External object definitions which says:
Prior to C90, implementations varied widely with regard to forward
referencing identifiers with internal linkage (see §6.2.2). The C89
committee invented the concept of tentative definition to handle this
situation. A tentative definition is a declaration that may or may not
act as a definition: If an actual definition is found later in the
translation unit, then the tentative definition just acts as a
declaration. If not, then the tentative definition acts as an actual
definition. For the sake of consistency, the same rules apply to
identifiers with external linkage, although they're not strictly
necessary.
and section 6.2.2 from the C99 rationale says:
The definition model to be used for objects with external linkage was
a major C89 standardization issue. The basic problem was to decide
which declarations of an object define storage for the object, and
which merely reference an existing object. A related problem was
whether multiple definitions of storage are allowed, or only one is
acceptable. Pre-C89 implementations exhibit at least four different
models, listed here in order of increasing restrictiveness:
Here's an example of a case where it's useful:
void (*a)();
void bar();
void foo()
{
a = bar;
}
static void (*a)() = foo;
/* ... code that uses a ... */
The key point is that the definition of foo has to refer to a, and the definition of a has to refer to foo. Similar examples with initialized structures should also be possible.