Is having global variables in common blocks an undefined behaviour? - c

0.c
int i = 5;
int main(){
return i;
}
1.c
int i;
Above compiles fine with gcc 0.c 1.c without any link errors about multiple definitions. The reason is i gets generated as common blocks (-fcommon which is the default behaviour in gcc).
The proper way to do this is using the extern keyword which is missing here.
I have been searching online to see if this is undefined behaviour or not, some post say it is, some say it isn't and it's very confusing:
It is UB
Is having multiple tentative definitions in separate files undefined behaviour?
Why can I define a variable twice in C?
How do I use extern to share variables between source files?
http://port70.net/~nsz/c/c11/n1570.html#J.2
An identifier with external linkage is used, but in the program there does not exist exactly one external definition for the identifier, or the identifier is not used and there exist multiple external definitions for the identifier (6.9).
It is NOT UB
Global variables and the .data section
Defining an extern variable in multiple files in C
Does C have One Definition Rule like C++?
Look for -fno-common:
https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Code-Gen-Options.html
So which one is it? is using -fcommon one of the few places where having multiple definition is allowed and the compiler sorts it out for you? or it is still UB?

Analysis of the code according to the C Standard
This is covered in section 6.9/5 of the latest C Standard:
Semantics
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
The term "external definition" should not be confused with "external linkage" or the extern keyword, those are are entirely different concepts that happen to have similar spelling.
"external definition" means a definition that is not tentative, and not inside a function.
Regarding tentative definition, ths is covered by 6.9.2/2:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static , constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So in your file 1.c, as per 6.9.2/2 the behaviour is exactly as if it had said int i = 0; instead. Which would be an external definition. This means 0.c and 1.c both behave as if they had external definitions which violates the rule 6.9/5 saying there shall be no more than one external definition.
Violating a semantic rule means the behaviour is undefined with no diagnostic required.
Explanation of what "undefined behaviour" means
See also: Undefined, unspecified and implementation-defined behavior
In case it is unclear, the C Standard saying "the behaviour is undefined" means that the C Standard does not define the behaviour. The same code built on different conforming implementations (or rebuilt on the same conforming implementation) may behave differently, including rejecting the program , accepting it, or any other outcome you might imagine.
(Note - some programs can have the defined-ness of their behaviour depend on runtime conditions; those programs cannot be rejected at compile-time and must behave as specified unless the condition occurs that causes the behaviour to be undefined. But that does not apply to the program in this question since all possible executions would encounter the violation of 6.9/5).
Compiler vendors may or may not provide stable and/or documented behaviour for cases where the C Standard does not define the behaviour.
For the code in your question it is common (ha ha) for compiler vendors to provide reliable behaviour ; this is documented in a non-normative Annex J.5.11 to the Standard:
J.5 Common extensions
J.5.11 Multiple external definitions
1 There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern ; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
It seems the gcc compiler implements this extension if -fcommon switch is provided, and disables it if -fno-common is provided (and the default setting may vary between compiler versions).
Footnote: I intentionally avoid using the word "defined" in relation to behaviour that is not defined by the C Standard as it seems to me that is one of the cause of confusion for OP.

Related

C - issue with linking libraries which implement the same names

i'm trying to use freetype-gl and cgml in the same project, but they both implement names such as 'vec2', but in totally different ways (one as struct, other - as array of floats) which means that redefinitions lead to errors
i don't really want to replace all the names in one of the modules to resolve this, is there other way?
C99 not allows 2 same definitions
The C99 standard mentions this:
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
2 same definitions
initializing definition
the 2 same initializing definition lead to compile error.
tentative definition
A definition at file scope that's not initialized is considered "tentative". Tentative definitions can be repeated. An initializing definition removes the tentative status.
If you force to do this.
use compile option.
-Wredundant-decls
Warn if anything is declared more than once in the same scope, even in cases where
multiple declaration is valid and changes nothing.

Inline function definition with external linkage containing reference to a static object

There is a constraint 6.7.4(p3):
An inline definition of a function with external linkage shall not
contain a definition of a modifiable object with static or thread
storage duration, and shall not contain a reference to an identifier
with internal linkage.
Consider the following example:
static const int i = 10;
void do_print(void);
inline void do_print(void){
printf("%d/n", i); //Reference to an identifier with internal linkage
//constraint violation
}
DEMO
Here the inline definition of a function with external linkage uses an identifier with internal linkage. So according to 5.1.1.3(p1):
A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
I expected the violation of this constraint is reported by the compiler somehow (some warning). But the code compiles just fine with no warnings or some other message produced.
The question is: Why is no diagnostic message produced in case of the constraint violation above?
The question is: Why is no diagnostic message produced in case of the constraint violation above?
Because your compiler is non-conforming in this regard.
That's really all there is to it. You have analyzed the text of the standard correctly, and applied it correctly to the code presented. A conforming implementation must emit a diagnostic about the reference to variable i by the inline implementation of do_print. An implementation that does not is, ergo, non-conforming.
I note at this point that some compilers are non-conforming in this general way -- omitting required diagnostics -- by default, while providing an option for turning on these mandatory diagnostics. This is the function of the -pedantic option in GCC, for example. However, I find that my (somewhat dated) version of GCC does not warn about your code even when -pedantic is specified.
cppreference has a paragraph that explains the rationale behind that:
If a function is declared inline in some translation units, it does not need to be declared inline everywhere: at most one translation unit may also provide a regular, non-inline non-static function, or a function declared extern inline. This one translation unit is said to provide the external definition. One external definition must exist in the program if the name of the function with external linkage is used in an expression, see one definition rule.
If the external definition exists in the program, the address of the function is always the address of the external function, but when this address is used to make a function call, it's unspecified whether the inline definition (if present in the translation unit) or the external definition is called.
A note also says (emphasize mine):
The inline keyword was adopted from C++, but in C++, if a function is declared inline, it must be declared inline in every translation unit, and also every definition of an inline function must be exactly the same (in C, the definitions may be different, as long as the behavior of the program does not depend on the differences). On the other hand, C++ allows non-const function-local statics and all function-local statics from different definitions of an inline function are the same in C++ but distinct in C.
That means that if a local inline function uses a static const value in one translation unit, a non inline function with same name could be defined in a different translation unit with a different value for the static const variable, leading to explicit UB because it is unspecified whether the compiler will use the local inline of the global non inline version.

Declaring a static array before defining it when compiling with GCC

The GCC compiler and the Clang compilers behave differently, where the Clang allows a static variable to be declared before it is defined, while the GCC compiler treats the declaration (or "tentative definition") as a definition.
I believe this is a bug in GCC, but complaining about it and opening a bug report won't solve the problem that I need the code to compile on GCC today (or yesterday)...
Heres a fast example:
static struct example_s { int i; } example[];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[] = {{1}, {2}, {3}};
With the Clang compiler, the program compiles and prints out:
Number: 1
However, with GCC the code won't compile and I get the following errors (ignore line numbers):
src/main2.c:26:36: error: array size missing in ‘example’
static struct example_s { int i; } example[];
^~~~~~~
src/main2.c:33:25: error: conflicting types for ‘example’
static struct example_s example[256] = {{1}, {2}, {3}};
^~~~~~~
src/main2.c:26:36: note: previous declaration of ‘example’ was here
static struct example_s { int i; } example[];
Is this a GCC bug or a Clang bug? who knows. Maybe if you're on one of the teams you can decide.
As for me, the static declaration coming before the static definition should be (AFAIK) valid C (a "tentative definition", according to section 6.9.2 of the C11 standard)... so I'm assuming there's some extension in GCC that's messing things up.
Any way to add a pragma or another directive to make sure GCC treats the declaration as a declaration?
The C11 draft has this in §6.9.2 External object definitions:
3 If the declaration of an identifier for an object is a tentative definition and has
internal linkage, the declared type shall not be an incomplete type
I read this as saying that the first line in your code, which has an array of unspecified length, fails to be a proper tentative definition. Not sure what it becomes then, but that would kind of explain GCC's first message.
TL;DR
The short answer is that this particular construct is not allowed by the C11 standard -- or any other C standard going back to ANSI C (1989) -- but it is accepted as a compiler extension by many, though not all, modern C compilers. In the particular case of GCC, you need to not use -pedantic (or -pedantic-errors), which would cause a strict interpretation of the C standard. (Another workaround is described below.)
Note: Although you can spell -pedantic with a W, it is not like many -W options, in that it does not only add warning messages: What it does is:
Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++.
Workarounds
It does not appear to be possible to suppress this error using a GCC #pragma, or at least the ones that I tried didn't have any effect. It is possible to suppress it for a single declaration using the __extension__ extension, but that seems to just be trading one incompatibility for another, since you would then need to find a way to remove (or macro expand away) __extension__ for other compilers.
Quoting the GCC manual:
-pedantic and other options cause warnings for many GNU C extensions. You can prevent such warnings within one expression by writing __extension__ before the expression. __extension__ has no effect aside from this.
On the GCC versions I had handy, the following worked without warnings even with -pedantic:
__extension__ static struct example_s { int i; } example[];
Probably your best bet it to just remove -pedantic from the build options. I don't believe that -pedantic is actually that useful; it's worth reading what the GCC manual has to say about it. In any event, it is doing its job here: the documented intent is to ban extensions, and that's what it is doing.
Language-lawyering
The language-lawyer justification for the above, taking into account some of the lengthy comment threads:
Definitions
An external declaration is a declaration at file scope, outside of any function definition. This shouldn't be confused with external linkage, which is a completely different usage of the word. The standard calls external declarations "external" precisely because they are outside any function definitions.
A translation unit is, thus, a sequence of external-declaration. See §6.9.
If an external declaration is also a definition -- that is, it is either a function declaration with a body or an object declaration with an initializer -- then it is referred to as an external definition.
A type is incomplete at a point in a program where there is not "sufficient information to determine the size of objects of that type" (§6.2.5p1), which includes "an array type of unknown size" (§6.2.5p22). (I'll return to this paragraph later.) (There are other ways for a type to be incomplete, but they're not relevant here.)
An external declaration of an object is a tentative definition (§6.9.2) if it is not a definition and is either marked static or has no storage-class specifier. (In other words, extern declarations are not tentative.)
What's interesting about tentative definitions is that they might become definitions. Multiple declarations can be combined with a single definition, and you can also have multiple declarations (in a translation unit) without any definition (in that translation unit) provided that the symbol has external linkage and that there is a definition in some other translation unit. But in the specific case where there is no definition and all declarations of a symbol are tentative, then the compiler will automatically insert a definition.
In short, if a symbol has any (external) declaration with an explicit extern, it cannot qualify for automatic definition (since the explicitly-marked declaration is not tentative).
A brief detour: the importance of the linkage of the first declaration
Another curious feature: if the first declaration for an object is not explicitly marked static, then no declaration for that object can be marked static, because a declaration without a storage class is considered to have external linkage unless the identifier has already been declared to have internal linkage (§6.2.2p5), and an identifier cannot be declared to have internal linkage if it has already been declared to have external linkage (§6.2.2p7). However, if the first declaration for an object is explicitly static, then subsequent declarations have no effect on its linkage. (§6.2.2p4).
What this all meant for early implementers
Suppose you're writing a compiler on an extremely resource-limited CPU (by modern standards), which was basically the case for all early compiler writers. When you see an external declaration for a symbol, you need to either give it an address within the current translation unit (for symbols with internal linkage) or you need to add it to the list of symbols you're going to let the linker handle (for symbols with external linkage). Since the linker will assign addresses to external symbols, you don't yet need to know what their size is. But for the symbols you're going to handle yourself, you will want to immediately give them an address (within the data segment) so that you can generate machine code referencing the data, and that means that you do need to know what size these objects are.
As noted above, you can tell whether a symbol is internally or externally linked when you first see a declaration for it, and it must be declared before it is used. So by the time you need to emit code using the symbol, you can know whether to emit code referencing a specific known offset within the data segment, or to emit a relocatable reference which will be filled in later by the linker.
But there's a small problem: What if the first declaration is incomplete? That's not a problem for externally linked symbols, but for internally-linked symbols it prevents you from allocating it to an address range since you don't know how big it is. And by the time you find out, you might have had to have emitted code using it. To avoid this problem, it's necessary that the first declaration of an internally-linked symbol be complete. In other words, there cannot be a tentative declaration of an incomplete symbol, which is what the standard says in §6.9.2p3:
If the declaration of an identifier for an object is a tentative definition and has internal linkage, the declared type shall not be an incomplete type.
A bit of paleocybernetics
That's not a new requirement. It was present, with precisely the same wording, in §3.7.2 of C89. And the issue has come up several times over the years in the comp.lang.c and comp.std.c Usenix groups, without ever attracting a definitive explanation. The one I provided above is my best guess, combined with hints from the following discussions:
in 1990: https://groups.google.com/forum/#!msg/comp.std.c/l3Ylvw-mrV0/xPS0dXfJtW4J
in 1993: https://groups.google.com/d/msg/comp.std.c/abG9x3R9-1U/Ib09BSo5EI0J
in 1996: https://groups.google.com/d/msg/comp.lang.c/j6Ru_EaJNkg/-O3jR5tDJMoJ
in 1998: https://groups.google.com/d/msg/comp.std.c/aZMaM1pYBHA/-YbmPnNI-lMJ
in 2003: https://groups.google.com/d/msg/comp.std.c/_0bk-xK9uA0/dAoULatJIKwJ (I got several links from Fergus Henderson's post in this thread.)
in 2011: https://groups.google.com/d/msg/comp.lang.c/aoUSLbUBs7I/7BdNQhAq5DgJ
And it's also come up a few times on Stackoverflow:
What is the meaning of statement below that the declared type shall not be incomplete type
Why is this statement producing a linker error with gcc?
A final doubt
Although no-one in any of the above debates has mentioned it, the actual wording of §6.2.5p22 is:
An array type of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage).
That definitely seems to contradict §6.9.2p3, since it contemplates a "later declaration with interal linkage", which would not be allowed by the prohibition on tentative definitions with internal linkage and incomplete type. This wording is also contained word-for-word in C89 (in §3.1.2.5), so if this is an internal contradiction, it's been in the standard for 30 years, and I was unable to find a Defect Report mentioning it (although DR010 and DR016 hover around the edges).
Note:
For C89, I relied on this file saved in the Wayback Machine but I have no proof that it's correct. (There are other instances of this file in the archive, so there is some corroboration.) When the ISO actually released C90, the sections were renumbered. See this information bulletin, courtesy wikipedia.
Edit: Apparently gcc was throwing an error due to the -Wpedantic flag, which (for some obscure reason) added errors in addition to warnings (see: godbolt.org and remove the flag to compile).
¯\_(ツ)_/¯
A possible (though not DRY) answer is to add the array length to the initial declaration (making a complete type with a tentative declaration where C11 is concerned)... i.e.:
static struct example_s { int i; } example[3];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[3] = {{1}, {2}, {3}};
This is super annoying, as it introduces maintenance issues, but it's a temporary solution that works.

Why there is no error in this program? [duplicate]

This question already has answers here:
Tentative definitions in C and linking
(3 answers)
Closed 6 years ago.
--- a.c ----
int i; // external definition
---- main.c ------
int i=0; // external definition
int main(void)
{
i=0;
}
In both files i is an external defnition in each translation unit and i is used in an expression. That should violate:
If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof operator
whose result is an integer constant), somewhere in the entire program
there shall be exactly one external definition for the identifier;
otherwise, there shall be no more than one.140)
This non-standard behavior is a common extension implemented in many C compilers.
This matter is discussed rather extensively in Rationale to C99 standard (see pp. 32-34). And, according to that document, this set of definitions would be legal under Relaxed Ref/Def model typically implemented in UNIX OS's C compilers of pre-C89 era. This is the reason for its popularity and this is why we often see it implemented as an extension. It is supposed to simplify support of legacy code.
Nevertheless, standard C definition model is different: it is a combination of Strict Ref/Def model and Initialization model. Standard C does not allow this.
P.S. While it is true that definition of i in a.c is a tentative definition, it has nothing to do with the issue. By the end of the containing translation unit all tentative definitions of some object combine and give birth to an external definition of the object. Their "tentative" nature is not in any way visible at inter-module level. Tentative definitions do not allow one to create multiple definitions of the same object in different translation units.

C declarations before usage

All identifiers in C need to be declared before they are used, but I can`t find where it denoted in C99 standard.
I think it refers to macro definitions too, but there is only macro expansion order defined.
C99:TC3 6.5.1 §2, with footnote 79 explicitly stating:
Thus, an undeclared identifier is a violation of the syntax.
in conjunction with 6.2.1 §5:
Unless explicitly stated otherwise, [...] it [ie an identifier] refers to the
entity in the relevant name space whose declaration is visible at the point the identifier
occurs.
and §7:
[...] Any other identifier has scope that begins just after the completion of its declarator.
There are a at least couple of exceptions to the rule that all identifiers need to be delcared before use:
while C99 removed implicit function declarations, you may still see C programs that rely, possibly unknowingly, on them. There is even the occasional question on SO that, for example, ask why functions that return double don't work (when the header that includes the declaration of the function is omitted). It seems that when compiling with pre-C99 semantics, warnings for undeclared functions are often not configured to be used or are ignored.
the identifier for a goto label may be used before it's 'declaration' - it is declared implicitly by its syntactic appearance (followed by a : and a statement).
The exception to the rule for goto labels is pretty much a useless nitpick, but the fact that function identifiers can be used without a declaration (pre-C99) is something that can be useful to know because you might once in a while run into a problem with it as a root cause.
Also, identifiers can be used before being 'declared' (strictly speaking, before being defined) in preprocessing, where they can be tested for being defined or not, or used in preprocessor expressions where they will evaluate to 0 if not otherwise defined.

Resources