External linkage of const in C - c

I was playing with extern keyword in C when I encountered this strange behaviour.
I have two files:
file1.c
#include<stdio.h>
int main()
{
extern int a;
a=10;
printf("%d",a);
return 0;
}
file2.c
const int a=100;
When I compile these files together, there is no error or warning and when I run them, output comes to be 10. I had expected that the compiler should report an error on line a=10;.
Moreover, if I change the contents of file2.c to
const int a;
that is, if I remove the initialization of global const variable a and then compile the files, there is still no error or warning but when I run them, Segmentation Fault occurs.
Why does this phenomenon happen? Is it classified under undefined behaviour? Is this compiler- or machine- dependent?
PS: I have seen many questions related to this one, but either they are for C++ or they discuss extern only.

Compilation and linking are two distinct phases. During compilation, individual files are being compiled into object files. Compiler will find both file1.c and file2.c being internally consistent. During linking phase, the linker will just point all the occurrence of the variable a to the same memory location. This is the reason you do not see any compilation or linker error.
To avoid exactly the problem which you have mentioned, it is suggested to put the extern in a header file and then include that header file in different C file. This way compiler can catch any inconsistency between the header and the C file
The following stackoverflow also speaks about linker not able to do type checking for extern variables.
Is there any type checking in C or C++ linkers?
Similarly, the types of global variables (and static members of classes and so on) aren't checked by the linker, so if you declare extern int test; in one translation unit and define float test; in another, you'll get bad results.

It is undefined behaviour but the compiler won't warn you. How could it? It has no idea how you declare a variable in another file.
Attempting to modify a variable declared const is undefined behaviour. It is possible (but not necessary) that the variable will be stored in read-only memory.

This is a known behavior of C compilers. It is one of the differences between C and C++ where strong compile time type checking is enforced.
The segmentation fault occurs when trying to assign a value to a const, because the linker puts the const values in a read-only elf segment and writing to this memory address is a runtime (segmentation) fault.
but during compile time, the compiler does not check any "externs", and the C linker, does not test types. therefore it passes compilation/linkage.

Your program causes undefined behaviour with no diagnostic required (whether or not const int a has an initializer). The relevant text in C11 is 6.2.7/2:
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
Also 6.2.2/2:
In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function.
In C, const int a = 100; means that a has external linkage. So it denotes the same object as extern int a;. However those two declarations have incompatible type (int is not compatible with const int, see 6.7.2 for the definition of "compatible type").

Related

Declaring a static array before defining it when compiling with GCC

The GCC compiler and the Clang compilers behave differently, where the Clang allows a static variable to be declared before it is defined, while the GCC compiler treats the declaration (or "tentative definition") as a definition.
I believe this is a bug in GCC, but complaining about it and opening a bug report won't solve the problem that I need the code to compile on GCC today (or yesterday)...
Heres a fast example:
static struct example_s { int i; } example[];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[] = {{1}, {2}, {3}};
With the Clang compiler, the program compiles and prints out:
Number: 1
However, with GCC the code won't compile and I get the following errors (ignore line numbers):
src/main2.c:26:36: error: array size missing in ‘example’
static struct example_s { int i; } example[];
^~~~~~~
src/main2.c:33:25: error: conflicting types for ‘example’
static struct example_s example[256] = {{1}, {2}, {3}};
^~~~~~~
src/main2.c:26:36: note: previous declaration of ‘example’ was here
static struct example_s { int i; } example[];
Is this a GCC bug or a Clang bug? who knows. Maybe if you're on one of the teams you can decide.
As for me, the static declaration coming before the static definition should be (AFAIK) valid C (a "tentative definition", according to section 6.9.2 of the C11 standard)... so I'm assuming there's some extension in GCC that's messing things up.
Any way to add a pragma or another directive to make sure GCC treats the declaration as a declaration?
The C11 draft has this in §6.9.2 External object definitions:
3 If the declaration of an identifier for an object is a tentative definition and has
internal linkage, the declared type shall not be an incomplete type
I read this as saying that the first line in your code, which has an array of unspecified length, fails to be a proper tentative definition. Not sure what it becomes then, but that would kind of explain GCC's first message.
TL;DR
The short answer is that this particular construct is not allowed by the C11 standard -- or any other C standard going back to ANSI C (1989) -- but it is accepted as a compiler extension by many, though not all, modern C compilers. In the particular case of GCC, you need to not use -pedantic (or -pedantic-errors), which would cause a strict interpretation of the C standard. (Another workaround is described below.)
Note: Although you can spell -pedantic with a W, it is not like many -W options, in that it does not only add warning messages: What it does is:
Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++.
Workarounds
It does not appear to be possible to suppress this error using a GCC #pragma, or at least the ones that I tried didn't have any effect. It is possible to suppress it for a single declaration using the __extension__ extension, but that seems to just be trading one incompatibility for another, since you would then need to find a way to remove (or macro expand away) __extension__ for other compilers.
Quoting the GCC manual:
-pedantic and other options cause warnings for many GNU C extensions. You can prevent such warnings within one expression by writing __extension__ before the expression. __extension__ has no effect aside from this.
On the GCC versions I had handy, the following worked without warnings even with -pedantic:
__extension__ static struct example_s { int i; } example[];
Probably your best bet it to just remove -pedantic from the build options. I don't believe that -pedantic is actually that useful; it's worth reading what the GCC manual has to say about it. In any event, it is doing its job here: the documented intent is to ban extensions, and that's what it is doing.
Language-lawyering
The language-lawyer justification for the above, taking into account some of the lengthy comment threads:
Definitions
An external declaration is a declaration at file scope, outside of any function definition. This shouldn't be confused with external linkage, which is a completely different usage of the word. The standard calls external declarations "external" precisely because they are outside any function definitions.
A translation unit is, thus, a sequence of external-declaration. See §6.9.
If an external declaration is also a definition -- that is, it is either a function declaration with a body or an object declaration with an initializer -- then it is referred to as an external definition.
A type is incomplete at a point in a program where there is not "sufficient information to determine the size of objects of that type" (§6.2.5p1), which includes "an array type of unknown size" (§6.2.5p22). (I'll return to this paragraph later.) (There are other ways for a type to be incomplete, but they're not relevant here.)
An external declaration of an object is a tentative definition (§6.9.2) if it is not a definition and is either marked static or has no storage-class specifier. (In other words, extern declarations are not tentative.)
What's interesting about tentative definitions is that they might become definitions. Multiple declarations can be combined with a single definition, and you can also have multiple declarations (in a translation unit) without any definition (in that translation unit) provided that the symbol has external linkage and that there is a definition in some other translation unit. But in the specific case where there is no definition and all declarations of a symbol are tentative, then the compiler will automatically insert a definition.
In short, if a symbol has any (external) declaration with an explicit extern, it cannot qualify for automatic definition (since the explicitly-marked declaration is not tentative).
A brief detour: the importance of the linkage of the first declaration
Another curious feature: if the first declaration for an object is not explicitly marked static, then no declaration for that object can be marked static, because a declaration without a storage class is considered to have external linkage unless the identifier has already been declared to have internal linkage (§6.2.2p5), and an identifier cannot be declared to have internal linkage if it has already been declared to have external linkage (§6.2.2p7). However, if the first declaration for an object is explicitly static, then subsequent declarations have no effect on its linkage. (§6.2.2p4).
What this all meant for early implementers
Suppose you're writing a compiler on an extremely resource-limited CPU (by modern standards), which was basically the case for all early compiler writers. When you see an external declaration for a symbol, you need to either give it an address within the current translation unit (for symbols with internal linkage) or you need to add it to the list of symbols you're going to let the linker handle (for symbols with external linkage). Since the linker will assign addresses to external symbols, you don't yet need to know what their size is. But for the symbols you're going to handle yourself, you will want to immediately give them an address (within the data segment) so that you can generate machine code referencing the data, and that means that you do need to know what size these objects are.
As noted above, you can tell whether a symbol is internally or externally linked when you first see a declaration for it, and it must be declared before it is used. So by the time you need to emit code using the symbol, you can know whether to emit code referencing a specific known offset within the data segment, or to emit a relocatable reference which will be filled in later by the linker.
But there's a small problem: What if the first declaration is incomplete? That's not a problem for externally linked symbols, but for internally-linked symbols it prevents you from allocating it to an address range since you don't know how big it is. And by the time you find out, you might have had to have emitted code using it. To avoid this problem, it's necessary that the first declaration of an internally-linked symbol be complete. In other words, there cannot be a tentative declaration of an incomplete symbol, which is what the standard says in §6.9.2p3:
If the declaration of an identifier for an object is a tentative definition and has internal linkage, the declared type shall not be an incomplete type.
A bit of paleocybernetics
That's not a new requirement. It was present, with precisely the same wording, in §3.7.2 of C89. And the issue has come up several times over the years in the comp.lang.c and comp.std.c Usenix groups, without ever attracting a definitive explanation. The one I provided above is my best guess, combined with hints from the following discussions:
in 1990: https://groups.google.com/forum/#!msg/comp.std.c/l3Ylvw-mrV0/xPS0dXfJtW4J
in 1993: https://groups.google.com/d/msg/comp.std.c/abG9x3R9-1U/Ib09BSo5EI0J
in 1996: https://groups.google.com/d/msg/comp.lang.c/j6Ru_EaJNkg/-O3jR5tDJMoJ
in 1998: https://groups.google.com/d/msg/comp.std.c/aZMaM1pYBHA/-YbmPnNI-lMJ
in 2003: https://groups.google.com/d/msg/comp.std.c/_0bk-xK9uA0/dAoULatJIKwJ (I got several links from Fergus Henderson's post in this thread.)
in 2011: https://groups.google.com/d/msg/comp.lang.c/aoUSLbUBs7I/7BdNQhAq5DgJ
And it's also come up a few times on Stackoverflow:
What is the meaning of statement below that the declared type shall not be incomplete type
Why is this statement producing a linker error with gcc?
A final doubt
Although no-one in any of the above debates has mentioned it, the actual wording of §6.2.5p22 is:
An array type of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage).
That definitely seems to contradict §6.9.2p3, since it contemplates a "later declaration with interal linkage", which would not be allowed by the prohibition on tentative definitions with internal linkage and incomplete type. This wording is also contained word-for-word in C89 (in §3.1.2.5), so if this is an internal contradiction, it's been in the standard for 30 years, and I was unable to find a Defect Report mentioning it (although DR010 and DR016 hover around the edges).
Note:
For C89, I relied on this file saved in the Wayback Machine but I have no proof that it's correct. (There are other instances of this file in the archive, so there is some corroboration.) When the ISO actually released C90, the sections were renumbered. See this information bulletin, courtesy wikipedia.
Edit: Apparently gcc was throwing an error due to the -Wpedantic flag, which (for some obscure reason) added errors in addition to warnings (see: godbolt.org and remove the flag to compile).
¯\_(ツ)_/¯
A possible (though not DRY) answer is to add the array length to the initial declaration (making a complete type with a tentative declaration where C11 is concerned)... i.e.:
static struct example_s { int i; } example[3];
int main(void) {
fprintf(stderr, "Number: %d\n", example[0].i);
return 0;
}
static struct example_s example[3] = {{1}, {2}, {3}};
This is super annoying, as it introduces maintenance issues, but it's a temporary solution that works.

No warning with uninitialized C string

I'm currently wondering why I don't get an error from GCC during compilation/linking of a small C program.
I declared in version.h the following string:
const char* const VERSION;
In version.c I have set the initialization of the variable:
const char* const VERSION = "0.8 rev 213";
No problem with that. I can use the string in the rest of the program.
If the c file is missing, no error occurs during compilation/linking but the program fails with SIGSEGV (of course) when it tries to access the variable.
Is my way of setting up the variable VERSION correct or is there a better way? Or is there a chance to get an error during compilation/linking?
You have defined (not just declared) a variable in the header.
If you ever include this header from more than one source file, the behaviour is undefined. Here's the relevant quote from the standard:
J.2 Undefined behavior
…
An identifier with external linkage is used, but in the program there does not exist exactly one external definition for the identifier, or the identifier is not used and there exist multiple external definitions for the identifier.
…
You are relying on a GCC-specific (actually common to many compilers, but still non-standard) behaviour here, which is merging of duplicate tentative data definitions. See help for -fcommon and -fno-common GCC compilation flags. Not all compilers behave this way. Historically this is the common behaviour for linkers, because that's how Fortran worked before there was C.
Assuming this language extension, one of the definitions (one that has an explicit initializer) initialises the variable to point to your string literal. But if you omit this definition, it will remain zero-initialised. That is, it will be a constant null pointer. Not very useful.
To make long story short, never ever do that. In order to declare (but not define) a global variable in a header, use extern. If you do, and try to omit a definition elsewhere, you will likely get a linker error (though the standard does not require diagnostic for this violation, all known implementation produce one).
Your example works because of a Fortran-inspired (mis)feature of C (but not C++) called tentative definitions (6.9.2p2) which is commonly but nonstandardly extended to multiple files.
Tentative definitions are variable declarations without extern and with
no initializer. In common implementations (pun intended), tentative definitions create a special kind of symbol which is called a common symbol. During linking, in the presence of a regular symbol of the same name, other common symbols become references to the regular symbol which means all the empty VERSIONs in your translation units created there due to inclusions will become references to the regular symbol const char* const VERSION = "0.8 rev 213";. If there's no such regular symbol, the common symbols will get merged into a single zero-initalized variable.
I don't know how to get a warning against this with a C compiler.
This
const char* const VERSION;
const char* const VERSION = "0.8 rev 213";
seems to work with gcc no matter what I've tried (g++ won't accept it -- C++ doesn't have the tentative definition feature and it doesn't like const variables that aren't explicitly initialized). But you can compile with -fno-common (which is a fairly "common" (and highly recommended) nonstandard option (gcc,clang, and tcc all have it)) and then you will get a linker error if the non-initialized and the initialized extern-less declarations are in different translation units.
Example:
v.c:
const char * VERSION;
main.c
const char* VERSION;
int main(){}
compilation and linking:
gcc main.c v.c #no error because of tentative definitions
g++ main.c v.c #linker error because C++ doesn't have tentative definitions
gcc main.c v.c -fno-common #linker error because tentative defs. are disabled
(I removed the second const in the example for the sake of the C++ example — C++ additionally makes const globals static or something like that which only complicates the demonstration of the tentative definition feature.)
With tentative definitions disabled or with C++, all your variable declarations in headers should have extern in them
version.h:
extern const char* const VERSION;
and you should have exactly one definition for each global and that definition should have an initializer or be without extern (some compilers will warn if you apply extern to an initialized variable).
version.c:
#include "version.h" //optional; for type checking
const char* const VERSION = "0.8 rev 213";
In version.h you should declare VERSION as a extern like
extern const char* const VERSION;
And in version.c you should define the extern variable like
const char* const VERSION = "0.8 rev 213";
EDIT :- Also you need to make sure that only one source file defined the variable VERSION while the other source files declared it extern and you can do it by defining it in a source file VERSION.c and put the extern declaration in a header file VERSION.h.

if global variable have default external linkage then why can't we access it directly in another file?

I have gone through following questions:
Global variable in C are static or not?
Are the global variables extern by default or it is equivalent to declaring variable with extern in global?
Above links describe that if we define global variable in one file and haven't specified extern keyword they will be accessible in another source file because of translation unit.
Now I have file1.c in that have defined following global variable and function:
int testVariable;
void testFunction()
{
printf ("Value of testVariable %d \n", testVariable);
}
In file2.c have following code
void main()
{
testVariable = 40;
testFunction();
}
Now I am getting error: 'testVariable' undeclared (first use in this function) -- why?
Note: both files are used in same program using makefile.
As per my understanding both function and global variable have default external linkage. So function we can use directly by it's name in another file but variable can't why?
Can any one have idea?
EDIT:
From the below answer i get idea that like in case of function old compiler will guess and add an implicit declaration but in case of variable it can't. Also C99 removed implicit declaration but still I am getting warning in C99 mode like:
warning: implicit declaration of function ‘testFunction’.
Now have gone through below link:
implicit int and implicit declaration of functions with gcc compiler
It said that compiler take it as diagnostic purpose and not give error. So compiler can process forward.
But why in case of variable it can't process further? Even in case of function if compiler proceed and if actual definition not there then at linking time it will fail. So what's benefit to move forward??
There are two things in play here: The first is that there is a difference between a definition and a declaration. The other thing is the concept of translation units.
A definition is what defines the variable, it's the actual place the variable exists, where the compiler reserves space for the variable.
A declaration is needed for the compiler to know that a symbol exists somewhere in the program. Without a declaration the compiler will not know that a symbol exists.
A translation unit is basically and very simplified the source file plus all its included header files. An object file is a single translation unit, and the linker takes all translation units to create the final program.
Now, a program can only have a single definition, for example a global variable may only exist in a single translation unit, or you will get multiple definition errors when linking. A declaration on the other hand can exist in any number of translation units, the compiler will use it to tell the linker that the translation references a definition in another (unknown at time of compilation) translation unit.
So what happens here is that you have a definition and a declaration in file1.c. This source file is used as input for one translation unit and the compiler generates a single object file for it, say file1.o. In the other source file, file2.c, there is no definition nor any declaration of the global variable testVariable, so the compiler doesn't know it exists and will give you an error for it. You need to declare it, for example by doing
extern int testVariable; // This is a declaration of the variable
It's a little more complicated for the function, because in older versions of the C standard one didn't have to declare functions being used, the compiler would guess and add an implicit declaration. If the definition and the implicit declaration doesn't match it will lead to undefined behavior, which is why implicit function declarations was removed in the C99 standard. So you should really declare the function too:
void testFunction(void); // Declare a function prototype
Note that the extern keyword is not needed here, because the compiler can automatically tell that it's a function prototype declaration.
The complete file2.c should then look like
extern int testVariable; // This is a declaration of the variable
void testFunction(void); // Declare a function prototype
void main()
{
testVariable = 40;
testFunction();
}
When compiler copes with file2.c it knows nothing about existence of testVariable and about its type. And as result it can't generate code to interact with such object. And purpose of line like
extern int testVariable;
is to let compiler to know that somewhere such object exists and has type of int.
With functions we have no such problem because of next rule - if function is not defined - compiler assumes that it is defined somewhere like
int testFunction() { ... }
So you can pass any number of any arguments to it and try to obtain int return value. But if real function signature differs - you'll get an undefined behafiour at runtime. Because of this weakness such approach is considered as bad practice, and you should declare proper function prototype before any call to that func.

initialized vs uninitialized global const variable in c

Pardon me, I am not very good in explaining questions. So I start with example directly
Look at following example
const int a=10;
int *ptr;
int main(){
ptr=&a;
*ptr=100; // program crashes
printf("%d",a);
}
But If I made a slightly change in above code as following
const int a; // uninitialized global variable
Then the above code works fine.
So my question is why compiler behaves differently for uninitialize and initialize global const variables?
I am using gcc for windows (mingw).
You are modifying a const object, and that is simply undefined behavior - so don't do it, and don't ignore compiler warnings.
Now, the actual reason for the different behavior in your particular case is that for const int a=10; the value 10 has to be stored somewhere. Since the variable is const, the linker places it in the .rodata or a similar read only section of the executable. When you're trying to write to a read-only location, you'll get a segmentation fault.
For the uninitialized case, const int a , the a needs to be initialized to zero since it's at file scope (or; a is a global variable). The linker then places the variable in the .bss section, together with other data that also is zero initialized at program startup. The .bss section is read/write and you get no segfault when you try to write to it.
All this is not something you can rely on , this could change with minor modification to the code, if you use another compiler or a newer/older version of your compiler etc.
Global and static variables are initialized implicitly if your code doesn't do it explicitly as mandated by the C standard.
From the doc:
const is a type qualifier. The other type qualifier is volatile. The
purpose of const is to announce objects that may be placed in
read-only memory, and perhaps to increase opportunities for
optimization.
In G++ you will receive the error for the second case ie, const int a;.
6.9.2 External object definitions
Semantics
1 If the declaration of an identifier for an object has file scope and
an initializer, the declaration is an external definition for the
identifier.
2 A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
declares a constant integer variable. It means it’s value can’t be modified. It’s value is initially assigned to 10.
If you try to change its value later, the compiler will issue a warning, or an error, depending on your compiler settings.

Function definition and extern declaration are different ; but GCC does not even warn and passes the compilation

I was reviewing a particular code snippet in which function is declared as
int fn_xyz()
but while referencing the function in another .c file, it is defined as :
extern void fn_xyz()
while fn_xyz is called, there is no check for the return values ; GCC-4.7.0 never warned on the above mismatch ; is this expected ?
Each source file (technically, each translation unit) is compiled completely independently of the others. So the compiler never knows that you've declared the same symbol in multiple places. At link time, all type information has been removed, so the linker can't complain either.
This is precisely why you should declare functions in a header that all source files then include. That way, type mismatches will trigger a compiler warning/error.
Since the linking stage happens after compilation (and the compiler doesn't know or care where you link to; like linking to a shared libary), it makes sense that such a test would not be expected of a compiler.

Resources