Does accesing .text segment via `extern` variables cause undefined-behaviour? - c

This is file "1.c"
#include <stdio.h>
char foo;
int bar(){
}
int main(){
printf("%d",foo);
return 0;
}
//--------------------------
This is file '2.c'
void foo(){
}
Compiler invoked as gcc 1.c 2.c
Does the above gives an undefined behaviour? My guess is, yes. Otherwise it's almost impossible to do optimization.
Multiple different definitions for the same entity (class, template, enumeration, inline function, static member function, etc.) [What are all the common undefined behaviour that a C++ programmer should know about?
But as far as I know char foo produce only a weak symbol that can be overridden by void foo(){} at linkage. Furthermore, if I change char foo into extern char foo, is that still a definition?

It'd cause undefined behaviour, yes. There are probably lots of quotes from the standard that are explicit for the various types of declarations and so forth, but this sums it up:
In the set of translation units and libraries that constitutes an entire program, each
declaration of a particular identifier with external linkage denotes the same object or
function. (6.2.2)
All declarations that refer to the same object or function shall have compatible type;
otherwise, the behavior is undefined. (6.2.7)
char foo; in your example is a tentative definition. If you use extern, it would not be a definition, but it'd still be a declaration, and the above would still apply.

Related

why no need to add `extern` for external functions?

below is my code:
//main.c
//I'm not using header file here,I know it is bad practice, it is just for demo purpose.
int main()
{
func();
return 0;
}
//test.c
void func()
{
...
}
we can see that above code compiles and can be linked by linker, but the same thing doesn't apply to variables as:
//main.c
int main()
{
sum += 1;
return 0;
}
//test.c
int sum = 2020;
then this code won't compile and can't be linked, and we have to add extern int sum; before main function in main.c.
But why we don't need to add extern in main.c as:
//main.c
extern void func(); //or `void func();` since functions are by default external
// without above line, it still compile
int main()
{
func();
return 0;
}
is it a little bit inconsistent here?
Note: by saying " Functions are by default external.",my understanding is: we can save some keystokes without typing extern , so void func(); == extern void func();, but we still need to add void func(); before main function in main.c, isn't it?
Both programs are incorrect since C99 and may be rejected by the compiler. An identifier may not be used in an expression without previously being declared.
In C89 there was a rule that if you write something that resembles a function call, and the function name has not previously been declared, then the compiler inserts a function declaration int f(); . There was not a similar rule for use of other identifiers that aren't followed by parentheses.
Some compilers (depending on compiler flags) will, even if set to C99 or later mode, issue a diagnostic and then perform the C89 behaviour anyway.
Note: your program still causes undefined behaviour in C89 because the implicit declaration is int func(); but the function definition has void func() which is incompatible type.
The compiler doesn't need to know anything about a function, in order to generate code to call it. In the absence of a prototype, it might generate the wrong code, but it can generate something (in principle, at least -- standards-compliance might forbid it by default). The compiler knows the calling convention for the platform -- it knows to put the function arguments onto the stack or into registers as required. It knows to write a symbol that the linker can later find and fix up, and so on.
But when you write "sum++", the compiler has no clue, lacking a declaration, how to generate code for that. It doesn't even know what kind of thing "sum" is. The code needed to increment a floating-point number will be completely different to that needed to increment an integer, and may be different from that needed to increment a pointer. The compiler doesn't need to know where "sum" is -- that's the linker's job -- but it needs to know what it is, to produce meaningful machine code.
But we don't need to add extern for the function in main.c as extern void func(); or void func();(as functions are implicitly extern prefixed) and the code still compile?
That's correct. Functions are by default external.
To make functions specific to a local source file (translation unit), you need to specific static for them.
Variables, on the other hand, are visible in the source file only. If you want to make some variable visible outside the source file where it is defined, you need extern for it.
There are two completely different topics - function prototypes and linkage.
void foo(void);
provides the extern function prototype needed by compiler to know the number and type of parameters and the type of the return value. Function has an external linkage - ie can be accessed by other compilation units
static void foo(void);
provides the static function prototype. Function has an no external linkage - ie it cannot be accessed by other compilation units
By default functions have an external linkage.
Objects (global scope).
int x;
Defines the object x having the external linkage and type int.
If you define another x object in another compilation unit the linker will complain and emit an error.
extern int x;
Only declares the object x without defining it. The object x has to be defined in other compilation unit.

Rationale of static declaration followed by non-static declaration allowed but not vice versa

This code will compile and is well defined under current C standards:
static int foo(int);
extern int foo(int);
The standard specifies that in this situation (C11: 6.2.2 Linkages of identifiers (p4)):
For an identifier declared with the storage-class specifier extern in
a scope in which a prior declaration of that identifier is visible,31)
if the prior declaration specifies internal or external linkage, the
linkage of the identifier at the later declaration is the same as the
linkage specified at the prior declaration. [...]
... which means that the int foo(int) function is declared static int foo(int).
Swapping these declarations around like this:
extern int foo(int);
static int foo(int);
...gives me a compiler error using GNU GCC:
static declaration of 'foo' follows non-static declaration
My question is: What is the design rationale behind the second case being an error and not handled in a similar way as the first case? I suspect it has something to do with the fact that separate translation units are easier to manage and #include? I feel as though without understanding this, I can open myself up to some mistakes in future C projects.
I think the idea of this confusing specification is that an extern declaration can be used inside a function to refer to a global function or object, e.g to disambiguate it from another identifier with the same name
static double a; // a declaration and definition
void func(void) {
unsigned a;
.....
if (something) {
extern double a; // refers to the file scope object
}
}
Whereas if you use static you declare something new:
extern double a; // just a declaration, not a definition
// may reside elsewhere
void func(void) {
unsigned a;
.....
if (something) {
static double a; // declares and defines a new object
}
}
I can imagine that the scenario leading to this asymmetry is that the default linkage of identifiers at global scope is extern.1 Failing to mark the definition of a previously static-declared function static as well would otherwise be an error because it is by default also an extern declaration.
Here is an illustration. In modern C, functions must be declared before use, but sometimes the implementation is at the end because it is secondary to the main purpose of the code in the file:
static void helper_func(); // typically not in a header
// code using helper_func()
// And eventually its definition, which by default
// declares an **external** function. Adding
// an explicit `extern` would not change a thing; it's redundant.
void helper_func() { /* ... */ }
This looks innocent enough, and the intent is clear. When C was standardized there was probably code around looking like this. That's why it is allowed.
Now consider the opposite:
extern void func(); // this could be in a header
// ... intervening code, perhaps a different file ...
static void func() { /* ... */ }
// code which uses func()
It's pretty clear that this should not be allowed. Defining a function static is a clear, explicit statement. A prior, contradicting extern declaration does not make sense.2 There is a good chance that this is an inadvertent name collision, for example with a function declared in a header. Probably not much code out there was looking like this at the time of formalization. That's why it is forbidden.
1 From the C17 draft, 6.2.2/5: "If the declaration of an identifier for a function has no storage-class specifier, its linkage is determined exactly as if it were declared with the storage-class specifier extern."
2 One could argue that an explicit extern declaration using the keyword, followed by a static definition, should be forbidden while an implicit one, without the keyword, could still be allowed (and that, correspondingly, a later explicitly extern function definition after a static declaration should be forbidden, while implicit ones are still allowed). But there is a limit to the hair-splitting a standard should do (and it's gone pretty far already).

Extern makes no difference

I am defining a global variable in test2.h
#ifndef TEST2_H
#define TEST2_H
int test_var;
void use_it(void);
#endif
and defining it again in two different files, test.c
#include <stdio.h>
#include "test2.h"
int test_var;
int main() {
printf("The test_var is: %d\n", ++test_var); // prints 1
use_it(); // prints 2
}
and test2.c
#include <stdio.h>
#include "test2.h"
int test_var;
void use_it() {
printf("The test_var is: %d", ++test_var);
}
I replaced the definition of test_var with extern int test_var and got the same result. That is, in both cases both files, test.c and test2.c have access to the global variable test_var. I was under the impression that without extern, each file would have their own copy of test_var. Observation suggests that this is not the case. So when does extern actually do something?
You end up with two copies of test_var and this is undefined behavior.
(C99, 6.9p5) "If an identifier declared with external linkage is used in an expression (other than as part of the operandof a sizeof operator whose result is an integer constant),
somewhere in the entire program there shall be exactly one external
definition for the identifier; otherwise, there shall be no more than one"
In your case the linker may be nice with you and merges the two symbols but this is still not portable and is undefined behavior. If you are using the GNU linker, you can use --warn-common to get the warning (and --fatal-warnings if you want an error).
To fix your issue, put the extern specifier in the declaration of test_var in the .h file and remove one of the definition of test_var (for example the one in test.c file).
This is undefined behavior, as others have noted, but what you are seeing here is the common extension described in appendix J.5.11 of the C99 spec, where multiple external definitions in different compilation units are allowed as long as none or only one of them are initialized and the types of all of them are the same.
In this case, with the extension, the definitions will be combined into a single definition at link time.
You also appear to be confused by the fact that the extern keyword, when used at the global scope, has nothing to do with extern linkage for declarations and definitions. ALL declarations at the global scope have extern linkage unless they have a static or inline keyword. The extern keyword serves to make such a declaration just a declaration. Without the extern keyword a global variable declaration is also a definition, and that is the only effect of the extern keyword in the global scope.
If you have the same variable declared in 2 diferent files as int test_var for example:
file1.c
int test_var;
file2.c
int test_var;
both variables will have their own memory adress, so they are two diferent variables with the same name.
if you have, two variables declared in 2 diferent files declared as extern int test_var, for example:
file1.c
extern int test_var; //this is a mistake
file2.c
extern int test_var; //this is a mistake
the compiler will return an error when you try to do something with that variables because with the keyword externyou are not reserving any space for that variable, you only use that keyword to say that a variable is already defined (commonly in another file).
The point is to unsderstand that a global variable is defined once with a sentence like int test_var (when you define a variable the compiler reserve space for it) and it's declared in every other file that need access to it with extern int test_var (when you declare a variable with the extern keyword you saying the compiler that variable is already defined and you want to have access to it in the file you are declaring it).
So an example of how to use a global variable wil be:
file1.c
int test_var; //definition
void useit(void);
int main () {
test_var=7;
useit();
return 0;
}
file2.c
#include <stdio.h>
void useit (void) {
extern int test_var; //declaration
printf ("the variable value is %d",test_var);
}
To answer your question:
extern int test_var; is a declaration. This announces that "Somewhere, there should exist test_var . We don't know where that is yet, but by the time we finish compiling and linking, we will find it in exactly one place".
So there has to be exactly one definition to match. A definition serves as a declaration, and also says "Here is the storage for test_var".
Also, test_var could either have internal linkage or external linkage. The default behaviour is external linkage. If you provide more than one definition for a variable of external linkage, it is undefined behaviour.
Internal linkage is indicated by including static in the declaration. You can have as many definitions as you want of a static variable (so long as only one per file has an initializer).
Summing up, we have:
extern int test_var; // declaration, external linkage
static int test_var; // declaration, definition, internal linkage
int test_var; // declaration, definition, external linkage
Note: the last two cases are actually tentative definitions: this is a thing that C has but C++ doesn't; the way it works is that it behaves like a declaration at first; but then , for each unit, if there is no later definition then this actually serves as a definition.
So you can write in C:
int test_var;
// stuff
int test_var = 5;
If you are using gcc and possibly some other compilers, you just stumbled upon some Unix tradition. Namely that uninitialized global variables are placed in the common block where multiple definitions of the same variable are merged during linking.
gcc can be told to put uninitialized global variables into the data section with the option -fno-common. With this, the linker will report an error when there are multiple definitions of the same variable name.

C - In what circumstances does the external declaration become a definition?

From C99 standard 6.2.3:
If the declaration of an identifier of an object has file scope and no storage-class specifier, its linkage is external.
and 6.7
A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that:
— for an object, causes storage to be reserved for that object;
— for a function, includes the function body;99)
— for an enumeration constant or typedef name, is the (only) declaration of the identifier.
Unfortunately, I haven't found any further description on when the compiler shall regard the external declaration as a definition (which means the type must be complete and storage size is calculated).
So I did some experiments. First I noticed that:
struct A a;
int main() {
}
Is invalid, gcc says the type A is incomplete and it doesn't know how to allocate the storage for a.
However, interestingly, we have the following valid code:
struct A a;
int main() {
}
struct A {int x;};
It's also reasonable since type A is completed at the end of the file. From two examples above, we can deduce that external declaration is checked at the end of file scope. (Still don't know where does the standard say about this)
However, array declaration is exceptional. The modified code is not longer valid:
struct A a[1];
int main() {
}
struct A {int x;};
And C99 standard does talk about this, it says elements of an array must be of completed type. So question comes about: is struct A a[1] a definition or a declaration? Don't be hasty to answer it. Check the following examples.
Here we have two files: a.c and b.c. In a.c:
#include <stdio.h>
int arr[10];
void a_arr_info() {
printf("%lu at %lx\n", sizeof arr, (size_t)arr);
}
while in b.c:
#include <stdio.h>
int arr[20];
void b_arr_info() {
printf("%lu at %lx\n", sizeof arr, (size_t)arr);
}
int main() {
a_arr_info();
b_arr_info();
}
The result is astonishing. The output shows that arr in both files refers to the same address. Which can be understood because arr are both in file scope, thus they're external linkage. The problem is, they have different size. In what file did the compiler take the declaration as definition and allocate the memory?
Why do I ask about this? Because, um, I'm working on a simplified C compiler project (course homework). So it might be important for me to figure it out. Although the homework does not go as far as this, I'm quite curious and would like to know more. Thanks!
It is called a tentative definition
A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
So any compilation unit (.o file) that has such a tentative definition realizes the object. Linking two such units together has undefined behavior, you should usually encounter a "multiply defined symbol" error. Some compiler/linkers just do it, you have to ensure that such symbols have same size and type.

C the same global variable defined in different files

I am reading this code from here(in Chinese). There is one piece of code about testing global variable in C. The variable a has been defined in the file t.h which has been included twice. In file foo.c defined a struct b with some value and a main function. In main.c file, defined two variables without initialized.
/* t.h */
#ifndef _H_
#define _H_
int a;
#endif
/* foo.c */
#include <stdio.h>
#include "t.h"
struct {
char a;
int b;
} b = { 2, 4 };
int main();
void foo()
{
printf("foo:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\tsizeof(b)=%d\n\tb.a=%d\n\tb.b=%d\n\tmain:0x%08x\n",
&a, &b, sizeof b, b.a, b.b, main);
}
/* main.c */
#include <stdio.h>
#include "t.h"
int b;
int c;
int main()
{
foo();
printf("main:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\t(&c)=0x%08x\n\tsize(b)=%d\n\tb=%d\n\tc=%d\n",
&a, &b, &c, sizeof b, b, c);
return 0;
}
After using Ubuntu GCC 4.4.3 compiling, the result is like this below:
foo: (&a)=0x0804a024
(&b)=0x0804a014
sizeof(b)=8
b.a=2
b.b=4
main:0x080483e4
main: (&a)=0x0804a024
(&b)=0x0804a014
(&c)=0x0804a028
size(b)=4
b=2
c=0
Variable a and b has the same address in two function, but the size of b has changed. I can't understand how it worked!
You are violating C's "one definition rule", and the result is undefined behavior. The "one definition rule" is not formally stated in the standard as such. We are looking at objects in different source files (aka, translation units), so we concerned with "external definitions". The "one external definition" semantic is spelled out (C11 6.9 p5):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
Which basically means you are only allowed to define an object at most once. (The otherwise clause allows you to not define an external object at all if it is never used anywhere in the program.)
Note that you have two external definitions for b. One is the structure that you initialize in foo.c, and the other is the tentative definition in main.c, (C11 6.9.2 p1-2):
If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So you have multiple definitions of b. However, there is another error, in that you have defined b with different types. First note that multiple declarations to the same object with external linkage is allowed. However, when the same name is used in two different source files, that name refers to the same object (C11 6.2.2 p2):
In the set of translation units and libraries that constitutes an entire program, each
declaration of a particular identifier with external linkage denotes the same object or
function.
C puts a strict limitation on declarations to the same object (C11 6.2.7 p2):
All declarations that refer to the same object or function shall have compatible type;
otherwise, the behavior is undefined.
Since the types for b in each of your source files do not actually match, the behavior is undefined. (What constitutes a compatible type is described in detail in all of C11 6.2.7, but it basically boils down to being that the types have to match.)
So you have two failings for b:
Multiple definitions.
Multiple declarations with incompatible types.
Technically, your declaration of int a in both of your source files also violates the "one definition rule". Note that a has external linkage (C11 6.2.2 p5):
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
But, from the quote from C11 6.9.2 earlier, those int a tentative definitions are external definitions, and you are only allowed one of those from the quote from C11 6.9 at the top.
The usual disclaimers apply for undefined behavior. Anything can happen, and that would include the behavior you observed.
A common extension to C is to allow multiple external definitions, and is described in the C standard in the informative Annex J.5 (C11 J.5.11):
There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
(Emphasis is mine.) Since the definitions for a agree, there is no harm there, but the definitions for b do not agree. This extension explains why your compiler does not complain about the presence of multiple definitions. From the quote of C11 6.2.2, the linker will attempt to reconcile the multiple references to the same object.
Linkers typically use one of two models for reconciling multiple definitions of the same symbol in multiple translation units. These are the "Common Model" and the "Ref/Def Model". In the "Common Model", multiple objects with the same name are folded into a single object in a union style manner so that the object takes on the size of the largest definition. In the "Ref/Def Model", each external name must have exactly one definition.
The GNU toolchain uses the "Common Model" by default, and a "Relaxed Ref/Def Model", where it enforces a strictly one definition rule for a single translation unit, but does not complain about violations across multiple translation units.
The "Common Model" can be suppressed in the GNU compiler by using the -fno-common option. When I tested this on my system, it caused "Strict Ref/Def Model" behavior for code similar to yours:
$ cat a.c
#include <stdio.h>
int a;
struct { char a; int b; } b = { 2, 4 };
void foo () { printf("%zu\n", sizeof(b)); }
$ cat b.c
#include <stdio.h>
extern void foo();
int a, b;
int main () { printf("%zu\n", sizeof(b)); foo(); }
$ gcc -fno-common a.c b.c
/tmp/ccd4fSOL.o:(.bss+0x0): multiple definition of `a'
/tmp/ccMoQ72v.o:(.bss+0x0): first defined here
/tmp/ccd4fSOL.o:(.bss+0x4): multiple definition of `b'
/tmp/ccMoQ72v.o:(.data+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `b' changed from 8 in /tmp/ccMoQ72v.o to 4 in /tmp/ccd4fSOL.o
collect2: ld returned 1 exit status
$
I personally feel the last warning issued by the linker should always be provided regardless of the resolution model for multiple object definitions, but that is neither here nor there.
References:
Unfortunately, I can't give you the link to my copy of the C11 Standard
What are extern variables in C?
The "Beginner's Guide to Linkers"
SAS Documentation on External Variable Models
Formally, it is illegal to define the same variable (or function) with external linkage more than once. So, from the formal point of view the behavior of your program is undefined.
Practically, allowing multiple definitions of the same variable with external linkage is a popular compiler extension (a common extension, mentioned as such in the language specification). However, in order to be used properly, each definition shall declare it with the same type. And no more than one definition shall include initializer.
Your case does not match the common extension description. Your code compiles as a side effect of that common extension, but its behavior is still undefined.
The piece of code seems to break the one-definition rule on purpose. It will invoke undefined behavior, don't do that.
About the global variable a: don't put definition of a global variable in a header file, since it will be included in multiple .c files, and leads to multiple definition. Just put declarations in the header and put the definition in one of the .c files.
In t.h:
extern int a;
In foo.c
int a;
About the global variable b: don't define it multiple times, use static to limit the variable in a file.
In foo.c:
static struct {
char a;
int b;
} b = { 2, 4 };
In main.c
static int b;
b has the same address because the linker decided to resolve the conflict for you.
sizeof shows different values because sizeof is evaluated at compile time. At this stage, the compiler only knows about one b (the one defined in the current file).
At the time foo is being compiled, the b that is in scope is the two ints vector {2, 4} or 8 bytes when an sizeof(int) is 4.
When main is compiled, b has just been redeclared as an int so a size of 4 makes sense. Also there is probably "padding bytes" added to the struct after "a" such that the next slot (the int) is aligned on 4 bytes boundary.
a and b have the same addresses because they occur at the same points in the file. The fact that b is a different size doesn't matter where the variable begins. If you added a variable c between a and b in one of the files, the address of the bs would differ.

Resources