c++ Forward declaration, translation unit, linking

c++ Forward declaration, translation unit, linking - linker

On this site
I read:
class MyClass;
simply states that "there is such a class" and its full definition will be "coming later" (either in the current file, at compile time, or from some other file at link time)
I'm not sure If I understand this process at the link time. I wrote the code below that should demonstrates it. Please if I'm wrong, correct me. I'm not sure how forward declaration at link time works.
//first.h
-----------
class Second;
class First{
public:
Second* ptr;
First();
};
//first.cpp
-----------
#include "first.h"
extern Second second;
First::First(){ptr = &second;}
//second.h
----------
class Second{
public:
Second(){};
};
//main.cpp
----------
#include "second.h"
Second second;
int main(int argc, char *argv[])
{
return 0;
}
This code is compiled. If the line Second second; is commented, linker throws: undefined reference to 'second'.
Some comment putting together 1) forward declaration 2) compilation unit 3) linking might be helpful.

I think the documentation you've read has mislead you by its laxity:
class MyClass;
doesn't exactly mean there is such a class, because the
only way to make a class exist is to define it, and a declaration is
not a definition. The declaration would be better read as: Assume there is such a class.
And it doesn't mean that full definition of the class will, or will not, be coming later. It's
full definition might need to come later for successful compilation. Or
not. And if the full class definition does need to come later,
it will need to come for successful compilation; therefore at compiletime, not linktime.
The undefined reference linkage error that you are able to provoke
by commenting out Second second; in main.cpp is simply a
plain old undefined reference error such as you'll always get
be trying to link a program in which a variable declared extern
is referenced somewhere and defined nowhere. It has no essential
connection with the extern variable being of class type - rather
than, say, int - or with the business of forward class declaration.
Forward declaration of classes is only ever necessary to preempt
a deadlock when the compiler attempts to parse the definitions of
of two classes that are interdependent and is unable to complete
either class definition before it completes the other one.
An elementary example: I naively write two classes first and second, of which
each has a method that uses an object of the other class and calls
one of its methods:
first.h
#ifndef FIRST_H
#define FIRST_H
#include <string>
#include <iostream>
#include "second.h"
struct first {
std::string get_type() const {
return "First";
}
void use_a_second(second const & second) const {
std::cout << second.get_type() << std::endl;
}
};
#endif
second.h
#ifndef SECOND_H
#define SECOND_H
#include <string>
#include <iostream>
#include "first.h"
struct second {
std::string get_type() const {
return "First";
}
void use_a_first(first const & first) const {
std::cout << first.get_type() << std::endl;
}
};
#endif
main.cpp
#include "first.h"
#include "second.h"
int main()
{
first f;
second s;
f.use_a_second(s);
s.use_a_first(f);
return 0;
}
Try to compile main.cpp:
$ g++ -c -o main.o -Wall -Wextra -pedantic main.cpp
In file included from first.h:6:0,
from main.cpp:1:
second.h:13:19: error: ‘first’ has not been declared
void use_a_first(first const & first) const {
^~~~~
second.h: In member function ‘void second::use_a_first(const int&) const’:
second.h:14:22: error: request for member ‘get_type’ in ‘first’, which is of non-class type ‘const int’
std::cout << first.get_type() << std::endl;
^~~~~~~~
main.cpp: In function ‘int main()’:
main.cpp:9:8: error: expected unqualified-id before ‘.’ token
second.use_a_first(first);
The compiler is stymied, because first.h includes second.h, and
vice versa, so it can't get the definition of first before it
gets the definition of second, which requires the definition of first...
and vice versa.
A forward declaration of each class before the definition of the
other one, and a correspending refactoring of each class into
a definition and an implementation, gets us out of this deadly embrace:
first.h (fixed)
#ifndef FIRST_H
#define FIRST_H
#include <string>
struct second; // Declaration
struct first{
std::string get_type() const {
return "first";
}
void use_a_second(second const & second) const;
};
#endif
second.h (fixed)
#ifndef SECOND_H
#define SECOND_H
#include <string>
struct first; //Declaration
struct second{
std::string get_type() const {
return "second";
}
void use_a_first(first const & first) const;
};
#endif
first.cpp (new)
#include <iostream>
#include "first.h"
#include "second.h"
void first::use_a_second(second const & second) const {
std::cout << second.get_type() << std::endl;
}
second.cpp (new)
#include <iostream>
#include "first.h"
#include "second.h"
void second::use_a_first(first const & first) const {
std::cout << first.get_type() << std::endl;
}
Compile:
$ g++ -c -o first.o -Wall -Wextra -pedantic first.cpp
$ g++ -c -o second.o -Wall -Wextra -pedantic second.cpp
$ g++ -c -o main.o -Wall -Wextra -pedantic main.cpp
Link:
$ g++ -o prog main.o first.o second.o
Run:
$ ./prog
second
first
This is the only scenario for which forward class declaration is
needed. It can be used in wider circumstances: see When can I use a forward declaration?. The need is only every a need
for successful compilation, not linkage. Linkage can't be attempted till
compilation succeeds.
The documentation snippet is also misleadingly imprecise in the use of the word definition. The
definition of a class means one thing in the context of compilation and that's
what it should mean in the interest of clarity. It means something else, loosely,
in the context of linkage and it shouldn't mean that in the interest of clarity.
In the context of linkage, we'd better only talk about the implementation of
a class - and even that is a notion that begs for qualification.
As far as the compiler is concerned a class is defined if it gets from
the start to the end of:
class foo ... {
...
};
without error, and then the class definition is the contents of that span. A complete definition
does not mean, of course, that a class has a complete implementation. It
only has that if, in addition to a complete definition, all the methods and
static members that are declared in its definition are also themselves defined somewhere, either
in-line within the class definition; out-of-line in a containing translation
unit, or in other translation units (possibly compiled in external
libraries) with which the compiled containing translation unit gets linked.
If any of those member definitions are not provided in one of those ways
come link-time, an unresolved reference linkage error will result. That
is a deficit of the class implementation.
The linker's idea of definition is different from the C++
compiler's and more elementary. From the linker's point of view,
a C++ class doesn't actually exist. For the linker, the class implementation is boiled down, by the compiler,
to a bunch of symbols and symbol definitions not essentially different from what it gets
from any language compiler, whether or not the language deals in classes at all.
What matters to the linker, for success, is that all the symbols that are referenced in the output binary
have definitions either in the same binary or in dynamic libraries requested
in the linkage. A symbol (broadly) can identify some executable code or some data.
For a code symbol, definition means implementation to the linker: the definition is the represented code, if any.
For a data symbol, definition means value to the linker: it means the represented data, if any.
So when the snippet says:
.. and its full definition will be "coming later" (either in the current file, at compile time, or from some other file at link time)
this needs to picked apart.
The full definition of class foo must be come later in the compilation of
a translation unit, before type foo is required as the type of anything else,
specifically, the type of a base class, or function/method argument, or object1.
If this requirement is not satisfied a compile error will result:-
A class cannot be fully defined if any base class is not fully defined.
A function or method cannot be fully defined if it has an argument of a type
that is not fully defined.
An object cannot exist of any type that is not fully defined.
If foo is never required later to be the type of a base class, argument or object,
then the definition of class foo need never follow the declaration.
The full implementation of class foo may or may not be required, or
provided, by the linkage. Since the linker doesn't know about classes,
it doesn't know any distinction between a full implementation of a class from an incomplete one.
You can change class first, above, by adding a method that has no implementation:
struct first{
std::string get_type() const {
return "first";
}
void use_a_second(second const & second) const;
void unused();
};
and the program will compile, link and run just the same. Since the
compiler emits no definition of void first::unused(), and since
the program does not attempt to invoke void first::unused() on
any object of type first, or to use its address, no mention of
void first::unused() appears in the linkage at all. If
we change main.cpp to:
#include "first.h"
#include "second.h"
int main()
{
first f;
second s;
f.use_a_second(s);
s.use_a_first(f);
f.unused();
return 0;
}
Then the linker will find a call to void first::unused() in main.o
and of course give an unresolved reference error. But this just
means that the linkage fails to provide an implementation that the
program needs. It doesn't mean that the class definition of
first is incomplete. If it was, compilation of main.cpp would have
failed, and no linkage would have been attempted.
Takeway:-
Forward class declaration can avert compiletime deadlock of
mutually dependent class definitions, with consequential refactoring.
A forward class declaration can't avert an unresolved reference linkage
error. Such an error always means that the implementation of
a code symbol, or the value of a data symbol, is needed by the program
and not provided by the linkage. A class declaration cannot add either
one of those things to the linkage. It adds nothing to the linkage. It
just directs the compiler to tolerate foo in contexts
where where it is necessary and sufficient for foo to be a class-name.
Linkage cannot provide any part of a class definition at linktime
if, after a forward class declaration, the class definition becomes
required, because a complete class definition will be required at compiletime or
not at all. Linkage cannot provide parts of a class definition at all;
only elements of the class implementation.
[1] To be clear:
class foo;
foo & bar();
...
foo * pfoo;
...
foo & rfoo = bar();
can compile, with merely the declaration of class foo, because neither
foo * pfoo or foo & rfoo requires an object of type foo to exist:
a pointer-to-foo, or reference-to-foo, is not a foo,
But:
class foo;
...
foo f; // Error
...
foo * pfoo;
...
pfoo->method(); // Error
can't compile, because f must be a foo, and the object addressed by pfoo
must exist, and therefore be a foo, if any method is invoked through that
object.

Related

At what point during compilation or linking of C code are extern variables implicitly defined?

If I have a project with the following 3 files in the same directory:
mylib.h:
int some_global;
void set_some_global(int value);
mylib.c:
#include "mylib.h"
void set_some_global(int value)
{
some_global = value;
}
main.c:
#include <stdio.h>
#include "mylib.h"
int main()
{
set_some_global(42);
printf("Some global: %d\n", some_global);
return 0;
}
and I compile with
gcc main.c mylib.c -o prog -Wall -Wpedantic
I get no errors or warnings, and the prog program prints 42 to the console.
When I first tried this, I expected there to be a "multiple definition" error or some kind of warning since some_global is not declared extern in the header file. Upon researching this issue, I discovered that in C the extern is implicit on variable declarations outside of functions (and also that the opposite is true for C++, which can be demonstrated by using g++ instead of gcc in the compilation line above). Also, if I change the line in mylib.h from a declaration to a definition (e.g. int some_global = 1;), I do get the "multiple definition" error that I expected (this is nothing shocking).
My main question is: where is the variable being defined? It appears to be implicitly defined somewhere, but at what point does either the compiler or linker realize it needs that variable defined and does so?
Also, why is it that if I explicitly declare the variable as extern in the mylib.h file, I get "undefined reference" errors unless I explicitly declare the variable in one and only one *.c? I would expect that given the reason why the code above works (that extern is implicit), that explicitly declaring extern wouldn't make a difference. Why is there a difference in behavior?
Follow up
After the answer below corrected me that the code in mylib.h is a "tentative definition" rather than a declaration, I discovered this related answer with more details on such matters:
https://stackoverflow.com/a/3095957/7007605

Your code compiles and links without error only because you use gcc which was compiled with -fcommon command line option "The -fcommon places uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object, or to a non-tentative definition. (...) It is mainly useful to enable legacy code to link without errors." This was default prior to version 10, but even now many toolchains are still build with this option enabled.
Never define data in the header files. Place only extern definitions of the variables in the header files.
It should be:
extern int some_global;
void set_some_global(int value);
mylib.c:
#include "mylib.h"
int some_global;
void set_some_global(int value)
{
some_global = value;
}
main.c:
#include <stdio.h>
#include "mylib.h"
int main()
{
set_some_global(42);
printf("Some global: %d\n", some_global);
return 0;
}

int some_global; is a tentative definition. In GCC before version 10, GCC produced an object file treating this as a common symbol. (This behavior is still selectable by a switch, -fcommon.) The linker coalesces multiple definitions of a common symbol to a single definition.

how can I force the source file implement definition for a header file

below is the code:
//test.h
...
extern int globalVariable;
...
//test.c
#include "test.h"
...
int globalVariable = 2020;
...
//main.c
#include <stdio.h>
#include "test.h"
int main()
{
printf("Value is %d", globalVariable);
}
let's say in a scenario, there are hundreds of variables are declared in test.h and globalVariable is just one of them.
since there are two many variables, I easily makes a typo error in test.c as:
#include "test.h"
int globalVariables = 2020; //extra 's' in the name which contradicts the declaration of its counterpart in test.h
if I compile(only compile,not linking them) test.c, test.h and main.c, it compiles and shows no error. the unresolved error will only occur when linker involved in the linking stage.
But in a large application, I might just write some modules without the need of linking all existing to an executable file, so it would be better the compiler throw an error in the compile stage to indicate the error so I can correct them asap, so how can I let the compiler force the source file implement definition for a header file?

You could also use the preprocessor
test.h:
#ifndef TEST_C_IMPLEMENTATION
#define DEFINE_AND_INIT_VARIABLE(type, name, value) \
extern type name;
#else
#define DEFINE_AND_INIT_VARIABLE(type, name, value) \
type name = value;
#endif
DEFINE_AND_INIT_VARIABLE(int, globalVariable, 2020);
test.c:
#define TEST_C_IMPLEMENTATION
#include "test.h"
This technique can be taken even further - there are small utility libraries that are shipped as a single include file; you're just to set a macro in one of the translation units to force the implementation to be compiled in there.

The declaration extern int globalVariable; says that the variable exists somewhere, but not necessarily in the current translation unit. So any source file that includes the header containing this declaration will know that the variable exists without needing the full definition.
When you then get to the linking stage is when you'll get the error regarding glovalVariable being undefined. Since the variables is declared in test.h, convention would dictate that the definition would be in test.c. Upon inspecting that file, you would then find that no such variable exists and could then either add it or find the typo and fix it.

Linkage and static function confusion

I read that
A function with internal linkage is only visible to one compilation
unit. (...) A function declared static has internal linkage
For .c files it sorta makes sense, but I was wondering what happens with static functions in headers, which get included by multiple .c files but usually have an include guard.
I was reading this answer about static functions in headers, and the first item mentions that it doesn't create a symbol with external linkage and the second item mentions the function is available purely through the header file. Isn't that contradictory? How can the function be available and at the same time have no external symbol? So I did a little test:
/* 1.h */
#ifndef ONE_H
#define ONE_H
#include <stdio.h>
static void foo() {
printf("foo from 1.h %p\n", foo);
return;
}
void bar();
#endif
/* 1.c */
#include "1.h"
#include <stdio.h>
void bar() {
printf("foo,bar from 1.c %p,%p\n", foo, bar);
foo();
}
/* 2.c */
#include "1.h"
#include <stdio.h>
int main() {
printf("foo,bar from main %p,%p\n", foo, bar);
foo();
bar();
return 0;
}
...
debian#pc:~ gcc 2.c 1.c
debian#pc:~ ./a.out
foo,bar from main 0x400506,0x400574
foo from 1.h 0x400506
foo,bar from 1.c 0x400559,0x400574
foo from 1.h 0x400559
As expected bar is the same across all files, but shouldn't foo be too? Isn't 1.h included only once? Adding inline to foo resulted in the same behavior. I'm kinda lost.

Read here, how a header file basically works. That should clarify about your actual question.
Briefly: A header file is just inserted instead of the corresponding #include directive. So any declarations or definitions are treated by the compiler as if you actually copy/pasted the text into your file.
Anyway, you should be careful with function definitions in a header. This is deemed bad style in general. It blows the code for instance, creating redundant copies of that function. Also Function pointers cannot be compared (you never know ...). It is often better to bundle the functions into a library with just the declarations in a header (non-static then: external linkage). There are good justifications sometimes, however (no rule without exceptions). One of them are inline functions.

-static functions are functions that are only visible to other functions in the same file (more precisely the same translation unit).
Check this article for a detailed explanation on linkage: http://publications.gbdirect.co.uk/c_book/chapter4/linkage.html

Definition of a function in a C program [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In a C program, where do you define a function?
Why?
I suppose that the function definition is generally written outside the main function and after the function declaration. It's correct? Why?
Thank you all!

You have to define a function outside main(), because main() is a function itself and nested functions are not supported in C.
Declaring a function is in modern C not necessary, because a function definition implies a function declaration. There are still two reasons to do it:
A function declaration can be exported in a header file and then used by other translation units that import the header file.
C is usually translated one-pass that means that you cannot use a function before it is declared without warning. If you have a function a() calling a function b()and vice versa, you cannot define both functions before declaring at least one.

The only real requirement is that a function be declared before it is first called in a statement, and that it be defined somewhere before everything is linked together (either in another source file that gets translated, or in a previously translated object file or library).
If your program is small and you have everything in a single source file, my recommended practice is to define the function before it is used, like so:
void foo( void )
{
// body of foo
}
void bar( void )
{
...
foo();
...
}
int main( void )
{
...
bar();
...
}
The function definition also serves as a declaration (specifying the return type as well as the number and types of parameters to the function). You could put the definitions after main, but you will still need to declare them before they're called:
int main( void )
{
void bar( void );
...
bar();
...
}
void bar( void )
{
void foo( void );
...
foo();
...
}
void foo ( void )
{
// body of foo
}
You don't have to declare foo within the body of bar, or bar within the body of main; you could declare them both before `main:
void foo( void );
void bar( void );
int main( void )
{
...
bar();
...
}
void bar( void )
{
...
foo();
...
}
The only problem with this style is that if you change the return type of the function or change any of the parameters, you have to chase down any declarations and change them as well. The first way (defining before use) reads "backwards", but it's less of a maintenance headache.
If your program is divided up among multiple source files, the usual practice is to create a separate header file for each source file, and #include that header in any other source file that uses those functions, like so:
/**
* foo.h - Header file for function foo
*/
#ifndef FOO_H // Include guards; prevents the file from being processed
#define FOO_H // more than once within the same translation unit
void foo( void ); // declaration of foo
#endif
/**
* foo.c - Source file for function foo
*/
#include "foo.h"
...
void foo( void ) // definition of foo
{
// body of foo
}
/**
* bar.h - Header file for bar.h
*/
#ifndef BAR_H
#define BAR_H
void bar( void ); // declaration of bar
#endif
/**
* bar.c - Source file for bar.h
*/
#include "foo.h" // necessary because bar calls foo
void bar( void )
{
...
foo();
}
/**
* main.c - Source file for main
*/
#include "bar.h" // necessary because main includes bar
int main( void )
{
...
bar();
}
Note that the header files only contain the declarations of foo and bar, not their actual code. In order for this to work, both foo.c and bar.c must be compiled along with main.c, and the resulting object files must all be linked together. You could do them all at once, like:
gcc -o blah main.c foo.c bar.c
Or you could compile each separately and link the object files together:
gcc -c foo.c
gcc -c bar.c
gcc -c main.c
gcc -o blan main.o foo.o bar.o
Or you could build a library out of foo.c and bar.c and link against that (useful if you want to use foo and bar in other programs):
gcc -c foo.c
gcc -c bar.c
ar cr libblurga.a foo.o bar.o
gcc -o blah main.c -lblurga
Standard C does not support nested functions (that is, defining a function within the body of another function). Some implementations such as gcc support nested functions as an extension, but it's not the usual practice.

Good question. Languages in the Pascal family usually do have the concept of scoped functions, like any other declaration/definition.
I think the answer lies in the origins of C as, heaven forgive me, a better macro assembler of sorts (with a standard library). Functions are mere jump addresses with a little stack magic for parameter and return value handling; function "scope" is just too abstract a concept in that world.
That said, a similar effect can be achieved by grouping "helper functions" together with a globally visible function which needs them in the same file; the helper functions would be declared static and could then only be used in that source file. The net effect is quite similar to scoped functions.

Private declaration goes on top of your .c file:
static int your_function();
Private declaration can be emitted if it is defined above where you are attempting to call it, although for maintainability it's always better to declare your private interface, just like your public, in one place.
Public declaration in your .h file:
extern int your_function();
Keyword 'extern' in header files is always implicitly added to your function declaration, although I tend to attach it explicitly for clarity.
Function definition works for both private and public declarations:
int your_function() {
return 5;
}
Or for private only:
static int your_function() {
return 5;
}
If you mark extern function definition as static, GCC will fail with the following:
error: static declaration of ‘your_function’ follows non-static declaration
When compiler builds your code, it pretty much replaces all your #include statements with the content of the file you are including and the parsing goes from top to bottom as one large file. Once you understand that, most of these things simply start to make sense.

Are static functions in C language really invisible?

I was told that a function defined as static in one .c file is not accessible from other files. But in the following program, I can access the static void show() function from another file. Is my understanding of static functions in C wrong?
a.h (first file):
static void show()
{
printf("I am in static show function in a.c");
}
b.c (another file):
#include"a.h"
void main()
{
show();
}

Remember that #includes work by copy-and-pasting the content of the included file. So in your example, after the #include has been processed, you get this:
static void show()
{
printf("I am in static show function in a.c");
}
void main()
{
show();
}
So clearly main can see show.1
The solution is to not #include .c files. In general, you should only #include header (.h) files. Your static functions shouldn't be declared or defined in the header file, so main will not be able to see it.
1. However, you now actually have two definitions of the show function, one in a.c and one in b.c. For static functions, this isn't a problem, but for non-static functions you would get a linker error.

static keyword changes the linkage specification to Internal Linkage.
A function marked as static will only be visible in that Translation Unit(TU).
Perhaps, You have same named symbols available in that particular TU, where you access the function. The how part of it can be only answered after you show us the code.
EDIT:
When you define a static function in header file, A copy of the same function gets created in every Translation Unit where you include it.Each instance of such a function is treated as a separate function(address of each function is different) and each instance of these functions have their own copies of static local variables & string literals.
Clearly, this will work but this might as well increase the size of your generated binary.

The other answers are correct, but it's not quite accurate to say that the static function is not accessible from another file. It is possible to access the function through a function pointer. It would be more accurate to say that the name of the function is not accessible in another translation unit.
Remember that converting C source code to an executable program consists of conceptual stages, including:
preprocessing (in which #include directives are replaced with the contents of the included file
compilation (which processes one translation unit at a time)
linking (in which the translation units are put together into the final program)
Suppose we have three files. foo.h:
typedef void (*void_function_p)(void);
extern void_function_p foo(void);
foo.c:
#include "foo.h"
#include <stdio.h>
static void baz(void) {
printf("worked!\n");
}
void_function_p foo(void) {
return baz;
}
bar.c:
#include "foo.h"
#include <stdio.h>
int main(void) {
(*foo())();
return 0;
}
This program compiles and prints "worked!" when it runs.
There are two translation units here. One is the code in the preprocessed foo.c (which, because of how #include works also includes the code in foo.h and stdio.h). The other is the code in the preprocessed bar.c (which, again, has its own copy of the code in foo.h and stdio.h).
By having the function foo return a pointer to the static function baz, we are able to call baz from the main function.
Now, consider what happens if we modify main to look like this:
int main(void) {
(*foo())();
baz();
return 0;
}
This code will result in a linker error because the name baz in this translation unit cannot be linked to the definition of baz in the other translation unit.
This is the first advantage of static functions: another programmer cannot accidentally access our baz function from another translation unit.
Now, consider what happens if we modify bar.c to look like this:
#include "foo.h"
#include <stdio.h>
static void baz(void) {
printf("still works!");
}
int main() {
(*foo())();
baz();
return 0;
}
This code will compile, and print "worked!" followed by "still works!"
This is the second advantage of static functions: we've defined two functions (in different translation units) with the same name.
If you try to put both static definitions in the same translation unit, you will get a compiler error about defining baz twice.
As a final note, if you take the program as it now stands and remove all the statics, it will result in a linker error because baz has been defined twice (with external linkage), which is not permitted.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

c++ Forward declaration, translation unit, linking - linker

Related

At what point during compilation or linking of C code are extern variables implicitly defined?

how can I force the source file implement definition for a header file

Linkage and static function confusion

Definition of a function in a C program [closed]

Are static functions in C language really invisible?

Categories

Resources