Why is multiple variable definition across different source files a problem, but multiple class definition across different sources is not - static

I am currently learning C++ (more precisely C++03) at uni, and I came across the initialization of static members. Non-constant static members should be declared inside the class, but defined outside. Moreover, they also should be declared in source files, not header files. As far as I understand, that is because if you have a myClass.h header with a myClass class in it, and A.cpp and B.cpp which include it, then you can protect yourself from multiple definition inside the same source file with include guards, but you cannot protect from myClass.h being present once in A.cpp and once in B.cpp. If you define the static member inside myClass.h, but outside myClass, then after the preprocess you will have copied the definition of the same thing both inside the global scope of A.cpp and B.cpp. The linker will "see" the global scope of B.cpp from inside A.cpp and vicecersa, so you will have multiple definitions available in a given context, which is a problem.
So my question is, if this is a problem, then how come the definition of the class myClass both in the global scope of A.cpp and B.cpp isn't one?

No you misunderstand the reasons. static variables that are not constexpr needs to be initialized only once as it happens at runtime. Something is really wrong if same variable is initialized twice at run-time... Thus they have to be initialized in a .cpp so that compiler/linker knows which library carries it.
Definitions for classes, on the other hand, are compile-time definitions so linker throws away all duplicates. (This is in fact is very bad and results in poor compilation times and at times in ODR issues. Modules C++20 were developed to resolve these issues.).

You've got some concepts wrong, which is all right and happens with everyone. You see, there's no hard and fast rule that the definitions of non-static methods are to be written inside the class. Its perfectly fine even to write them outside the class, the difference being, you then will be able to mask their definitions if you want to, as the header files needs to be provided in cleartext. As against this, if you don't write the functions in a .cpp file, you can't make a library and provide the library, inherently masking the definitions. Also, it is usually recommended to do it the former way, so that the class size in clear text becomes small and the abstraction is better achieved. Also, please do read how to ask a good question on stack overflow.
RE: If you define the static member inside myClass.h, but outside myClass, then after the preprocess you will have copied the definition of the same thing both inside the global scope of A.cpp and B.cpp Please read boilerplate code
Hope this helps. :)
And if you don't understand anything, I'm always open for dicussions; don't hesitate to ask

Related

Benefits and drawbacks of making all functions in main.c static?

I have heard that, when you have just 1 (main.c) file (or use a "unity build"), there are benefits to be had if you make all your functions static.
I am kind of confused why this (allegedly) isn't optimized by default, since it's not probable that you will include main.c into another file where you will use one of its functions.
I would like to know the benefits and dangers of doing this before implementing it.
Example:
main.c
static int my_func(void){ /*stuff*/ }
int main(void) {
my_func();
return 0;
}
You have various chunks of wisdom in the comments, assembled here into a Community Wiki answer.
Jonathan Leffler noted:
The primary benefit of static functions is that the compiler can (and will) aggressively inline them when it knows there is no other code that can call the function. I've had error messages from four levels of inlined function calls (three qualifying “inlined from” lines) on occasion. It's staggering what a compiler will do!
and:
FWIW: my rule of thumb is that every function should be static until it is known that it will be called from code in another file. When it is known that it will be used elsewhere, it should be declared in a header file that is included both where the function is defined and where it is used. (Similar rules apply to file scope variables — aka 'global variables'; they should be static until there's a proven need for them elsewhere, and then they should be declared in a header too.)
The main() function is always called from the startup code, so it is never static. Any function defined in the same file as an unconditionally compiled main() function cannot be reused by other programs. (Library code might contain a conditionally compiled test program for the library function(s) defined in the source file — most of my library code has #ifdef TEST / …test program… / #endif at the end.)
Eirc Postpischil generalized on that:
General rule: Anytime you can write code that says the use of something is limited, do it. Value will not be modified? Make it const. Name only needs to be used in a certain section? Declare it in the innermost enclosing scope. Name does not need to be linked externally? Make it static. Every limitation both shrinks the window for a bug to be created and may remove complications that interfere with optimization.

What is and isn't available to linked files in C?

As of late, I've been trying to work through opaque pointers as a programming concept and one of the main things I've had difficulties with is figuring out what is or isn't available to other files. In a previous question, I failed in trying to create an opaque pointer to a struct and even though the answer explained how to fix that, I still don't quite understand where I went wrong.
I think that if a struct is defined in file2.c, file1.c can use it if both files include header.h which includes a declaration of the struct? That doesn't entirely make sense to me. header.h is used by both files, so I can see how they would access the stuff in it, but I don't understand how they would use it to access each other.
When I started programming, I thought it was pretty straight forwards, where you have program files, they can't access anything in each other, and those program files can #include header files with definitions and declarations in them (e.g. file1.c has access to variables/functions/etc. defined in header.h). Turns out I was wrong and things are quite a bit more complicated.
So from what I can tell, func() defined in header.h can be used by file1.c without being declared in file1.c, if file1.c includes header.h. As opposed to var defined in header.h which needs to be declared in file1.c with the extern keyword? And I think if var is defined in file2.c, file1.c can use it if it extern declares it, even if neither file1.c nor file2.c include header.h?
I apologize if the previous paragraphs makes no sense, I'm having quite a bit of difficulty with trying to describe something that confuses me. By all means, please edit this if you are able to fix mistakes or whatnot.
Books and webpages don't seem to help at all. They end up giving me misconceptions because I already don't understand something and draw the wrong conclusions, or they bring up concepts that throw me off even more.
What I'm looking what I'm looking for is an answer that lays this all down in front of me. For example 'this can access this under these circumstances', 'this cannot access this'.
Functions defined in one .c file can use anything defined in another .c file except for those things which are marked as static. Functions and global variables which are marked as static cannot be accessed from other translation units.
Whether something is declared in a header file or not doesn't really matter--you can declare functions locally in the same .c file which calls them if you want.
Your question asks about “access” at several points, but I do not think that is what you mean to use. Any object or function can be accessed (for an object: read or written, for a function: called) from anywhere as long as a pointer to it is provided in some way). I think what you mean to ask is what names are available.
Any declaration that is outside of a function is an external declaration. In this use of “external” in the C standard, it simply means outside of a function. (That includes a function declaration or definition; although it is declaring or defining a function, it is not inside itself or any other function declaration, so it is outside of any function.)
Any identifier for an object or function with an external declaration has either internal linkage or external linkage. If it is first declared with static, it has internal linkage (and may be later declared with extern, but that will not change the linkage). Otherwise, it has external linkage.
Any identifier with external linkage will refer to the same object or function in all translation units (provided other rules of the C standard are satisfied—a program can do various things that will result in behavior not defined by the C standard).
Thus your answer is: The name of any object or function that is (a) defined outside of any function and (b) not initially declared with static is available to be linked to from other translation units.
Some technicalities that may be of interested:
What people think of as a variable is two things: an identifier (the name) and an object (a region of memory that stores the value).
Identifiers have scope, which is where they are visible in the source code. Identifiers declared outside functions have file scope; they are visible for the rest of the translation unit. Identifiers declared inside functions have various other types of scope: function scope, function prototype scope, and block scope.
You may sometimes seem people refer to global scope or external scope, but these are misnomers; they are not terms used in the C standard.
Linkage is related to scope and is sometimes confused with it, but linkage is a different concept: Two identical identifiers declared in different places can be made to refer to the same thing. Those identifiers have different scopes, notably one having file scope in one translation unit and the other having file scope in a different translation unit. Since each translation unit is compiled separately, the compiler generates code regarding each identifier separately. When the object modules are linked, then the code is bounded together, causing the separate identifiers to refer to the same object or function.
Identifiers can be declared with extern inside functions, but these can only link to objects or functions defined elsewhere; external definitions cannot appear inside functions.

Why declaration of a pre defined function and code of that pre defined function are in different files? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Why does C++ have header files and .cpp files?
C++ compilation
A compilation in C++ is done in 2 major phases:
The first is the compilation of "source" text files into binary "object" files: The CPP file is the compiled file and is compiled without any knowledge about the other CPP files (or even libraries), unless fed to it through raw declaration or header inclusion. The CPP file is usually compiled into a .OBJ or a .O "object" file.
The second is the linking together of all the "object" files, and thus, the creation of the final binary file (either a library or an executable).
Where does the HPP fit in all this process?
A poor lonesome CPP file...
The compilation of each CPP file is independent from all other CPP files, which means that if A.CPP needs a symbol defined in B.CPP, like:
// A.CPP
void doSomething()
{
doSomethingElse(); // Defined in B.CPP
}
// B.CPP
void doSomethingElse()
{
// Etc.
}
It won't compile because A.CPP has no way to know "doSomethingElse" exists... Unless there is a declaration in A.CPP, like:
// A.CPP
void doSomethingElse() ; // From B.CPP
void doSomething()
{
doSomethingElse() ; // Defined in B.CPP
}
Then, if you have C.CPP which uses the same symbol, you then copy/paste the declaration...
COPY/PASTE ALERT!
Yes, there is a problem. Copy/pastes are dangerous, and difficult to maintain. Which means that it would be cool if we had some way to NOT copy/paste, and still declare the symbol... How can we do it? By the include of some text file, which is commonly suffixed by .h, .hxx, .h++ or, my preferred for C++ files, .hpp:
// B.HPP (here, we decided to declare every symbol defined in B.CPP)
void doSomethingElse() ;
// A.CPP
#include "B.HPP"
void doSomething()
{
doSomethingElse() ; // Defined in B.CPP
}
// B.CPP
#include "B.HPP"
void doSomethingElse()
{
// Etc.
}
// C.CPP
#include "B.HPP"
void doSomethingAgain()
{
doSomethingElse() ; // Defined in B.CPP
}
How does include work?
Including a file will, in essence, parse and then copy-paste its content in the CPP file.
For example, in the following code, with the A.HPP header:
// A.HPP
void someFunction();
void someOtherFunction();
... the source B.CPP:
// B.CPP
#include "A.HPP"
void doSomething()
{
// Etc.
}
... will become after inclusion:
// B.CPP
void someFunction();
void someOtherFunction();
void doSomething()
{
// Etc.
}
One small thing - why include B.HPP in B.CPP?
In the current case, this is not needed, and B.HPP has the doSomethingElse function declaration, and B.CPP has the doSomethingElse function definition (which is, by itself a declaration). But in a more general case, where B.HPP is used for declarations (and inline code), there could be no corresponding definition (for example, enums, plain structs, etc.), so the include could be needed if B.CPP uses those declaration from B.HPP. All in all, it is "good taste" for a source to include by default its header.
Conclusion
The header file is thus necessary, because the C++ compiler is unable to search for symbol declarations alone, and thus, you must help it by including those declarations.
One last word: You should put header guards around the content of your HPP files, to be sure multiple inclusions won't break anything, but all in all, I believe the main reason for existence of HPP files is explained above.
#ifndef B_HPP_
#define B_HPP_
// The declarations in the B.hpp file
#endif // B_HPP_
or even simpler (although not standard)
#pragma once
// The declarations in the B.hpp file
Well, the main reason would be for separating the interface from the implementation. The header declares "what" a class (or whatever is being implemented) will do, while the cpp file defines "how" it will perform those features.
This reduces dependencies so that code that uses the header doesn't necessarily need to know all the details of the implementation and any other classes/headers needed only for that. This will reduce compilation times and also the amount of recompilation needed when something in the implementation changes.
It's not perfect, and you would usually resort to techniques like the Pimpl Idiom to properly separate interface and implementation, but it's a good start.
Because C, where the concept originated, is 30 years old, and back then, it was the only viable way to link together code from multiple files.
Today, it's an awful hack which totally destroys compilation time in C++, causes countless needless dependencies (because class definitions in a header file expose too much information about the implementation), and so on.
Because in C++, the final executable code does not carry any symbol information, it's more or less pure machine code.
Thus, you need a way to describe the interface of a piece of code, that is separate from the code itself. This description is in the header file.
Because C++ inherited them from C. Unfortunately.
Because the people who designed the library format didn't want to "waste" space for rarely used information like C preprocessor macros and function declarations.
Since you need that info to tell your compiler "this function is available later when the linker is doing its job", they had to come up with a second file where this shared information could be stored.
Most languages after C/C++ store this information in the output (Java bytecode, for example) or they don't use a precompiled format at all, get always distributed in source form and compile on the fly (Python, Perl).
It's the preprocessor way of declaring interfaces. You put the interface (method declarations) into the header file, and the implementation into the cpp. Applications using your library only need to know the interface, which they can access through #include.
Often you will want to have a definition of an interface without having to ship the entire code. For example, if you have a shared library, you would ship a header file with it which defines all the functions and symbols used in the shared library. Without header files, you would need to ship the source.
Within a single project, header files are used, IMHO, for at least two purposes:
Clarity, that is, by keeping the interfaces separate from the implementation, it is easier to read the code
Compile time. By using only the interface where possible, instead of the full implementation, the compile time can be reduced because the compiler can simply make a reference to the interface instead of having to parse the actual code (which, idealy, would only need to be done a single time).
Responding to MadKeithV's answer,
This reduces dependencies so that code that uses the header doesn't
necessarily need to know all the details of the implementation and any
other classes/headers needed only for that. This will reduce
compilation times, and also the amount of recompilation needed when
something in the implementation changes.
Another reason is that a header gives a unique id to each class.
So if we have something like
class A {..};
class B : public A {...};
class C {
include A.cpp;
include B.cpp;
.....
};
We will have errors, when we try to build the project, since A is part of B, with headers we would avoid this kind of headache...

Why won't C compile if two separate source files in the same workspace share function names?

I'm using eclipse indigo, gcc and cdt in a project. If two functions in separate source files share names (regardless of return type or parameters), eclipse flags a redefinition error. This isn't a huge issue regarding this project given I can easily rename these functions, and I'm well aware of wrappers if it were. Although this isn't a critical issue, it does make me think I'm not understanding the c build process. What occurs during the build process in which a program structure like this would cause issue?
Here's some more info. on the situation, and where my understanding is so far -- not necessary to answer the question, although there must be a hole in my understanding.
In this case, the two functions are intended to be used only locally, as such their prototypes are not given in the .h interface, and for the sake of my point, neither are defined 'static'.
Neither of these source files are being included anywhere in the project, so they shouldn't be sharing any compilation units. With that in consideration, I would have assumed that the neither source file is aware of the presence of the other, and the compiler would have no problem indexing the two functions, as the separate files would allow for proper distinguishing between the two during linking -- so long as they weren't included in the same compilation unit.
I noticed that statically defining either instance of the function declaration removes the error. I remember reading at some point that every function not declared static is global -- although given these functions are not a part of the .h interface, the practical example in which including the .h interface doesn't allow for the including program to reference all .c functions would indicate "hiding" these functions would be of no issue.
What am I overlooking?
Some insight would be greatly appreciated, thanks!
This is the concept of "linkage". Every function and variable in C has a linkage type, one of "external", "internal", and "none". (Only variables can have no linkage.)
Functions have external linkage by default, which means that they can be called by name from any compilation unit (where "compilation unit" roughly means one source file and all the headers it includes). This can be expressed explicitly by declaring them extern, or it can be overridden by declaring them static. Functions declared static have internal linkage, meaning they can be referenced by name only from other functions in the same compilation unit.
No two external functions anywhere in the same program can have the same name, regardless of header files, but static functions in different compilation units may have the same name. A static function may have the same name as an external function, too -- then the name resolves to the static function within its compilation unit, and to the external function elsewhere. These restrictions make sense, for otherwise it would be possible for a function call to be ambiguous.
Header files don't factor into the linkage equation at all. They are primarily a vehicle for sharing declarations, but a function's linkage depends only on how it is declared, not on where.
I leave discussion of variables' linkage for another time.
It doesn't matter whether one source module includes headers for another. Header files only contain declarations for the purpose of local functions being able to find functions in other modules. It doesn't mean that functions not declared don't exist from the perspective of that module.
When everything gets linked together, anything not specifically defined to be local to one source module (i.e. static) has to have a unique name across all linked components.
remember reading at some point that every function not declared static is global
Having understood this you got the main point and reason for the behaviour observed.
.h files are not known to the linker, after pre-processing there are only translation units left (typically a .c file with all includes merged in), from which .o files are compiled.
There are no interfaces on language level in C.
Neither of these source files are being included anywhere in the project, so they shouldn't be sharing any compilation units.
Declare those functions as static. This is the only way to "hide" a function from the linker "inside" a translation unit.
C doesn't "mangle" function names the way C++ or Java do (since C doesn't support function polymorphism).
For example, in C++, the functions
void foo( void );
void foo( int x );
void foo( int x, double y );
have their names "mangled" into the unique symbols1
_Z3fooid
_Z3fooi
_Z3foov
which is how overloaded function/method calls are disambiguated at the machine level.
C doesn't do that; instead, the linker sees two different function definitions using the same symbol and yaks because it has no way to disambiguate the two.
1. This is what happens on my system, anyway

Reasons to use Static functions and variables in C

I wonder about the use of the static keyword as scope limiting for variables in a file, in C.
The standard way to build a C program as I see it is to:
have a bunch of c files defining functions and variables, possibly scope limited with static.
have a bunch of h files declaring the functions and possibly variables of the corresponding c file, for other c files to use. Private functions and variables are not published in the h file.
every c file is compiled separately to an o file.
all o files are linked together to an application file.
I see two reasons for declaring a gobal as static, if the variable is not published in the h file anyway:
one is for readability. Inform future readers including myself that a variable is not accessed in any other file.
the second is to prevent another c file from redeclaring the variable as extern. I suppose that the linker would dislike a variable being both extern and static. (I dislike the idea of a file redeclaring a variable owned by someone else as extern, is it ok practice?)
Any other reason?
Same goes for static functions. If the prototype is not published in the h file, other files may not use the function anyway, so why define it static at all?
I can see the same two reasons, but no more.
When you talk about informing other readers, consider the compiler itself as a reader. If a variable is declared static, that can affect the degree to which optimizations kick in.
Redefining a static variable as extern is impossible, but the compiler will (as usual) give you enough rope to hang yourself.
If I write static int foo; in one file and int foo; in another, they are considered different variables, despite having the same name and type - the compiler will not complain but you will probably get very confused later trying to read and/or debug the code. (If I write extern int foo; in the second case, that will fail to link unless I declare a non-static int foo; somewhere else.)
Global variables rarely appear in header files, but when they do they should be declared extern. If not, depending on your compiler, you risk that every source file which includes that header will declare its own copy of the variable: at best this will cause a link failure (multiply-defined symbol) and at worst several confusing cases of overshadowing.
By declaring a variable static on file level (static within function has a different meaning) you forbid other units to access it, e.g. if you try to the variable use inside another unit (declared with extern), linker won't find this symbol.
When you declare a static function the call to the function is a "near call" and in theory it performs better than a "far call". You can google for more information. This is what I found with a simple google search.
If a global variable is declared static, the compiler can sometimes make better optimizations than if it were not. Because the compiler knows that the variable cannot be accessed from other source files, it can make better deductions about what your code is doing (such as "this function does not modify this variable"), which can sometimes cause it to generate faster code. Very few compilers/linkers can make these sorts of optimizations across different translation units.
If you declare a variable foo in file a.c without making it static, and a variable foo in file b.c without making it static, both are automatically extern which means the linker may complain if you initialise both, and assign the same memory location if it doesn't complain. Expect fun debugging your code.
If you write a function foo () in file a.c without making it static, and a function foo () in file b.c without making it static, the linker may complain, but if it doesn't, all calls to foo () will call the same function. Expect fun debugging your code.
My favorite usage of static is being able to store methods that I wont have to Inject or create an object to use, the way I see it is, Private Static Methods are always useful, where public static you have to put some more time in thinking of what it is your doing to avoid what crazyscot defined as, getting your self too much rope and accidentally hanging ones self!
I like to keep a folder for Helper classes for most of my projects that mainly consist of static methods to do things quickly and efficiently on the fly, no objects needed!

Resources