Multiple Include Optimization - c

I'm trying to understand how multiple-include optimization works with gcc.
Lately, I've been reading a lot code that has include guards for standard header files like so
#ifndef _STDIO_H_
#include <stdio.h>
#endif
and I'm trying to figure out if this construct has any benefits.
Here's an example I wrote to understand this a little better.
header1.h
#ifndef _HDR_H_
#define _HDR_H_
#define A (32)
#endif
header2.h
#ifndef _HDR_H_
#define _HDR_H_
#define A (64)
#endif
hdr.c
#include <stdio.h>
#include "header1.h"
#include "header2.h"
int main()
{
printf("%d\n", A);
return 0;
}
Note that both header1.h and header2.h use the same include guard. As expected this program outputs the value of A defined in header1.h; header2.h is skipped since it uses the same include guard.
Here's what I'm trying to understand
At what point when parsing header2.h does the preprocessor skip this file? My understanding is that it skips this file immediately after the #if directive on line 1, i.e. it does not have to wait for the matching #endif. Is this correct?
What can I add to the example above to demonstrate how this works?
EDIT: Thanks everyone for the answers. This is starting to make more sense now. A follow up question. The page linked to on the first line of this post has the following text
The preprocessor notices such header files, so that if the header file
appears in a subsequent #include directive and FOO is defined, then it
is ignored and it doesn't preprocess or even re-open the file a second
time. This is referred to as the multiple include optimization.
If I understand this correctly, this means that any header file is read only once even it is included multiple times for a given compile process. And so, additional include guards in application code or header file provide no benefit.

At what point when parsing header2.h does the preprocessor skip this file?
As #Sean says, header2.h will never be skipped, but the content between the ifndef ... endif will be ignored in this case.
What can I add to the example above to demonstrate how this works?
Add something (for example, a #define B 123) after the #endif in header2.h. Now try to access it in the main. It will be accessible.
Now, try to add it before the #endif. You'll see, that it's not accessible in the `main.

At what point when parsing header2.h does the preprocessor skip this file?
The file is not skipped.
My understanding is that it skips this file immediately after the #if directive on line 1, i.e. it does not have to wait for the matching #endif. Is this correct?
Yes and No. Some compilers identify the sentry macro when it parses the first header file and if it finds it in a second file, it will immediately stop parsing. Other compilers will parse the header again (looking for the matching #endif).
What can I add to the example above to demonstrate how this works?
Add a print message inside and outside the sentry macro
#ifdef _HEADER_INCLUDED
#define _HEADER_INCLUDED
...
#pragma message ("inside sentry in " __FILE__ "\n")
#endif //#ifdef _HEADER_INCLUDED
#pragma message ("outside sentry in " __FILE__ "\n")
Relevant material:
You can use #pragma once instead of the sentry macro. Faster compilation since very little of the file is parsed. No worries about macro name collisions.
You can wrap the includes if checks to sentry macro so the header file isn't loaded again. This is usually used in library headers that include multiple headers many times. Can significantly speed up compilation at the expense of ugly code:
#ifndef __LIST_H_
#include "list.h"
#endif

The pre-processor will never skip header2.h. It will always include it, and when expanding it will ignore the stuff in the #ifndef block.
In your example A will be 32, as the #define in herader2.h will never be reached. If it was reached you'd get some sort of "macro redefinition error" as you'd have multiple #defines for "A". To fix this you#d need to #undef A.
Most compilers support the #pragma once directive these days to save you having to write include guards in header files.

The preprocessor starts blocking all input that follows a false #if[[n]def] to go to through subsequent compiler steps.
The preprocessor does however continues reading the input, to keep track of nesting depth of all those conditional compilation #-directives.
When it finds the matching #endif, of where it started blocking input, it simply stops blocking.

If I understand this correctly, this means that any header file is read only once even it is included multiple times for a given compile process. And so, additional include guards in application code or header file provide no benefit.
No gcc compiler only does this optimization for files that it knows to be safe following the rules:
There must be no tokens outside the controlling #if-#endif pair, but whitespace and comments are permitted.
There must be no directives outside the controlling directive pair, but the null directive (a line containing nothing other than a single ‘#’ and possibly whitespace) is permitted.
The opening directive must be of the form
#ifndef FOO
or
#if !defined FOO [equivalently, #if !defined(FOO)]

Related

How Header files and macros are related?

just a beginner question, what's going on with #ifndef SOME_HEADER_H understandable that it is a preprocessor directive for conditional compilation, if some header is already included (i might be wrong correct me?) move on , if it's not, include it, i read at some blog the letter sentence with these words instead, if it's defined move on else #define it, well i thought we can just include a header file not define a header file , how can a header file be defined, and what's the relation here ? and the second question, the file name was foo.h and when he try to check if it's defined he does #ifndef FOO_H #define FOO_H, ok how foo.h have been translated to FOO_H , does the c mechanism know that he's talking about that specific file or does he done something before-word? thank's for your time!
There is no such thing as translating foo.h as FOO_H, nor such thing as "defining that a .h has already been included". Using preprocessor variables is just the standard way C developers ensure that .h are not included twice.
In C preprocessor, you can use things such as #if, #else and #endif in order to make logic. You can also #define variables, to store information. You can also use the function defined(...) to check if a C-preprocessor variable is already defined. The #ifdef MY_VARIABLE directive is just a shorthand for #if defined(MY_VARIABLE), and #ifndef is just the opposite of that.
On the other hand, you don't want a .h to be included twice, there are several ways to do this, but the standard way is:
/* Check if my variable has already been declared */
#ifndef MY_aWeSoMe_VARIBLE
/* If we are in here, it mean that it is not */
/* So let's declare it */
#define MY_aWeSoMe_VARIBLE
/* You can write some more code here, like your .h stuff */
/* And of course, it's time to close the if */
#endif /* This closes the MY_aWeSoMe_VARIABLE ifndef */
The 1st time your complier will include the .h, MY_aWeSoMe_VARIABLE won't be defined yet, so preprocessor will get inside the if, define the variable, include all the .h's code. If your compiler comes to include the .h a 2nd or more time, the variable will already be defined, so the preprocessor won't get inside the if. Since all the .h's content is inside the if, it won't do anything.
Since naming a variable MY_aWeSoMe_VARIABLE is pretty stupid, people tend to name it like MY_FILE_NAME, or MY_FILE_NAME_H, but this is not mandatory, practices actually vary from one dev to another.
What you have here is a header guard:
File: some_header.h
#ifndef SOME_HEADER_H // if SOME_HEADER_H is not defined, enter the
// #ifndef ... #endif block
#define SOME_HEADER_H // and define SOME_HEADER_H
struct foo {
int x;
};
#endif
This protects the header from being included more than once in the same translation unit and thereby trying to define the same entities more than once. The macro SOME_HEADER_H will stay defined until the translation unit is done so no matter how many times this header is included in the translation unit (implicitly via other header files) its contents will only be parsed once for that translation unit.
You can now do this:
File: some_other_header.h
#ifndef SOME_OTHER_HEADER_H
#define SOME_OTHER_HEADER_H
#include "some_header.h" // this header uses some_header.h
struct bar {
struct foo x;
};
#endif
And a program can now include both header files without getting an error like redefinition of 'foo'.
File: main.cpp
#include "some_header.h"
#include "some_other_header.h"
int main() {}
A non-standard but quite popular alternative to the classic header guards shown above is #pragma once which does the same thing (if your preprocessor supports it):
File: some_header.h
#pragma once
// no need for a named macro or #endif
struct foo { ... };

What happens when preprocessor lines are processed by the preprocessor? - the '.i' file

I am using Gnu cc compiler of Gcc to compile my C programs. Consider a program,
#include <stdio.h>
int main(){
return 0;
}
Now, when I pre-process the above code, using
cpp sample.c > sample.i
I get a lot of contents in sample.i which I haven't included. Say, 'stdio.h' file is preprocessed. If that is the case,
Question 1:
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Question 2:
Can anyone explain what exactly happens when the preprocessor proccess the C file.(The contents that I got in my '*.i' file)
Compiler: gcc
OS: Ubuntu
Thanks
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Preprocessing is just one part of the compilation process. It's more or less a simple textual replacement and nothing more complex is involved at the preprocessing stage. The preprocessor does not know or care whether you have used any standard functions in your code program or not. An optimizer (as part of the compilation process) might
"remove" parts that are not needed. But the preprocessor doesn't do that.
It'll do preprocessing of all the header files you have included and other header files included via your header files and so on.
Can anyone explain what exactly happens when the preprocessor process the C file.(The contents that I got in my '*.i' file)
The preprocessing involves quite a few tasks: macro replacement, conditional compilation, stringification, string concatenation etc.
You can read more about cpp in detail here: https://gcc.gnu.org/onlinedocs/cpp/
the preprocessor command #include "aFile.h" will put the hole content from aFile.h into your cpp file. And that exactly to the place, where the preprocessor directives stands. That is the reason why you can use the in aFile.h defined functions.
if you are interest to learn more about the preprocessor, there is a very good (and short) guidance on cplusplus.com
The preprocessor does text substitution. The net effect of #include <stdio.h> is to replace the #include <stdio.h> line with the contents of <stdio.h>.
Practically, <stdio.h> contains several declarations of various functions (e.g. fprintf(), fscanf()), declarations of variables (e.g. stdout, stdin), and some macro definitions (which, when used in later code, cause text substitution).
The preprocessor is specified as a phase of compilation, which takes source code as input, substitutes text as required (e.g. the #include as I have described, macro expansions, etc), and outputs the resultant source code. That output is what you are directing into sample.i
The output of the preprocessor is then input to a later phase of compilation, which actually understands declarations, definitions, statements, etc.
The phases of compilation are sequential - they occur one after the other, not all at once. So the later phase of compilation feeds no information whatsoever back to the preprocessor. It is the later phase of compilation that detects if declarations etc are used. But, since it cannot feed such information back to the preprocessor (and the preprocessor is an ignorant program that couldn't use such information anyway) the preprocessor cannot know that declarations are unused, and filter them out.
1) You may not use them, but you have included them in line 1
#include <stdio.h>
That's where what you see come from. Try to remove it to see the difference.
2) The preprocessor read your C file and processed all preprocessor directives that you have declared. All Preprocessor directives start with a '#' symbol. The '#include' will replace this line by the content of the given file. You also have the classical '#ifndef' and '#define' directive. The latter is equal to 'if' statement which allow you to activate a part of a code only if a symbol is defined
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
#ifndef WIN32
#include <some_file.h>
#else
#include <some_other_file.h>
#endif
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
// this second function is ignored
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
When the preprocessor reads the above file, the symbol "_SOME_SYMBOL_" is unknown, so the preprocessor initializes it. Next it includes the file whether or not it knows of WIN32. Usually this kind of symbol is passed trough command line. So part of your code is dynamically activated or deactivated.
The preprocessor will output this
void some_other_function_from_some_other_file(){}
int main() { return 0;}

What is the scope of a #define?

What is the scope of a #define?
I have a question regarding the scope of a #define for C/C++ and am trying to bet understand the preprocessor.
Let's say I have a project containing multiple source and header files. Let's say I have a header file that has the following:
// header_file.h
#ifndef __HEADER_FILE
#define __HEADER_FILE
#define CONSTANT_1 1
#define CONSTANT_2 2
#endif
Let's then say I have two source files that are compiled in the following order:
// source1.c
#include header_file.h
void funct1(void)
{
int var = CONSTANT_1;
}
// source2.c
#include header_file.h
void funct2(void)
{
int var = CONSTANT_2;
}
Assuming I have included all the other necessary overhead, this code should compile fine. However, I'm curious as to what #defines are remembered between compilations. When I compile the above code, are the contents of each #include actually included, or are the include guards actually implemented?
TLDR: Do #defines carry over from one compilation unit to the next? Or do #define only exist within a single compilation unit?
As I type this out, I believe I'm answering my own question and I will state my believed answer. #defines are constrained to a single compilation unit (.c). The preprocessor essentially forgets any #defines when it goes from one compilation unit to the next. Thus in the above example I listed, the include guards do not come into play. Am I correct in this belief?
source1.c is compiled separately from source2.c therefore your defines are processed for source1 as it is compiled and then as an independent action they are processed for source2 as it is compiled.
Hopefully this is a clear explanation.
Preprocessor macros do not have "scope" as such, they just define a piece of text that should replace the macro in the code.
This means that the compiler never sees the strings CONSTANT_1 and CONSTANT_2 but instead gets the source in a preprocessed form with these macros replaced with their expansions (1 and 2 respectively).
You may inspect this preprocessed source by calling gcc with the -E flag, or with whatever flag only does preprocessing on your particular compiler.
Yes, you are right!!
Compilation of a file, in it self, is merely, just a process under execution. One process can not interfare with another unless explicitly done. The c pre-processors are just literal substitution mechanism, performed in a dumb way. Whatever conditional checking are performed, are confined to ongoing instance of pre-processor only, nothing gets carry forward once execution (compilation) comes to end. Pre-processors do not "configure" compiler, their scope is limited till "their own compilation"

#ifdef #else #endif macro question

I am new to C, and I am maintaining someones code. I came across this in the header file. I can understand that if the source is compiled on the windows it will enter the if statement else if the code is compiled on a linux it will enter the else statement. Correct me if I am wrong.
However, the question is why is # (hash) used in front of all the include headers?
Many thanks for any suggestions,
#ifdef WIN32
# include <conio.h>
# include <process.h>
# include <stdlib.h>
# include <string.h>
#else
# include <unistd.h>
# include <termio.h>
# include <sys/types.h>
# include <sys/stat.h>
# include <fcntl.h>
#endif
The hash (#) indicates a preprocessor directive. The preprocessor runs over the code before compilation and does things depending on all the lines beginning with "#". The "#include filename.h" directive essentially copies all the contents of filename.h and pastes it where the "#include filename.h" line was.
#include is the way you include files in C.
You might be confused by the spaces between the # and the include.
But they don't matter. These lines are still #include.
Because "#include" is the syntax for tell the preprocessor to include a header. The spaces after the pound are just there for formatting and are not strictly necessary.
The # lines are actually handled not by the C compiler itself, but by a preprocessor that runs as an early stage in the compilation pipeline. The "#" is how it knows which lines it is responsible for.
That same preprocessor can be used in other contexts as well.
The preprocessor can not only do evaluation of expression, as in the #if and #ifdef clauses, but it can also open other files and insert them using #include and even do text substitution using #define clauses.
More information can be found in the Wikipedia entry on the C preprocessor.
#include is different from, say, the VB.Net Imports statement or the C# using statement. Those make references to other classes, but #include actually inserts the text of the included file at that location in the source file. And it can act recursively, so that included files may themselves #include still others.
The #include directive tells the preprocessor to treat the contents of a specified file as if those contents had appeared in the source program at the point where the directive appears.
http://msdn.microsoft.com/en-us/library/36k2cdd4(VS.80).aspx
include, ifdef, etc. Are all preprocessor directives, so they must have the pound (or hash) character in front of them. The coder who wrote this code simply lined up all of those # characters on the left side to make to code look cleaner (in his opinion).
cplusplus.com has a good overview of preprocessor directives.

#include <> files in different files

If I have a several header files :lets say 1.h, 2.h, 3.h.
Let's say the all three of the header files have #include <stdlib.h> and one of the include files in them.
When I have to use all 3 header files in a C file main.c,
it will have 3 copies of #include <stdlib.h> after the preprocessor.
How does the compiler handle this kind of conflict?
Is this an error or does this create any overhead?
If there are no header guards, what will happen?
Most C headers include are wrapped as follows:
#ifndef FOO_H
#define FOO_H
/* Header contents here */
#endif
The first time the preprocessor scans this, it will include the contents of the header because FOO_H is undefined; however, it also defines FOO_H preventing the header contents from being added a second time.
There is a small performance impact of having a header included multiple times: the preprocessor has to go to disk and read the header each time. This can be mitigated by adding guards in your C file to include:
#ifndef FOO_H
#include <foo.h>
#endif
This stuff is discussed in great detail in Large-Scale C++ Software Design (an excellent book).
This is usually solved with preprocessor statements:
#ifndef __STDLIB_H
#include <stdlib.h>
#define __STDLIB_H
#endif
Although I never saw it for common header files like stdlib.h, so it might just be necessary for your own header files.
The preprocessor will include all three copies, but header guards will prevent all but the first copy from being parsed.
Header guards will tell the preprocessor to convert subsequent copies of that header file to effectively nothing.
Response to edit:
Standard library headers will have the header guards. It would be very unusual and incorrect for them to not have the guards.
Similarly, it is your responsibility to use header guards on your own headers.
If header guards are missing, hypothetically, you will get a variety of errors relating to duplicate definitions.
Another point: You can redeclare a function (or extern variable) a bazillion times and the compiler will accept it:
int printf(const char*, ...);
int printf(const char*, ...);
is perfectly legal and has a small compilation overhead but no runtime overhead.
That's what happens when an unguarded include file is included more than once.
Note that it is not true for everything in an include file. You can't redeclare an enum, for example.
This is done by one of the two popular techniques, both of which are under stdlib's responsibility.
One is defining a unique constant and checking for it, to #ifdef out all the contents of the file if it is already defined.
Another is microsoft-specific #pragma once, that has an advantage of not having to even read the from the hard drive if it was already included (by remembering the exact path)
You must also do the same in all header files you produce. Or, headers that include yours will have a problem.
As far a I know regular include simply throws in the contents of another file. The standard library stdlib.h urely utilizes the code guards: http://en.wikipedia.org/wiki/Include_guard, so you end up including only one copy. However, you can break it (do try it!) if you do: #include A, #undef A_GUARD, #include A again.
Now ... why do you include a .h inside another .h? This can be ok, at least in C++, but it is best avoided. You can use forward declarations for that: http://en.wikipedia.org/wiki/Forward_declaration
Using those works for as long as your code does not need to know the size of an imported structure right in the header. You might want to turn some function arguments by value into the ones by reference / pointer to solve this issue.
Also, always utilize the include guards or #pragma once for your own header files!
As others have said, for standard library headers, the system must ensure that the effect of a header being included more than once is the same as the header being included once (they must be idempotent). An exception to that rule is assert.h, the effect of which can change depending upon whether NDEBUG is defined or not. To quote the C standard:
Standard headers may be included in any order; each may be included more than once in
a given scope, with no effect different from being included only once, except that the
effect of including <assert.h> depends on the definition of NDEBUG.
How this is done depends upon the compiler/library. A compiler system may know the names of all the standard headers, and thus not process them a second time (except assert.h as mentioned above). Or, a standard header may include compiler-specific magic (mostly #pragma statements), or "include guards".
But the effect of including any other header more than once need not be same, and then it is up to the header-writer to make sure there is no conflict.
For example, given a header:
int a;
including it twice will result in two definitions of a. This is a Bad Thing.
The easiest way to avoid conflict like this is to use include guards as defined above:
#ifndef H_HEADER_NAME_
#define H_HEADER_NAME_
/* header contents */
#endif
This works for all the compilers, and doesn't rely of compiler-specific #pragmas. (Even with the above, it is a bad idea to define variables in a header file.)
Of course, in your code, you should ensure that the macro name for include guard satisfies this:
It doesn't start with E followed by an uppercase character,
It doesn't start with PRI followed by a lowercase character or X,
It doesn't start with LC_ followed by an uppercase character,
It doesn't start with SIG/SIG_ followed by an uppercase character,
..etc. (That is why I prefer the form H_NAME_.)
As a perverse example, if you want your users guessing about certain buffer sizes, you can have a header like this (warning: don't do this, it's supposed to be a joke).
#ifndef SZ
#define SZ 1024
#else
#if SZ == 1024
#undef SZ
#define SZ 128
#else
#error "You can include me no more than two times!"
#endif
#endif

Resources