Is it right to simply include all header files? - c

Remembering the names of system header files is a pain...
Is there a way to include all existing header files at once?
Why doesn't anyone do that?

Including unneeded header files is a very bad practice. The issue of slowing down compilation might or might not matter; the bigger issue is that it hides dependencies. The set of header files you include in a source file should is the documentation of what functionality the module depends upon, and unlike external documentation or comments, it is automatically checked for completeness by the compiler (failing to include needed header files will result in an error). Ensuring the absence of unwanted dependencies not only improves portability; it also helps you track down unneeded and potentially dangerous interactions, for instance cases where a module which should be purely computational or purely data structure management is accessing the filesystem.
These principles apply whether the headers are standard system headers or headers for modules within your own program or third-party libraries.

Your source code files are preprocessed before the compiler looks at them, and the #include statement is one of the directives that the preprocessor uses. When being preprocessed, #include statements are replaced with the entire contents of the file being included. The result of including all of the system files would be very large source files that the compiler then needs to work through, which will cost a lot of time during compilation.

No one includes all the header files. There are too many, and a few of them are mutually exclusive with other files (like ncurses.h and curses.h).
It really is not that bad when writing a program even from scratch. A few are quite easy to remember: stdio.h for any FILE stuff; ctype.h for any character classification, alloc.h for any use of malloc(), etc.
If you don't remember one:
leave the #include out
compile
examine first few error messages for indication of a missing header file, such as some type not declared, or calling a function with assumed parameter types
figure out which function call is the cause
look at the man page (or whatever documentation your compiler has) for that function
notice the #include shown by the documentation and add it
repeat until all errors fixed
It is quite a bit easier for adding to an existing code base. You could go hundreds or thousands of working hours and never have to add a #include.

No it is a terrible idea and will massively increase your compile times and possible make your exe a lot larger by including massive amounts of unused code.

I know what you're talking about, but I need to double-check the function prototypes for the functions I'm using (for ones I don't use daily, anyway) -- I'll just copy and paste the #includes straight out of the manpage for the associated functions. I'm already looking at the manpage (it's a simple K in vim(1)), so it doesn't feel like an extra burden.

You can create a "master" header, where you put all your includes into. Then in everything else include it! Beware of conflicting definitions and circular references... So.... Master1.h, master2.h, ...
Not advocating it. Just saying.

Related

Problems of including too many header files in C

Does including too many header files increase the size of the source file. Does it also increase the size of executable? Do these header files increase the compilation time?
For example if i add these header files in my program do they increase of size of source file or executable file or both?
#include <stdio.h>
#include "header1.h"
#include "header2.h"
What are the other problems of including too many header files?
Does including too many header files increase the size of the source file.
It increases with as many letters as you type. So if 1 letter is 1 byte on your system, then adding #include <stdio.h> increases the source code file size by at least 18 bytes. This shouldn't matter to you unless you are using a computer from the mid-1980s.
Does it also increase the size of executable?
No. Only used functions increase the size of the executable.
Do these header files increase the compilation time?
Generally yes, though compilers use various tricks such as "precompiled headers" for its own libraries. Again, this isn't a problem unless you are using 1980s stuff or worse (such as Eclipse).
What are the other problems of including too many header files?
Your main concern about including headers should be to not include stuff that you don't use. Every include creates a dependency, and also means more identifiers and symbols added to the global namespace.
Does including too many header files increase the size of the source file.
Yes, For each additional character added to a source file, for example "#include <stdio.h>" increases the physical size of the source file precisely by the number of characters in that statement, eg": strlen("#include <stdio.h>"); bytes. (and, depending on how OS allocates file block size, it could be seen by the OS as an extra kByte.) More importantly though, at compile time the contents of each header file #included will effectively be expanded into source code that is fed to the compiler.
Does it also increase the size of executable?
Yes/No/Possibly. Depending on what is actually used in the header file. Optimizing compilers can exclude whatever is not needed in an executable. If nothing is used, there will no additional size to the executable. There will however be additional work done during compile-time because even if there is nothing useful in the header file, compiler does not know this until it is processed.
Do these header files increase the compilation time?
Compared to what? i.e. If there are, within a header file, necessary components to allow a build to occur, i.e. by containing prototypes of functions, #defines, etc, then compile time is just normal compile time. But if you have been compiling with say 3 necessary header files for awhile, and decide that you want to add a new library (and it's corresponding header file.), then by all means, yes, the next compile will take a little longer than those previous.
What are the other problems of including too many header files?
Too many?. If each and every header file is necessary, then there are not too many. (with this caveat about maximum header file depth.) However, if a header file is found to be unnecessary, its code bloat, get rid of it. It adds to compile time, as each header file regardless of whether there is anything useful in it has to be looked at by the build process. Even worse unnecessary header files add complexity and difficulty to the tasks of future maintainers.
There is a good post here discussing this in more detail.
Additionally, this is a fun page that also discusses header files.
Header files should not produce any extra code unless they are poorly designed, an d you will probably encounter redefinition problems if they do and you include them more than once. By code here I mean "machine code", that is executable code.
About the source code, the compiler ultimately sees the source code as one big source code with the #include directives replaced by the content of the file, so adding more header files will increase the compile time (as the apparent source code will be longer). So including unnesessary files should be avoided.
Adding header files will increase the size of the intermediate source file, taking into account the inclusions. Modern compilers may not even generate this intermediate file explicitly -- it may be absorbed into the overall compilation process. This is a matter of compiler design. As a developer, you probably won't ever see the fully-expanded file unless you ask for it (e.g., gcc -E).
Adding header files will not necessarily increase the size of the compiled code -- if all the headers contain is declarations and constant definitions, they won't increase the size much, if at all. If they contain actual code -- which isn't a particularly common practice -- they might have some small effect on the executable size.
Adding header files will probably have some effect on the compilation time but, really, this isn't a question anybody should be asking. If you need the headers, you need the headers. If it slows down compilation, what's the alternative? Don't compile?
If the question is really about how to distribute code between headers and source files, so as to improve some aspect of the build process, then that's a very complicated question to answer. If the question is about what harm is done by including a bunch of headers you don't use, the answer with modern compilers is: very little, from a functional perspective. However, including some header gives the reader the impression that the source actually uses the features it declares, and that's bad for readability. You should do your future self, or your colleagues, a favour and try not to include headers that aren't used. But if you need them, you need them, and there's little point worry about the consequences too much,
Does including too many header files increase the size of the source file?
The more characters in the source file, the more size has the source file.
But it's only about the #include directives itself. Not the content of header files - the source file doesn't get expanded by the content of the headers.
When you #include a header, the compiler gets known about to read it at that point of time, but the source file isn't changed.
So, yes.
Does it also increase the size of executable?
Depends on the content of the headers. If they contain definitions then yes.
Do these header files increase the compilation time?
The more to read and evaluate from the compiler, the longer the time to compile. So yes.
What are the other problems of including too many header files?
As said before, the time to evaluate might take longer and thus the more header files, the slower the compilation. But there is nothing wrong to add as much useful headers as you like. Just don't add unnecessary headers, which slow down the compilation.

Why use object files in C?

When I compile a C program, for ease I've been including the source file for a certain header at the end. So, if main.c includes util.h, util.h will have all the headers util.c will use, outlines types or structs, etc, then at the very end it include util.c. Then, when I compile I only have to use gcc main.c -o main, and the rest is all taken care of.
I've been looking up C coding standards, trying to figure out what the best way to do things is, and there are just so many, and so many conflicting opinions I don't know what to think. Why do so many places reccomend compiling object files individually instead of including all of them in a web? util never touches anything but util.c, so the two are perfectly independent, and in theory (my theory) it would be fine, but I'm probably wrong since this is computer science and people are wrong even when they're right, so if I'm already wrong I'm probably wrong.
Some people say header files should ONLY be prototypes, and the source file be the one that includes it, and it's necessary system headers. From purely as aesthetic point of view I much prefer having all the info (types, system headers used, prototypes) in the header (in this case util.h) and having ONLY function code in util.c (excluding one "#include "util.h"" at the very top).
I guess the point I'm getting at is, with all this stuff that works, selecting a method sounds arbitrary to someone who doesn't understand the background (me). Please tell me why and what.
While your program is small, this will work. At some point, however, your program will get large enough that recompiling the whole program every time you change one line is a pain in the rear.
This -- even more than avoiding editing huge files -- is the reason to split up your program. If main.c and util.c are seperately compiled into object files, changing one line in a function in main.c will no longer require you to recompile all the code in util.c.
By the time your program is made up of a few dozen files, this will be a big win.
I think the point is that you want to include only what is needed for that file to be independent. This reduces overall compilation times by allowing the compiler to only read the headers that are necessary rather repeatedly reading every header when it might not need to. For example, if your util.c method utilises functions and/or types in <stdio.h> but your util.h doesn't, then you would want to include <stdio.h> only in util.c so that when the compiler compiles util.c it only then includes <stdio.h>, but if you include <stdio.h> in your util.h instead, then every source file that includes util.h is also including <stdio.h> whether it needs it or not.
This is very negligible for small projects with only a handful of files, but proper header inclusion can affect compilation times for larger projects.
With regards to the question about "object files": when you compile a source file into an object file, you create a shortcut that allows a build system to only recompile the source files that have outdated object files. This is an effective way to significantly reduce compilation times especially for large projects.
First, including a .c file from a .h file is completely bass-ackwards.
The "standard" way of doing it follows a line of thought roughly like this:
You have a library, containing dozens of functions. Keeping everything in one big source file means that anyone using your library would have to link the whole library, even if he uses only a single function of it. (Imagine linking the whole C standard library for a puts( "Hello" ).)
So you split things across multiple source files, which are compiled individually. Whenever you make changes to one of your functions, you have to re-translate only one small source file and update the library archive (or executable) - instead of re-translating the whole thing every time. (This is still an issue, because code sizes have somewhat kept up with CPU improvements. Compiling something like the Boost lib can still take several minutes on not-too-fancy hardware...)
Now you are in a pinch, however. The function is defined inside the .c file, and the corresponding .o file can conveniently be linked (via a .a archive if need be). However, to actually address the function (provided by the .o file) properly from another source file (a.k.a. "translation unit"), your compiler needs to know the function name, its parameter list, and its return type. This is why the declaration of the function (i.e., the function head without its body) is put in a separate header (.h) file.
Other source files can now #include the header file, address the function properly (without the compiler being aware of what the function actually does), and when all parts of your library / program are compiled into .o files, then everything is linked together.
The source file includes its own header basically to make sure the two files agree on the function declaration. ;-)
That's about it, as far as I can be bothered to write it up right now. Putting everything into one monolithic source file is barely acceptable (actually, no, it isn't, not for anything beyond about 200 lines), but including the .c file at the end of the .h file either means you learned your C coding by looking at god-awful code instead of a good book, or whoever tutored you should never tutor another person on C coding in his life. No offense intended. ;-)
PS: Header files also provide a good summary / oversight of a piece of code. Languages that don't provide headers - Java, for example - need IDE's or documentation tools to extract this kind of information. Personally, I found header files to be a benefit, not a liability.
Please use *.h and *.c files as customary: *.h files are #included in *.c files; *.h contain only macro definitions, data type declarations, function declarations, and extern data declarations. All definitions are in *.c files. That is how everybody else organizes C programs, do your fellow humans (who some day might need to understand your program) a favor. If something in file.c is used outside, you'd write file.h containing the declarations of whatever in that file is to be used outside, and include that in file.c (to check that declarations and definitions agree) and in all using *.c files. If a bunch of *.h are always included together, it might mean that the splitup into *.c isn't right (or at least that of the *.h; perhaps you should make one .h including all those declarations, and creating *.h for internal use where needed among the group of related *.c files).
[If a program written as you outline crosses my path, I can assure you I'll avoid it like the plague. The extra obfuscation might be wellcome in IOCCC, but not by me. It is a sure sign of somebody who doesn't know how to organize a program cleanly, and so the program probably isn't worth trying it out.]
Re: Separate compilation: You break up a C program so the pieces are easier to understand, you can hide details of how things work in the C files (think static), this provides support for Parnas' modularity. It also means that if you change a file, you don't have to recompile everything.
Re: Differing C programming standards: Yes, there are lots of them around. Pick one you feel confortable with, and stick to that. If you work on a project, adhere to their standards.
The "include in a single translation unit" approach becomes very inefficient for any significantly sized project, it is impractical for projects that are distributed amongst multiple developers.
Morover when creating static libraries, if everything in the library were from a single translation unit, any code linked to it would get all the library code regardless of whether it is referenced or not.
A project using a build manager such as make or the features available in most IDEs uses header file dependencies to allow an incremental build; only compiling those sources that are modified or dependent on modified files. The dependencies are determined by the file inclusions, so minimising redundant dependencies speeds build time.
A typical commercial project can comprise hundreds of thousands of lines of code and a few hundred source files; full rebuild times can vary from minutes to hours. If in your development cycle you have to wait that long between code changes and test, productivity would be very low!

Any good reason to #include source (*.c *.cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

C project structure - header-per-module vs. one big header

I've worked with a number of C projects during my programming career and the header file structures usually fall into one of these two patterns:
One header file containing all function prototypes
One .h file for each .c file, containing prototypes for the functions defined in that module only.
The advantages of option 2 are obvious to me - it makes it cheaper to share the module between multiple projects and makes dependencies between modules easier to see.
But what are the advantages of option 1? It must have some advantages otherwise it would not be so popular.
This question would apply to C++ as well as C, but I have never seen #1 in a C++ project.
Placement of #defines, structs etc. also varies but for this question I would like to focus on function prototypes.
I think the prime motivation for #1 is ... laziness. People think it's either too hard to manage the dependencies that splitting things into separate files can make more obvious, and/or think it's somehow "overkill" to have separate files for everything.
It can also, of course, often be a case of "historical reasons", where the program or project grew from something small, and no-one took the time to refactor the header files.
Option 1 allows for having all the definitions in one place so that you have to include/search just one file instead of having to include/search many files. This advantage is more obvious if your system is shipped as a library to a third party - they don't care much about your library structure, they just want to be able to use it.
Another reason for using a different .h for every .c is compile time. If there is just one .h (or if there are more of them but you are including them all in every .c file), every time you make a change in the .h file, you will have to recompile every .c file. This, in a large project, can represent a valuable amount of time being lost, which can also break your workflow.
1 is just unnecessary. I can't see a good reason to do it, and plenty to avoid it.
Three rules for following #2 and have no problems:
start EVERY header file with a
#ifndef _HEADER_Namefile
#define _HEADER_Namefile_
end the file with
#endif
That will allow you to include the same header file multiple times on the same module (innadvertely may happen) without causing any fuss.
you can't have definitions on your header files... and that's something everybody thinks he/she knows, about function prototypes, but almost ever ignores for global variables.
If you want a global variable, which by definition should be visible outside it's defining C module, use the extern keyword:
extern unsigned long G_BEER_COUNTER;
which instructs the compiler that the G_BEER_COUNTER symbol is actually an unsigned long (so, works like a declaration), that on some other module will have it's proper definition/initialization. (This also allows the linker to keep the resolved/unresolved symbol table.) The actual definition (same statement without extern) goes in the module .c file.
only on proven absolute necessity do you include other headers within a header file. include statements should only be visible on .c files (the modules). That allows you to better interpret the dependecies, and find/resolve issues.
I would recommend a hybrid approach: making a separate header for each component of the program which could conceivably be used independently, then making a project header that includes all of them. That way, each source file only needs to include one header (no need to go updating all your source files if you refactor components), but you keep a logical organization to your declarations and make it easy to reuse your code.
There is also I believe a 3rd option: each .c has its own .h, but there is also one .h which includes all other .h files. This brings the best of both worlds at the expense of keeping a .h up to date, though that could done automatically.
With this option, internally you use the individual .h files, but a 3rd party can just include the all-encompassing .h file.
When you have a very large project with hundreds/thousands of small header files, dependency checking and compilation can significantly slow down as lots of small files must be opened and read. This issue can be often solved by using precompiled headers.
In C++ you would definitely want one header file per class and use pre-compiled headers as mentioned above.
One header file for an entire project is unworkable unless the project is extremely small - like a school assignment
That depends on how much functionality is in one header/source file. If you need to include 10 files just to, say, sort something, it's bad.
For example, if I want to use STL vectors I just include and I don't care what internals are necessary for vector to be used. GCC's includes 8 other headers -- allocator, algobase, construct, uninitialized, vector and bvector. It would be painful to include all those 8 just to use vector, would you agree?
BUT library internal headers should be as sparse as possible. Compilers are happier if they don't include unnecessary stuff.

untangling .h dependencies

What do you do when you have a set of .h files that has fallen victim to the classic 'gordian knot' situation, where to #include one .h means you end up including almost the entire lot? Prevention is clearly the best medicine, but what do you do when this has happened before the vendor (!) has shipped the library?
Here's an extension to the question, and this is probably the more pertinent question -- should you even attempt to disentangle the dependencies in the first place?;
I've done this on a C++ code base that was already split into many libraries (which was a good start).
I had to workout (or guess) which library was the most depended upon, which depended upon nothing else in the code base. I then processed each library in turn.
I looked at each module (*.cpp files) in turn and made sure that its own header was #included first and commented out the rest, then I commented out all the #includes in that header file and then re-compiled just that module to let the compiler tell me what was needed. I would un-comment the first header that seemed to be needed, and reviewed that one, recursing as necessary. It was interesting to see how many headers ended up not being needed.
Where only the name is needed (because you have a pointer or reference) use class name; or struct name;, which is called forward declaration and avoid #including the header file.
The compiler is very helpful in telling you what the dependencies are when you comment out #includes (you need to recompile with ALL the compilers you have to maintain portability).
Sometimes I had to move modules between libraries so that no pairs or groups of libraries were mutually dependant.
As you have the opportunity, you should refactor the code to reduce includes that are too large, however that assumes you can achieve some sort of package cohesion. If you disentangle things just to discover that every user of the code has to include all the elements anyway, the end result is the same.
Another option is to use #defines to configure sections on and off. Regardless, for an existing code base the solution is to move toward package cohesion.
Read: http://ivanov.files.wordpress.com/2007/02/sedpackages.pdf and research issues related to package cohesion.
I've untangled that knot a few times, and it generally helps a lot when maintaining a system to reduce the .h dependencies as much as possible. There are decent tools for generating dependency trees ( I was using Klocwork at the time ).
The downside I found was with conditional compilation. Someone might remove a header file because they think we don't need it, but it turns out that we only don't need it because VxWorks has some screwed up headers... on Solaris (or any reasonable Posix system) you do need it.
There is a balance to be struck between an enormous number of finely organized headers and a single header that includes everything. Consider the Standard C library; there are some biggish headers like <stdio.h>, which declares a lot of functions, but they are all related to I/O. There are other headers that are more of a miscellany - notably <stdlib.h>.
The Goddard Space Flight Center guidelines for C are worth hunting down.
The basic rule is that each header should declare the facilities provided by a suitable (usually small) set of source files. The facilities and header should be self-contained. That is, if someone needs the code in header "something.h", then that should be the only header that must be added to the compilation. If there are facilities needed by "something.h" that are not declared in the header, then it must include the relevant headers. That can mean that headers end up including <stddef.h> because one of the functions uses size_t, for example.
As #quamrana points out, you can use forward declarations for structures (not classes, since the question is tagged C and not C++) when appropriate - which primarily means when the interface takes pointers and does not need to know the size of the structures or any of the members.

Resources