Cost of an #include in C

Cost of an #include in C - c

Say you #include and never use anything from stdio.h. What are the overhead costs associated with this?
I noticed a lot of network code includes all networking related headers they can think of in case they end up using something from one of them, so I was wondering if this is some sort of ease of use vs efficiency trade off or if there is no loss of efficiency.

Compile-time overhead, mostly, as the compiler has to include and parse that file.

It should impact compile speed only, not execution speed. As for the compile time overhead, for large projects it can be noticeable but the only way to know how it impacts your projects is to measure the compile times with and without the includes.

There is no runtime efficiency loss. It's more of a maintenance issue as having superfluous includes makes it unclear as to what libraries you are actually using.

Generally speaking, there is only compile time overhead not runtime overhead. The linker will only link against parts of the library that are really used by your program. Bad linkers will include the loading code for the referenced libraries even when they are not used at all, so you pay a little on startup time.

Precompiled headers in most popular compilers nowadays make the inclusion costs by #include pretty negligible. And at runtime nothing of that remains anyway since linkers are smart enough not to include things that aren't needed.

Including files that are not needed will lead to slightly increased compilation times, but will have no effect on the generated code.

Any overhead will primarily be at compile time rather than run-time.
The #include tells the compiler than it needs to include code from the referenced files. So it has to load the file when it comes across the reference. This will take a finite amount of time depending on where the file actually is and how large it is.
The only overhead at run-time would be if the compiler included code that wasn't referenced thus making the executable larger than it needed to be - which would potentially increase start-up times.

It slows the compile down, of course. In the general case, It can also cause your your .exe to contain global variables or even functions that you never actually use.
For the standard C-runtime headers, I'm not aware of any significant runtime cost. For other headers you need to be careful. Some of the Windows headers declare hundreds of UUIDs that can end up bloating your exe.
The way you find out if this is costing you anything at runtime is to look at the .map file that the linker generates. Are there any variables or functions there that you didn't expect?

Related

How to compile a normal .h-.c object build and get the same level of optimization as with a static "unity" build, in gcc?

I have been told that "unity builds" have a greater chance to inline everything if you make all the functions static, and thus make the binary more optimized and faster.
Personally I don't like them because the classic way is much more intuitive and modular, and you don't have to keep track of headers between branching .c files and main.c, and you don't have to have a master declaration header (basically emulating the normal way).
I don't care about compilation time, but I do care about efficiency of the program. So in my mind, the question is why wouldn't a compiler be able to do all these optimizations regardless of objects and whatnot, even if it had to compile twice or several times?
So how do I do that?

Header-only and static-inline-only library in C

I write small header-only and static-inline-only libraries in C. Would this be a bad idea when applied to big libraries? Or is it likely that the running time will be faster with the header-only version? Well, without considering the obvious compilation time difference.

Yes, it is a bad idea -- especially when integrated with larger libraries.
The problem of inline functions' complexity generally increases as these libraries are included and visible to more translations and more complex header inclusion graphs -- which is quite common with larger projects. It becomes much more time consuming to build as translation counts and dependencies increase. The increase is not typically linear complexity.
There are reasons this flies in C++, but not in C. inline export semantics differ. In short, you will end up producing tons of copies of functions in C (as well as functions' variables). C++ deduplicates them. C does not.
Also, inlining isn't a silver bullet for speed. The approach will often increase your code size and executable size. Large functions can create slower code. Copies of programs/functions can also make your program slower. Larger binaries take more time to link and initialize (=launch). Smaller is usually better.
It's better to consider alternatives, such as Link Time Optimizations, Whole Program Optimizations, Library design, using C++ -- and to avoid C definitions in headers.
Also keep in mind that the compiler can eliminate dead code, and the linker can eliminate unused functions.

I wrote a unit testing framework* as a single C89 header file. Essentially everything is a macro or marked static and link time optimisation (partly) deduplicates the result.
This is a win for ease of use as integration with build systems is trivial.
Compile times are OK, since this is C, but the resulting function duplication does bother me a little. It can therefore be used as header + source instead by setting a macro before #including in a single source file, e.g.
#define MY_LIB_HEADER_IMPLEMENTATION
#include "my_lib.h"
I don't think I would take this approach for a larger project, but I think it's optimal for what is essentially a set of unit testing macros.
in the "don't call us, we'll call you" sense

Is there a difference in a binary when using multiple files C as opposed to putting it all into a single file?

I know that multiple files will by far make code easier. However do they offer a performance difference between "jamming it all into one file" or will a modern compiler like gcc create the same binaries for both. When I say performance difference I mean file size, compile time, and running time.
This is for C only.

Arguably, compile times improve with multiple files, as you only need to recompile files that have changed (assuming you have a decent dependency-tracking build system).
Linking would probably take longer, as there's just more to do.
Traditionally, compilers have been unable to perform optimizations across multiple source files (things like inlining functions is tricky). So the resulting executable is likely to be different, and potentially slower.

There are more opportunities for optimization when everything is in a single file. E.g. gcc, starting with -O2, will inline some functions if their body is available, even if they aren't declared inline (even more functions are eligible for inlining with -O3). So there are differences in run time, and sometimes you even have a chance to notice them. Even more so with -fwhole-program, telling GCC that you don't care about out-of-line versions of external functions except main() (GCC behaves as if all your external functions became static).
Overall compile time may increase (because there is more stuff to analyze, and not all optimizer algorithms are linear) or decrease (when there's no need to parse the same headers multiple times). Binary size may increase (due to inlining, in exchange for running faster) or decrease (less likely; but sometimes, inlining simplifies caller's code to the point where code size decreases).
As of the ease of development and maintenance, you can use sqlite's approach: it has multiple source files, but they are jammed into one ("amalgamation") before compilation.

From some tests, compiling and linking take longer. You will receive a different binary, at least I did, however mine was within a byte of the other.
The all-in-one file ran in .000764 MS
The Multiple files version ran in .000769 MS
Do take the benchmark with a grain of salt, as I did put it together in about 5 minutes, and it was a tiny program.
So really no differences overall.

Good C header style

My C headers usually resemble the following style to avoid multiple inclusion:
#ifndef <FILENAME>_H
#define <FILENAME>_H
// define public data structures / prototypes, macros etc.
#endif /* !<FILENAME>_H */
However, in his Notes on Programming in C, Rob Pike makes the following argument about header files:
There's a little dance involving #ifdef's that can prevent a file being read twice, but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file that includes it. The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.
On the one hand, Pike is the only programmer I actually admire. On the other hand, putting several #ifdefs in multiple source files instead of putting one #ifdef in a single header file feels needlessly awkward.
What is the best way to handle the problem of multiple inclusion?

In my opinion, use the method that requires less of your time (which likely means putting the #ifdefs in the header files). I don't really mind if the compiler has to work harder if my resulting code is cleaner. If, perhaps, you are working on a multi-million line code base that you constantly have to fully rebuild, maybe the extra savings is worth it. But in most cases, I suspect that the extra cost is not usually noticeable.

Keep doing what you do - It's clear, less bug-prone, and well known by compiler writers, so not as inefficient as it maybe was a decade or two ago.
You could use the non-standard #pragma once - If you search, there's probably at least a bookshelf's worth of include guards vs pragma once discussion, so I'm not going to recommend one over the other.

Pike wrote some more about it in https://talks.golang.org/2012/splash.article:
In 1984, a compilation of ps.c, the source to the Unix ps command, was
observed to #include <sys/stat.h> 37 times by the time all the
preprocessing had been done. Even though the contents are discarded 36
times while doing so, most C implementations would open the file, read
it, and scan it all 37 times. Without great cleverness, in fact, that
behavior is required by the potentially complex macro semantics of the
C preprocessor.
Compilers have become quite clever since: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html, so this is less of an issue now.
The construction of a single C++ binary at Google can open and read
hundreds of individual header files tens of thousands of times. In
2007, build engineers at Google instrumented the compilation of a
major Google binary. The file contained about two thousand files that,
if simply concatenated together, totaled 4.2 megabytes. By the time
the #includes had been expanded, over 8 gigabytes were being delivered
to the input of the compiler, a blow-up of 2000 bytes for every C++
source byte.
As another data point, in 2003 Google's build system was moved from a
single Makefile to a per-directory design with better-managed, more
explicit dependencies. A typical binary shrank about 40% in file size,
just from having more accurate dependencies recorded. Even so, the
properties of C++ (or C for that matter) make it impractical to verify
those dependencies automatically, and today we still do not have an
accurate understanding of the dependency requirements of large Google
C++ binaries.
The point about binary sizes is still relevant. Compilers (linkers) are quite conservative regarding stripping unused symbols. How to remove unused C/C++ symbols with GCC and ld?
In Plan 9, header files were forbidden from containing further
#include clauses; all #includes were required to be in the top-level C file. This required some discipline, of course—the programmer was
required to list the necessary dependencies exactly once, in the
correct order—but documentation helped and in practice it worked very
well.
This is a possible solution. Another possiblity is to have a tool that manages the includes for you, for example MakeDeps.
There is also unity builds, sometimes called SCU, single compilation unit builds. There are tools to help manage that, like https://github.com/sakra/cotire
Using a build system that optimizes for the speed of incremental compilation can be advantageous too. I am talking about Google's Bazel and similar. It does not protect you from a change in a header file that is included in a large number of other files, though.
Finally, there is a proposal for C++ modules in the works, great stuff https://groups.google.com/a/isocpp.org/forum/#!forum/modules. See also What exactly are C++ modules?

The way you're currently doing it is the common way. Pike's method cuts a bit on compilation time, but with modern compilers probably not very much (when Pike wrote his notes, compilers weren't optimizer-bound), it clutters modules and its bug-prone.
You could still cut on multi-inclusion by not including headers from headers, but instead documenting them with "include <foodefs.h> before including this header."

I recommend you put them in the source-file itself. No need to complain about some thousand needless parsed lines of code with actual PCs.
Additionally - it is far more work and source if you check every single header in every source-file that includes the header.
And you would have to handle your header-files different from default- and other third-party-headers.

He may have had an argument the time he was writing this. Nowadays decent compilers are clever enough to handle this well.

I agree with your approach - as others have commented, its clearer, self-documenting, and lower maintenance.
My theory on why Rob Pike might have suggested his approach: He's talking about C, not C++.
In C++, if you have a lot of classes and you are declaring each one in its own header file, then you'll have a lot of header files. C doesn't really provide this kind of fine-grained structure (I don't recall seeing many single-struct C header files), and .h/.c file-pairs tend to be larger and contain something like a module or a subsystem. So, fewer header files. In that scenario Rob Pike's approach might work. But I don't see it as suitable for non-trivial C++ programs.

Any good reason to #include source (.c .cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)

As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)

Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.

If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.

I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).

I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.

I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight