Do C compilers remove duplicate common includes from included libraries? - c

I am new to C and am just learning the basics of modularising my code for neatness and maintainability. I am reading a lot of people saying not to include .c files directly but instead to use .h files with associated .c files.
My question is, when writing a library which is exposed/included via its .h file - does the compiler dedupe common includes or are the included each time they are referenced?
For instance in my above application, I am using printf in my main and also in my foo library.
When running:
gcc -o app main foo/foo.c && ./app
I get the expected outputs printed to the console, however does the compiler remove duplicates of the <stdio.h> include or is it included once for main.c and once again for foo.c?

No, the compiler does not remove them. Nor should it, because sometimes (although it's rare) headers are written with the purpose of being included several times with different effects each time. So the compiler can't just omit these subsequent inclusions.
That's why people put include guards in headers (#ifndef FOO_H_ in this case.)
Each file, regardless of whether is a .h or .c file, should include what it needs. It should not rely that a header has already been included somewhere else. If something is included twice in the current compilation unit, the include guards will make sure headers are only included once, regardless of how many files try to include them.
As a side note, #pragma once, even though it's not in the C standard, is a de-facto standard compiler extension. So you can use just do:
#pragma once
void foo();
It's one of those rare cases where a non-standard compiler extension is so widely supported that it's safe to use.

In contrary, each compilation unit ("main.c" and "foo.c" in your case) needs that include. Otherwise the compiler would not know the prototype of printf()(note). Each compilation unit (aka "module") is compiled on its own.
You might mix up headers and linkable files (object code files, and libraries).
The contents of a header file replaces the #include line during preprocessing. "stdio.h" contains only the prototype of printf(), among a lot of other stuff, not the implementation of the function.
If the compiler generates the object code for "main.c" and "foo.c", each of them includes an unresolved reference to printf().
Finally the linker will include the object code for printf(), but just once. This single instance of the function is called by both callers. Here happens what you seem to ask.
You might wonder why you don't have to add the library to your command line. This is a convenience feature of most compiler drivers, as nearly all applications want the standard libraries. You might like to add "-v" to the command line to see what really happens. Other options can suppress this automation.
Note: Some compilers are quite smart and know a lot of standard functions. They will accept the source and produce a nice warning. But don't rely on this.

Related

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

What is the difference between stdio.c and stdio.h?

Couldn't stdio functions and variables be defined in header files without having to use .c files.
If not, what are .c files used for?
The functions defined in the header file have to be implemented. The .c file contains the implementation, though these have already been compiled into a static or shared library that your compiler can use.
The header file should contain a minimal description of the function to save time when compiling. If it included the entire source it'd force the compiler to rebuild it each and every time you compile which is really wasteful since that source never changes.
In effect, the header file serves as a cheat sheet on how to interact with the already compiled library.
The reason the .c files are provided is primarily for debugging, so your debugger can step through in your debug build and show you source instead of raw machine code. In rare cases you may want to look at the implementation of a particular function in order to better understand it, or in even more rare cases, identify a bug. They're not actually used to compile your program.
In your code you should only ever reference the header file version, the .h via an #include directive.
stdio.h is a standard header, required to be provided by every conforming hosted C implementation. It declares, but does not define, a number of entities, mostly library functions like putchar and scanf.
stdio.c, if it exists, is likely to be a C source file that defines the functions declared in stdio.h. There is no requirement that an implementation must make it available. It might not even exist; for example the implementations of the functions declared in stdio.h might appear in multiple *.c files.
The declaration of putchar is:
int putchar(int c);
and that's all the compiler needs to know when it sees a call to putchar in your program. The code that implements putchar is typically provided as machine code, and the linker's job is to resolve your putchar() call so it ends up invoking that code. putchar() might not even be written in C (though it probably is).
An executable program can be built from multiple *.c source files. One and only one copy of the code that implements putchar is needed for an entire program. If the implementation of putchar were in the header file, then it would be included in each separately compiled source file, creating conflicts and, at best, wasting space. The code that implements putchar() (and all the other functions in the library) only needs to be compiled once.
The .c files has specific function for any aim. For example stdio.c files has standart input-output functions to use within C program. In stdio.h header files has function prototypes for all stdio.c functions, all defines, all macros etc. When you #include <stdio.h> in your main code.c file your main code assumes there is a " int printf(const char *format, ...)" function. Returns int value and you can pass argument ..... etc. When you call printf() function actually you use stdio.c files..
There are languages where if you want to make use of something someone else has written, you say something like
import module
and that takes care of everything.
C is not one of those languages.
You could put "library" source code in a file, and then use #include to pull it in wherever you needed it. But this wouldn't work at all, for two reasons:
If you used #include to pull it in from two different source files, and then linked the two resulting object files together, everything in the "library" would be defined twice.
You might not want to deliver your "library" code as source; you might prefer to deliver it in compiled, object form.

How to correctly include own libraries in function files and project files

I got stuck trying to do Exercise 8-3 of K&R, the goal of the exercise is to rewrite some functions of stdio.h such as fopen, fclose, fillbuf and flushbuf
here's how my source files are organized:
stdio.h: contains types and macro definitions, and the declarations of some functions proper to the library. all content of the file is enclosed between #ifndef #endif lines as follows:
#ifndef STDIO_H
#define STDIO_H
/* content of stdio.h */
#endif
myfunction.c: I have a .c file per function, each file has a #include "stdio.h" line to load all needed types definitions.
main.c: where I have code to test my functions, the main.c also has a #include "stdio.h" line.
my problem is the following: when I try to compile all my files using gcc I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h)...why is this happening ? I though the #ifndef line was to specifically prevent such errors.
more generally:
How would you go about making your own header files and library/function files and using them in your projects ?
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
Please become aware of the difference between a library and its header files.
A library is a (collection of) binary machine code (with some additional meta-data, e.g. relocation directives to the linker).
For example, on my Linux system, dynamic libraries are generally shared objects (e.g. /usr/lib/x86_64-linux-gnu/libgmp.so) and it makes absolutely no sense to try some preprocessor directive like #include "libgmp.so" //wrong.
But a library has some API. That API is given by some documentation and by some header file(s), e.g. gmp.h and you should #include "gmp.h" in any C code (your C translation unit) which uses it.
myfunction.c: I have a .c file per function
Having one file per function is often poor taste. You generally can group related functions. For example, in your case, you probably want to define your myfopen and myfclose functions in the same myopenclose.c translation unit (even if you don't have to) because these two functions are intimately related. As a rule of thumb, I prefer having source files of one or a few thousand lines each (but that is really a matter of taste, and some people like having many small files).
Remember that what the compiler really sees is the preprocessed form of code. Consider asking your compiler to produce that form (e.g. from foo.c you can get its preprocessed form foo.i with gcc -C -E -Wall foo.c > foo.i on my Linux desktop) and look into it. Try that on your own files (e.g. your myopenclose.c if you have one).
If you have many small files, the compiler is probably including the same headers in each of them, and these included declarations gets compiled every time. BTW, notice that gcc is only a driver program. Use it with -v flag. You'll see that it is running cc1 (the C compiler proper), as (the assembler), ld (the linker), etc.
I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h).
You probably should declare extern your _iob global variable in your stdio.h and define a global _iob in only one implementation file (perhaps myopenclose.c, if it is relevant) of your library.
Don't confuse definition and declaration (of variables, functions, types, etc.). Spend some time reading the C11 standard n1570. These words are defined there. As a rule of thumb, declarations should go into header .h files, definitions (of variables and functions) in implementation .c files (of course details are much more complex, you often but not always define types and struct in header files).
I strongly recommend using some Linux distribution (it is very developer- and student- friendly) and studying the source code of some existing free software C standard library (like musl-libc, whose code is quite readable). More generally, study the source code of existing free software projects (e.g. on github). They will inspire you.
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
This shows a lot of confusion (the above question does not make any sense). Read more about compilers (your cc1 program -started by gcc- is translating a .c file into some object file .o) and about linkers (your ld, generally started by gcc, is agglomerating several object files, processing relocations inside them, and producing an ELF library or an executable). The preprocessing (e.g. of #include directive) is done at compile time by cc1. The linker cannot see any header files (it only deals with object files or libraries).
If you rewrite some of the system declarations and functions, while at the same time including the system declarations, you can expect some collisions.
Header files (.h) contain code (usually only declarations) and the mechanism you describe (#ifndef STDIO_H) is to prevent multiple inclusions of the same header file - mainly because another include file (header) that has already been loaded might also include it. That result in the same kind of collision as you had.
In C, you could, for instance
make a new header file that contain your own declarations + the stdio ones that don't collide with yours
use the stdio declarations, and only write new functions that use the same structures, defines, enums etc... as stdio
rewrite the necessary declarations and code that allows you not to include the system headers anymore
use another naming convention, like my_iob in both your header file, and in your code.
The two last ones are probably the best in your case, since you still have some collisions coming from a header file.
For instance, your code might not include stdio.h, but another header file you include might do it, indirectly...

Do including header files make the program heavier in c

Do including header files such as stdio.h, conio.h or any other makes our code or program heavier
Including header files inserts all the content from them into the translation unit on pre-processing.
If the include has only declarations(which is usually the case) and the functions are implemented in library files, the code doesn't get heavier. If the header files include implementation, it will be compiled on compilation, thus making the file heavier.
You can read more about compilation steps here:
http://www.tenouk.com/ModuleW.html
Why don't you try including them and building an EXE, then not including them and building an EXE, and see what happens to the file size. I suspect you'll find that the linker is clever enough to only build what's necessary into the EXE.
For typical release builds, no. Header files typically only contain declarations, and unused declarations do not contribute to the size of release builds. Header files can contain inline function definitions, which might be emitted by your compiler if the definition is not optimized out. However, this will probably not apply to system headers like <stdio.h>.
For typical debug builds, yes. Debugging data often contains information about declarations even if those declarations aren't used in the program. This way you can use those declarations in the debugger. Modern debuggers include function definitions, enums, and even preprocessor definitions these days.
That said, you can put anything in a header file.
The primary effect of unnecessary header file inclusions is to make the build process take longer.
Normally a header file contains declarative statements only. Declarations in themselves do not add to code size. Referencing the declared symbols anywhere in the code will cause the linker to include the code defining the declared symbols.
Because a header file must be parsed by the compiler in the context of the code in which it is included, the build time can be extended by header file inclusion. Very large or deeply nested headers such as windows.h for example can have a considerable effect in this way.
Additionally if you are using an IDE with code navigation and comprehension facilities such as auto-complete and syntax checking, the IDE must parse the code in a similar manner to the compiler, and here again header files can slow that process down.
While you should avoid including unnecessary header files, those containing declarations fuse in teh code are unavoidable - or at least avoiding them will lead likely lead to errors, repetition and make the code hard to maintain.

Do I need to compile the header files in a C program?

Sometimes I see someone compile a C program like this:
gcc -o hello hello.c hello.h
As I know, we just need to put the header files into the C program like:
#include "somefile"
and compile the C program: gcc -o hello hello.c.
When do we need to compile the header files or why?
Firstly, in general:
If these .h files are indeed typical C-style header files (as opposed to being something completely different that just happens to be named with .h extension), then no, there's no reason to "compile" these header files independently. Header files are intended to be included into implementation files, not fed to the compiler as independent translation units.
Since a typical header file usually contains only declarations that can be safely repeated in each translation unit, it is perfectly expected that "compiling" a header file will have no harmful consequences. But at the same time it will not achieve anything useful.
Basically, compiling hello.h as a standalone translation unit equivalent to creating a degenerate dummy.c file consisting only of #include "hello.h" directive, and feeding that dummy.c file to the compiler. It will compile, but it will serve no meaningful purpose.
Secondly, specifically for GCC:
Many compilers will treat files differently depending on the file name extension. GCC has special treatment for files with .h extension when they are supplied to the compiler as command-line arguments. Instead of treating it as a regular translation unit, GCC creates a precompiled header file for that .h file.
You can read about it here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
So, this is the reason you might see .h files being fed directly to GCC.
Okay, let's understand the difference between active and passive code.
The active code is the implementation of functions, procedures, methods, i.e. the pieces of code that should be compiled to executable machine code. We store it in .c files and sure we need to compile it.
The passive code is not being execute itself, but it needed to explain the different modules how to communicate with each other. Usually, .h files contains only prototypes (function headers), structures.
An exception are macros, that formally can contain an active pieces, but you should understand that they are using at the very early stage of building (preprocessing) with simple substitution. At the compile time macros already are substituted to your .c file.
Another exception are C++ templates, that should be implemented in .h files. But here is the story similar to macros: they are substituted on the early stage (instantiation) and formally, each other instantiation is another type.
In conclusion, I think, if the modules formed properly, we should never compile the header files.
When we include the header file like this: #include <header.h> or #include "header.h" then your preprocessor takes it as an input and includes the entire file in the source code. the preprocessor replaces the #include directive by the contents of the specified file.
You can check this by -E flag to GCC, which generates the .i (information file) temporary file or can use the cpp(LINUX) module specifically which is automatically used by the compiler driver when we execute GCC.
So its actually going to compile along with your source code, no need to compile it.
In some systems, attempts to speed up the assembly of fully resolved '.c' files call the pre-assembly of include files "compiling header files". However, it is an optimization technique that is not necessary for actual C development.
Such a technique basically computed the include statements and kept a cache of the flattened includes. Normally the C toolchain will cut-and-paste in the included files recursively, and then pass the entire item off to the compiler. With a pre-compiled header cache, the tool chain will check to see if any of the inputs (defines, headers, etc) have changed. If not, then it will provide the already flattened text file snippets to the compiler.
Such systems were intended to speed up development; however, many such systems were quite brittle. As computers sped up, and source code management techniques changed, fewer of the header pre-compilers are actually used in the common project.
Until you actually need compilation optimization, I highly recommend you avoid pre-compiling headers.
I think we do need preprocess(maybe NOT call the compile) the head file. Because from my understanding, during the compile stage, the head file should be included in c file. For example, in test.h we have
typedef enum{
a,
b,
c
}test_t
and in test.c we have
void foo()
{
test_t test;
...
}
during the compile, i think the compiler will put the code in head file and c file together and code in head file will be pre-processed and substitute the code in c file. Meanwhile, we'd better to define the include path in makefile.
You don't need to compile header files. It doesn't actually do anything, so there's no point in trying to run it. However, it is a great way to check for typos and mistakes and bugs, so it'll be easier later.

Resources