Included files, all or nothing? - c

If I #include a file in C, do I get the entire contents of the file linked in, or just the parts I use?
If it has 10 functions in it, and I only use one of the functions, does the code for the other nine functions get included in my executable? This is especially relevant for me right now as I am working on a microcontroller and memory is precious.

Firstly, header files do not get "linked in". #include is basically a textual copy-paste feature. Everything from your include file gets pasted by preprocessor into the final translation unit, which will later be seamlessly processed by the compiler proper. The compiler proper knows nothing about any header files or #include directives.
Secondly, it means that if in your code you declared or defined some function or variable that you do not use, it is completely irrelevant whether it came from a header file through #include or was written directly in source file. There's absolutely no difference.
Thirdly, the question is: what exactly do you have in your header file that you include? Typically, header files do not define objects and functions, they simply declare them. Declarations do not produce any code, regardless whether you use the function or not. Declarations simply tell the compiler that the code (generated from the function definition) already exists elsewhere. Thus, as long as we are talking about typical header files, #include directives and header files by themselves have no effect on final code size.
Fourthly, if your header file is of some unusual kind that contains function (or object) definitions, then see "firstly" and "secondly" above. The compiler proper can see only one translation unit at a time, for which reason a typical strategy for the compiler proper is to completely discard unused entities with internal linkage (i.e. static objects and functions) and keep all entities with external linkage. Entities with external linkage cannot be discarded by compiler proper, since they might be needed in some other translation unit.
Fifthly, at linking stage linker can see the program in its entirety and, for that reason, can discard unused objects and functions, if it is advanced enough for that (and if you allow linker to do it). Meanwhile, inclusion-exclusion precision of a typical run-of-the-mill linker is limited to a single object file. Each object file is atomic to such linker. This means that if you want to be able to exclude unused functions on per-function basis, you might have to adopt "one function per object file" strategy, i.e. write one and only one function per .c file. Of course, this is only possible when you write your own code. If some third-party library you want to use does not adhere to this convention, then you might not be able to exclude individual functions.

If you #include a file in C, the entire contents of that file are added to your source file and compiled by your compiler. A header file, though, usually only has declarations of functions and no definitions (so no actual code is compiled).
The linker, on the other hand, takes all the functions from all the libraries and compiled source code and merges them into the final output file. At this time, the linker will discard any functions that you aren't using.
So, to answer your question: only the functions you use (and indirectly depend on) will be included in your final program file, and this is independent of what files you #include. Happy hacking!

You have to distinguish between different scenarios:
What does the included header file contain? Declarations of external functions only, or also static function definitions?
How are the implementations of the external functions distributed which are declared in that the header file you include? Are they all implemented in one .c file, or distributed across several .c files?
Regarding point 1: Only by #includeing external declarations, no other code will become part of your object file. And, definitions of static functions that are part of the header file, but which are not referenced by your code, may not become part of your object file - this is an optimization that is fairly common. It depends on your compiler, however.
Regarding point 2: Some linkers can only link whole object files, all or nothing. That means, if all the external functions declared in a header file are implemented in one .c file, and, if your code references at least one of these functions, chances are that you will get the whole object file, including all the other functions you don't use. Some linkers, however, can avoid this and remove unused parts when linking object files.
One brute-force approach to deal with non-optimizing linkers is, to put every external function into a .c file of its own. You will, however, have to find a way to deal with the situation that some of these functions refer to a common static function that is part of the original .c file...

Include simply presents the compiler ultimately with what looks like a single file (and if you do save-temps on GCC you will see that exactly a single file is presented to the actual compiler). It is no more complicated than that. So if you have some function prototypes or defines in your .c file then having them come from an include makes no difference whatsoever; the end result is the same.
If the things you include include code, functions, and not just prototypes, then it is the same as if you had those in the .c file itself. Whether or not those show up in the final binary has to do with whether or not you declared them as global or not using static, and then whether or not you optimized, etc. The same goes for variables and structures and other things.
Not all linkers are the same, but a common way to do it is whatever the compiler left in the object goes into the final binary. But if you take those objects and make a library out of them then some/many(?) linkers don’t suck everything into the binary on the portions that are required to resolve the dependencies.

Related

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

How to correctly include own libraries in function files and project files

I got stuck trying to do Exercise 8-3 of K&R, the goal of the exercise is to rewrite some functions of stdio.h such as fopen, fclose, fillbuf and flushbuf
here's how my source files are organized:
stdio.h: contains types and macro definitions, and the declarations of some functions proper to the library. all content of the file is enclosed between #ifndef #endif lines as follows:
#ifndef STDIO_H
#define STDIO_H
/* content of stdio.h */
#endif
myfunction.c: I have a .c file per function, each file has a #include "stdio.h" line to load all needed types definitions.
main.c: where I have code to test my functions, the main.c also has a #include "stdio.h" line.
my problem is the following: when I try to compile all my files using gcc I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h)...why is this happening ? I though the #ifndef line was to specifically prevent such errors.
more generally:
How would you go about making your own header files and library/function files and using them in your projects ?
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
Please become aware of the difference between a library and its header files.
A library is a (collection of) binary machine code (with some additional meta-data, e.g. relocation directives to the linker).
For example, on my Linux system, dynamic libraries are generally shared objects (e.g. /usr/lib/x86_64-linux-gnu/libgmp.so) and it makes absolutely no sense to try some preprocessor directive like #include "libgmp.so" //wrong.
But a library has some API. That API is given by some documentation and by some header file(s), e.g. gmp.h and you should #include "gmp.h" in any C code (your C translation unit) which uses it.
myfunction.c: I have a .c file per function
Having one file per function is often poor taste. You generally can group related functions. For example, in your case, you probably want to define your myfopen and myfclose functions in the same myopenclose.c translation unit (even if you don't have to) because these two functions are intimately related. As a rule of thumb, I prefer having source files of one or a few thousand lines each (but that is really a matter of taste, and some people like having many small files).
Remember that what the compiler really sees is the preprocessed form of code. Consider asking your compiler to produce that form (e.g. from foo.c you can get its preprocessed form foo.i with gcc -C -E -Wall foo.c > foo.i on my Linux desktop) and look into it. Try that on your own files (e.g. your myopenclose.c if you have one).
If you have many small files, the compiler is probably including the same headers in each of them, and these included declarations gets compiled every time. BTW, notice that gcc is only a driver program. Use it with -v flag. You'll see that it is running cc1 (the C compiler proper), as (the assembler), ld (the linker), etc.
I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h).
You probably should declare extern your _iob global variable in your stdio.h and define a global _iob in only one implementation file (perhaps myopenclose.c, if it is relevant) of your library.
Don't confuse definition and declaration (of variables, functions, types, etc.). Spend some time reading the C11 standard n1570. These words are defined there. As a rule of thumb, declarations should go into header .h files, definitions (of variables and functions) in implementation .c files (of course details are much more complex, you often but not always define types and struct in header files).
I strongly recommend using some Linux distribution (it is very developer- and student- friendly) and studying the source code of some existing free software C standard library (like musl-libc, whose code is quite readable). More generally, study the source code of existing free software projects (e.g. on github). They will inspire you.
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
This shows a lot of confusion (the above question does not make any sense). Read more about compilers (your cc1 program -started by gcc- is translating a .c file into some object file .o) and about linkers (your ld, generally started by gcc, is agglomerating several object files, processing relocations inside them, and producing an ELF library or an executable). The preprocessing (e.g. of #include directive) is done at compile time by cc1. The linker cannot see any header files (it only deals with object files or libraries).
If you rewrite some of the system declarations and functions, while at the same time including the system declarations, you can expect some collisions.
Header files (.h) contain code (usually only declarations) and the mechanism you describe (#ifndef STDIO_H) is to prevent multiple inclusions of the same header file - mainly because another include file (header) that has already been loaded might also include it. That result in the same kind of collision as you had.
In C, you could, for instance
make a new header file that contain your own declarations + the stdio ones that don't collide with yours
use the stdio declarations, and only write new functions that use the same structures, defines, enums etc... as stdio
rewrite the necessary declarations and code that allows you not to include the system headers anymore
use another naming convention, like my_iob in both your header file, and in your code.
The two last ones are probably the best in your case, since you still have some collisions coming from a header file.
For instance, your code might not include stdio.h, but another header file you include might do it, indirectly...

#include and what actually compiles

This is just a general compiler question, directed at C based languages.
If I have some code that looks like this:
#include "header1.h"
#include "header2.h"
#include "header3.h"
#include "header4.h" //Header where #define BUILD_MODULE is located
#ifdef BUILD_MODULE
//module code to build
#endif //BUILD_MODULE
Will all of the code associated with those headers get built even if BUILD_MODULE is not defined? The compiler just "pastes" the contents of headers correct? So this would essentially build a useless bunch or header code that just takes up space?
All of the text of the headers will be included in the compilation, but they will generally have little or no effect, as explained below.
C does not have any concept of “header code”. A compilation of the file in the question would be treated the same as if the contents of all the included files appeared in a single file. Then what matters is whether the contents define any objects or functions.
Most declarations in header files are (as header files are commonly used) just declarations, not definitions. They just tell the compiler about things; they do not actually cause objects or code to be created. For the most part, a compiler will not generate any data or code from declarations that are not definitions.
If the headers define external objects or functions, the compiler must generate data (or space) or code for them, because these objects or functions could be referred to from other source files to be compiled later and then linked with the object produced from the current compilation. (Some linkers can determine that external objects or functions are not used and discard them.)
If the headers define static objects or functions (to be precise, objects with internal or no linkage), then a compiler may generate data or code for these. However, the optimizer should see that these objects and functions are not referenced, and therefore generation may be suppressed. This is a simple optimization, because it does not require any complicated code or data analysis, simply an observation that nothing depends on the objects or functions.
So, the C standard does not guarantee that no data or code is generated for static objects or functions, but even moderate quality C implementations should avoid it, unless optimization is disabled.
Depends on the actual compiler. Optimizing compilers will not generate the output for unrequired code, whereas dumber compilers will.
gcc (a very common c compiler for open-source platforms) will optimize your code with the -O option, which will not generate unneeded expressions.
Code in #ifdef statements where the target is not defined will never generate output, as this would violate the language specifications.
Conceptually, at least, include/macro processing is a separate step from compilation. The main source file is read and a new temporary file is constructed containing all the included code. If anything is "#ifdefed out" then that code is not included in the temporary file. At the same time, the occurrences of macro names are replaced with the text they "expand" into. It is that resulting file, with all the includes included, etc, that is fed into the actual compiler.
Some compilers do this literally (and you can even "capture" the intermediate file) while others sort of simulate it (and actually require an entire separate step if you request that the intermediate file be produced). But most compilers have one means or another of producing the file for your examination.
The C/C++ standards lay out some rather arcane rules that must be followed to assure that any "simulated" implementation doesn't somehow change the behavior of the resulting code, vs the "literal" approach.

Header per source file

I'm trying to understand the purpose behind one header per each source file method. As I see it, headers are meant for sharing function declarations, typedef's and macro's between several files that utilize them. When you make a header file for your .c file it has the disadvantage that each time you want to see a function declaration or macro you need to refer to the header file, and generally it is simpler that everything is in one source file (not the whole software, of course).
So why do programmers use this method?
The header files in C separate declarations (which must be available to each .c file that uses the functions) from the definitions (which must be in one place). Further, they provide a little modularity, since you can put only the public interface into a header file, and not mention functions and static variables that should be internal to the .c file. That uses the file system to provide a public interface and private implementation.
The practice of one .h file to one .c file is mostly convenience. That way, you know that the declarations are in the .h file, and the definitions in the corresponding .c file.
Logical, structured organisation and small source files enable:
faster, better programming - breaking the code into more manageable and understandable chunks makes it easier to find, understand and edit the relevant code.
code re-usability - different "modules" of code can be separated into groups of source/header files that you can more easily integrate into different programs.
better "encapsulation" - only the .c files that specifically include that header can use the features from it, which helps you to minimise the relationships between different parts of your code, which aids modularity. It doesn't stop you using things from anywhere, but it helps you to think about why a particular c file needs to access functions declared in a particular header.
Aids teamwork - two programmers trying to change the same code file concurrently usually cause problems (e.g. exclusive locks) or extra work (e.g. code merges) that slow each other down.
faster compiles - if you have one header then every time you make a change in it you must recompile everything. With many small headers, only the .c files that #include the changed header must be rebuilt.
easier maintainability & refactoring - for all the above reasons
In particular, "one header for each source file" makes it very easy to find the declarations relevant to the c file you are working in. As soon as you start to coalesce multiple headers into a single file, it starts to become difficult to relate the c and h files, and ultimately makes building a large application much more difficult. If you're only working on a small application then it's still a good idea to get into the habit of using a scalable approach.
Programmers use this method because it allows them to separate interface from implementation while guaranteeing that client code and implementation agree on the declarations of the functions. The .h file is the "single point of truth" (see Don't Repeat Yourself) about the prototype of each function.
(Client code is the code that #include's the .h file in order to use the exported functions, but does not implement any of the functions in the .h.)
Because, as you said yourself, it is not feasible to put the "whole software" into one source file.
If your program is very small, then yes it's is simpler just to put everything in one .c file. As your program gets larger, it becomes helpful to organize things by putting related functions together in different .c files. Further, in the .h files you can restrict the declarations you give to declarations of things that are supposed to be used by things in other .c files. If a .c file doesn't contain anything that should be accessible outside itself, it needs no header.
For example, if .c has function foo() and fooHelper(), but nobody except foo() is supposed to call fooHelper() directly, then by putting foo() and fooHelper() into foo.c, only putting the declaration of foo() in foo.h, and declaring fooHelper() as static, it helps to enforce that other parts of your program should only access foo() and should not know or care about fooHelper(). Kind of a non object-oriented form of encapsulation.
Finally, make engines are generally smart enough to rebuild only those files which have changed since the last build, so splitting into multiple .c files (using .h files to share what needs to be shared) helps speed up builds.
You only put in your header file the bare minimum that other source files need to "see" in order to compile. I've seen some people that put everything non-code into the header file (all typedefs, all #define's, all structures, etc.) even if nothing else in the codebase will be using those. That makes the header file much harder to read for yourself and those who want to use your module.
You don't need one header per source file. One header per module, containing the public interface, and maybe an additional header containing private declarations etc shared between files in that module.
Generally a header for a source file method means that you declare only the functions from that compilation unit in that header.
That way you don't pollute with declarations you don't need. (in large software project might be a problem)
As for separate compilation units, these speed up the compilation and can help you avoid collisions if private symbols are declared static.

Organization of C files

I'm used to doing all my coding in one C file. However, I'm working on a project large enough that it becomes impractical to do so. I've been #including them together but I've run into cases where I'm #including some files multiple times, etc. I've heard of .h files, but I'm not sure what their function is (or why having 2 files is better than 1).
What strategies should I use for organizing my code? Is it possible to separate "public" functions from "private" ones for a particular file?
This question precipitated my inquiry. The tea.h file makes no reference to the tea.c file. Does the compiler "know" that every .h file has a corresponding .c file?
You should regard .h files as interface files of your .c file. Every .c file represents a module with a certain amount of functionality. If functions in a .c file are used by other modules (i.e. other .c files) put the function prototype in the .h interface file. By including the interface file in your original modules .c file and every other .c file you need the function in, you make this function available to other modules.
If you only need a function in a certain .c file (not in any other module), declare its scope static. This means it can only be called from within the c file it is defined in.
Same goes for variables that are used across multiple modules. They should go in the header file and there they have to marked with the keyword 'extern'. Note: For functions the keyword 'extern' is optional. Functions are always considered 'extern'.
The inclusion guards in header files help to not include the same header file multiple times.
For example:
Module1.c:
#include "Module1.h"
static void MyLocalFunction(void);
static unsigned int MyLocalVariable;
unsigned int MyExternVariable;
void MyExternFunction(void)
{
MyLocalVariable = 1u;
/* Do something */
MyLocalFunction();
}
static void MyLocalFunction(void)
{
/* Do something */
MyExternVariable = 2u;
}
Module1.h:
#ifndef __MODULE1.H
#define __MODULE1.H
extern unsigned int MyExternVariable;
void MyExternFunction(void);
#endif
Module2.c
#include "Module.1.h"
static void MyLocalFunction(void);
static void MyLocalFunction(void)
{
MyExternVariable = 1u;
MyExternFunction();
}
Try to make each .c focus on a particular area of functionality. Use the corresponding .h file to declare those functions.
Each .h file should have a 'header' guard around it's content. For example:
#ifndef ACCOUNTS_H
#define ACCOUNTS_H
....
#endif
That way you can include "accounts.h" as many times as you want, and the first time it's seen in a particular compilation unit will be the only one that actually pulls in its content.
Compiler
You can see an example of a C 'module' at this topic - Note that there are two files - the header tea.h, and the code tea.c. You declare all the public defines, variables, and function prototypes that you want other programs to access in the header. In your main project you'll #include and that code can now access the functions and variables of the tea module that are mentioned in the header.
It gets a little more complex after that. If you're using Visual Studio and many other IDEs that manage your build for you, then ignore this part - they take care of compiling and linking objects.
Linker
When you compile two separate C files the compiler produces individual object files - so main.c becomes main.o, and tea.c becomes tea.o. The linker's job is to look at all the object files (your main.o and tea.o), and match up the references - so when you call a tea function in main, the linker modifies that call so it actually does call the right function in tea. The linker produces the executable file.
There is a great tutorial that goes into more depth on this subject, including scope and other issue you'll run into.
Good luck!
-Adam
A couple of simple rules to start:
Put those declarations that you want to make "public" into the header file for the C implementation file you are creating.
Only #include header files in the C file that are needed to implement the C file.
include header files in a header file only if required for the declarations within that header file.
Use the include guard method described by Andrew OR use #pragma once if the compiler supports it (which does the same thing -- sometimes more efficiently)
To answer your additional question:
This
question precipitated my inquiry. The
tea.h file makes no reference to the
tea.c file. Does the compiler "know"
that every .h file has a corresponding
.c file?
The compiler is not primarily concerned with header files. Each invocation of the compiler compiles a source (.c) file into an object (.o) file. Behind the scenes (i.e. in the make file or project file) a command line equivalent to this is being generated:
compiler --options tea.c
The source file #includes all the header files for the resources it references, which is how the compiler finds header files.
(I'm glossing over some details here. There is a lot to learn about building C projects.)
As well as the answers supplied above, one small advantage of splinting up your code into modules (separate files) is that if you have to have any global variables, you can limit their scope to a single module by the use of the key word 'static'. (You could also apply this to functions). Note that this use of 'static' is different from its use inside a function.
Your question makes it clear that you haven't really done much serious development. The usual case is that your code will generally be far too large to fit into one file. A good rule is that you should split the functionality into logical units (.c files) and each file should contain no more than what you can easily hold in your head at one time.
A given software product then generally includes the output from many different .c files. How this is normally done is that the compiler produces a number of object files (in unix systems ".o" files, VC generates .obj files). It is the purpose of the "linker" to compose these object files into the output (either a shared library or executable).
Generally your implementation (.c) files contain actual executable code, while the header files (.h) have the declarations of the public functions in those implementation files. You can quite easily have more header files than there are implementation files, and sometimes header files can contain inline code as well.
It is generally quite unusual for implementation files to include each other. A good practice is to ensure that each implementation file separates its concerns from the other files.
I would recommend you download and look at the source for the linux kernel. It is quite massive for a C program, but well organised into separate areas of functionality.
The .h files should be used to define the prototypes for your functions. This is necessary so you can include the prototypes that you need in your C-file without declaring every function that you need all in one file.
For instance, when you #include <stdio.h>, this provides the prototypes for printf and other IO functions. The symbols for these functions are normally loaded by the compiler by default. You can look at the system's .h files under /usr/include if you're interested in the normal idioms involved with these files.
If you're only writing trivial applications with not many functions, it's not really necessary to modularize everything out into logical groupings of procedures. However, if you have the need to develop a large system, then you'll need to pay some consideration as to where to define each of your functions.

Resources