Where are the header functions defined? [duplicate] - c

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.

You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.

In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.

If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.

It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

Related

What is the difference between stdio.c and stdio.h?

Couldn't stdio functions and variables be defined in header files without having to use .c files.
If not, what are .c files used for?
The functions defined in the header file have to be implemented. The .c file contains the implementation, though these have already been compiled into a static or shared library that your compiler can use.
The header file should contain a minimal description of the function to save time when compiling. If it included the entire source it'd force the compiler to rebuild it each and every time you compile which is really wasteful since that source never changes.
In effect, the header file serves as a cheat sheet on how to interact with the already compiled library.
The reason the .c files are provided is primarily for debugging, so your debugger can step through in your debug build and show you source instead of raw machine code. In rare cases you may want to look at the implementation of a particular function in order to better understand it, or in even more rare cases, identify a bug. They're not actually used to compile your program.
In your code you should only ever reference the header file version, the .h via an #include directive.
stdio.h is a standard header, required to be provided by every conforming hosted C implementation. It declares, but does not define, a number of entities, mostly library functions like putchar and scanf.
stdio.c, if it exists, is likely to be a C source file that defines the functions declared in stdio.h. There is no requirement that an implementation must make it available. It might not even exist; for example the implementations of the functions declared in stdio.h might appear in multiple *.c files.
The declaration of putchar is:
int putchar(int c);
and that's all the compiler needs to know when it sees a call to putchar in your program. The code that implements putchar is typically provided as machine code, and the linker's job is to resolve your putchar() call so it ends up invoking that code. putchar() might not even be written in C (though it probably is).
An executable program can be built from multiple *.c source files. One and only one copy of the code that implements putchar is needed for an entire program. If the implementation of putchar were in the header file, then it would be included in each separately compiled source file, creating conflicts and, at best, wasting space. The code that implements putchar() (and all the other functions in the library) only needs to be compiled once.
The .c files has specific function for any aim. For example stdio.c files has standart input-output functions to use within C program. In stdio.h header files has function prototypes for all stdio.c functions, all defines, all macros etc. When you #include <stdio.h> in your main code.c file your main code assumes there is a " int printf(const char *format, ...)" function. Returns int value and you can pass argument ..... etc. When you call printf() function actually you use stdio.c files..
There are languages where if you want to make use of something someone else has written, you say something like
import module
and that takes care of everything.
C is not one of those languages.
You could put "library" source code in a file, and then use #include to pull it in wherever you needed it. But this wouldn't work at all, for two reasons:
If you used #include to pull it in from two different source files, and then linked the two resulting object files together, everything in the "library" would be defined twice.
You might not want to deliver your "library" code as source; you might prefer to deliver it in compiled, object form.

How to correctly include own libraries in function files and project files

I got stuck trying to do Exercise 8-3 of K&R, the goal of the exercise is to rewrite some functions of stdio.h such as fopen, fclose, fillbuf and flushbuf
here's how my source files are organized:
stdio.h: contains types and macro definitions, and the declarations of some functions proper to the library. all content of the file is enclosed between #ifndef #endif lines as follows:
#ifndef STDIO_H
#define STDIO_H
/* content of stdio.h */
#endif
myfunction.c: I have a .c file per function, each file has a #include "stdio.h" line to load all needed types definitions.
main.c: where I have code to test my functions, the main.c also has a #include "stdio.h" line.
my problem is the following: when I try to compile all my files using gcc I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h)...why is this happening ? I though the #ifndef line was to specifically prevent such errors.
more generally:
How would you go about making your own header files and library/function files and using them in your projects ?
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
Please become aware of the difference between a library and its header files.
A library is a (collection of) binary machine code (with some additional meta-data, e.g. relocation directives to the linker).
For example, on my Linux system, dynamic libraries are generally shared objects (e.g. /usr/lib/x86_64-linux-gnu/libgmp.so) and it makes absolutely no sense to try some preprocessor directive like #include "libgmp.so" //wrong.
But a library has some API. That API is given by some documentation and by some header file(s), e.g. gmp.h and you should #include "gmp.h" in any C code (your C translation unit) which uses it.
myfunction.c: I have a .c file per function
Having one file per function is often poor taste. You generally can group related functions. For example, in your case, you probably want to define your myfopen and myfclose functions in the same myopenclose.c translation unit (even if you don't have to) because these two functions are intimately related. As a rule of thumb, I prefer having source files of one or a few thousand lines each (but that is really a matter of taste, and some people like having many small files).
Remember that what the compiler really sees is the preprocessed form of code. Consider asking your compiler to produce that form (e.g. from foo.c you can get its preprocessed form foo.i with gcc -C -E -Wall foo.c > foo.i on my Linux desktop) and look into it. Try that on your own files (e.g. your myopenclose.c if you have one).
If you have many small files, the compiler is probably including the same headers in each of them, and these included declarations gets compiled every time. BTW, notice that gcc is only a driver program. Use it with -v flag. You'll see that it is running cc1 (the C compiler proper), as (the assembler), ld (the linker), etc.
I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h).
You probably should declare extern your _iob global variable in your stdio.h and define a global _iob in only one implementation file (perhaps myopenclose.c, if it is relevant) of your library.
Don't confuse definition and declaration (of variables, functions, types, etc.). Spend some time reading the C11 standard n1570. These words are defined there. As a rule of thumb, declarations should go into header .h files, definitions (of variables and functions) in implementation .c files (of course details are much more complex, you often but not always define types and struct in header files).
I strongly recommend using some Linux distribution (it is very developer- and student- friendly) and studying the source code of some existing free software C standard library (like musl-libc, whose code is quite readable). More generally, study the source code of existing free software projects (e.g. on github). They will inspire you.
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
This shows a lot of confusion (the above question does not make any sense). Read more about compilers (your cc1 program -started by gcc- is translating a .c file into some object file .o) and about linkers (your ld, generally started by gcc, is agglomerating several object files, processing relocations inside them, and producing an ELF library or an executable). The preprocessing (e.g. of #include directive) is done at compile time by cc1. The linker cannot see any header files (it only deals with object files or libraries).
If you rewrite some of the system declarations and functions, while at the same time including the system declarations, you can expect some collisions.
Header files (.h) contain code (usually only declarations) and the mechanism you describe (#ifndef STDIO_H) is to prevent multiple inclusions of the same header file - mainly because another include file (header) that has already been loaded might also include it. That result in the same kind of collision as you had.
In C, you could, for instance
make a new header file that contain your own declarations + the stdio ones that don't collide with yours
use the stdio declarations, and only write new functions that use the same structures, defines, enums etc... as stdio
rewrite the necessary declarations and code that allows you not to include the system headers anymore
use another naming convention, like my_iob in both your header file, and in your code.
The two last ones are probably the best in your case, since you still have some collisions coming from a header file.
For instance, your code might not include stdio.h, but another header file you include might do it, indirectly...

Included files, all or nothing?

If I #include a file in C, do I get the entire contents of the file linked in, or just the parts I use?
If it has 10 functions in it, and I only use one of the functions, does the code for the other nine functions get included in my executable? This is especially relevant for me right now as I am working on a microcontroller and memory is precious.
Firstly, header files do not get "linked in". #include is basically a textual copy-paste feature. Everything from your include file gets pasted by preprocessor into the final translation unit, which will later be seamlessly processed by the compiler proper. The compiler proper knows nothing about any header files or #include directives.
Secondly, it means that if in your code you declared or defined some function or variable that you do not use, it is completely irrelevant whether it came from a header file through #include or was written directly in source file. There's absolutely no difference.
Thirdly, the question is: what exactly do you have in your header file that you include? Typically, header files do not define objects and functions, they simply declare them. Declarations do not produce any code, regardless whether you use the function or not. Declarations simply tell the compiler that the code (generated from the function definition) already exists elsewhere. Thus, as long as we are talking about typical header files, #include directives and header files by themselves have no effect on final code size.
Fourthly, if your header file is of some unusual kind that contains function (or object) definitions, then see "firstly" and "secondly" above. The compiler proper can see only one translation unit at a time, for which reason a typical strategy for the compiler proper is to completely discard unused entities with internal linkage (i.e. static objects and functions) and keep all entities with external linkage. Entities with external linkage cannot be discarded by compiler proper, since they might be needed in some other translation unit.
Fifthly, at linking stage linker can see the program in its entirety and, for that reason, can discard unused objects and functions, if it is advanced enough for that (and if you allow linker to do it). Meanwhile, inclusion-exclusion precision of a typical run-of-the-mill linker is limited to a single object file. Each object file is atomic to such linker. This means that if you want to be able to exclude unused functions on per-function basis, you might have to adopt "one function per object file" strategy, i.e. write one and only one function per .c file. Of course, this is only possible when you write your own code. If some third-party library you want to use does not adhere to this convention, then you might not be able to exclude individual functions.
If you #include a file in C, the entire contents of that file are added to your source file and compiled by your compiler. A header file, though, usually only has declarations of functions and no definitions (so no actual code is compiled).
The linker, on the other hand, takes all the functions from all the libraries and compiled source code and merges them into the final output file. At this time, the linker will discard any functions that you aren't using.
So, to answer your question: only the functions you use (and indirectly depend on) will be included in your final program file, and this is independent of what files you #include. Happy hacking!
You have to distinguish between different scenarios:
What does the included header file contain? Declarations of external functions only, or also static function definitions?
How are the implementations of the external functions distributed which are declared in that the header file you include? Are they all implemented in one .c file, or distributed across several .c files?
Regarding point 1: Only by #includeing external declarations, no other code will become part of your object file. And, definitions of static functions that are part of the header file, but which are not referenced by your code, may not become part of your object file - this is an optimization that is fairly common. It depends on your compiler, however.
Regarding point 2: Some linkers can only link whole object files, all or nothing. That means, if all the external functions declared in a header file are implemented in one .c file, and, if your code references at least one of these functions, chances are that you will get the whole object file, including all the other functions you don't use. Some linkers, however, can avoid this and remove unused parts when linking object files.
One brute-force approach to deal with non-optimizing linkers is, to put every external function into a .c file of its own. You will, however, have to find a way to deal with the situation that some of these functions refer to a common static function that is part of the original .c file...
Include simply presents the compiler ultimately with what looks like a single file (and if you do save-temps on GCC you will see that exactly a single file is presented to the actual compiler). It is no more complicated than that. So if you have some function prototypes or defines in your .c file then having them come from an include makes no difference whatsoever; the end result is the same.
If the things you include include code, functions, and not just prototypes, then it is the same as if you had those in the .c file itself. Whether or not those show up in the final binary has to do with whether or not you declared them as global or not using static, and then whether or not you optimized, etc. The same goes for variables and structures and other things.
Not all linkers are the same, but a common way to do it is whatever the compiler left in the object goes into the final binary. But if you take those objects and make a library out of them then some/many(?) linkers don’t suck everything into the binary on the portions that are required to resolve the dependencies.

How does #including standard library work?

My basic question is how the compilation process works to use standard library routines. When I #include <stdio.h> in C, does the preprocessor take the entire standard library and paste it into my source file?
If this is so, when I use a library routine, how is it that the linker is involved?
The preprocessor is, as its name implies, a program that runs before the compiler. All it does is simple text substitutions.
When a #include directive is found, it simply "pastes" the complete file into the place where the directive was. The same applies to macro expansions, when a macro "call" is detected, the body of the macro is "pasted" into its place.
The preprocessor has nothing to do with libraries. It's just that C (and C++) needs to have all its functions and variables declared before they are used, and so putting the declarations in a header file that is included by the preprocessor is a simple way to get these declarations from libraries.
There are basically two types of libraries: Header only libraries, and libraries you need to link with. The first type, header only libraries, are exactly what the name implies: They are fully contained in the header files you include. However, the vast majority of libraries are libraries you need to link with. This is done in a step after the compiler has done its work, by a special program. How this is used depends on the environment of course.
In general, compilation of a program can be divided into these steps:
Editing
Preprocessor
Compiler
Linker
The editing step is what you do to create your source.
The preprocessor and compilation steps are often put together into a single step, which is probably why there is some confusion among beginners as to what the preprocessor really does.
The final step, linking, is taking the input from the compiler, and uses that together with the libraries you specified to create the final executable.
When I pound include in C, does the preprocessor take the entire standard library and paste it into my source file?
Only the header files you #include.
If this is so, when I use a library routine, how is it that the linker is involved?
The standard library headers contain declarations only. The definition (implementation) of the functions is in a library file, most likely /usr/lib/libc.ext (ext being an OS-dependent extension).
When you #include something in your source code, the preprocessor pastes whatever you #include into your source file.
But specifically, if you include a header file from a library, you are just including function declarations like void a();, and the linker finds implementations of these functions in the library itself.

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

Resources