I'm trying to understand the purpose behind one header per each source file method. As I see it, headers are meant for sharing function declarations, typedef's and macro's between several files that utilize them. When you make a header file for your .c file it has the disadvantage that each time you want to see a function declaration or macro you need to refer to the header file, and generally it is simpler that everything is in one source file (not the whole software, of course).
So why do programmers use this method?
The header files in C separate declarations (which must be available to each .c file that uses the functions) from the definitions (which must be in one place). Further, they provide a little modularity, since you can put only the public interface into a header file, and not mention functions and static variables that should be internal to the .c file. That uses the file system to provide a public interface and private implementation.
The practice of one .h file to one .c file is mostly convenience. That way, you know that the declarations are in the .h file, and the definitions in the corresponding .c file.
Logical, structured organisation and small source files enable:
faster, better programming - breaking the code into more manageable and understandable chunks makes it easier to find, understand and edit the relevant code.
code re-usability - different "modules" of code can be separated into groups of source/header files that you can more easily integrate into different programs.
better "encapsulation" - only the .c files that specifically include that header can use the features from it, which helps you to minimise the relationships between different parts of your code, which aids modularity. It doesn't stop you using things from anywhere, but it helps you to think about why a particular c file needs to access functions declared in a particular header.
Aids teamwork - two programmers trying to change the same code file concurrently usually cause problems (e.g. exclusive locks) or extra work (e.g. code merges) that slow each other down.
faster compiles - if you have one header then every time you make a change in it you must recompile everything. With many small headers, only the .c files that #include the changed header must be rebuilt.
easier maintainability & refactoring - for all the above reasons
In particular, "one header for each source file" makes it very easy to find the declarations relevant to the c file you are working in. As soon as you start to coalesce multiple headers into a single file, it starts to become difficult to relate the c and h files, and ultimately makes building a large application much more difficult. If you're only working on a small application then it's still a good idea to get into the habit of using a scalable approach.
Programmers use this method because it allows them to separate interface from implementation while guaranteeing that client code and implementation agree on the declarations of the functions. The .h file is the "single point of truth" (see Don't Repeat Yourself) about the prototype of each function.
(Client code is the code that #include's the .h file in order to use the exported functions, but does not implement any of the functions in the .h.)
Because, as you said yourself, it is not feasible to put the "whole software" into one source file.
If your program is very small, then yes it's is simpler just to put everything in one .c file. As your program gets larger, it becomes helpful to organize things by putting related functions together in different .c files. Further, in the .h files you can restrict the declarations you give to declarations of things that are supposed to be used by things in other .c files. If a .c file doesn't contain anything that should be accessible outside itself, it needs no header.
For example, if .c has function foo() and fooHelper(), but nobody except foo() is supposed to call fooHelper() directly, then by putting foo() and fooHelper() into foo.c, only putting the declaration of foo() in foo.h, and declaring fooHelper() as static, it helps to enforce that other parts of your program should only access foo() and should not know or care about fooHelper(). Kind of a non object-oriented form of encapsulation.
Finally, make engines are generally smart enough to rebuild only those files which have changed since the last build, so splitting into multiple .c files (using .h files to share what needs to be shared) helps speed up builds.
You only put in your header file the bare minimum that other source files need to "see" in order to compile. I've seen some people that put everything non-code into the header file (all typedefs, all #define's, all structures, etc.) even if nothing else in the codebase will be using those. That makes the header file much harder to read for yourself and those who want to use your module.
You don't need one header per source file. One header per module, containing the public interface, and maybe an additional header containing private declarations etc shared between files in that module.
Generally a header for a source file method means that you declare only the functions from that compilation unit in that header.
That way you don't pollute with declarations you don't need. (in large software project might be a problem)
As for separate compilation units, these speed up the compilation and can help you avoid collisions if private symbols are declared static.
Related
What if you have a minimal amount of structures, functions and macros but want to exclude them from the source file to convert the source code into a more concise and readable format and reduce the amount of lines of code.
Is structures, functions or macros/data in general accessible/viewable from examining the binary even if the data is not called within the source code? And if so how?
For the sake of readability is it safe to cut structures, functions and macros from a source file into a header file that is used by multiple source files even if some source files don't use all of the structures, functions and macros (for small header files)?
Is structures, functions or macros/data in general accessible/viewable
from examining the binary even if the data is not called within the
source code?
Depends on what you build. If you build a library (.a, .so, .lib, .dll, whatever) they're probably in there and everything in that library is accessible in some way. If you build an executable the linker will most likely remove unused code.
Have a look at nm
For the sake of readability is it safe to cut structures, functions
and macros from a source file into a header file that is used by
multiple source files
Yes and no. Put declarations of functions/structs in header files and their implementations in .c files. Don't put a lot of unrelated functions and structs in one header. You end up including all those declarations in every source file even though you're using 5% of them. That means extra work for your compiler, probably some extra work for your linker and extra work for your and future programmers brains when reading all this unnecessary stuff.
So, guessing what's happening in your code base, you probably want to put them in seperate header files.
Be careful when using macros and even more when putting them in header files. You should avoid this most of the time.
even if some source files don't use all of the structures, functions
and macros
That is quite common. You include some standard C headers too and don't use all of the functions and structs in there right? Just (as I said) put together what belongs together.
Why should I use another source code file to share code or a function between many programs and use the linker instead of using a header file only? (I read this in Head First C but I didn't understand what is the point of it)
Generally, header files should only be used to declare your functions/structs/classes.
The actual implementation should be created in a separate .c file which then can be built and exported as a binary along with the header.
Keeping the implementation in the header has many drawbacks.
Bigger footprint - the header size will be bigger since you have
more symbols in it.
You cannot hide the implementation from the end-user.
The compile-time will be a lot larger since all code has to be processed every time it is included by the compiler.
Just to name a few. They might be many more reasons.
However, there are some cases when it is okay/better to include some logic in the header files.
For example for inline functions which may improve the runtime of the application while maintaining good code quality and/or templates in C++.
Generally, header files contain declarations, like function signatures, while the function definitions (the actual source code) are located in separate source files.
Multiple files can include the same header file and share function declarations at compile time. But they also must share the source files (the files must be linked together) in order to have access to the function code at run time.
If a header file is just a piece of code that gets pasted onto another when I use #include then what is stopping me from programming in C using only .h files? You can even #include into other header files!
I assume there must be some performance or workflow reason why this isn't more common, but if it exists I do not know what it is.
Therefore my question is: What is the reason people don't program entire applications with just header files?
Header files come with the concept of modularisation, i.e. of separating the source code of a large program into several independent parts, i.e. translation units. Thereby, implementation details are hidden to other translation units, and dependencies are dramatically reduced. But: if a translation unit A needs to call a function from another translation B, it needs to have the function prototype of the respective function, i.e. the function without the body - something like int functionFromB(int x); in order to tell the compiler how to call it. So translation unit A could simple write this prototype at the beginning; but usually functions from translation unit B (e.g. B.cpp) are exposed in a header file B.h, which comprises all the "public" functions of B in form of function prototypes. Same applies to type definitions and (global) variables. Then A simply can include B.h in order to have all the function prototypes et al. available without the necessity of knowing all the implementation details (like the function bodies).
So you can write a large program completely in .h-files; yet you have to tell the compiler to treat them as translation units (usually only .cpp-files are treated as such), and you still have to provide function prototypes et al...
With translation units, you have separate / independent modules. This is in contrast to a large monolithic block, which you get when you "paste" all the chunks of your program spread over different .h-files by #include-ing them "together". You can compile / test / distribute translation units separately, whereas a monolithic block cannot be compiled / tested / distributed partly.
Header files are a necessity of C and C++ when you need to reference code or data structures across different files, something especially important when external programs that need to link against a library and the compiler has to understand how to use it.
It's advantageous to break up your application into a series of .c or .cpp files in order to make the compilation process more efficient. Most compiler environments, where they're Makefile driven or IDE managed, have methods for detecting which files need to be recompiled when a change is made.
In larger applications building all files can take considerable time, but recompiling a single .cpp file is often fairly quick. So long as you change only the .cpp source and don't touch the headers you can do a quick recompile and relink, ready for testing right away.
If instead you put everything into a header file then you'd need to recompile everything, every time, which can be a painfully slow process.
Keep in mind some code bases can take hours to rebuild, so this is not a sustainable practice.
I thought of a strange analogy.
You go to a restaurant.
The waiter presents you with a menu. The menu is your interface to the kitchen.
You pick the dish(es) you want to order.
Then,
Option 1:
The waiter asks you to move to the kitchen and see for yourself how the dishes are prepared. You remain in the kitchen until the food is served to you.
Option 2:
The waiter brings the food to your table after it is prepared in a kitchen that you don't necessarily see.
Which one would you prefer?
Putting the implementation in the .h file is analogous to Option 1.
Putting the implementation in a .c/.cpp file somewhere else is analogous to Option 2.
Header files are different from source files by convention. Using the .h or .hpp extension communicates that the file is intended to be #included and is not meant to exist as a standalone source file. You can generally assume that .h/.hpp files are safe to include from multiple source files.
Meanwhile, .c and .cpp extensions communicate that the file likely is intended to be a translation unit and is not suitable to be #included in other translation units.
You could very well write an entire codebase with every file having any arbitrary extension, or none at all, if you really want to make it hard on yourself and anybody else working in the codebase.
Assuming that I work on a big project in C with multiple .c files, is there any reason why I should prefer to have multiple header files instead of a single header file?
And another question:
Let's say that I have 3 files: header.h, main.c and other.c.
I have a function named func() that is defined and used only in the file other.c. Should I place the function prototype in the header file or in the file other.c ?
Multiple headers vs a single header.
A primary reason for using multiple headers is that some of the code may be usable independently of the rest, and that code should probably have its own header. In the extreme, each source file (or small group of source files) that provides a service should have its own header that defines the interface to the service.
Also note that what goes in the header is the information needed to use the module — function declarations and type declarations needed by the function declarations (you don't have global variables, do you?). The header should not include headers only needed by the implementation of the module. It should not define types only needed by the implementation of the module. It should not define functions that are not part of the formal interface of the module (functions used internally by the module).
All functions in a module that can be static should be static.
You might still have an omnibus header for your current project that includes all, or most, or the separate headers, but if you think of headers as defining the interfaces to modules, you will find that most consumer modules don't need to know about all possible provider modules.
The function func() is only used in other.c so the function should be made static so that it is only visible in other.c. It should not go in a header unless some other file uses the function — and at that point, it is crucial that it does go into a header.
You may find useful information in these other questions, and there are, no doubt, a lot of other questions that would help too:
What are extern variables in C?
Where to document functions in C?
Design principles — Best practices and design patterns for C
Should I use #include in headers?
If it's a BIG project, you almost certainly HAVE to have multiple headerfiles to make anything sensible out of your project.
I have worked on projects that have several thousand source files, and many hundred header files, totalling millions of lines. You couldn't put all those headerfiles together into one file, and do any meaningful work.
A headerfile should provide one "funcionality". So, if you have a program dealing with customer accounts, stock, invoices, and such, you may have one "customer.h", a "stock.h" and a "invoice.h". You'll probably also have a "dateutils.h" for calculating the "when does this invoice need to be paid by, and how long is it since the invoice was sent out, to send out reminders.
In general, keeping headerfiles SMALL is a good thing. If one headerfile needs something from another one, have it include that.
Of course, if a function is not used outside a particular file, it should not go in a headerfile, and to avoid "leaking names", it should be static. E.g:
static void func(int x)
{
return x * 2;
}
If, for some reason, you need to forward declare func (because some function before func needs to call func), then declare it at the beginning of the source file. There is no need to "spread it around" by adding it to a header file.
By marking it static, you are making it clear that "nobody else, outside this file, uses this function". If at a later stage, you find that "Hmm, this func is really useful in module B as well", then add it to a suitable header file (or make a new header file), and remove the static. Now, anyone reading the source file knows that they will need to check outside of this source file to make sure that any changes to func are OK in the rest of the code.
Commonly, there is a header file per module describing its interface for clean separation of concerns/readability/re-usability.
If the function in other.c is local, there is no need to include it in the header file.
I've worked with a number of C projects during my programming career and the header file structures usually fall into one of these two patterns:
One header file containing all function prototypes
One .h file for each .c file, containing prototypes for the functions defined in that module only.
The advantages of option 2 are obvious to me - it makes it cheaper to share the module between multiple projects and makes dependencies between modules easier to see.
But what are the advantages of option 1? It must have some advantages otherwise it would not be so popular.
This question would apply to C++ as well as C, but I have never seen #1 in a C++ project.
Placement of #defines, structs etc. also varies but for this question I would like to focus on function prototypes.
I think the prime motivation for #1 is ... laziness. People think it's either too hard to manage the dependencies that splitting things into separate files can make more obvious, and/or think it's somehow "overkill" to have separate files for everything.
It can also, of course, often be a case of "historical reasons", where the program or project grew from something small, and no-one took the time to refactor the header files.
Option 1 allows for having all the definitions in one place so that you have to include/search just one file instead of having to include/search many files. This advantage is more obvious if your system is shipped as a library to a third party - they don't care much about your library structure, they just want to be able to use it.
Another reason for using a different .h for every .c is compile time. If there is just one .h (or if there are more of them but you are including them all in every .c file), every time you make a change in the .h file, you will have to recompile every .c file. This, in a large project, can represent a valuable amount of time being lost, which can also break your workflow.
1 is just unnecessary. I can't see a good reason to do it, and plenty to avoid it.
Three rules for following #2 and have no problems:
start EVERY header file with a
#ifndef _HEADER_Namefile
#define _HEADER_Namefile_
end the file with
#endif
That will allow you to include the same header file multiple times on the same module (innadvertely may happen) without causing any fuss.
you can't have definitions on your header files... and that's something everybody thinks he/she knows, about function prototypes, but almost ever ignores for global variables.
If you want a global variable, which by definition should be visible outside it's defining C module, use the extern keyword:
extern unsigned long G_BEER_COUNTER;
which instructs the compiler that the G_BEER_COUNTER symbol is actually an unsigned long (so, works like a declaration), that on some other module will have it's proper definition/initialization. (This also allows the linker to keep the resolved/unresolved symbol table.) The actual definition (same statement without extern) goes in the module .c file.
only on proven absolute necessity do you include other headers within a header file. include statements should only be visible on .c files (the modules). That allows you to better interpret the dependecies, and find/resolve issues.
I would recommend a hybrid approach: making a separate header for each component of the program which could conceivably be used independently, then making a project header that includes all of them. That way, each source file only needs to include one header (no need to go updating all your source files if you refactor components), but you keep a logical organization to your declarations and make it easy to reuse your code.
There is also I believe a 3rd option: each .c has its own .h, but there is also one .h which includes all other .h files. This brings the best of both worlds at the expense of keeping a .h up to date, though that could done automatically.
With this option, internally you use the individual .h files, but a 3rd party can just include the all-encompassing .h file.
When you have a very large project with hundreds/thousands of small header files, dependency checking and compilation can significantly slow down as lots of small files must be opened and read. This issue can be often solved by using precompiled headers.
In C++ you would definitely want one header file per class and use pre-compiled headers as mentioned above.
One header file for an entire project is unworkable unless the project is extremely small - like a school assignment
That depends on how much functionality is in one header/source file. If you need to include 10 files just to, say, sort something, it's bad.
For example, if I want to use STL vectors I just include and I don't care what internals are necessary for vector to be used. GCC's includes 8 other headers -- allocator, algobase, construct, uninitialized, vector and bvector. It would be painful to include all those 8 just to use vector, would you agree?
BUT library internal headers should be as sparse as possible. Compilers are happier if they don't include unnecessary stuff.