Single Source Code vs Multiple Files + Libraries - c

How much effect does having multiple files or compiled libraries vs. throwing everything (>10,000 LOC) into one source have on the final binary? For example, instead of linking a Boost library separately, I paste its code, along with my original source, into one giant file for compilation. And along the same line, instead of feeding several files into gcc, pasting them all together, and giving only that one file.
I'm interested in the optimization differences, instead of problems (horror) that would come with maintaining a single source file of gargantuan proportions.
Granted, there can only be link-time optimization (I may be wrong), but is there a lot of difference between optimization possibilities?

If the compiler can see all source code, it can optimize better if your compiler has some kind of Interprocedural Optimization (IPO) option turned on. IPO differs from other compiler optimization because it analyzes the entire program; other optimizations look at only a single function, or even a single block of code
Here is some interprocedural optimization that can be done, see here for more:
Constant propagation
mod/ref analysis
Alias analysis
Forward substitution
Routine key-attribute propagation
Partial dead call elimination
Symbol table data promotion
Dead function elimination
Whole program analysis
GCC supports this kind of optimization.
This interprocedural optimization can be used to analyze and optimize the function being called.
If compiler can not see the source code of the library function, it cannot do such optimization.

Note that some modern compilers (clang/LLVM, icc and recently even gcc) now support link-time-optimization (LTO) to minimize the effect of separate compilation. Thus you gain the benefits of separate compilation (maintenance, faster compilation, etc.) and these of whole program analysis.
By the way, it seems like gcc has supported -fwhole-program and --combine since version 4.1. You have to pass all source files together, though.
Finally, since BOOST is mostly header files (templates) that are #included, you cannot gain anything from adding these to your source code.


How to compile a normal .h-.c object build and get the same level of optimization as with a static "unity" build, in gcc?

I have been told that "unity builds" have a greater chance to inline everything if you make all the functions static, and thus make the binary more optimized and faster.
Personally I don't like them because the classic way is much more intuitive and modular, and you don't have to keep track of headers between branching .c files and main.c, and you don't have to have a master declaration header (basically emulating the normal way).
I don't care about compilation time, but I do care about efficiency of the program. So in my mind, the question is why wouldn't a compiler be able to do all these optimizations regardless of objects and whatnot, even if it had to compile twice or several times?
So how do I do that?

Size optimization options

I am trying to sort out an embedded project where the developers took the option of including all the h and c files into a c file, then they can compile just that one file with the -whole-program option to get good size optimization.
I hate this and am determined to make this into a traditional program just using LTO to achieve the same.
The versions included with the dev kit are;
aps-gcc (GCC) 4.7.3 20130524 (Cortus)
GNU ld (GNU Binutils) 2.22
With one .o file .text is 0x1c7ac, fractured into 67 .o files .text comes out as 0x2f73c, I added the LTO stuff and reduced it to 0x20a44, good but nowhere near enough.
I have tried --gc-sections and using the linker plugin option but they made no further improvment.
Any suggestions, am I see the right sort of improvement from LTO?
To get LTO to work perfectly you need to have the same information and optimisation algorithms available at link stage as you have at compile stage. The GNU tools cannot do this and I believe this was actually one of the motivating factors in the creation of LLVM/Clang.
If you want to inspect the difference in detail I'd suggest you generate a Map file (ld option -Map <filename>) for each option and see if there are functions which haven't been in-lined or functions that are larger. The lack of in-lining you can manually resolve by forcing those functions to inline by moving the definition of the function into a header file and defining it as extern inline which effectively turns it into a macro (this is a GNU extension).
Larger functions are likely not being subject to constant propagation and I don't think there's anything you can do about that. You can make some improvements by carefully declaring the function attributes such as const, leaf, noreturn, pure, and returns_nonnull. These effectively promise that the function will behave in a particular way that the compiler may otherwise detect if using a single compilation unit, and that allow additional optimisations.
In contrast, Clang can compile your object code to a special kind of bytecode (LLVM stands for Low Level Virtual Machine, like JVM is Java Virtual Machine, and runs bytecode) and then optimisation of this bytecode can be performed at link time (or indeed run-time, which is cool). Since this bytecode is what is optimised whether you do LTO or not, and the optimisation algorithms are common between the compiler and the linker, in theory Clang/LLVM should give exactly the same results whether you use LTO or not.
Unfortunately now that the C backend has been removed from LLVM I don't know of any way to use the LLVM LTO capabilities for the custom CPU you're targeting.
In my opinion, the method chosen by the previous developers is the correct one. It is the method that gives the compiler the most information and thus the most opportunities to perform the optimizations that you want. It is a terrible way to compile (any change will require the whole project to be compiled) so marking this as just an option is a good idea.
Of course, you would have to run all your integration tests against such a build, but that should be trivial to do. What is the downside of the chosen approach except for compilation time (which shouldn't be an issue because you don't need to build in that manner all the time ... just for integration tests).

Strange compiler speed optimization results - IAR compiler

I'm experiencing a strange issue when I try to compile two source files that contain some important computing algorithms that need to be highly optimized for speed.
Initially, I have two source files, let's call them A.c and B.c, each containing multiple functions that call each other (functions from a file may call functions from the other file). I compile both files with full speed optimizations and then when I run the main algorithm in an application, it takes 900 ms to run.
Then I notice the functions from the two files are mixed up from a logical point of view, so I move some functions from A.c to B.c; let's call the new files A2.c and B2.c. I also update the two headers A.h and B.h by moving the corresponding declarations.
Moving function definitions from one file to the other is the only modification I make!
The strange result is that after I compile the two files again with the same optimizations, the algorithm now takes 1000 ms to run.
What is going on here?
What I suspect happens: when functions f calls function g, being in the same file allows the compiler to replace actual function calls with inline code as an optimization. This is no longer possible when definitions are not compiled at the same time.
Am I correct in my assumption?
Aside from regrouping the function definitions as it was before, is there anything I can do to obtain the same optimization as before? I researched and it seems it's not possible to compile two source files simultaneously into a single object file. Could the order of compilation matter?
As to whether your assumption is correct, the best way to tell is to examine the assembler output, such as by using gcc -S or gcc -save-temps. That will be the definitive way to see what your compiler has done.
As to compiling two C source files into a single object file, that's certainly doable. Just create a AB.c as follows:
#include "A.c"
#include "B.c"
and compile that.
Barring things that should be kept separate (such as static items which may exist in both C files), that should work (or at least work with a little modification).
However, remember the optimisation mantra: Measure, don't guess! You're giving up a fair bit of encapsulation by combining them so make sure the benefits well outweigh the costs.

Is there a difference in a binary when using multiple files C as opposed to putting it all into a single file?

I know that multiple files will by far make code easier. However do they offer a performance difference between "jamming it all into one file" or will a modern compiler like gcc create the same binaries for both. When I say performance difference I mean file size, compile time, and running time.
This is for C only.
Arguably, compile times improve with multiple files, as you only need to recompile files that have changed (assuming you have a decent dependency-tracking build system).
Linking would probably take longer, as there's just more to do.
Traditionally, compilers have been unable to perform optimizations across multiple source files (things like inlining functions is tricky). So the resulting executable is likely to be different, and potentially slower.
There are more opportunities for optimization when everything is in a single file. E.g. gcc, starting with -O2, will inline some functions if their body is available, even if they aren't declared inline (even more functions are eligible for inlining with -O3). So there are differences in run time, and sometimes you even have a chance to notice them. Even more so with -fwhole-program, telling GCC that you don't care about out-of-line versions of external functions except main() (GCC behaves as if all your external functions became static).
Overall compile time may increase (because there is more stuff to analyze, and not all optimizer algorithms are linear) or decrease (when there's no need to parse the same headers multiple times). Binary size may increase (due to inlining, in exchange for running faster) or decrease (less likely; but sometimes, inlining simplifies caller's code to the point where code size decreases).
As of the ease of development and maintenance, you can use sqlite's approach: it has multiple source files, but they are jammed into one ("amalgamation") before compilation.
From some tests, compiling and linking take longer. You will receive a different binary, at least I did, however mine was within a byte of the other.
The all-in-one file ran in .000764 MS
The Multiple files version ran in .000769 MS
Do take the benchmark with a grain of salt, as I did put it together in about 5 minutes, and it was a tiny program.
So really no differences overall.

Combining source code into a single file for optimization

I was aiming at reducing the size of the executable for my C project and I have tried all compiler/linker options, which have helped to some extent. My code consists of a lot of separate files. My question was whether combining all source code into a single file will help with optimization that I desire? I read somewhere that a compiler will optimize better if it finds all code in a single file in place of separate multiple files. Is that true?
A compiler can indeed optimize better when it finds needed code in the same compilable (*.c) file. If your program is longer than 1000 lines or so, you'll probably regret putting all the code in one file, because doing so will make your program hard to maintain, but if shorter than 500 lines, you might try the one file, and see if it does not help.
The crucial consideration is how often code in one compilable file calls or otherwise uses objects (including functions) defined in another. If there are few transfers of control across this boundary, then erasing the boundary will not help performance appreciably. Therefore, when coding for performance, the key is to put tightly related code in the same file.
I like your question a great deal. It is the right kind of question to ask, in my view; and, though the complete answer is not simple enough to treat fully in a Stackexchange answer, your pursuit of the answer will teach you much. Though you may not yet realize it, your question really regards linking, a subject every advancing programmer eventually has to learn. Your question regards symbol tables, inlining, the in-place construction of return values and several, other, subtle factors.
At any rate, if your program is shorter than 500 lines or so, then you have little to lose by trying the single-file approach. If longer than 1000 lines, then a single file is not recommended.
It depends on the compiler. The Intel C++ Composer XE for example can automatically optimize over multiple files (when building using icc -fast *.c *.cpp or icl /fast *.c *.cpp, for linux/windows respectively).
When you use Microsoft Visual Studio, or a derived product (like Atmel Studio for microcontrollers), every single source file is compiled on its own (i. e. one cl, icl, or gcc command is issued for every c and cpp file in the project). This means no optimization.
For microcontroller projects I sometimes have to put everything in a single file in order make it even fit in the limited flash memory on the controller. If your compiler/IDE does it like visual studio, you can use a trick: Select all the source files and make them not participate in the build process (but leave them in the project), then create a file (I always use whole_program.c, and #include every single source (i.e. non-header) file in it (note that including c files is frowned upon by many high level programmers, but sometimes, you have to do it the dirty way, and with microcontrollers, that's actually more often than not).
My experience has been that with gnu/gcc the optimization is within the single file plus includes to create a single object. With clang/llvm it is quite easy and I recommend, DO NOT optimize the clang step, use clang to get from C to bytecode, the use llvm-link to link all of your bytecode modules into one bytecode module, then you can optimize the whole project, all source files optimized together, the llc adds more optimization as it heads for the target. Your best results are to tell clang using the something triple command line option what your ultimate target is. For the gnu path to do the same thing either use includes to make one big file compiled to one object, or if there is a machine code level optimizer other than a few things the linker does, then that is where it would have to happen. maybe gnu has an exposed ir file format, optimizer, and ir to target tool, but I think I would have seen that by now. a number of my projects, although very simple programs, have llvm and gnu builds for the same source files, you can see where the llvm builds I make a binary from unoptimized bytecode and also optimized bytecode (llvm's optimizer has problems with small while loops and sometimes generates non-working code, a very quick check to see if it is you or them is to try the non-optimized llvm binary and the gnu binary to see if they all behave the same (you) or if only the optimized llvm doesnt work (them)).
