Combining source code into a single file for optimization - c

I was aiming at reducing the size of the executable for my C project and I have tried all compiler/linker options, which have helped to some extent. My code consists of a lot of separate files. My question was whether combining all source code into a single file will help with optimization that I desire? I read somewhere that a compiler will optimize better if it finds all code in a single file in place of separate multiple files. Is that true?

A compiler can indeed optimize better when it finds needed code in the same compilable (*.c) file. If your program is longer than 1000 lines or so, you'll probably regret putting all the code in one file, because doing so will make your program hard to maintain, but if shorter than 500 lines, you might try the one file, and see if it does not help.
The crucial consideration is how often code in one compilable file calls or otherwise uses objects (including functions) defined in another. If there are few transfers of control across this boundary, then erasing the boundary will not help performance appreciably. Therefore, when coding for performance, the key is to put tightly related code in the same file.
I like your question a great deal. It is the right kind of question to ask, in my view; and, though the complete answer is not simple enough to treat fully in a Stackexchange answer, your pursuit of the answer will teach you much. Though you may not yet realize it, your question really regards linking, a subject every advancing programmer eventually has to learn. Your question regards symbol tables, inlining, the in-place construction of return values and several, other, subtle factors.
At any rate, if your program is shorter than 500 lines or so, then you have little to lose by trying the single-file approach. If longer than 1000 lines, then a single file is not recommended.

It depends on the compiler. The Intel C++ Composer XE for example can automatically optimize over multiple files (when building using icc -fast *.c *.cpp or icl /fast *.c *.cpp, for linux/windows respectively).
When you use Microsoft Visual Studio, or a derived product (like Atmel Studio for microcontrollers), every single source file is compiled on its own (i. e. one cl, icl, or gcc command is issued for every c and cpp file in the project). This means no optimization.
For microcontroller projects I sometimes have to put everything in a single file in order make it even fit in the limited flash memory on the controller. If your compiler/IDE does it like visual studio, you can use a trick: Select all the source files and make them not participate in the build process (but leave them in the project), then create a file (I always use whole_program.c, and #include every single source (i.e. non-header) file in it (note that including c files is frowned upon by many high level programmers, but sometimes, you have to do it the dirty way, and with microcontrollers, that's actually more often than not).

My experience has been that with gnu/gcc the optimization is within the single file plus includes to create a single object. With clang/llvm it is quite easy and I recommend, DO NOT optimize the clang step, use clang to get from C to bytecode, the use llvm-link to link all of your bytecode modules into one bytecode module, then you can optimize the whole project, all source files optimized together, the llc adds more optimization as it heads for the target. Your best results are to tell clang using the something triple command line option what your ultimate target is. For the gnu path to do the same thing either use includes to make one big file compiled to one object, or if there is a machine code level optimizer other than a few things the linker does, then that is where it would have to happen. maybe gnu has an exposed ir file format, optimizer, and ir to target tool, but I think I would have seen that by now.
http://github.com/dwelch67 a number of my projects, although very simple programs, have llvm and gnu builds for the same source files, you can see where the llvm builds I make a binary from unoptimized bytecode and also optimized bytecode (llvm's optimizer has problems with small while loops and sometimes generates non-working code, a very quick check to see if it is you or them is to try the non-optimized llvm binary and the gnu binary to see if they all behave the same (you) or if only the optimized llvm doesnt work (them)).

Related

How to get a c source code from the compiled code

I have the compiled C code in text format. I need to extract the source code by decompiling the machine code. How to do that?
"True" decompiling is, basically, impossible. Foremost, you can't "decompile" local names (in functions and source code files / modules). For those, you'll get something like, for int local variables: i1, i2... Of course, unless you also have debug information, which is not often the case.
Decompiling to "something" (which might not be very readable) is possible, but it usually relies on some heuristics, recognizing code patterns that compilers generate and can be fooled into generating strange (possibly even incorrect) C code. In practice that means that a decompiler usually works OK for a certain compiler with certain (default) compile options, but, not so nice with others.
Having said that, decompilers do exist and you can try your luck with, say Snowman
As Srdjan has said, in general decompilation of a C (or C++) program is not possible. There is too much information lost during the compilation process. For example consider a declaration such as int x this is 'lost' as it does not directly produce any machine level instruction. The compiler needs this information to do type checking only.
Now, however it is possible to disassembly which is taking the compiled executable back up a level to assembly language. However, interpretation of the assembly might (will ?) be difficult and certainly time consuming. There are several disassemblers available, if you have money IDA-Pro is probably the industry standard in disassemblers, and if you are doing this type work, well worth the several thousand dollars per license. There are a number of open source disassemblers available, google can find them.
Now, that being said there have been efforts to create a decompilers, IDA-Pro has one, and you can look at http://boomerang.sourceforge.net/ in addition to Snowman linked above.
Lastly, other languages are more friendly towards decompilation then C or C++. For example a C# programs is decompilable with tools like dotPeek or ilSpy. Similarly with Java there are a number of tools that can convert Java bytecode back into Java source.
Please post a sample of the "compiled C code in text format."
Perhaps then it will be easier to see what you are trying to achieve.
Typically it is not practical to reverse engineer assembly language into C because much the human readable information in the form of Labels and variable names is permanently lost in the compilation process.

What files need to be modified to compile for a custom architecture of an existing cpu with gcc?

I've been looking at examples of C code that is compiled for some lesser known processors (like ZPU) using the gcc cross compiler.
Most of the working examples I see assume a certain arquitecture (Memory map and set of peripherals) and simply give you a recipe to compile for these and they work.
However I can find very little information on what needs to modified if you use the same cpu with a different memory map and set of peripherals.
From what I've read. There are two main files that I need to make sure that are done "right". The linker script that is used and the crt0.o (Which if I need to modify means recompiling the crt0.S which is assembler). On this last one, especially I find very little information on what is actually supposed to do (other that setting up reset there is no clear info, and I'm talking conceptually not for an specific processor. Although something for this would also be useful).
Can any one tell me what is the relationship between a the c files for the code of program (bare metal development), the crt0.S (specially why it is needed) and it's relationship with a working linker script?
PD: Answers of the form "read this book" are welcome and I would love them.
PD: I realize this kind of question is usually vague and closed quickly but I don't know where else to turn, so I ask for a bit of leniency.

How do i compile a c program without all the bloat?

I'm trying to learn x86. I thought this would be quite easy to start with - i'll just compile a very small program basically containing nothing and see what the compiler gives me. The problem is that it gives me a ton of bloat. (This program cannot be run in dos-mode and so on) 25KB file containing an empty main() calling one empty function.
How do I compile my code without all this bloat? (and why is it there in the first place?)
Executable formats contain a bit more than just the raw machine code for the CPU to execute. If you want that then the only option is (I think) a DOS .com file which essentially is just a bunch of code loaded into a page and then jumped into. Some software (e.g. Volkov commander) made clever use of that format to deliver quite much in very little executable code.
Anyway, the PE format which Windows uses contains a few things that are specially laid out:
A DOS stub saying "This program cannot be run in DOS mode" which is what you stumbled over
several sections containing things like program code, global variables, etc. that are each handled differently by the executable loader in the operating system
some other things, like import tables
You may not need some of those, but a compiler usually doesn't know you're trying to create a tiny executable. Usually nowadays the overhead is negligible.
There is an article out there that strives to create the tiniest possible PE file, though.
You might get better result by digging up older compilers. If you want binaries that are very bare to the bone COM files are really that, so if you get hold of an old compiler that has support for generating COM binaries instead of EXE you should be set. There is a long list of free compilers at http://www.thefreecountry.com/compilers/cpp.shtml, I assume that Borland's Turbo C would be a good starting point.
The bloated module could be the loader (operating system required interface) attached by linker. Try adding a module with only something like:
void foo(){}
and see the disassembly (I assume that's the format the compiler 'gives you'). Of course the details vary much from operating systems and compilers. There are so many!

Single Source Code vs Multiple Files + Libraries

How much effect does having multiple files or compiled libraries vs. throwing everything (>10,000 LOC) into one source have on the final binary? For example, instead of linking a Boost library separately, I paste its code, along with my original source, into one giant file for compilation. And along the same line, instead of feeding several files into gcc, pasting them all together, and giving only that one file.
I'm interested in the optimization differences, instead of problems (horror) that would come with maintaining a single source file of gargantuan proportions.
Granted, there can only be link-time optimization (I may be wrong), but is there a lot of difference between optimization possibilities?
If the compiler can see all source code, it can optimize better if your compiler has some kind of Interprocedural Optimization (IPO) option turned on. IPO differs from other compiler optimization because it analyzes the entire program; other optimizations look at only a single function, or even a single block of code
Here is some interprocedural optimization that can be done, see here for more:
Inlining
Constant propagation
mod/ref analysis
Alias analysis
Forward substitution
Routine key-attribute propagation
Partial dead call elimination
Symbol table data promotion
Dead function elimination
Whole program analysis
GCC supports this kind of optimization.
This interprocedural optimization can be used to analyze and optimize the function being called.
If compiler can not see the source code of the library function, it cannot do such optimization.
Note that some modern compilers (clang/LLVM, icc and recently even gcc) now support link-time-optimization (LTO) to minimize the effect of separate compilation. Thus you gain the benefits of separate compilation (maintenance, faster compilation, etc.) and these of whole program analysis.
By the way, it seems like gcc has supported -fwhole-program and --combine since version 4.1. You have to pass all source files together, though.
Finally, since BOOST is mostly header files (templates) that are #included, you cannot gain anything from adding these to your source code.

Good C header style

My C headers usually resemble the following style to avoid multiple inclusion:
#ifndef <FILENAME>_H
#define <FILENAME>_H
// define public data structures / prototypes, macros etc.
#endif /* !<FILENAME>_H */
However, in his Notes on Programming in C, Rob Pike makes the following argument about header files:
There's a little dance involving #ifdef's that can prevent a file being read twice, but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file that includes it. The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.
On the one hand, Pike is the only programmer I actually admire. On the other hand, putting several #ifdefs in multiple source files instead of putting one #ifdef in a single header file feels needlessly awkward.
What is the best way to handle the problem of multiple inclusion?
In my opinion, use the method that requires less of your time (which likely means putting the #ifdefs in the header files). I don't really mind if the compiler has to work harder if my resulting code is cleaner. If, perhaps, you are working on a multi-million line code base that you constantly have to fully rebuild, maybe the extra savings is worth it. But in most cases, I suspect that the extra cost is not usually noticeable.
Keep doing what you do - It's clear, less bug-prone, and well known by compiler writers, so not as inefficient as it maybe was a decade or two ago.
You could use the non-standard #pragma once - If you search, there's probably at least a bookshelf's worth of include guards vs pragma once discussion, so I'm not going to recommend one over the other.
Pike wrote some more about it in https://talks.golang.org/2012/splash.article:
In 1984, a compilation of ps.c, the source to the Unix ps command, was
observed to #include <sys/stat.h> 37 times by the time all the
preprocessing had been done. Even though the contents are discarded 36
times while doing so, most C implementations would open the file, read
it, and scan it all 37 times. Without great cleverness, in fact, that
behavior is required by the potentially complex macro semantics of the
C preprocessor.
Compilers have become quite clever since: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html, so this is less of an issue now.
The construction of a single C++ binary at Google can open and read
hundreds of individual header files tens of thousands of times. In
2007, build engineers at Google instrumented the compilation of a
major Google binary. The file contained about two thousand files that,
if simply concatenated together, totaled 4.2 megabytes. By the time
the #includes had been expanded, over 8 gigabytes were being delivered
to the input of the compiler, a blow-up of 2000 bytes for every C++
source byte.
As another data point, in 2003 Google's build system was moved from a
single Makefile to a per-directory design with better-managed, more
explicit dependencies. A typical binary shrank about 40% in file size,
just from having more accurate dependencies recorded. Even so, the
properties of C++ (or C for that matter) make it impractical to verify
those dependencies automatically, and today we still do not have an
accurate understanding of the dependency requirements of large Google
C++ binaries.
The point about binary sizes is still relevant. Compilers (linkers) are quite conservative regarding stripping unused symbols. How to remove unused C/C++ symbols with GCC and ld?
In Plan 9, header files were forbidden from containing further
#include clauses; all #includes were required to be in the top-level C file. This required some discipline, of course—the programmer was
required to list the necessary dependencies exactly once, in the
correct order—but documentation helped and in practice it worked very
well.
This is a possible solution. Another possiblity is to have a tool that manages the includes for you, for example MakeDeps.
There is also unity builds, sometimes called SCU, single compilation unit builds. There are tools to help manage that, like https://github.com/sakra/cotire
Using a build system that optimizes for the speed of incremental compilation can be advantageous too. I am talking about Google's Bazel and similar. It does not protect you from a change in a header file that is included in a large number of other files, though.
Finally, there is a proposal for C++ modules in the works, great stuff https://groups.google.com/a/isocpp.org/forum/#!forum/modules. See also What exactly are C++ modules?
The way you're currently doing it is the common way. Pike's method cuts a bit on compilation time, but with modern compilers probably not very much (when Pike wrote his notes, compilers weren't optimizer-bound), it clutters modules and its bug-prone.
You could still cut on multi-inclusion by not including headers from headers, but instead documenting them with "include <foodefs.h> before including this header."
I recommend you put them in the source-file itself. No need to complain about some thousand needless parsed lines of code with actual PCs.
Additionally - it is far more work and source if you check every single header in every source-file that includes the header.
And you would have to handle your header-files different from default- and other third-party-headers.
He may have had an argument the time he was writing this. Nowadays decent compilers are clever enough to handle this well.
I agree with your approach - as others have commented, its clearer, self-documenting, and lower maintenance.
My theory on why Rob Pike might have suggested his approach: He's talking about C, not C++.
In C++, if you have a lot of classes and you are declaring each one in its own header file, then you'll have a lot of header files. C doesn't really provide this kind of fine-grained structure (I don't recall seeing many single-struct C header files), and .h/.c file-pairs tend to be larger and contain something like a module or a subsystem. So, fewer header files. In that scenario Rob Pike's approach might work. But I don't see it as suitable for non-trivial C++ programs.

Resources