Compiling before linking prevents optimization - c

Consider something like the following scenario:
main.c contains something like this:
#include "sub.h"
main(){
int i = 0;
while(i < 1000000000){
f();
i++;
}
}
while sub.h contains:
void f();
and sub.c contains something like this:
void f(){
int a = 1;
}
Now, if this were all in one source file, the compiler (gcc in my case) would notice that f() doesn't actually do anything and optimize the loop away. But since compiling happens before linking, that optimization can't happen in this case.
This can be avoided for local include files by including the raw .c files rather tan the headers, but when including headers from other libraries this becomes impossible. Is there any way around this?

If I'm understanding correctly, you would like to only link library functions that are being used by your program. Using the GCC tool chain this is possible with the optimization flags:
-O2 -fdata-sections -ffunction-sections
The first flag should optimize away loops that do nothing. The other two flags place each function or data item into its own section in the compiled output file. This allows the linker to perform optimizations. Note: it will take longer to compile and you won't be able to use gprof.
You will also need to then pass the linker the -gc-sections flag so that it won't include unused function and data sections.
All in all, you would execute:
gcc -O2 -fdata-sections -ffunction-sections main.c sub.c -Wl,-gc-sections
If you were to instead call GCC to produce assembly files you could inspect them to find that _main does not execute a loop or call the function f():
$ gcc -O2 -S -fdata-sections -ffunction-sections main.c sub.c -Wl,-gc-sections
$ cat main.s
Sources:
How to remove unused C/C++ symbols with GCC and ld?
http://linux.die.net/man/1/ld
http://linux.die.net/man/1/gcc

The compiler cannot guess and cannot make assumptions on what's outside the individual translation unit that it compiles. Some toolchains (end-to-end compiler+linker+supporting-utilities) may detect some such cases within a project being built from sources, depending on their sophistication of optimizations. It would not be common, and not guaranteed. It would most certainly not, and could not, apply to opaque 3rd party libraries being linked in.
In practice however, would you really use a 3rd party library that exported some no-op function, in hope that someone (your toolchain) would notice and safely optimize it away?

In windows, the vs system has whole program optimization
Sqlite uses a script to build a single C file to compile

Related

Code::Blocks + MinGW: minimize the size of a static library

I've tried passing -ffunction-sections -fdata-sections for the compiler, but that doesn't seem to have the desired effect. As far as I understand, I also have to pass -Wl,--gc-sections to the linker, but I'm not linking the files at this point. I just want to have a .a library file as small as possible, with minimal redundant code/data.
The compiler performs optimization based on the knowledge it has of the program. Optimization levels -O2 and above, in particular, enable unit-at-a-time mode, which allows the compiler to consider information gained from later functions in the file when compiling a function. Compiling multiple files at once to a single output file in unit-at-a-time mode allows the compiler to use information gained from all of the files when compiling each of them.
Not all optimizations are controlled directly by a flag.
-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file.
Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower. You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.
U can use the following link for more details..:-)
http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Optimize-Options.html
The following will reduce the size of your compiled objects (and thus the static library)
-Os -g0 -fvisibility=hidden -fomit-frame-pointer -mno-accumulate-outgoing-args -finline-small-functions -fno-unwind-tables -fno-asynchronous-unwind-tables -s
The following will increase the size of objects (though they may make the binary smaller)
-ffunction-sections -fdata-sections -flto -g -g1 -g2 -g3
The only way to really make a static library smaller is to remove the unneeded code by compiling the static library with -ffunction-sections -fdata-sections and linking the final product with -Wl,--gc-sections,--print-gc-sections to find out what parts are cruft. Then go back and remove those functions/variables (this also works for making smaller shared libraries - see http://libraryopt.sf.net)
Compare the size of the library if you -g in the compiler flags, vs not having it. I can easily imagine you double the size of a static library if it includes debug information in. If that was what you saw, chances are you already stripped the library of debug symbols, and hence cannot significantly shrink the library file more. This could explain your memory of cutting the size in half at some point.

shared library constructor not working

In my shared library I have to do certain initialization at the load time. If I define the function with the GCC attribute __attribute__ ((constructor)) it doesn't work, i.e. it doesn't get called when the program linking my shared library is loaded.
If I change the function name to _init(), it works. Apparently the usage of _init() and _fini() functions are not recommended now.
Any idea why __attribute__ ((constructor)) wouldn't work? This is with Linux 2.6.9, gcc version 3.4.6
Edit:
For example, let's say the library code is this the following:
#include <stdio.h>
int smlib_count;
void __attribute__ ((constructor)) setup(void) {
smlib_count = 100;
printf("smlib_count starting at %d\n", smlib_count);
}
void smlib_count_incr() {
smlib_count++;
smlib_count++;
}
int smlib_count_get() {
return smlib_count;
}
For building the .so I do the following:
gcc -fPIC -c smlib.c
ld -shared -soname libsmlib.so.1 -o libsmlib.so.1.0 -lc smlib.o
ldconfig -v -n .
ln -sf libsmlib.so.1 libsmlib.so
Since the .so is not in one of the standard locations I update the LD_LIBRARY_PATH and link the .so from a another program. The constructor doesn't get called. If I change it to _init(), it works.
Okay, so I've taken a look at this, and it looks like what's happening is that your intermediate gcc step (using -c) is causing the issue. Here's my interpretation of what I'm seeing.
When you compile as a .o with setup(), gcc just treats it as a normal function (since you're not compiling as a .so, so it doesn't care). Then, ld doesn't see any _init() or anything like a DT_INIT in the ELF's dynamic section, and assumes there's no constructors.
When you compile as a .o with _init(), gcc also treats it as a normal function. In fact, it looks to me like the object files are identical except for the names of the functions themselves! So once again, ld looks at the .o file, but this time sees a _init() function, which it knows it's looking for, and decides it's a constructor, and correspondingly creates a DT_INIT entry in the new .so.
Finally, if you do the compilation and linking in one step, like this:
gcc -Wall -shared -fPIC -o libsmlib.so smlib.c
Then what happens is that gcc sees and understands the __attribute__ ((constructor)) in the context of creating a shared object, and creates a DT_INIT entry accordingly.
Short version: use gcc to compile and link in one step. You can use -Wl (see the man page) for passing in extra options like -soname if required, like -Wl,-soname,libsmlib.so.1.
From this link :
"Shared libraries must not be compiled with the gcc arguments -nostartfiles'' or-nostdlib''. If those arguments are used, the constructor/destructor routines will not be executed (unless special measures are taken)."
gcc/ld doesn't set the DT_INIT bit in the elf header when -nostdlib is used . You can check objdump -p and look for the section INIT in both cases. In attribute ((constructor)) case you wont find that INIT section . But for __init case you will find INIT section in the shared library.

Optimize C compiles: removed unreferenced parts on-the-fly

We're facing an interesting topic. Lets say we have a special-functions.c file, basically a library.
We need to optimize the code as getting rid of all unused/unreferenced functions during the build process on-the-fly.
I'm not searching for generally unused (dead) code: some parts will be "dead" in case of compiling to one of the architectures, but it's going to be used in an other architecture build.
Does anybody knows of flags, tools, methods, tricks to do this?
The compiler is standard gcc with ansi 99 C code.
EDIT
I know, this is mainly the part of the linker, but using gcc, the process is not really split into two parts.
From http://embeddedfreak.wordpress.com/2009/02/10/removing-unused-functionsdead-codes-with-gccgnu-ld/ :
Compile with -fdata-sections to keep the data in separate data
sections and -ffunction-sections to keep functions in separate
sections, so they (data and functions) can be discarded if unused.
Link with --gc-sections to remove unused sections.
For example:
gcc -Os -fdata-sections -ffunction-sections test.c -o test -Wl,--gc-sections
I think that a recent GCC (i.e. 4.6) should do that if you compile and link with the -flto flag (link time optimization). I would imagine that having hidden or internal visibility should be relevant (at least for non-static functions).
To my knowledge, the GNU binary utils (ld, in this case) already remove unusesd references on static link

using a function in different .c files (c programming 101)

/me/home/file1.c containes function definition:
int mine(int i)
{
/* some stupidity by me */
}
I've declared this function in
/me/home/file1.h
int mine(int);
if I want to use this function mine() in /me/home/at/file2.c
To do so, all I need to do is:
file2.c
#include "../file1.h"
Is that enough? Probably not.
After doing this much, when I compile file2.c, I get undefined reference to 'mine'
You will also need to link the object file from file1. Example:
gcc -c file2.c
gcc -c ../file1.c
gcc -o program file2.o file1.o
Or you can also feed all files simultaneously and let GCC do the work (not suggested beyond a handful of files);
gcc -o program file1.c file2.c
Don't use ../ in a header. Instead, instruct gcc to use the parent directory as include path:
(in the at directory):
gcc -I../ -c file2.c
After doing this much, when I compile file2.c, I get undefined reference to 'mine'
No, you don't. It's not compiling that causes those errors. It's this other thing, called "linking".
The compiler compiles one "translation unit" - the result of running the preprocessor on one source file, possibly pulling in more stuff via #include - at a time, and then the linker sticks these together to make an executable. Typically the same program serves as both the compiler and linker, with different flags, and typically you can tell it to do everything at once (and not save any temporary files for the compiled translation units). But you do need to tell it what to link, and you do need to compile everything that will be linked.

Including source files in C

So I get the point of headers vs source files. What I don't get is how the compiler knows to compile all the source files. Example:
example.h
#ifndef EXAMPLE_H
#define EXAMPLE_H
int example(int argument); // prototype
#endif
example.c
#include "example.h"
int example(int argument)
{
return argument + 1; // implementation
}
main.c
#include "example.h"
main()
{
int whatever;
whatever = example(whatever); // usage in program
}
How does the compiler, compiling main.c, know the implementation of example() when nothing includes example.c?
Is this some kind of an IDE thing, where you add files to projects and stuff? Is there any way to do it "manually" as I prefer a plain text editor to quirky IDEs?
Compiling in C or C++ is actually split up into 2 separate phases.
compiling
linking
The compiler doesn't know about the implementation of example(). It just knows that there's something called example() that will be defined at some point. So it just generated code with placeholders for example()
The linker then comes along and resolves these placeholders.
To compile your code using gcc you'd do the following
gcc -c example.c -o example.o
gcc -c main.c -o main.o
gcc example.o main.o -o myProgram
The first 2 invocations of gcc are the compilation steps. The third invocation is the linker step.
Yes, you have to tell the compiler (usually through a makefile if you're not using an IDE) which source files to compile into object files, and the compiler compiles each one individually. Then you give the linker the list of object files to combine into the executable. If the linker is looking for a function or class definition and can't find it, you'll get a link error.
It doesn't ... you have to tell it to.
For example, whe using gcc, first you would compile the files:
gcc file1.c -c -ofile1.o
gcc file2.c -c -ofile2.o
Then the compiler compiles those files, assuming that symbols that you've defined (like your example function) exist somewhere and will be linked in later.
Then you link the object files together:
gcc file1.o file2.o -oexecutable
At this point of time, the linker looks at those assumtions and "clarifies" them ie. checks whether they're present. This is how it basically works...
As for your IDE question, Google "makefiles"
The compiler does not know the implementation of example() when compiling main.c - the compiler only knows the signature (how to call it) which was included from the header file. The compiler produces .o object files which are later linked by a linker to create the executable binary. The build process can be controlled by an IDE, or if you prefer a Makefile. Makefiles have a unique syntax which takes a bit of learning to understand but will make the build process much clearer. There are lots of good references on the web if you search for Makefile.
The compiler doesn't. But your build tool does. IDE or make tool. The manual way is hand-crafted Makefiles.

Resources