How do linkers decide what parts of libraries to include? - c

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?

Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.

It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.

It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)

Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.

This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.

Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.

It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.

It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out

It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Related

List "never linked against" source file in C project

I would like to know if someone is aware of a trick to retrieve the list of files that had been (or ideally will be) used by linker to produce an executable.
Some kind of solution must exist. A a static source analyzer, or a hack, such as compiling with some weird flags, and analyzing produced executable with another tool, or force the linker to output this information.
The goal is to provide a tool that strip useless source files from a list of source files.
The end goal is to ease the build process, by allowing him to give a list of usable source files. Then my tool would only compile the ones actually used by linker instead of everything.
This would allow for some unit_test to still be runnable even if some others are broken and can't compile, while not asking the user to manually list every test dependencies manually in the cmake.
I am targetting linux for now, but will be intersted in the futur to do the same trick on others OS. So I would like a cross-platform solution, eventhought I doubt I will have it :)
Thanks for your help
Edit because I see that it is confusing, what I mean by
allowing him to give a list of usable source file
is that, in cmake, for exemple. If you use add_executable(name, sources), then sources is considered as the sources to compile and link on.
I want to wrap add_executable, so sources is viewed as a set of usable if necessary sources files.
I'm afraid the idea of detecting never linked source files is not a fruitful one.
To build a program, CMake will not compile a source file if it not going to link the resulting object
file into the program. I can understand how you might think that this happens, but it doesn't happen.
CMake already does what you would like it to do and the same is true of every other build automation system going back to
their invention in the 1970s. The fundamental purpose of all
such systems is to ensure that the building of a program
compiles a source file name.(c|cc|f|m|...) if and only if
the object file name.o is going to be linked into the program
and is out of date or does not exist. You can always defeat this purpose by
egregiously bad coding of the project's build spec (CMakeLists.txt, Makefile, SConstruct, etc.),
but with CMake you would need to be really trying to do it, and
trying quite expertly.
If you do not want name.c to be compiled and the object file name.o
linked into a target program, then you do not tell the build system
that name.o or name.c is a prerequisite of the program. Don't tell
it what you know is not true. It is elementary competence not to specify redundant prerequisites of
a build system target.
The linker will link all its input object files into an output
program without question. It does not ask whether or not they are "needed"
by the program because it cannot answer that question. Neither the
linker nor any possible static analysis tool can know what program
you intend to produce when you input some object files for linkage.
It can only be assumed that you intend to produce the program that
results from the linkage of those object files, assuming the
linkage is successful.
If those object files cannot be linked into a program at all, the linker will tell you
that, and why. Otherwise, if you have linked object files that you didn't
intend to link, you can only discover that for yourself, by noticing
the mistake in the build log, or failing that by testing the program and/or inspecting its contents and comparing
your observations with your expectations.
Given your choice of object files for linkage, you can instruct the linker
to detect any code sections or data sections it extracts those object files in
which no symbols are defined that can be referenced by the program, and to
throw away all such unreferenced input sections instead of linking them
into the program. This is called linktime "garbage collection". You tell the
linker to do it by passing the option -Wl,-gc-sections in the
gcc linkage command. See this question
to learn how to maximise the collectible garbage. This is what you
can do to remove redundant object code from the linkage.
But you can only collect any garbage from a program in this way if the program
is dynamically opaque, i.e not linked with the option -rdynamic
: then the global symbols defined in the program's static image are not visible
to the OS loader and cannot be referenced from outside its static image by dynamic
libraries in the same process. In this case the linker can determine by static
analysis that a symbol whose definition is not referenced in the program's static
image cannot be referenced at all, since it cannot be referenced dynamically,
and if all symbols defined in an input section are statically unreferenced then
it can garbage-collect the section.
If the program has been linked -rdynamic then -Wl,-gc-sections will
collect no garbage, and this is quite right, because if the program is
not dynamically opaque then it is impossible for static analysis to determine that anything
defined in its linkage cannot be referenced.
It's noteworthy that although -rdynamic is not a default linkage
option for GCC, it is a default linkage option for CMake projects using
the GCC toolchain. So to use linktime garbage collection in CMake projects
you would always have to override the -rdynamic default. And obviously it would only be
valid to do this if you have determined that it is alright for the program to
be dynamically opaque.

how to make shared library an executable

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13
This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.
I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.
To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

Linking two object files with a common symbol

If i have two object files both defining a symbol (function) "foobar".
Is it possible to tell the linker to obey the obj file order i give in the command line call and always take the symbol from first file and never the later one?
AFAIK the "weak" pragma works only on shared libraries but not on object files.
Please answer for all the C/C++ compiler/linker/operating system combinations you know cause i'm flexibel and use a lot of compiles (sun studio, intel, msvc, gcc, acc).
I believe that you will need to create a static library from the second object file, and then link the first object file and then the library. If a symbol is resolved by an object file, the linker will not search the libraries for it.
Alternatively place both object files in separate static libraries, and then the link order will be determined by their occurrence in the command line.
Creating a static library from an object file will vary depending on the tool chain. In GCC use the ar utility, and for MSVC lib.exe (or use the static library project wizard).
There is a danger here, the keyword here is called Interpositioning dependant code.
Let me give you an example here:
Supposing you have written a custom routine called malloc. And you link in the standard libraries, what will happen is this, functions that require the usage of malloc (the standard function) will use your custom version and the end result is the code may become unstable as the unintended side effect and something will appear 'broken'.
This is just something to bear in mind.
As in your case, you could 'overwrite' (I use quotes to emphasize) the other function but then how would you know which foobar is being used? This could lead to debugging grief when trying to figure out which foobar is called.
Hope this helps,
Best regards,
Tom.
You can make it as a .a file... Then the compiler gets the symbol and doesn't crib later

Resources