GNU ld: how do I detect multiply-defined symbols?

GNU ld: how do I detect multiply-defined symbols? - linker

I'm aggregating two very similar sets of source code into a single library archive. There are maybe 5 or 6 functions which are defined with identical signatures in the two code sets, but with slightly different implementation. I need to find these functions, so that I can either change their names (if I need both of them), or to remove one of them.
I thought that ld would do the hard work for me, by reporting that the functions were multiply-defined, but it's not doing it. I've currently got a 2-stage link procedure:
1 - an incremental link of the two sets of source files, to produce an archive file. If I already know which functions are multiply defined, I can use nm to confirm that the symbol appears twice in the archive.
2 - a final link of this archive file with the object that calls the library code. 'ld' doesn't complain during this step, and presumably is just linking the first matching object that it finds in the archive, without reporting that a second object could also be used.
Any idea how I can get ld to scan the entire archive, and report the functions which are multiply-defined? Thanks.

Attempt a link of all the component .o files (rather than .a files), and you will get the multiply-defined messages.

Related

Duplicate symbols in Microsoft C library

I'm writing a linker for Windows PE format object files, and I've got to the stage where it can link together object files produced by the Microsoft compiler, but when I try to link with libcmt.lib I get a lot of duplicate symbols.
For example, cosl is defined by three different objects in the library. All three refer to definitions in different places, and all three look genuine, e.g. they point to text segments named .text$mn and have storage class IMAGE_SYM_CLASS_EXTERNAL.
Is it the case that these are alternate versions and the linker is supposed to pick one based on some criterion, or am I misunderstanding something about the semantics of the PE library format?

As referenced in the comments, the OP is not processing the COMDAT section properly.
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/pecoff.doc

How to get all symbol conflict from 2 static libs in VC8

Say I have 2 static libs
ex1.a
ex2.a
In both libs I will define 10 same functions
When Compiling a sample test code say "test.c" , I link with both static libs ex1.a and ex2.a
In "test.c" I will call only 3 functions, then I will get the
linker error "same symbols deifned in both ex1.a and ex2.a libraries" This is Ok.
My Question here is :
1. Why this error only display 3 functions as multiple defined.. Why not it list all 10 functions
In VC8 How can I list all multiple defined symbols without actualy calling that function in test code ...
Thanks,

Thats because, linker tries to resovle a symbol name, when it compiles and links a code which has the function call. Only when the code has some function calls, linker would try to resolve it in either the test code or the libraries linked along and thats when it would find multiple definitions. If no function called, then I guess no problem.

What you experience is the optimizing part of the linker: By default it won't include code that isn't referenced. The compiler will create multiple object files with most likely unresolved dependencies (calls that couldn't be satisfied by the code included). So the linker takes all object files passed and tries to find solutions for the unresolved dependencies. If it fails, it will check the available library files. If there are multiple options with the same exact name/signature it will start complaining cause it won't be able to decide which one to pick (for identical code this won't matter but imagine different implementations using different "behind the scenes" work on memory, such as debug and release stuff).
The only (and possibly easiest way) I could think of to detect all these multiple definitions would be creating another static library project including all source files used in both static libs. When creating a library the linker will include everything called or exported - you won't need specific code calling the stuff for the linker to see/include everything as long as it's exported.
However I still don't understand what you're actually trying to accomplish as a whole. Trying to find code shared between two libraries?

What is the difference between a .o file and a .lib file?

What is the difference between a .o file and a .lib file?

Conceptually, a compilation unit (the unit of code in a source file/object file) is either linked entirely or not at all. While some implementations, with significant levels of cooperation between the compiler and linker, are able to remove unused code from object files at link time, it doesn't change the issue that including 2 compilation units with conflicting symbol names in a program is an error.
As a practical example, suppose your library has two functions foo and bar and they're in an object file together. If I want to use bar, but my program already has an external symbol named foo, I'm stuck with an error. Even if or how the implementation might be able to resolve this problem for me, the code is still incorrect.
On the other hand, if I have a library file containing two separate object files, one with foo and the other with bar, only the one containing bar will get pulled into my program.
When writing libraries, you should avoid including multiple functions in the same object file unless it's essential that they be used together. Doing so will bloat up applications which link your library (statically) and increase the likelihood of symbol conflicts. Personally I prefer erring on the side of separate files when there's a doubt - it's even useful to put foo_create and foo_free in separate files if the latter is nontrivial so that short one-off programs that don't need to call foo_free can avoid pulling in the code for deep freeing (and possibly even avoid pulling in the implementation of free itself).

A .LIB file is a collection of .OBJ
files concatenated together with an
index. There should be no difference
in how the linker treats either.
Quoted from here:
What is the difference between .LIB and .OBJ files? (Visual Studio C++)

They are actually quite different, specially with older linkers.
The .o (or .obj) files are object files, they contain the output of the compiler generated code. It is still in an intermediate format, for example, most references are still unresolved. Usually there is a one to one mapping between the source file and the object file.
The .a (or .lib) files are archives, also known as library, and are a set of object files.
All operating systems have tools that allow you to add/remove/list object files to library files.
Another difference, specially with older linkers is how the files are dealt with, when linking them. Some linked will place the complete object file into the final binary, regardless of what is actually being used, while they will only extract the useful information out of library files.
Nowadays most linkers are smart enough to remove all stuff that is not being used.

Distribute binary library on OSX

I'm planning to release some compiled code that shall be linked by client applications on MacOSX.
The distribution is some kind of code library and a set of header files defining the public interface for the library.The code is internally C++ but its public interface (i.e what's being shown in the headers) is completely C.
These are my requirements or atleast what I hope I can accomplish:
I want my library to be as agnostic
as possible for what version of OSX
and GCC the user is running. Having
separate libraries for 64 bit and 32
bit is okay though.
I want my library
to be loadable from languages that
supports loading C libraries such as
python or similar.
I want my
libraries internal symbols to be
isolated from the code it's being
linked into. I don't want to have
duplicate symbol errors because we
happen to name an internal function
in the same way. My C++ code is properly namespaced so this may not be as big of an issue though, but some of the libraries I depend on is C and can be an issue (see next point).
I want my library
dependencies to be safe. My library
depends on some libraries such as
libpng, boost and stl and I don't
want issues because some users don't
necessarily have all of them installed
or get problems because they have
been compiled with other flags or
have different versions than I have.
On Windows I use a DLL with an export library and link all my dependencies statically into the dll. It fulfills all the criteria above and if I can get the same result on OSX it would be great, however I've heard that dynamic libraries tend not to isolate symbols on mac in the same way.
Is there some kind of best practice for this on OSX?

A normal OS X .dylib pretty much satisfies your requirements, with the note that you will want to have an exports file that the linker uses to determine exactly which symbols are exported (to prevent leaking your internal symbols).
In order to make your own library dependencies safe, you will probably need to either include those libraries with yours or link them statically into your library.
edit: To answer your follow-up question of how to apply an exports file to a link command, the man page for ld has the following to say:
-exported_symbols_list filename
The specified filename contains a list of global symbol names
that will remain as global symbols in the output file. All
other global symbols will be treated as if they were marked
as __private_extern__ (aka visibility=hidden) and will not be
global in the output file. The symbol names listed in file-
name must be one per line. Leading and trailing white space
are not part of the symbol name. Lines starting with # are
ignored, as are lines with only white space. Some wildcards
(similar to shell file matching) are supported. The *
matches zero or more characters. The ? matches one charac-
ter. [abc] matches one character which must be an 'a', 'b',
or 'c'. [a-z] matches any single lower case letter from 'a'
to 'z'.
So, if your library had only two functions that you wanted to be public, lets call them foo and bar, and they were C functions (so the symbol names aren't mangled), your exports file (let's call it myLibrary.exports) would contain these two lines:
_foo
_bar
and maybe some comments, etc. When you do the final link step to build the library, you would pass the -exported_symbols_list myLibrary.exports flag to the linker. This has the additional benefit that the link will fail if you don't provide one of the exported symbols; this can catch a lot of "oops, I forgot to include that file in the build" mistakes.
You don't need to use the command-line tools to do all this, of course. In the build settings for a dynamic library in XCode, you will find Exported Symbols File (undefined by default); set it to the path to your exports file there and it will be passed to the linker.

The key term you need is 'framework'. You need to create a 'universal' framework that is self-contained. ('Universal' is Apple-ease for 'compile several times and package into one library.) It's not as straightforward as on Windows in terms of encapsulation, but the necessary linker options are there.

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?

Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.

It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.

It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)

Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.

This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.

Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.

It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.

It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out

It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight