How to get all symbol conflict from 2 static libs in VC8 - c

Say I have 2 static libs
ex1.a
ex2.a
In both libs I will define 10 same functions
When Compiling a sample test code say "test.c" , I link with both static libs ex1.a and ex2.a
In "test.c" I will call only 3 functions, then I will get the
linker error "same symbols deifned in both ex1.a and ex2.a libraries" This is Ok.
My Question here is :
1. Why this error only display 3 functions as multiple defined.. Why not it list all 10 functions
In VC8 How can I list all multiple defined symbols without actualy calling that function in test code ...
Thanks,

Thats because, linker tries to resovle a symbol name, when it compiles and links a code which has the function call. Only when the code has some function calls, linker would try to resolve it in either the test code or the libraries linked along and thats when it would find multiple definitions. If no function called, then I guess no problem.

What you experience is the optimizing part of the linker: By default it won't include code that isn't referenced. The compiler will create multiple object files with most likely unresolved dependencies (calls that couldn't be satisfied by the code included). So the linker takes all object files passed and tries to find solutions for the unresolved dependencies. If it fails, it will check the available library files. If there are multiple options with the same exact name/signature it will start complaining cause it won't be able to decide which one to pick (for identical code this won't matter but imagine different implementations using different "behind the scenes" work on memory, such as debug and release stuff).
The only (and possibly easiest way) I could think of to detect all these multiple definitions would be creating another static library project including all source files used in both static libs. When creating a library the linker will include everything called or exported - you won't need specific code calling the stuff for the linker to see/include everything as long as it's exported.
However I still don't understand what you're actually trying to accomplish as a whole. Trying to find code shared between two libraries?

Related

Undefined reference to error when .so libraries are involved while building the executable

I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.

Two main functions in a C application

I'm actually put in a big project. My first step to understand the code was to search the main function so that I have a vision of the architecture.
What I discovered is that there is more than one main function. It's true that they are in different folders, but I don't understand how this application succeed to build. What I know is that the linker expects one main function (Entry point).
I believe it's too hard to understand the build process of the application, so I'm asking because for sure some of you have encountered this.
1 - Should I have theoretical background to understand this? If so, please suggest me articles, books, what ever you want.
2 - When do we have to use several main functions in one application?
You can't have multiple main functions for a single executable. There are several possibilities.
If doing a build builds only a single application, then only one of the main functions will be compiled. (Or none, if there's an option to build a library rather than an executable.) There are probably options that determine which one to build, depending on which variant of the application you want, the target system, or something else.
Or perhaps the application consists of multiple executables, with one main function for each one.
If running the build doesn't take too long, a trick I've used to determine which of several source files is actually compiled is to temporarily add #error directives, like:
#error "TEMPORARY: This is /full/path/to/source.cpp"
The resulting error message will tell you which source file was actually compiled. (You can also use #warning directives if your compiler supports them.)
Should i have theorical background to undestand this, suggest me articles book what ever you want
You need to have some understanding of what a build means but, more importantly, you have to understand the build process in your specific environment. You need to have an understanding of:
The list of build targets (executables and shared libraries)
The compiler settings used by them (you could be building DLLs for VS 2010 as well as VS 2012).
When do we have to use several main functions in one application?
When your build system builds multiple executables, you will need multiple main functions.
When your build system chooses file1.c or file2.c for building an executable depending on some settings, you might find main functions in file1.c as well file2.c. This will be a rather poor way to organize code but it is possible.
Trying to link an executable with multiple definitions for the same identifier (aka main in this case) will fail as the linker cannot select one or the other.
There's one possibility, though that allows you to have multiple main entry points, but assuring you only select one in the building process.
I'll illustrate this with a simple example: suppose you have a library (a .a static library) that has a module that includes a main definition. As library modules are selected at link time depending on the actual need of the identifiers at the time the library is linked, you can have a main definition in a module to supply when you don't have one, or that module is not linked in case you have a proper definition of it. This is exploited in some standard libraries like -ll (flex has a definition of main that calls the yylex() function), -ly (bison has a main definition that call the yyparse() routine) These modules are included if you don't have done it before.
But beware, as if you link your main function after the library -ll, for example, it will be included (as main was not resolved at the time he library was included, and then you included the second, duplicated entry, making the ld linker to complaint)

What is responsible for ensuring all symbols are known/defined?

Is it the C preprocessor, compiler, or linkage editor?
To tell you the truth, it is programmer.
The answer you are looking for is... the compiler it depends. Sometimes it's the compiler, sometimes it's the linker, and sometimes it doesn't happen until the program is loaded.
The preprocessor:
handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
...
The language of preprocessor directives is agnostic to the grammar of C, so the C preprocessor can also be used independently to process other kinds of text files.
The linker:
takes one or more objects generated by a compiler and combines them into a single executable program.
...
Computer programs typically comprise several parts or modules; all
these parts/modules need not be contained within a single object file,
and in such case refer to each other by means of symbols. Typically,
an object file can contain three kinds of symbols:
defined symbols, which allow it to be called by other modules,
undefined symbols, which call the other modules where these symbols are defined, and
local symbols, used internally within the object file to facilitate relocation.
When a program comprises multiple object files, the linker combines
these files into a unified executable program, resolving the
symbols as it goes along.
In environments which allow dynamic linking, it is possible that
executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these.
The programmer must make sure everything is defined somewhere. The programmer is RESPONSIBLE for doing so.
Various tools will complain along the way if they notice anything missing:
The compiler will notice certain things missing, and will error out if it can realize that something's not there.
The linker will error out if it can't fix up a reference that's not in a library somewhere.
At run time there is a loader that pulls the relevant shared libraries into the process's memory space. The loader is the last thing that gets a crack at fixing up symbols before the program gets to run any code, and it will throw errors if it can't find a shared library/dll, or if the interface for the library that was used at link-time doesn't match up correctly with the available library.
None of these tools is RESPONSIBLE for making sure everything is defined. They are just the things that will notice if things are NOT defined, and will be the ones throwing the error message.
For symbols with internal linkage or no linkage: the compiler.
For symbols with external linkage: the linker, either the "traditional" one, or the runtime linker.
Note that the dynamic/runtime linker may choose to do its job lazily, resolving symbols only when they are used (e.g: when a function is called for the first time).

How to catch unintentional function interpositioning?

Reading through my book Expert C Programming, I came across the chapter on function interpositioning and how it can lead to some serious hard to find bugs if done unintentionally.
The example given in the book is the following:
my_source.c
mktemp() { ... }
main() {
mktemp();
getwd();
}
libc
mktemp(){ ... }
getwd(){ ...; mktemp(); ... }
According to the book, what happens in main() is that mktemp() (a standard C library function) is interposed by the implementation in my_source.c. Although having main() call my implementation of mktemp() is intended behavior, having getwd() (another C library function) also call my implementation of mktemp() is not.
Apparently, this example was a real life bug that existed in SunOS 4.0.3's version of lpr. The book goes on to explain the fix was to add the keyword static to the definition of mktemp() in my_source.c; although changing the name altogether should have fixed this problem as well.
This chapter leaves me with some unresolved questions that I hope you guys could answer:
Does GCC have a way to warn about function interposition? We certainly don't ever intend on this happening and I'd like to know about it if it does.
Should our software group adopt the practice of putting the keyword static in front of all functions that we don't want to be exposed?
Can interposition happen with functions introduced by static libraries?
Thanks for the help.
EDIT
I should note that my question is not just aimed at interposing over standard C library functions, but also functions contained in other libraries, perhaps 3rd party, perhaps ones created in-house. Essentially, I want to catch any instance of interpositioning regardless of where the interposed function resides.
This is really a linker issue.
When you compile a bunch of C source files the compiler will create an object file for each one. Each .o file will contain a list of the public functions in this module, plus a list of functions that are called by code in the module, but are not actually defined there i.e. functions that this module is expecting some library to provide.
When you link a bunch of .o files together to make an executable the linker must resolve all of these missing references. This is the point where interposing can happen. If there are unresolved references to a function called "mktemp" and several libraries provide a public function with that name, which version should it use? There's no easy answer to this and yes odd things can happen if the wrong one is chosen
So yes, it's a good idea in C to "static" everything unless you really do need to use it from other source files. In fact in many other languages this is the default behavior and you have to mark things "public" if you want them accessible from outside.
It sounds like what you want is for the tools to detect that there are name conflicts in functions - ie., you don't want your externally accessible function names form accidentally having the same name and therefore 'override' or hide functions with the same name in a library.
There was a recent SO question related to this problem: Linking Libraries with Duplicate Class Names using GCC
Using the --whole-archive option on all the libraries you link against may help (but as I mentioned in the answer over there, I really don't know how well this works or how easy it is to convince builds to apply the option to all libraries)
Purely formally, the interpositioning you describe is a straightforward violation of C language definition rules (ODR rule, in C++ parlance). Any decent compiler must either detect these situations, or provide options for detecting them. It is simply illegal to define more than one function with the same name in C language, regardless of where these functions are defined (Standard library, other user library etc.)
I understand that many platforms provide means to customize the [standard] library behavior by defining some standard functions as weak symbols. While this is indeed a useful feature, I believe the compilers must still provide the user with means to enforce the standard diagnostics (on per-function or per-library basis preferably).
So, again, you should not worry about interpositioning if you have no weak symbols in your libraries. If you do (or if you suspect that you do), you have to consult your compiler documentation to find out if it offers you with means to inspect the weak symbol resolution.
In GCC, for example, you can disable the weak symbol functionality by using -fno-weak, but this basically kills everything related to weak symbols, which is not always desirable.
If the function does not need to be accessed outside of the C file it lives in then yes, I would recommend making the function static.
One thing you can do to help catch this is to use an editor that has configurable syntax highlighting. I personally use SciTE, and I have configured it to display all standard library function names in red. That way, it's easy to spot if I am re-using a name I shouldn't be using (nothing is enforced by the compiler, though).
It's relatively easy to write a script that runs nm -o on all your .o files and your libraries and checks to see if an external name is defined both in your program and in a library. Just one of the many sane sensible services that the Unix linker doesn't provide because it's stuck in 1974, looking at one file at a time. (Try putting libraries in the wrong order and see if you get a useful error message!)
The Interposistioning occurs when the linker is trying to link separate modules.
It cannot occur within a module. If there are duplicate symbols in a module the linker will report this as an error.
For *nix linkers, unintended Interposistioning is a problem and it is difficult for the linker to guard against it.
For the purposes of this answer consider the two linking stages:
The linker links translation units into modulles (basically
applications or libraries).
The linker links any remaining unfound symbols by searching in modules.
Consider the scenario described in 'Expert C programming' and in SiegeX's question.
The linker fist tries to build the application module.
It sess that the symbol mktemp() is an external and tries to find a funcion definiton for the symbol. The linker finds
the definition for the function in the object code of the application module and marks the symbol as found.
At this stage the symbol mktemp() is completely resolved. It is not considered in any way tentative so as to allow
for the possibility that the anothere module might define the symbol.
In many ways this makes sense, since the linker should first try and resolve external symbols within the module it is
currently linking. It is only unfound symbols that it searches for when linking in other modules.
Furthermore, since the symbol has been marked as resolved, the linker will use the applications mktemp() in any
other cases where is needs to resolve this symbol.
Thus the applications version of mktemp() will be used by the library.
A simple way to guard agains the problem is to try and make all external sysmbols in your application or library unique.
For modules that are only going to shared on a limited basis, this can fairly easily be done by making sure all
extenal symbols in your module are unique by appending a unique identifier.
For modules that are widely shared making up unique names is a problem.

Linking two object files with a common symbol

If i have two object files both defining a symbol (function) "foobar".
Is it possible to tell the linker to obey the obj file order i give in the command line call and always take the symbol from first file and never the later one?
AFAIK the "weak" pragma works only on shared libraries but not on object files.
Please answer for all the C/C++ compiler/linker/operating system combinations you know cause i'm flexibel and use a lot of compiles (sun studio, intel, msvc, gcc, acc).
I believe that you will need to create a static library from the second object file, and then link the first object file and then the library. If a symbol is resolved by an object file, the linker will not search the libraries for it.
Alternatively place both object files in separate static libraries, and then the link order will be determined by their occurrence in the command line.
Creating a static library from an object file will vary depending on the tool chain. In GCC use the ar utility, and for MSVC lib.exe (or use the static library project wizard).
There is a danger here, the keyword here is called Interpositioning dependant code.
Let me give you an example here:
Supposing you have written a custom routine called malloc. And you link in the standard libraries, what will happen is this, functions that require the usage of malloc (the standard function) will use your custom version and the end result is the code may become unstable as the unintended side effect and something will appear 'broken'.
This is just something to bear in mind.
As in your case, you could 'overwrite' (I use quotes to emphasize) the other function but then how would you know which foobar is being used? This could lead to debugging grief when trying to figure out which foobar is called.
Hope this helps,
Best regards,
Tom.
You can make it as a .a file... Then the compiler gets the symbol and doesn't crib later

Resources