Create non-PIC shared libraries with ld - c

I have a bunch of object files that have been compiled without the -fPIC option. So the calls to the functions do not use #PLT. (source code is C and is compiled with clang).
I want to link these object files into a shared library that I can load at runtime using dlopen. I need to do this because I have to do a lot of setup before the actual .so is loaded.
But every time I try to link with the -shared option, I get the error -
relocation R_X86_64_PC32 against symbol splay_tree_lookup can not be used when making a shared object; recompile with -fPIC
I have no issues recompiling from source. But I don't want to use -fPIC. This is part of a research project where we are working on a custom compiler. PIC wouldn't work for the type of guarantees we are trying to provide in the compiler.
Is there some flag I can use with ld so that it generate load time relocating libraries. In fact I am okay with no relocations. I can provide a base address for the library and dlopen can fail if the virtual address is not available.
The command I am using for compiling my c files are equivalent to -
clang -m64 -c foo.c
and for linking I am using
clang -m64 -shared *.o -o foo.so
I say equivalent because it is a custom compiler (forked off clang) and has some extra steps. But it is equivalent.

It is not possible to dynamically load your existing non PIC objects with the expectation of it working without problems.
If you cannot recompile the original code to create a proper shared library that supports PIC, then I suggest you create a service executable that links to a static library composed of those objects. The service executable can then provide IPC/RPC/REST API/shared memory/whatever to allow your object code to be used by your program.
Then, you can author a shared library which is compiled with PIC that provides wrapper APIs that launches and communicates with the service executable to perform the actual work.
On further thought, this wrapper API library may as well be static. The dynamic aspect of it is performed by launching the service executable.

Recompiling the library's object files with the -fpic -shared options would be the best option, if this is possible!
man ld says:
-i Perform an incremental link (same as option -r).
-r
--relocatable
Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file’s magic number to "OMAGIC". If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur.
When an input file does not have the same format as the output file, partial linking is only supported if that input file does not contain any relocations. Different output formats can have further restrictions; for example some "a.out"-based formats do not support partial linking with input files in other formats at all.
I believe you can partially link your library object files into a relocatable (PIC) library, then link that library with your source code object file to make a shared library.
ld -r -o libfoo.so *.o
cp libfoo.so /foodir/libfoo.so
cd foodir
clang -m32 -fpic -c foo.c
clang -m32 -fpic -shared *.o -o foo.so
Regarding library base address:
(Again from man ld)
--section-start=sectionname=org
Locate a section in the output file at the absolute address given by org. You may use this option as many times as necessary to locate multiple sections in the command line. org must be a single hexadecimal integer; for compatibility with other linkers, you may omit the leading 0x usually associated with hexadecimal values. Note: there should be no white space between sectionname, the equals sign ("="), and org.
You could perhaps move your library's .text section?
--image-base value
Use value as the base address of your program or dll. This is the lowest memory location that will be used when your program or dll is loaded. To reduce the need to relocate and improve performance of your dlls, each should have a unique base address and not overlap any other dlls. The default is 0x400000 for executables, and 0x10000000 for dlls. [This option is specific to the i386 PE targeted port of the linker]

Related

On linking of shared libraries, are they really final, and if so, why?

I am trying to understand more about linking and shared library.
Ultimately, I wonder if it's possible to add a method to a shared library. For instance, suppose one has a source file a.c, and a library lib.so (without the source file). Let's furthermore assume, for simplicity, that a.c declares a single method, whose name is not present in lib.so. I thought maybe it might be possible to, at linking time, link a.o to lib.so while instructing to create newLib.so, and forcing the linker to export all methods/variable in lib.so to that the newLib.so is now basically lib.so with the added method from a.so.
More generally, if one has some source file depending on a shared library, can one create a single output file (library or executable) that is not dependent on the shared library anymore ? (That is, all the relevant methods/variable from the library would have been exported/linked/inlined to the new executable, hence making the dependency void). If that's not possible, what is technically preventing it ?
A somehow similar question has been asked here: Merge multiple .so shared libraries.
One of the reply includes the following text: "If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.: without explaining the technical details. Was it a mistake or does it hold ? If so, how to do it ?
Once you have a shared library libfoo.so the only ways you can use it
in the linkage of anything else are:-
Link a program that dynamically depends on it, e.g.
$ gcc -o prog bar.o ... -lfoo
Or, link another shared library that dynamically depends on it, e.g.
$ gcc -shared -o libbar.so bar.o ... -lfoo
In either case the product of the linkage, prog or libbar.so
acquires a dynamic dependency on libfoo.so. This means that prog|libfoo.so
has information inscribed in it by the linker that instructs the
OS loader, at runtime, to find libfoo.so, load it into the
address space of the current process and bind the program's references to libfoo's exported symbols to
the addresses of their definitions.
So libfoo.so must continue to exist as well as prog|libbar.so.
It is not possible to link libfoo.so with prog|libbar.so in
such a way that libfoo.so is physically merged into prog|libbar.so
and is no longer a runtime dependency.
It doesn't matter whether or not you have the source code of the
other linkage input files - bar.o ... - that depend on libfoo.so. The
only kind of linkage you can do with a shared library is dynamic linkage.
This is in complete contrast with the linkage of a static library
You wonder about the statement in this this answer where it says:
If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.
The author is just observing that if I have source files
foo_a.c foo_b.c... bar_a.c bar_b.c
which I compile to the corresponding object files:
foo_a.o foo_b.o... bar_a.o bar_b.o...
or if I simply have those object files. Then as well as - or instead of - linking them into two shared libraries:
$ gcc -shared -o libfoo.so foo_a.o foo_b.o...
$ gcc -shared -o libbar.so bar_a.o bar_b.o...
I could link them into one:
$ gcc -shared -o libfoobar.so foo_a.o foo_b.o... bar_a.o bar_b.o...
which would have no dependency on libfoo.so or libbar.so even if they exist.
And although that could be straightforward it could also be false. If there is
any symbol name that is globally defined in any of foo_a.o foo_b.o... and
also globally defined in any of bar_a.o bar_b.o... then it will not matter
to the linkage of either libfoo.so or libbar.so (and it need not be dynamically
exported by either of them). But the linkage of libfoobar.so will fail for
multiple definition of name.
If we build a shared library libbar.so that depends on libfoo.so and has
itself been linked with libfoo.so:
$ gcc -shared -o libbar.so bar.o ... -lfoo
and we then want to link a program with libbar.so, we can do that in such a way
that we don't need to mention its dependency libfoo.so:
$ gcc -o prog main.o ... -lbar -Wl,-rpath=<path/to/libfoo.so>
See this answer to follow that up. But
this doesn't change the fact that libbar.so has a runtime dependency on libfoo.so.
If that's not possible, what is technically preventing it?
What technically prevents linking a shared library with some program
or shared library targ in a way that physically merges it into targ is that a
shared library (like a program) is not the sort of thing that a linker knows
how to physically merge into its output file.
Input files that the linker can physically merge into targ need to
have structural properties that guide the linker in doing that merging. That is the structure of object files.
They consist of named input sections of object code or data that are tagged with various attributes.
Roughly speaking, the linker cuts up the object files into their sections and distributes them into
output sections of the output file according to their attributes, and makes
binary modifications to the merged result to resolve static symbol references
or enable the OS loader to resolve dynamic ones at runtime.
This is not a reversible process. The linker can't consume a program or
shared library and reconstruct the object files from which it was made to
merge them again into something else.
But that's really beside the point. When input files are physically
merged into targ, that is called static linkage.
When input files are just externally referenced in targ to
make the OS loader map them into a process it has launched for targ,
that is called dynamic linkage. Technical development has given us
a file-format solution to each of these needs: object files for static linkage, shared libraries
for dynamic linkage. Neither can be used for the purpose of the other.

Difference using avr-gcc and avr-ld in ELF (Executable and Linkable Format) file generation

Sorry if this might be off-topic.
In the process of generating .hex(Intel HEX format) files using avr-gcc or avr-ld the output(final result) is significantly different. As an minimal clarification I am talking about the step of generating ELF file just after generating the Object files.
On my first attempt, I used avr-ld to generate my ELF file. Process works smoothly but after generating HEX files and uploading to my board it did nothing (as in uploading an blank HEX file).
On my second try, I followed the advice found here:
It is important to specify the MCU type when linking. The compiler uses the -mmcu option to choose start-up files and run-time libraries that get linked together. If this option isn't specified, the compiler defaults to the 8515 processor environment, which is most certainly what you didn't want.
It did as I expected. Uploaded the HEX file and my board updated accordingly.
So my questions are as follows:
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
What is the logic behind this configuration? Is my way of thinking wrong (in using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the EFL files, and avr-objcopy to strip only usefull information and changing the format of the file ELF -> HEX)?
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
Why did the linker (avr-ld) lose information about the micro-controller I am using. I thought that the MCU information is stored in the Object files.
The linker doesn't llose that information, it was never supplied in the first place. Object files resp. ELF headers are on level of "emulation", i.e. granularity like -mmcu=arch where arch is one of avr2, avr25, avrxmega2, avrtiny etc.
using avr-gcc for compilation/generating .o files, avr-ld to link the .o files and generate the ELF files, and avr-objcopy to strip only usefull information and changing the format of the file ELF → HEX?
avr-gcc is not a compiler, it's just a driver program that calls other programs like compiler proper cc1 or cc1plus, assembler as, linker ld depending on file type and options provided. The driver will add options to these programs which greatly simplifies their usage, many of which are described in specs-attiny25 (since v5 onwards).
As an example, take a simple main.c with a main function returning 0, and process it with
avr-gcc main.c -o main.elf -mmcu=attiny25 -save-temps -v -Wl,-v
The -v makes the driver show the commands it is issuing, and -save-temps stores intermediate files like assembly. For avr-gcc v8.5, the link process starts with a call of collect2:
.../bin/../libexec/gcc/avr/8.5.0/collect2 -plugin .../bin/../libexec/gcc/avr/8.5.0/liblto_plugin.so -plugin-opt=.../bin/../libexec/gcc/avr/8.5.0/lto-wrapper -plugin-opt=-fresolution=main.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lm -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lattiny25 -mavr25 -o main.elf .../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack/crtattiny25.o -L.../bin/../lib/gcc/avr/8.5.0/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib/avr25/tiny-stack -L.../bin/../lib/gcc/avr/8.5.0 -L.../bin/../lib/gcc -L.../bin/../lib/gcc/avr/8.5.0/../../../../avr/lib main.o -v --start-group -lgcc -lm -lc -lattiny25 --end-group
where ... stands for the absolute path where the tools are installed. As you can see, the driver adds some salt, for example it links against startup code crtattiny25.o, standard libs like libc.a, libm.a, libgcc.a. collect2 gathers some extra information needed to build startup code, and then calls back the compiler1 and finally ld.
The options provided to ld look very much like the ones provided to collect2. The only device-specific stuff is: startup code crtattiny25.o and device lib libattiny25.a. Many other device-specific stuff has already been compiler into the code, like SFR addresses, #ifdef __AVR_ATtiny25__ etc.
Is any way in achieving the same output using avr-ld as when using avr-gcc for generating my ELF file?
You could provide all that options by hand.
1Calling back the compiler is needed for LTO (link-time optimization) as of -flto. The linker calls a plugin which calls the compiler with LTO byte-code, compiles it to assemblwith the LTO-compiler lto1, then as, then ld. Newer versions of the tool are always using linker plugin when without lto compilation; one can -fno-use-linker-plugin which makes the call chain and options somewhat simpler.

Linking error in static lib unless used in main project

I'm creating a little static library for having thread pools, and it depends on 2 other homemade static libraries (a homemade printf and a homemade mini libc).
But sub-functions like ft_bzero are not linked in the project unless I use them on the root project, the one that needs to use thread pools library. So I have the linking error coming from my thpool lib.
Sample :
cc -Wall -Werror -Wextra -MD -I ./ -I ./jqueue -I ../libft/incs -I
../printf/incs -o .objs/thpool_create.o -c ./thpool_create.c
ar rc libthpool.a ./.objs/thpool_create.o etcetc
In the libraries, I compile every .o and use an ar rc libthpool.a *.o. Then I compile .o from main project (a single test.c actually), and then
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lft -lftprintf -lthpool -lpthread
How can I solve my errors?
Since the code in the ftpool library uses code from ft and ftprintf, you (almost certainly) need to list the libraries in the reverse order:
cc .objs/test.o -o test -L./libft -L./printf -L./thpool -lthpool -lftprintf -lft -lpthread
When scanning a static library, the linker looks for definitions of symbols that are currently undefined. If your test code only calls functions from thpool, then none of the symbols in ft are referenced when the ft library is scanned, so nothing is included from the library; if none of the symbols from ftprintf are referenced when the ftprintf library is scanned, nothing is included from ftprintf either. When it comes across the symbols in thpool that reference things from ft or ftprintf, it's too late; the linker doesn't rescan the libraries. Hence you need to list the libraries in an order such that all references from one library (A) to another (B) are found by linking (A) before (B). If the test code references some of the functions in ft or ftprintf, you may get lucky, or a bit lucky; some symbols may be linked in. But if there are functions in thpool that make the first reference to a function in ft, with the order in the question, you've lost the chance to link everything. Hence the suggested reordering.
Another (very grubby, but nonetheless effective) technique is to rescan the static libraries by listing them several times on the command line.
With shared libraries, the rules of linking are different. If a shared library satisfies any symbol, the whole library will be available, so the linker remembers all the defined symbols, and you might well get away with the original link order.
You might need to look up 'topological sort'. You should certainly aim to design your static libraries so that there are no loops in the dependencies; that leads to cycles of dependencies, and the only reliable solutions are either to rescan the libraries or combine the libraries.

Statically linking against LAPACK

I'm attempting to do a release of some software and am currently working through a script for the build process. I'm stuck on something I never thought I would be, statically linking LAPACK on x86_64 linux. During configuration AC_SEARCH_LIB([main],[lapack]) works, but compilation of the lapack units do not work, for example undefiend reference to 'dsyev_' --no lapack/blas routine goes unnoticed.
I've confirmed I have the libraries installed and even compiled them myself with the appropriate options to make them static with the same results.
Here is an example I had used in my first experience with LAPACK a few years ago that works dynamically, but not statically: http://pastebin.com/cMm3wcwF
The two methods I'm using to compile are the following,
gcc -llapack -o eigen eigen.c
gcc -static -llapack -o eigen eigen.c
Your linking order is wrong. Link libraries after the code that requires them, not before. Like this:
gcc -o eigen eigen.c -llapack
gcc -static -o eigen eigen.c -llapack
That should resolve the linkage problems.
To answer the subsequent question why this works, the GNU ld documentation say this:
It makes a difference where in the command you write this option; the
linker searches and processes libraries and object files in the order
they are specified. Thus, foo.o -lz bar.o' searches libraryz' after
file foo.o but before bar.o. If bar.o refers to functions in `z',
those functions may not be loaded.
........
Normally the files found this way are library files—archive files
whose members are object files. The linker handles an archive file by
scanning through it for members which define symbols that have so far
been referenced but not defined. But if the file that is found is an
ordinary object file, it is linked in the usual fashion.
ie. the linker is going to make one pass through a file looking for unresolved symbols, and it follows files in the order you provide them (ie. "left to right"). If you have not yet specified a dependency when a file is read, the linker will not be able to satisfy the dependency. Every object in the link list is parsed only once.
Note also that GNU ld can do reordering in cases where circular dependencies are detected when linking shared libraries or object files. But static libraries are only parsed for unknown symbols once.

Restricting symbols in a Linux static library

I'm looking for ways to restrict the number of C symbols exported to a Linux static library (archive). I'd like to limit these to only those symbols that are part of the official API for the library. I already use 'static' to declare most functions as static, but this restricts them to file scope. I'm looking for a way to restrict to scope to the library.
I can do this for shared libraries using the techniques in Ulrich Drepper's How to Write Shared Libraries, but I can't apply these techniques to static archives. In his earlier Good Practices in Library Design paper, he writes:
The only possibility is to combine all object files which need
certain internal resources into one using 'ld -r' and then restrict the symbols
which are exported by this combined object file. The GNU linker has options to
do just this.
Could anyone help me discover what these options might be? I've had some success with 'strip -w -K prefix_*', but this feels brutish. Ideally, I'd like a solution that will work with both GCC 3 and 4.
Thanks!
I don't believe GNU ld has any such options; Ulrich must have meant objcopy, which has many such options: --localize-hidden, --localize-symbol=symbolname, --localize-symbols=filename.
The --localize-hidden in particular allows one to have a very fine control over which symbols are exposed. Consider:
int foo() { return 42; }
int __attribute__((visibility("hidden"))) bar() { return 24; }
gcc -c foo.c
nm foo.o
000000000000000b T bar
0000000000000000 T foo
objcopy --localize-hidden foo.o bar.o
nm bar.o
000000000000000b t bar
0000000000000000 T foo
So bar() is no longer exported from the object (even though it is still present and usable for debugging). You could also remove bar() all together with objcopy --strip-unneeded.
Static libraries can not do what you want for code compiled with either GCC 3.x or 4.x.
If you can use shared objects (libraries), the GNU linker does what you need with a feature called a version script. This is usually used to provide version-specific entry points, but the degenerate case just distinguishes between public and private symbols without any versioning. A version script is specified with the --version-script= command line option to ld.
The contents of a version script that makes the entry points foo and bar public and hides all other interfaces:
{ global: foo; bar; local: *; };
See the ld doc at: http://sourceware.org/binutils/docs/ld/VERSION.html#VERSION
I'm a big advocate of shared libraries, and this ability to limit the visibility of globals is one their great virtues.
A document that provides more of the advantages of shared objects, but written for Solaris (by Greg Nakhimovsky of happy memory), is at http://developers.sun.com/solaris/articles/linker_mapfiles.html
I hope this helps.
The merits of this answer will depend on why you're using static libraries. If it's to allow the linker to drop unused objects later then I have little to add. If it's for the purpose of organisation - minimising the number of objects that have to be passed around to link applications - this extension of Employed Russian's answer may be of use.
At compile time, the visibility of all symbols within a compilation unit can be set using:
-fvisibility=hidden
-fvisibility=default
This implies one can compile a single file "interface.c" with default visibility and a larger number of implementation files with hidden visibility, without annotating the source. A relocatable link will then produce a single object file where the non-api functions are "hidden":
ld -r interface.o implementation0.o implementation1.o -o relocatable.o
The combined object file can now be subjected to objcopy:
objcopy --localize-hidden relocatable.o mylibrary.o
Thus we have a single object file "library" or "module" which exposes only the intended API.
The above strategy interacts moderately well with link time optimisation. Compile with -flto and perform the relocatable link by passing -r to the linker via the compiler:
gcc -fuse-linker-plugin -flto -nostdlib -Wl,-r {objects} -o relocatable.o
Use objcopy to localise the hidden symbols as before, then call the linker a final time to strip the local symbols and whatever other dead code it can find in the post-lto object. Sadly, relocatable.o is unlikely to have retained any lto related information:
gcc -nostdlib -Wl,-r,--discard-all relocatable.o mylibrary.o
Current implementations of lto appear to be active during the relocatable link stage. With lto on, the hidden=>local symbols were stripped by the final relocatable link. Without lto, the hidden=>local symbols survived the final relocatable link.
Future implementations of lto seem likely to preserve the required metadata through the relocatable link stage, but at present the outcome of the relocatable link appears to be a plain old object file.
This is a refinement of the answers from EmployedRussian and JonChesterfield, which may be helpful if you're generating both dynamic and static libraries.
Start with the standard mechanism for hiding symbols in DSOs (the dynamic version of your lib). Compile all files with -fvisibility=hidden. In the header file which defines your API, change the declarations of the classes and functions you want to make public:
#define DLL_PUBLIC __attribute__ ((visibility ("default")))
extern DLL_PUBLIC int my_api_func(int);
See here for details. This works for both C and C++. This is sufficient for DSOs, but you'll need to add these build steps for static libraries:
ld -r obj1.o obj2.o ... objn.o -o static1.o
objcopy --localize-hidden static1.o static2.o
ar -rcs mylib.a static2.o
The ar step is optional - you can just link against static2.o.
My way of doing it is to mark everything that is not to be exported with INTERNAL,
include guard all .h files, compile dev builds with -DINTERNAL= and compile release builds with a single .c file that includes all other library .c files with -DINTERNAL=static.

Resources