nokogori gem comes with its own version of libxml2. Moreover it warns about libxml2.so of a different version being loaded before it was required:
if compiled_parser_version != loaded_parser_version
["Nokogiri was built against LibXML version #{compiled_parser_version}, but has dynamically loaded #{loaded_parser_version}"]
It basically compares LIBXML_DOTTED_VERSION macro and xmlParserVersion global variable:
rb_const_set( mNokogiri,
rb_intern("LIBXML_VERSION"),
NOKOGIRI_STR_NEW2(LIBXML_DOTTED_VERSION)
);
rb_const_set( mNokogiri,
rb_intern("LIBXML_PARSER_VERSION"),
NOKOGIRI_STR_NEW2(xmlParserVersion)
);
And I'm experiencing it firsthand. When rmagick (which dynamically links to libxml2.so, ldd confirms that) is required before nokogiri, the latter complains.
From what I can see nokogiri is linked to libxml2 statically. First that is the default (supposedly). Then when rmagick is not required I can't see libxml2.so in /proc/PID/maps. I neither can see another version of libxml2.so. ldd doesn't list libxml2.so as a nokogiri.so's dependency. objdump lists xmlReadMemory (and friends) as a nokogori.so's symbol (probably a sign that it was linked statically).
So how come can nokogiri access libxml2.so's variables? Does that mean that loading libxml2.so overrides any statically linked versions? Can that happen in the middle of code execution?
So how come can nokogiri access libxml2.so's variables?
This is by design (and due to the fact that nokogiri was built incorrectly).
UNIX shared libraries are designed to emulate archive libraries. This means that the first ELF image to export a given symbol wins (there are some complications for libraries linked with -Bstatic flag, but we'll ignore them for now).
Does that mean that loading libxml2.so overrides any statically linked versions?
Yes, if statically linked version is also exported and calls to it go though PLT.
Can that happen in the middle of code execution?
With lazy symbol resolution (which is default, except when LD_BIND_NOW or -z now are in effect) it will always happen in the middle of code execution.
Now, the problem is that if nokogiri links in a static copy of libxml.a, it should hide that fact by localizing that copy within itself and not exporting any of its symbols. That would prevent end users from having to deal with symbol conflicts.
Your best bet is to either build your own nokogiri compiling and linking it against the same version of libxml, or to contact nokogiri maintainers and ask them to fix their builds.
Related
Is there a way to produce a C static library from Go code, but without Go runtime function definitions?
Rationale:
Project A creates a C static library with go build -buildmode=c-archive, libA.a .
Works well: project B uses pure C and is able to easily create an executable, statically linking with libA.a, all is fine.
Problem 1: project C happens to also use Go, but would like to use libA.a as a regular C library. Now it has a link problem: the Go runtime functions such as e.g. _cgo_panic are now defined both in project C runtime (as it uses Go) and in libA.a.
Problem 2: project D uses pure C, same as B. Yet it wants to use two different libraries from project A, e.g. libA.a and some libA2.a. Sadly, it does not link either, because Go runtime functions are now defined in both libA.a and libA2.a.
The problems faced by project C and project D could be easily resolved if project A could produce its libraries without the Go runtime definitions inside. Project C could just link with libA.a. Project D would link with libA.a, libA2.a and some libGo.a that would contain definitions of all the Go runtime stuff.
What I tried:
Using linker flags at 'project C' level, such as -Wl,--allow-multiple-definition. Now its build fails with a cryptic message 'function symbol table not sorted by program counter'.
Manually removing go.o from 'libA.a' (as it's just an "ar" archive): didn't work as 'go.o' also contained implementations of my exported functions, so I've removed too much.
Using go build -buildmode=c-shared. As expected, it produces a dynamic library which uses another format, so I could not directly use it as a static library.
Any solution at the client side (such as finding a proper way to ignore duplicate definitions at the link stage for project C) would be also considered a valid answer.
I can also accept a negative answer (no solution) if it provides enough evidence.
Update: see a related question Is there a way to include multiple c-archive packages in a single binary
With the current implementation it's not going to work to use -buildmode=c-archive multiple times and put the results into multiple shared libraries, as you've discovered. The essential problem is that there has to be only one Go runtime, but you have multiple runtimes. When using -buildmode=c-archive there's no way to isolate the different runtimes.
The -buildmode=c-shared libraries differ from buildmode=c-archive in that they are built with -Bsymbolic which forces all their local references to be local. The effect is that we have multiple Go runtimes, but they don't refer to each other so there is no confusion.
You could try adding -Wl,-Bsymbolic to build each shared library that includes Go code in c-archive if your C code doesn't mind being linked with -Bsymbolic.
I wish you luck.
How to link a binary in a manner to be compatible with two existing version of a library that have conflicting SONAME ?
Those two versions don't share same SONAME prefix. One is libcapi10.so.3 and the other is libcapi10.so.4.
I can't recompile them since i get them as binary, and since those are certified crypto libraries i can't request new one with right SONAME. Of course i would not have faced any problem if one was libcap10.so.3 and the other libcap10.so.3.1 since i would just need to link over the first to be compatible with the second.
Those two libraries are told to be binary compatible ( i should trust this info ).
I searched but didn't find any good way to do , either with linker options or using objcopy. I would like to avoid patching binary by hand to use it at compilation linking time.
So back to my initial question : How to specify SONAME to be ( in this case libcap10.so ) used for link ?
( I already searched, and my current findings are just that it is a no go, but unfortunately this is a requirement ... ).
Update:
i patched .so library using a binary sed-like tool replacing libcapi10.so.6\0 with libcapi10.so\0, what works since new name is shorter than previous and that elf structure for SONAME is a C-string ended with a 0 and that elf checksum is not used during gcc linking. i used that patched library only at compilation time, then i can use either one or the other original library on my target system with the same binary.
patchelf is your friend. You can do something like: patchelf --replace-needed libcapi10.so.3 libcapi10.so.4 <your_thing>.
Patchelf is useful for a variety of other things such as changing RPATH, too. Check out the manpage. Very nifty toy.
My current best answer allowing to fullfill my need.
I patched .so library using a binary sed-like tool replacing libcapi10.so.6\0 with libcapi10.so\0, what works since new name is shorter than previous and that elf structure for SONAME is a C-string ended with a 0 and that elf checksum is not used during gcc linking. i used that patched library only at compilation time, then i can use either one or the other original library on my target system with the same binary.
best hints in stackverflow found there :
How can I change the filename of a shared library after building a program that depends on it?
I have existing Linux shared object file (shared library) which has been stripped. I want to produce a new version of the library with some additional functions included. I had hoped that something like the following would work, but does not:
ld -o newlib.so newfuncs.o --whole-archive existinglib.so
I do not have the source to the existing library. I could get it but getting a full build environment with the necessary dependencies in place would be a lot of effort for what seems like a simple problem.
You might like to try coming at this from a slightly different angle by loading your object using preloading.
Set LD_PRELOAD to point to your new object
export LD_PRELOAD=/my/newfuncs/dir/newfuncs.o
and specify the existing library in the same way via your LD_LIBRARY_PATH.
This will then instruct the run time linker to search for needed symbols in your object before looking in objects located in your LD_LIBRARY_PATH.
BTW You can put calls in your object to then call the function that would've been called if you hadn't specified an LD_PRELOAD object or objects. This is why this is sometimes called interposing.
This is how many memory allocation analysis tools work. They interpose versions of malloc() and free() that records the calls to alloc() and free() before then calling the actual system alloc and free functions to perform the memory management.
There's plenty of tutorials on the interwebs on using LD_PRELOAD. One of the original and best is still "Building library interposers for fun and profit". Though written nine years ago and written for Solaris it is still an excellent resource.
HTH and good luck.
Completely untested idea:
# mv existinglib.so existinglib-real.so
# ld -o exlistinglib.so -shared newfuncs.o -lexistinglib-real
The dynamic linker, when loading a program that expects to load existinglib.so, will find your version, and also load existinglib-real.so on which it depends. It doesn't quite achieve the stated goal of your question, but should look as if it does to the program loading the library.
Shared libraries are not archives, they are really more like executables. You cannot therefore just stuff extra content into them in the same way you could for a .a static library.
Short answer: exactly what you asked for can't be done.
Longer answer: depending on why you want to do this, and exactly how existinglib.so has been linked, you may get close to desired behavior. In addition to LD_PRELOAD and renaming existinglib.so already mentioned, you could also use a linker script (cat /lib/libc.so to see what I mean).
On 'C', Linux,
Do I need static libraries to statically link, or the shared ones I have suffice?
If not, why not? (Don't they contain the same data?)
Yes, you need static libraries to build a statically linked executable.
Static libraries are bundles of compiled objects. When you statically link with to library, it is effectively the same as taking the compilation results of that library, unpacking them in your current project, and using them as if they were your own objects.
Dynamic libraries are already linked. This means that some information like relocations have already been fixed up and thrown out.
Additionally, dynamic libraries must be compiled as position-independent code. This is not a restriction on static libraries, and results in a significant difference in performance on some common platforms (like x86).
There exist tools like ELF Statifier which attempt to bundle dynamically-linked libraries into a dynamically-linked executable, but it is very difficult to generate a correctly-working result in all circumstances.
There is no such thing as static compilation, only static linking. And for that, you need static libraries. The difference between static and dynamic linking is that with the former, names are resolved at link-time (just after compile-time), wheras with the latter, they are resolved just as the program starts running.
Static and dynamic libraries may or may not contain the same information, depending on lots of factors. The decision on whether to statically or dynamically link your code is an important one, and will often influence application architecture.
All libraries you link into a statically linked program must be the static variant. While the dynamic (libfoo.so) and static (libfoo.a) libraries have the same functions in them, they are different format files and so you need the matching type for your program.
Another option is Ermine (http://magicErmine.com)
It's like statifier, but able to deal with memory randomization.
I'm planning to release some compiled code that shall be linked by client applications on MacOSX.
The distribution is some kind of code library and a set of header files defining the public interface for the library.The code is internally C++ but its public interface (i.e what's being shown in the headers) is completely C.
These are my requirements or atleast what I hope I can accomplish:
I want my library to be as agnostic
as possible for what version of OSX
and GCC the user is running. Having
separate libraries for 64 bit and 32
bit is okay though.
I want my library
to be loadable from languages that
supports loading C libraries such as
python or similar.
I want my
libraries internal symbols to be
isolated from the code it's being
linked into. I don't want to have
duplicate symbol errors because we
happen to name an internal function
in the same way. My C++ code is properly namespaced so this may not be as big of an issue though, but some of the libraries I depend on is C and can be an issue (see next point).
I want my library
dependencies to be safe. My library
depends on some libraries such as
libpng, boost and stl and I don't
want issues because some users don't
necessarily have all of them installed
or get problems because they have
been compiled with other flags or
have different versions than I have.
On Windows I use a DLL with an export library and link all my dependencies statically into the dll. It fulfills all the criteria above and if I can get the same result on OSX it would be great, however I've heard that dynamic libraries tend not to isolate symbols on mac in the same way.
Is there some kind of best practice for this on OSX?
A normal OS X .dylib pretty much satisfies your requirements, with the note that you will want to have an exports file that the linker uses to determine exactly which symbols are exported (to prevent leaking your internal symbols).
In order to make your own library dependencies safe, you will probably need to either include those libraries with yours or link them statically into your library.
edit: To answer your follow-up question of how to apply an exports file to a link command, the man page for ld has the following to say:
-exported_symbols_list filename
The specified filename contains a list of global symbol names
that will remain as global symbols in the output file. All
other global symbols will be treated as if they were marked
as __private_extern__ (aka visibility=hidden) and will not be
global in the output file. The symbol names listed in file-
name must be one per line. Leading and trailing white space
are not part of the symbol name. Lines starting with # are
ignored, as are lines with only white space. Some wildcards
(similar to shell file matching) are supported. The *
matches zero or more characters. The ? matches one charac-
ter. [abc] matches one character which must be an 'a', 'b',
or 'c'. [a-z] matches any single lower case letter from 'a'
to 'z'.
So, if your library had only two functions that you wanted to be public, lets call them foo and bar, and they were C functions (so the symbol names aren't mangled), your exports file (let's call it myLibrary.exports) would contain these two lines:
_foo
_bar
and maybe some comments, etc. When you do the final link step to build the library, you would pass the -exported_symbols_list myLibrary.exports flag to the linker. This has the additional benefit that the link will fail if you don't provide one of the exported symbols; this can catch a lot of "oops, I forgot to include that file in the build" mistakes.
You don't need to use the command-line tools to do all this, of course. In the build settings for a dynamic library in XCode, you will find Exported Symbols File (undefined by default); set it to the path to your exports file there and it will be passed to the linker.
The key term you need is 'framework'. You need to create a 'universal' framework that is self-contained. ('Universal' is Apple-ease for 'compile several times and package into one library.) It's not as straightforward as on Windows in terms of encapsulation, but the necessary linker options are there.