gcc: Reduce libc required version - c

I am trying to run a newly compiled binary on some oldish 32bits RedHat distribution.
The binary is compiled C (not++) on a CentOS 32bits VM running libc v2.12.
RedHat complains about libc version: error while loading shared libraries: requires glibc 2.5 or later dynamic linker
Since my program is rather simplistic, It is most likely not using anything new from libc.
Is there a way to reduce libc version requirement

An untested possible solution
What is "error while loading shared libraries: requires glibc 2.5 or later dynamic linker"?
The cause of this error is the dynamic binary (or one of its dependent
shared libraries) you want to run only has .gnu.hash section, but the
ld.so on the target machine is too old to recognize .gnu.hash; it only
recognizes the old-school .hash section.
This usually happens when the dynamic binary in question is built
using newer version of GCC. The solution is to recompile the code with
either -static compiler command-line option (to create a static
binary), or the following option:
-Wl,--hash-style=both
This tells the link editor ld to create both .gnu.hash and .hash
sections.
According to ld documentation here, the old-school .hash section
is the default, but the compiler can override it. For example, the GCC
(which is version 4.1.2) on RHEL (Red Hat Enterprise Linux) Server
release 5.5 has this line:
$ gcc -dumpspecs
....
*link:
%{!static:--eh-frame-hdr} %{!m32:-m elf_x86_64} %{m32:-m elf_i386} --hash-style=gnu %{shared:-shared} ....
^^^^^^^^^^^^^^^^
...
For more information, see here.

I already had the same problem, trying to compile a little tool (I wrote) for an old machine for which I had not compiler. I compiled it on an up to date machine, and the binary required at least GLIBC 2.14 in order to run.
By making a dump of the binary (with xxd), I found this :
....
5f64 736f 5f68 616e 646c 6500 6d65 6d63 _dso_handle.memc
7079 4040 474c 4942 435f 322e 3134 005f py##GLIBC_2.14._
....
So I replaced the memcpy calls in my code by a call to an home-made memcpy, and the dependency with the glibc 2.14 magically disappeared.
I'm sorry I can't really explain why it worked, or I can't explain why it didn't work before the modification.
Hope it helped !

Ok then, trying to find some balance between elegance and brute force, I downloaded a VM matching the target kernel version, hence fixing library issues.
The whole thing (download + yum install gcc) took less than 30 minutes.
References: Virtual machines, Kernel Version Mapping Table

Related

RV32E version of the soft-float methods such as __divdi3 and __mulsi3

I have managed to build an RV32E cross-compiler on my Intel Ubuntu machine by using the official riscv GitHub toolchain (github.com/riscv/riscv-gnu-toolchain) with the following configuration:-
./configure --prefix=/home/riscv --with-arch=rv32i --with-abi=ilp32e
The ip32e specifies soft float for RV32E. This generates a working compiler that works fine on my simple C source code. If I disassemble the created application then it does indeed stick to the RV32E specification. It only generates assembly for my code that uses the first 16 registers.
I use static linking and it pulls in the expected set of soft float routines such as __divdi3 and __mulsi3. Unfortunately the pulled in routines use all 32 registers and not the restricted lower 16 for RV32E. Hence, not very useful!
I cannot find where this statically linked code is coming from, is it compiled from C source and therefore being compiled without the RV32E restriction? Or maybe it was written as hand coded assembly that has been written only for the full RV32I instead of RV32E? I tried to grep around the source but have had no luck finding anything like the actual code that is statically linked.
Any ideas?
EDIT: Just checked in more details and the compiler is not generating using just the first 16 registers. Turns out with a simple test routine it manages to only use the first 16 but more complex code does use others as well. Maybe RV32E is not implemented yet?
The configure.ac file contains this code:
AS_IF([test "x$with_abi" == xdefault],
[AS_CASE([$with_arch],
[*rv64g* | *rv64*d*], [with_abi=lp64d],
[*rv64*f*], [with_abi=lp64f],
[*rv64*], [with_abi=lp64],
[*rv32g* | *rv32*d*], [with_abi=ilp32d],
[*rv32*f*], [with_abi=ilp32f],
[*rv32*], [with_abi=ilp32],
[AC_MSG_ERROR([Unknown arch])]
)])
Which seems to map your input of rv32i to the ABI ilp32, ignoring the e. So yes, it seems support for the ...e ABIs is not fully implemented yet.

Port glibc 2.25 and test memory functions

I was investigating whether a few memory functions(memcpy, memset, memmove) in glibc-2.25 with various versions(sse4, ssse3, avx2, avx512) could have performance gain for our server programs in Linux(glibc 2.12).
My first attempt was to download a tar ball of glibc-2.25 and build/test following the instructions here https://sourceware.org/glibc/wiki/Testing/Builds. I manually commented out kernel version check and everything went well. Then a test program was linked with newly built glibc with the procedure listed in section "Compile against glibc build tree" of glibc wiki and 'ldd test' shows that it indeed depended on the expected libraries:
# $GLIBC is /data8/home/wentingli/temp/glibc/build
libm.so.6 => /data8/home/wentingli/temp/glibc/build/math/libm.so.6 (0x00007fe42f364000)
libc.so.6 => /data8/home/wentingli/temp/glibc/build/libc.so.6 (0x00007fe42efc4000)
/data8/home/wentingli/temp/glibc/build/elf/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fe42f787000)
libdl.so.2 => /data8/home/wentingli/temp/glibc/build/dlfcn/libdl.so.2 (0x00007fe42edc0000)
libpthread.so.0 => /data8/home/wentingli/temp/glibc/build/nptl/libpthread.so.0 (0x00007fe42eba2000)
I use gdb to verify which memset/memcpy was actually called but it always shows that __memset_sse2_unaligned_erms is used while I was expecting that some more advanced version of the function(avx2,avx512) could be in use.
My questions are:
Did glibc-2.25 select the most suitable version of memory functions automatically according to cpu/os/memory address? If not, am I missing any configuration during glibc build or something wrong with my setup?
Is there any other alternatives for porting memory functions from newer glibc?
Any help or suggestion would be appreciated.
On x86, glibc will automatically select an implementation which is most suitable for the CPU of the system, usually based on guidance from Intel. (Whether this is the best choice for your scenario might not be clear because the performance trade-offs for many of the vector instructions are extremely complex.) Only if you explicitly disable IFUNCs in the toolchain, this will not happen, but __memset_sse2_unaligned_erms isn't the default implementation, so this does not apply here. The ERMS feature is pretty recent, so this is not completely unreasonable.
Building a new glibc is probably the right approach to test these string functions. Theoretically, you could also use LD_PRELOAD to override the glibc-provided functions, but it is a bit cumbersome to build the string functions outside the glibc build system.
If you want to run a program against a patched glibc without installing the latter, you need to use the testrun.sh script in the glibc build directory (or a similar approach).

How do I use the GNU linker instead of the Darwin Linker?

I'm running OS X 10.12 and I'm developing a basic text-based operating system. I have developed a boot loader and that seems to be running fine. My only problem is that when I attempt to compile my kernel into pure binary, the linker won't work. I have done some research and I think that this is because of the fact OS X runs the Darwin linker and not the GNU linker. Because of this, I have downloaded and installed the GNU binutils. However, it still won't work...
Here is my kernel:
void main() {
// Create pointer to a character and point it to the first cell of video
// memory (i.e. the top-left)
char* video_memory = (char*) 0xb8000;
// At that address, put an x
*video_memory = 'x';
}
And this is when I attempt to compile it:
Hazims-MacBook-Pro:32 bit root# gcc -ffreestanding -c kernel.c -o kernel.o
Hazims-MacBook-Pro:32 bit root# ld -o kernel.bin -T text 0x1000 kernel.o --oformat binary
ld: unknown option: -T
Hazims-MacBook-Pro:32 bit root#
I would love to know how to solve this issue. Thank you for your time.
-T is a gcc compiler flag, not a linker flag. Have a look at this:
With these components you can now actually build the final kernel. We use the compiler as the linker as it allows it greater control over the link process. Note that if your kernel is written in C++, you should use the C++ compiler instead.
You can then link your kernel using:
i686-elf-gcc -T linker.ld -o myos.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
Note: Some tutorials suggest linking with i686-elf-ld rather than the compiler, however this prevents the compiler from performing various tasks during linking.
The file myos.bin is now your kernel (all other files are no longer needed). Note that we are linking against libgcc, which implements various runtime routines that your cross-compiler depends on. Leaving it out will give you problems in the future. If you did not build and install libgcc as part of your cross-compiler, you should go back now and build a cross-compiler with libgcc. The compiler depends on this library and will use it regardless of whether you provide it or not.
This is all taken directly from OSDev, which documents the entire process, including a bare-bones kernel, very clearly.
You're correct in that you probably want binutils for this especially if you're coding baremetal; while clang as is purports to be a cross compiler it's far from optimal or usable here, for various reasons. noticing you're developing on ARM I infer; you want this.
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm
Aside from the fact that gcc does this thing better than clang markedly, there's also the issue that ld does not build on OS X from the binutils package; it in some configurations silently fails so you may in fact never have actually installed it despite watching libiberty etc build, it will even go through the motions of compiling the source of that target sometimes and just refuse to link it... to the fellow with the lousy tone blaming OP, if you had relevant experience ie ever had built this under this condition you would know that is patently obnoxious. it'd be nice if you'd refrain from discouraging people from asking legitimate questions.
In the CXXfilt package they mumble about apple-darwin not being a target; try changing FAKE_TARGET to instead of mn10003000-whatever or whatever they used, to apple-rhapsody some time.
You're still in way better shape just building them from current if you say need to strip relocations from something or want to work on restoring static linkage to the system. which is missing by default from that clang installation as well...anyhow it's not really that ld couldn't work with macho, it's all there, codewise in fact...that i am sure of
Regarding locating things in memory, you may want to refer to a linker script
http://svn.screwjackllc.com/?p=noid.git;a=blob_plain;f=new_mbed_bs.link_script.ld
As i have some code in there that will directly place things in memory, rather than doing it on command line it is more reproducible to go with the linker script. it's a little complex but what it is doing is setting up a couple of regions of memory to be used with my memory allocators, you can use malloc, but you should prefer not to use actual malloc; dynamic memory is fine when it isn't dynamic...heh...
The script also sets flags for the stack and heap locations, although they are just markers, not loaded til go time, they actually get placed, stack and heap, by the startup code, which is in assembly and rather readable and well commented (hard to believe, i know)... neat trick, you have some persistence to volatile memory, so i set aside a very tiny bit to flip and you can do things like have it control what bootloader to run on the next power cycle. again you are 100% correct regarding the linker; seems to be you are headed the right direction. incidentally another way you can modify objects prior to loading them , and preload things in memory, similar to this method, well there are a ton of ways, but, check out objcopy and objdump...you can use gdb to dump srecs of structures in memory, note the address, and then before linking but after assembly use dd to insert the records you extracted with gdb back in to extracted sections..is one of my favorite ways just because is smartass route :D also, if you are tight on memory ever and need to precalculate constants it's one way to optimize things...that way is actually closer to what ld is doing, just doing it by hand... probably path of least resistance on this now though is linker script.

Is there a reliable way to know what libraries could be dlopen()ed in an elf binary?

Basically, I want to get a list of libraries a binary might load.
The unreliable way I came up with that seems to work (with possible false-positives):
comm -13 <(ldd elf_file | sed 's|\s*\([^ ]*\)\s.*|\1|'| sort -u) <(strings -a elf_file | egrep '^(|.*/)lib[^:/]*\.so(|\.[0-9]+)$' | sort -u)
This is not reliable. But it gives useful information, even if the binary was stripped.
Is there a reliable way to get this information without possible false-positives?
EDIT: More context.
Firefox is transitioning from using gstreamer to using ffmpeg.
I was wondering what versions of libavcodec.so will work.
libxul.so uses dlopen() for many optional features.
And the library names are hard-coded. So, the above command helps
in this case.
I also have a general interest in package management and binary dependencies.
I know you can get direct dependencies with readelf -d, dependencies of
dependencies with ldd. And I was wondering about optional dependencies, hence the question.
ldd tells you the libraries your binary has been linked against. These are not those that the program could open with dlopen.
The signature for dlopen is
void *dlopen(const char *filename, int flag);
So you could, still unreliably, run strings on the binary, but this could still fail if the library name is not a static string, but built or read from somewhere during program execution -- and this last situation means that the answer to your question is "no"... Not reliably. (The name of the library file could be read from the network, from a Unix socket, or even uncompressed on the fly, for example. Anything is possible! -- although I wouldn't recommend any of these ideas myself...)
edit: also, as John Bollinger mentioned, the library names could be read from a config file.
edit: you could also try substituting the dlopen system call with one of yours (this is done by the Boehm garbage collector with malloc, for example), so it would open the library, but also log its name somewhere. But if the program didn't open a specific library during execution, you still won't know about it.
(I am focusing on Linux; I guess that most of my answer fits for every POSIX systems; but on MacOSX dlopen wants .dylib dynamic library files, not .so shared objects)
A program could even emit some C code in some temporary file /tmp/foo1234.c, fork a compilation of that /tmp/foo1234.c into a shared library /tmp/foo1234.so by some gcc -O -shared -fPIC /tmp/foo1234.c -o /tmp/foo1234.so command -generated and executed at runtime of your program-, perhaps remove the /tmp/foo1234.c file -since it is not needed any more-, and dlopen that /tmp/foo1234.so (and perhaps even remove /tmp/foo1234.so after dlopen), all that in the same process. My GCC MELT plugin for gcc does exactly this, and so does Bigloo, and the GCCJIT library is doing something close.
So in general, your quest is impossible and even has no sense.
Is there a reliable way to get this information without possible false-positives?
No, there is no reliable way to get such information without false positives (you could prove that equivalent to the halting problem, or to some other undecidable problem). See also Rice's theorem.
In practice, most dlopen happens on plugins provided by some configuration. There might not be exactly named as such in a configuration file (e.g. some Foo programs might have a convention like a plugin named bar in some foo.conf configuration file is provided by foo-bar.so plugin).
However, you might find some heuristic approximation. Most programs doing some dlopen have some plugin convention requesting some particular symbol names in the plugin. You could search for shared objects defining these names. Of course you'll get false positives.
For example, the zsh shell accepts plugins called zsh modules. the example module shows that enables_,
boot_, features_ etc... functions are expected in zsh modules. You could use nm -D to find *.so files providing these (hence finding the plugins likely to be perhaps loadable by zsh)
(I am not convinced that such an approach is worthwhile; in fact you should usually know which plugins are useful on your system by which applications)
BTW, you could use strace(1) on the execution of some command to understand the syscalls it is doing, hence the plugins it is loading. You might also use ltrace(1), or pmap(1) (on some given process), or simply -for a process 1234- use cat /proc/1234/maps to understand its virtual address space, hence the plugins it has already loaded. See proc(5).
Notice that strace, ltrace, pmap exist on Linux, but many POSIX systems have similar programs.
Also, a program could generate some machine code at runtime and execute it (SBCL does that at every REPL interaction!). Your program could also use some JIT techniques (e.g. with libjit, llvm, asmjit, GCCJIT or with hand-written code...) to do likewise. So plugin-like behavior can happen without dlopen (and you might mimic dlopen with mmap calls and some ELF relocation processing).
Addenda:
If you are installing firefox from its packaged version (e.g. the iceweasel package on Debian), its package is likely to handle the dependencies

dynamically loaded object loaded into a C program gives undefined symbol errors on x86_64

I have a C program that dynamically loads a .so file at runtime in order to connect to a MySQL database. On an x86 (32bit) kernel this works fine but when I recompile my program on an x86_64 (64 bit) kernel I get runtime errors like this:
dlerror: mysql-1.932-x86_64-freebsd7.2.so::plugin_tweak_products: Undefined symbol "plugin_filter_cart"
dlerror: mysql-1.932-x86_64-freebsd7.2.so::plugin_shutdown: Undefined symbol "plugin_post_action"
Obviously from the error message above you can see that this program is running on a FreeBSD 7.2 x86_64 machine. Both the C program and the .so file are compiled for 64 bit.
I am passing RTLD_LAZY to dlopen() when I load the .so file. I think the problem is that for some reason on x86_64 it is not dynamically loading parts of the library as needed but on 32 bit x86 it is. Is there some flag I can put in my Makefile.am to get this to work on x86_64? Any other ideas?
Here is what the file command lists for my C program
ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), for FreeBSD 7.2, dynamically linked (uses shared libs), FreeBSD-style, not stripped
and for the .so file
ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), not stripped
Just a wild guess. The prefix plugin seems to indicate there might be some callbacks with function pointers going on. Also probably your compiler versions are not the same for 32 and 64 bit? Do you use C99's or gcc's inline feature?
Such things can happen if one variant of your compiler is able to inline some function (static or inline) and the other doesn't. Then an external symbol might be produced or not. This depends a lot of your compiler version, gcc had different strategies to handle such situations over time. Try to enforce the implementation of the function in at least one of your objects. And as roguenut indicates, check with nm for the missing symbols.
It looks like this was being caused by the same problem as
dlerror: Undefined symbol "_nss_cache_cycle_prevention_function" on FreeBSD 7.2
You need to call dlerror() first and ignore the return value to clear out errors from previous errors before you check the dlerror()'s return value.

Resources