Which library the program is linked to that provides a given function? - c

I have a program invoking function foo that is defined in a library. How can I know where the library is in the file system? (like is it a static library or a dynamically linked lib?)
Update: with using ldd, the program has a lot of dependency library. How to tell which lib contains function foo?

You didn't say which OS you are on, and the answer is system-dependent.
On Linux and most UNIX systems, you can simply ask the linker to tell you. For example, suppose you wanted to know where printf is coming from into this program:
#include <stdio.h>
int main()
{
return printf("Hello\n");
}
$ gcc -c t.c
$ gcc t.o -Wl,-y,printf
t.o: reference to printf
/lib/libc.so.6: definition of printf
This tells you that printf is referenced in t.o and defined in libc.so.6. Above solution will work for both static and shared libraries.
Since you tagged this question with gdb, here is what you can do in gdb:
gdb -q ./a.out
Reading symbols from /tmp/a.out...done.
(gdb) b main
Breakpoint 1 at 0x400528
(gdb) run
Breakpoint 1, 0x0000000000400528 in main ()
(gdb) info symbol &printf
printf in section .text of /lib/libc.so.6
If foo comes from a shared library, gdb will tell you which one. If it comes from a static library (in which case gdb will say in section .text of a.out), use the -Wl,-y,foo solution above. You could also do a "brute force" solution like this:
find / -name '*.a' -print0 | xargs -0 nm -A | grep ' foo$'

For shared libs try using ldd command line tool.
For static libs the library is in the program itself - there are no external dependencies, which is the whole point of using static libs.

You cannot list static libraries in the final binary. To list the linked dynamic libraries, use the commands: On Linux, use ldd [file]. On Mac OS X, use otool -L [file]. On Windows, I have no idea ;-)

Related

Find all symbols in a directory

I am looking to figure out which C library to include when compiling a program that includes it as a header, in this case #include <pcre2.h>. The only way I've been able to figure out where the file is I need is to check for a specific symbol that I know needs to be exported. For example:
$ ls
CMakeCache.txt Makefile install_manifest.txt libpcre2-posix.pc pcre2_grep_test.sh
CMakeFiles a.out libpcre2-8.a pcre2-config pcre2_test.sh
CTestCustom.ctest cmake_install.cmake libpcre2-8.pc pcre2.h pcre2grep
CTestTestfile.cmake config.h libpcre2-posix.a pcre2_chartables.c pcre2test
$ objdump -t libpcre2-8.a|grep pcre2_compile
pcre2_compile.c.o: file format elf64-x86-64
0000000000000000 l df *ABS* 0000000000000000 pcre2_compile.c
00000000000100bc g F .text 00000000000019dd pcre2_compile_8
0000000000000172 g F .text 00000000000000e3 pcre2_compile_context_create_8
0000000000000426 g F .text 0000000000000055 pcre2_compile_context_copy_8
0000000000000557 g F .text 0000000000000032 pcre2_compile_context_free_8
And because the symbol pcre2_compile_8 exists in that file (after trying every other file...) I know that the library I need to include is pcre2-8, that is, I compile my code with:
$ gcc myfile.c -lpcre2-8 -o myfile; ./myfile
Two questions related to this:
Is there a simpler way to find a symbols in a batch of files (some of which are not elf files)? For example, something like objdump -t *? Or what's the closest thing to doing that?
Is there a better way to find out what the library value of -l<library> is? Or, what's the common way when someone downloads a new C program that they know what to add to their command-line so that the program works? (For me, I've just spent the last hour figuring out that it's -lpcre2-8 and not -lpcre or -lpcre2.
Usually, the function you call from the library will be a symbol defined by that library. But in PCRE2, due to different code unit sizes, the function you call (e.g. pcre2_compile) actually becomes a different symbol through preprocessor macros (e.g. pcre2_compile_8). You can find the symbol you need from the library by compiling your program and checking the undefined symbols:
$ cat test.c
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
int main() {
pcre2_compile("",0,0,NULL,NULL,NULL);
}
$ gcc -c test.c
$ nm -u test.o
U _GLOBAL_OFFSET_TABLE_
U pcre2_compile_8
Is there a simpler way to find a symbols in a batch of files?
You can search a directory (/usr/lib/ below) for the library files (.a or .so extension below), running nm for each and search for the undefined symbol (adapted from this question):
$ for lib in $(find /usr/lib/ -name \*.a -o -name \*.so)
> do
> nm -A --defined-only $lib 2>/dev/null| grep pcre2_compile_8
> done
/usr/lib/x86_64-linux-gnu/libpcre2-8.a:libpcre2_8_la-pcre2_compile.o:0000000000007f40 T pcre2_compile_8
Is there a better way to find out what the library value of -l is?
It is usually conveyed through the library documentation. For PCRE2, the second page of the documentation talks about the pcre-config tool that gives the appropriate flags:
pcre2-config returns the configuration of the installed PCRE2 libraries and the options required to compile a program to use them. Some of the options apply only to the 8-bit, or 16-bit, or 32-bit libraries, respectively, and are not available for libraries that have not been built.
[...]
--libs8 Writes to the standard output the command line options required to link with the 8-bit PCRE2 library (-lpcre2-8 on many systems).
[...]
--cflags Writes to the standard output the command line options required to compile files that use PCRE2 (this may include some -I options, but is blank on many systems).
So for this particular library, the recommended way to build and link is:
gcc -c $(pcre2-config --cflags) test.c -o test.o
gcc test.o -o test $(pcre2-config --libs8)

How to make a linux shared object (library) runnable on its own?

Noticing that gcc -shared creates an executable file, I just got the weird idea to check what happens when I try to run it ... well the result was a segfault for my own lib. So, being curious about that, I tried to "run" the glibc (/lib/x86_64-linux-gnu/libc.so.6 on my system). Sure enough, it didn't crash but provided me some output:
GNU C Library (Debian GLIBC 2.19-18) stable release version 2.19, by Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.16.7 system on 2015-04-14.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
So my question here is: what is the magic behind this? I can't just define a main symbol in a library -- or can I?
I wrote a blog post on this subject where I go more in depth because I found it intriguing. You can find my original answer below.
You can specify a custom entry point to the linker with the -Wl,-e,entry_point option to gcc, where entry_point is the name of the library's "main" function.
void entry_point()
{
printf("Hello, world!\n");
}
The linker doesn't expect something linked with -shared to be run as an executable, and must be given some more information for the program to be runnable. If you try to run the library now, you will encounter a segmentation fault.
The .interp section is a part of the resulting binary that is needed by the OS to run the application. It's set automatically by the linker if -shared is not used. You must set this section manually in the C code if building a shared library that you want to execute by itself. See this question.
The interpreter's job is to find and load the shared libraries needed by a program, prepare the program to run, and then run it. For the ELF format (ubiquitous for modern *nix) on Linux, the ld-linux.so program is used. See it's man page for more info.
The line below puts a string in the .interp section using GCC attributes. Put this in the global scope of your library to explicitly tell the linker that you want to include a dynamic linker path in your binary.
const char interp_section[] __attribute__((section(".interp"))) = "/path/to/ld-linux";
The easiest way to find the path to ld-linux.so is to run ldd on any normal application. Sample output from my system:
jacwah#jacob-mint17 ~ $ ldd $(which gcc)
linux-vdso.so.1 => (0x00007fff259fe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faec5939000)
/lib64/ld-linux-x86-64.so.2 (0x00007faec5d23000)
Once you've specified the interpreter your library should be executable! There's just one slight flaw: it will segfault when entry_point returns.
When you compile a program with main, it's not the first function to be called when executing it. main is actually called by another function called _start. This function is responsible for setting up argv and argc and other initialisation. It then calls main. When main returns, _start calls exit with the return value of main.
There's no return address on stack in _start as it's the first function to be called. If it tries to return, an invalid read occurs (ultimately causing a segmentation fault). This is exactly what is happening in our entry point function. Add a call to exit as the last line of your entry function to properly clean up and not crash.
example.c
#include <stdio.h>
#include <stdlib.h>
const char interp_section[] __attribute__((section(".interp"))) = "/path/to/ld-linux";
void entry_point()
{
printf("Hello, world!\n");
exit(0);
}
Compile with gcc example.c -shared -fPIC -Wl,-e,entry_point.
While linking with -shared gcc strips start files, and some objects (like cout) will not be initialized. So, std::cout << "Abc" << std::endl will cause SEGFAULT.
Approach 1
(simplest way to create executable library)
To fix it change linker options. The simplest way - run gcc to build executable with -v option (verbose) and see the linker command line. In this command line you should remove -z now, -pie (if present) and add -shared. The sources must be anyway compiled with -fPIC (not -fPIE).
Let's try. For example we have the following x.cpp:
#include <iostream>
// The next line is required, while building executable gcc will
// anyway include full path to ld-linux-x86-64.so.2:
extern "C" const char interp_section[] __attribute__((section(".interp"))) = "/lib64/ld-linux-x86-64.so.2";
// some "library" function
extern "C" __attribute__((visibility("default"))) int aaa() {
std::cout << "AAA" << std::endl;
return 1234;
}
// use main in a common way
int main() {
std::cout << "Abc" << std::endl;
}
Firstly compile this file via g++ -c x.cpp -fPIC. Then will link it dumping command-line via g++ x.o -o x -v.
We will get correct executable, which can't be dynamically loaded as a shared library. Check this by python script check_x.py:
import ctypes
d = ctypes.cdll.LoadLibrary('./x')
print(d.aaa())
Running $ ./x will be successful. Running $ python check_x.py will fail with OSError: ./x: cannot dynamically load position-independent executable.
While linking g++ calls collect2 linker wraper which calls ld. You can see command-line for collect2 in the output of last g++ command like this:
/usr/lib/gcc/x86_64-linux-gnu/11/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper -plugin-opt=-fresolution=/tmp/ccqDN9Df.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro -o x /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/11/../../.. x.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/11/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/crtn.o
Find there -pie -z now and replace with -shared. After running this command you will get new x executable, which will wonderfully work as an executable and as a shared library:
$ ./x
Abc
$ python3 check_x.py
AAA
1234
This approach has disadvantages: it is hard to do replacement automatically. Also before calling collect2 GCC will create a temporary file for LTO plugin (link-time optimization). This temporary file will be missing while you running the command manually.
Approach 2
(applicable way to create executable library)
The idea is to change linker for GCC to own wrapper which will correct arguments for collect2. We will use the following Python script collect3.py as linker:
#!/usr/bin/python3
import subprocess, sys, os
marker = '--_wrapper_make_runnable_so'
def sublist_index(haystack, needle):
for i in range(len(haystack) - len(needle)):
if haystack[i:i+len(needle)] == needle: return i
def remove_sublist(haystack, needle):
idx = sublist_index(haystack, needle)
if idx is None: return haystack
return haystack[:idx] + haystack[idx+len(needle):]
def fix_args(args):
#print("!!BEFORE REPLACE ", *args)
if marker not in args:
return args
args = remove_sublist(args, [marker])
args = remove_sublist(args, ['-z', 'now'])
args = remove_sublist(args, ['-pie'])
args.append('-shared')
#print("!!AFTER REPLACE ", *args)
return args
# get search paths for linker directly from gcc
def findPaths(prefix = "programs: ="):
for line in subprocess.run(['gcc', '-print-search-dirs'], stdout=subprocess.PIPE).stdout.decode('utf-8').split('\n'):
if line.startswith(prefix): return line[len(prefix):].split(':')
# get search paths for linker directly from gcc
def findLinker(linker_name = 'collect2'):
for p in findPaths():
candidate = os.path.join(p, linker_name)
#print("!!CHECKING LINKER ", candidate)
if os.path.exists(candidate) : return candidate
if __name__=='__main__':
args = sys.argv[1:]
args = fix_args(args)
exit(subprocess.call([findLinker(), *args]))
This script will replace arguments and call true linker. To switch linker we will create the file specs.txt with the following content:
*linker:
<full path to>/collect3.py
To tell our fake linker that we want to correct arguments we will use the additional argument --_wrapper_make_runnable_so. So, the complete command line will be the following:
g++ -specs=specs.txt -Wl,--_wrapper_make_runnable_so x.o -o x
(we suppose that you want to link existing x.o).
After this you can both run the target x and use it as dynamic library.

stdio.h - whats the name of libfile and where can I find it Linux

I have a question about stdio.h in c-language.
well - this contains only the function-prototypes of the standard input- and output-streams.
But there must be a libfile (objectfile) for this standard input- and output, right?
But what is its name and in which folder is it residing in Linux (ubuntu)?
If I compile a simple hello world C program I get this:
% ldd easy
linux-vdso.so.1 => (0x00007fffcc9fe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4a90eb6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4a91299000)
Why does it need libc ?
% nm easy
...
000000000040052d T main
U printf##GLIBC_2.2.5
The symbol printf is being provided by glibc. nm shows a printf symbol is being provided by that object:
% nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep printf
...
00000000000542f0 T printf
0000000000109dc0 T __printf_chk
000000000004f1d0 T __printf_fp
Alternatively you can ask ldd to print debugging info:
% LD_DEBUG=bindings ./easy 2>&1 | grep printf
17922: binding file ./easy [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]:\
normal symbol `printf' [GLIBC_2.2.5]
It depends on which implementation of the standard library you use, but if you are mainstream and compile with gcc you can find the path of an library used to link with
$ gcc -print-file-name=libc.so
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/libc.so
Take into account that you can have more than one implementation installed in your system.

C program linking with shared library without setting LD_LIBRARY_PATH

I was reading Introduction to GCC and it says if a package has both .a and .so. gcc prefer the shared library. By default the loader searches for shared libraries only in a predefined set of system directories, such as /usr/local/lib and /usr/lib. If the library is not located in one of these directories it must be added to the load path, or you need to use -static option to force it to use the .a library. However, I tried the following:
vim hello.c:
#include <gmp.h>
#include <stdio.h>
int main() {
mpz_t x;
mpz_init(x);
return 0;
}
gcc hello.c -I/opt/include -L/opt/lib -lgmp (my gmp library is in opt)
./a.out
And it runs. The book says it should have the following error:
./a.out: error while loading shared libraries:
libgdbm.so.3: cannot open shared object file:
No such file or directory
(well, the book uses GDBM as example but I used GMP, but this won't matter right?)
However, I did not set LD_LIBRARY_PATH=/opt/lib, and as you can see I did not use -static option either, but a.out still runs.
Can you all tell me why and show me how to get the error described in the book? Yes I want the error so I will understand what I misunderstood.
From your response to my comment:
linux-gate.so.1 => (0xb7746000)
libgmp.so.10 => /usr/lib/i386-linux-gnu/libgmp.so.10 (0xb76c5000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7520000)
/lib/ld-linux.so.2 (0xb7747000)
So, your program is picking up the lib from /usr/lib.
What you can try to do is rename the lib in your /opt/lib, and link against the new name.
mv /opt/lib/libgmp.so /opt/lib/libgmp-test.so
gcc hello.c -I/opt/include -L/opt/lib -lgmp-test
Then try running the program. Also, compare the result of ldd against the new a.out against what you got before.

Why works addr2line only for functions

I've got addr2line working for function addresses:
$ nm -S executable | grep main
08048742 000000a0 T main
$ addr2line -e executable 08048742
/home/blablabla/src/main.c:80
Unfortunately it only works if I supply an address of a function, when passing an address of a data symbol (e.g. the address of a crc table) it can never resolve the file/line number:
$ nm -S executable | grep tableCRC
080491bc 00000200 r tableCRC
$ addr2line -e executable 080491bc
??:0
I guess that that kind of debug information just isn't included for data because this feature is probably intended for analyzing backtraces, but maybe there's a compiler/linker option to force this?
I want to use the output of addr2line to generate detailed information about how much memory size a file or module uses (instead of the global number reported by the 'size' tool).
The --print-size and --line-numbers options to nm are probably what you are looking for.
Please note that the ELF object needs to contain debugging information for the --line-numbers option to work.

Resources