C undefined symbol at runtime - c

I have inherited a codebase that I am attempting to build and run. I have been able to do so before, but now on a different system I am getting some runtime errors, even though compilation goes fine. I am quite new to C and would appreciate any suggestions to move forward.
Specifically, I have a directory, common, of common utilities, which gets built into a shared library, and when the main program tries to use symbols from the shared library, I get errors like this:
some_file.so: undefined symbol: CUSTOM_LOG_LEVEL
The symbol is declared within a header file, custom_log.h, like so:
extern int CUSTOM_LOG_LEVEL;
The symbol is defined within the custom_log.c file, like so:
int CUSTOM_LOG_LEVEL = DEFAULT_CUSTOM_LOG_LEVEL;
These logger files are contained in a directory of other common utilities, which are all brought together in a file common.h. Here is how the logger is included in the common library:
#include "custom_log.h"
After I have run configure, make, and make install, the relevant contents of the lib directory look like this:
libsomething_common.so -> libsomething_common.so.0.0.0
libsomething_common.so.0 -> libsomething_common.so.0.0.0
libsomething_common.so.0.0.0
So there are a few symlinked .so files, and the one they end up pointing to is libsomething_common.so.0.0.0. Analyzing the file with the objdump command yields the following results:
$ objdump -t libsomething_common.so.0.0.0 | grep CUSTOM_LOG_LEVEL
0000000000209a20 l O .data 0000000000000004 CUSTOM_LOG_LEVEL
$ objdump -T libsomething_common.so.0.0.0 | grep CUSTOM_LOG_LEVEL
So we can see that the symbol is not dynamic, and I believe it should be. But how to make that happen?
Again, I am very new to this build process and appreciate any suggestions you might have.

Related

Removing symbols from `.a`s

I'm compiling a C++ static library using g++ via Cmake. I want to remove symbols relating to the internal implementation so they don't show up in nm. (See here and here for the same with shared libraries.)
This answer tells you how to do it on iOS, and I'm trying to understand what happens under the hood so I can replicate on Linux. They invoke ld with:
-r/--relocatable to Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld.
-x/--discard-all: Delete all local symbols.
AFAICS the -r glues all the modules into one module, and then the -x removes symbols only used inside that module. Is that right?
It's not clear how the linker 'knows' which symbols will be exported externally? Does it rely on __attribute__((visibility("hidden/default"))) as in the .so case?
Edit: clearly I'm confused... I thought cmake invoked ld to link the .os into .a. Googled + clarified above.
Question still stands: how do I modify the build process to exclude most symbols?

symbol lookup error: symbol exists, I know where it is, how do I get my SO to "see it"?

I am writing a plugin module for a larger program, written in C++. I have never written a SO library before. My module compiles and links correctly (I think) however the main program loading the SO crashes with the error symbol lookup error.
The module I am writing worked fine, until I started to try and use other libraries within it. (Specifically caffe)
There is a main program which is developed by another group
I am writing a plugin module for this program
My plugin module uses functions / code from Caffe (from the libcaffe.so file, Caffe itself is a compiled binary just to add to confusion)
The main program crashes with the following error
/path-to-binary/binary-name: symbol lookup error: ./build/libTestModule.so: undefined symbol: _ZN5caffe2db5GetDBERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I tried adding export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib to my .bashrc.
I did this because (after some internet searching - I don't actually understand what I am doing here) I ran
nm -g libcaffe.so | grep _ZN5caffe2db5GetDBERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
in the folder /usr/lib and that symbol exists in libcaffe.so.
00000000001cbb30 T _ZN5caffe2db5GetDBERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
libcaffe.so is in /usr/lib and contains the symbol that my program cannot find.
My understanding is that (for some reason which is not known to me) I have to set LD_LIBRARY_PATH to /usr/lib so that my program can find libcaffe.co and the symbols contained within it.
However I would have assumed that since /usr/lib contains loads of "default" .so files, that this would be searched regardsless of whether LD_LIBRARY_PATH was set, ie shouldn't this directory be searched by default?
Regardless of the above question, I don't know what I should try next.
How can I get my program to find the symbol above in libcaffe.so?
CMakeLists.txt
cmake_minimum_required(VERSION 3.3)
project(TestModule)
find_package(Falaise REQUIRED)
add_library(TestModule SHARED TestModule.h TestModule.cpp)
set(Caffe_INCLUDE_DIRS "/usr/include/caffe")
set(Caffe_LIBRARIES "/usr/lib/libcaffe.so")
target_link_libraries(TestModule PUBLIC Falaise::FalaiseModule)
Your libTestModule.so is, you say, dependent on libcaffe.so but you are
not linking it. This:
cmake_minimum_required(VERSION 3.3)
project(TestModule)
find_package(Falaise REQUIRED)
add_library(TestModule SHARED TestModule.h TestModule.cpp)
target_link_libraries(TestModule PUBLIC Falaise::FalaiseModule caffe)
is how you would do so.

How to find out *.c and *.h files that were used to build a binary?

I am building a project that builds multiple shared libraries and executable files. All the source files that are used to build these binaries are in a single /src directory. So it is not obvious to figure out which source files were used to build each of the binaries (there is many-to-many relation).
My goal is to write a script that would parse a set of C files for each binary and make sure that only the right functions are called from them.
One option seems to be to try to extract this information from Makefile. But this does not work well with generated files and headers (due to dependence on Includes).
Another option could be to simply browse call graphs, but this would get complicated, because a lot of functions are called by using function pointers.
Any other ideas?
You can first compile your project with debug information (gcc -g) and use objdump to get which source files were included.
objdump -W <some_compiled_binary>
Dwarf format should contain the information you are looking for.
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
< c> DW_AT_producer : (indirect string, offset: 0x5f): GNU C 4.4.3
<10> DW_AT_language : 1 (ANSI C)
<11> DW_AT_name : (indirect string, offset: 0x28): test_3.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x36): /home/auselen/trials
<19> DW_AT_low_pc : 0x82f0
<1d> DW_AT_high_pc : 0x8408
<21> DW_AT_stmt_list : 0x0
In this example, I've compiled object file from test_3, and it was located in .../trials directory. Then of course you need to write some script around this to collect related source file names.
First you need to separate the debug symbols from the binary you just compiled. check this question on how to do so:
How to generate gcc debug symbol outside the build target?
Then you can try to parse this file on your own. I know how to do so for Visual Studio but as you are using GCC I won't be able to help you further.
Here is an idea, need to refine based on your specific build. Make a build, log it using script (for example script log.txt make clean all). The last (or one of the last) step should be the linking of object files. (Tip: look for cc -o <your_binary_name>). That line should link all .o files which should have corresponding .c files in your tree. Then grep those .c files for all the included header files.
If you have duplicate names in your .c files in your tree, then we'll need to look at the full path in the linker line or work from the Makefile.
What Mahmood suggests below should work too. If you have an image with symbols, strings <debug_image> | grep <full_path_of_src_directory> should give you a list of C files.
You can use unix nm tool. It shows all symbols that are defined in the object. So you need to:
Run nm on your binary and grab all undefined symbols
Run ldd on your binary to grab list of all its dynamic dependencies (.so files your binary is linked to)
Run nm on each .so file youf found in step 2.
That will give you the full list of dynamic symbols that your binary use.
Example:
nm -C --dynamic /bin/ls
....skipping.....
00000000006186d0 A _edata
0000000000618c70 A _end
U _exit
0000000000410e34 T _fini
0000000000401d88 T _init
U _obstack_begin
U _obstack_newchunk
U _setjmp
U abort
U acl_extended_file
U bindtextdomain
U calloc
U clock_gettime
U closedir
U dcgettext
U dirfd
All those symbols with capital "U" are used by ls command.
If your goal is to analyze C source files, you can do that by customizing the GCC compiler. You could use MELT for that purpose (MELT is a high-level domain specific language to extend GCC) -adding your own analyzing passes coded in MELT inside GCC-, but you should first learn about GCC middle-end internal representations (Gimple, Tree, ...).
Customizing GCC takes several days of work (mostly because GCC internals are quite complex in the details).
Feel free to ask me more about MELT.

Why shared library path is hardcoded in execuatble?

Recently I got a test binary. When I checked it using objdump, I observed that it includes hard coded library path. Why it is needed to to hardcode the path like that? Shouldn't the path be taken from SHELL environment variables or -L parameter instead ?
objdump -p testprog
The output includes the hardcoded path to shared libraries:
....
NEEDED /home/test/lib/liba.so
NEEDED /home/test/lib/libb.so
NEEDED /home/test/lib/libc.so
....
This is probably because those three .so files had no SONAME on the host where your test program was built. Tell the person who built it to rebuild liba.so with -Wl,soname,liba.so and similar for the other two, then relink the main program.

How to determine which object files are actually necessary for linking?

I had to modify some open source code to use in a C project. Instead of building a library from the modified code, I'd like to just compile and build an executable from my own source combined with the modified open source code. The goal is to have a stand-alone package that can be distributed. I can get this to work just fine using the GNU build tools and have successfully built my executable.
Now I'd like to pare down the amount of code I am building and linking. Is there an easy way to determine which of the open source files I actually need to compile? There are, say, 40 .c files in the open source package. I'm guessing my code only uses (or causes to be used) 20-ish of those files. Currently I'm compiling all of them and throwing everything at the linker. There has to be a smart (and easy?) way to determine which ones I actually need, right?
I'm happy to provide further details if it's helpful. Thanks in advance.
When faced with this I've either simply taken the final link command stripped out all of the objects and then added back in until it works, or processed the output of the nm command.
Worked example:
Looking at the output of nm:
$ nm *.o
a.o:
00000000 T a
U aa
b.o:
00000000 T b
t.o:
U a
U b
00000000 T main
ua.o:
00000000 T ua
ub.o:
00000000 T ub
So I create the following awk script
# find-unused.awk
BEGIN {req["main"]="crt"}
/\.o\:$/{
gsub(/\:/,"");
modulename=$0;
}
$1=="U"{
req[$2] = modulename;
}
/[0-9,a-f].* T/{
def[$3] = modulename;
}
END{
print "modules referenced:"
for (i in req)
{
if (def[i] != "")
print " "def[i];
}
print "functions not found"
for (i in req)
{
if (def[i] == "")
print " "i;
}
}
and then call it like this;
$ nm *.o|awk -f find-unused.awk
it tells me:
modules referenced:
t.o
a.o
b.o
functions not found
aa
Which is right - because the ua & ub functions in the above example aren't used.
See if you can get your dead-code stripper to tell you what functions/symbols it eliminated during the link. Then you will know what source code you can safely remove. The GNU linker's -map option may be useful on that front. You could, for instance, link once without dead-code stripping, then link again with dead-code stripping and compare the output map files.
If there are only 40 source files maximum, is this optimization really worth your time?

Resources