I am having a problem about a wrong symbol resolution. My main program loads a shared library with dlopen and a symbol from it with dlsym. Both the program and the library are written in C.
Library code
int a(int b)
{
return b+1;
}
int c(int d)
{
return a(d)+1;
}
In order to make it work on a 64-bit machine, -fPIC is passed to gcc when compiling.
The program is:
#include <dlfcn.h>
#include <stdio.h>
int (*a)(int b);
int (*c)(int d);
int main()
{
void* lib=dlopen("./libtest.so",RTLD_LAZY);
a=dlsym(lib,"a");
c=dlsym(lib,"c");
int d = c(6);
int b = a(5);
printf("b is %d d is %d\n",b,d);
return 0;
}
Everything runs fine if the program is NOT compiled with -fPIC, but it crashes with a segmentation fault when the program is compiled with -fPIC. Investigation led to discover that the crash is due to the wrong resolution of symbol a. The crash occurs when a is called, no matter whether from the library or the main program (the latter is obtained by commenting out the line calling c() in the main program).
No problems occur when calling c() itself, probably because c() is not called internally by the library itself, while a() is both a function used internally by the library and an API function of the library.
A simple workaround is not use -fPIC when compiling the program. But this is not always possible, for example when the code of the main program has to be in a shared library itself. Another workaround is to rename the pointer to function a to something else. But I cannot find any real solution.
Replacing RTLD_LAZY with RTLD_NOW does not help.
I suspect that there is a clash between two global symbols. One solution is to declare a in the main program as static. Alternatively, the linux manpage mentions RTLD_DEEPBIND flag, a linux-only extension, which you can pass to dlopen and which will cause library to prefer its own symbols over global symbols.
It seems this issue can take place in one more case (like for me). I have a program and a couple of a dynamically linked libs. And when I tried to add one more I used a function from a static lib (my too) in it. And I forgot to add to linkage list this static lib. Linker was not warn me about this, but program was crushing with segmentation fault error.
Maybe this will help for someone.
FWIW, I ran into a similar problem when compiling as C++ and forgetting about name mangling. A solution there is to use extern "C".
Related
I am making a program in C using GCC/Clang to compile, and I have the following issue: I am trying to make sure that the compiler doesn't let me leave any symbols in my program undefined. I understand that translation units should be able to compile with undefined symbols, but I don't want it to link any static libraries unless it can find all of its internal symbols. For example:
#include <stdio.h>
int add(int a, int b);
int main(void)
{
printf("Hello, World.\n");
}
I would expect that this program should compile, but it should not link (unless of course, I can find a way to state that add is a part of a shared library), which is not the case, it compiles with no warnings nor errors, even with -Werror -Wall --pedantic-errors. Is there any option in GCC/Clang that won't let a static library be compiled unless all of the internal symbols are defined? Otherwise, I think this is a disaster waiting to happen, especially in larger projects.
Thank you all in advance.
I have tested such a simple program below
/* a shared library */
dispatch_write_hello(void)
{
fprintf(stderr, "hello\n");
}
extern void
print_hello(void)
{
dispatch_write_hello();
}
My main program is like this:
extern void
dispatch_write_hello(void)
{
fprintf(stderr, "overridden\n");
}
int
main(int argc, char **argv)
{
print_hello();
return 0;
}
The result of the program is "overriden". To figure this out why this happens, I used gdb. The call chain is like this:
_dl_runtime_resolve -> _dl_fixup ->_dl_lookup_symbol_x
I found the definition of _dl_lookup_symbol_x in glibc is
Search loaded objects' symbol tables for a definition of the symbol UNDEF_NAME, perhaps with a requested version for the symbol
So I think when trying to find the symbol dispatch_write_hello, it first of all looks up in the main object file, and then in the shared library. This is the cause of this problem. Is my understanding right? Many thanks for your time.
Given that you mention _dl_runtime_resolve, I assume that you're on Linux system (thanks #Olaf for clarifying this).
A short answer to your question - yes, during symbols interposition dynamic linker will first look inside the executable and only then scan shared libraries. So definition of dispatch_write_hello will prevail.
EDIT
In case you wonder why runtime linker needs to resolve the call to dispatch_write_hello in print_hello to anything besides dispatch_write_hello in the same translation unit - this is caused by so called semantic interposition support in GCC. By default compiler treats any call inside library code (i.e. code compiled with -fPIC) as potentially interposable at runtime unless you specifically tell it not to, via -fvisibility-hidden, -Wl,-Bsymbolic, -fno-semantic-interposition or __attribute__((visibility("hidden"))). This has been discussed on the net many times e.g. in the infamous Sorry state of dynamic libraries on Linux.
As a side note, this feature incures significant performance penalty compared to other compilers (Clang, Visual Studio).
I'm trying to create a Lua dll extension on Windows. I'm using Lua 5.3. My compiler is from MinGW and is gcc 4.9.3.
My C code for the dll extension is something like this:
#include <stdio.h>
#include <lua.h>
static int dub(lua_State *L) {
const double a = lua_tonumber(L, 1);
lua_pushnumber(L, a*2);
return 1;
}
__declspec(dllexport) int __cdecl luaopen_mylib(lua_State *L){
printf("One\n");
lua_pushcfunction(L, dub);
printf("Two\n");
lua_setglobal(L, "dub");
printf("Three\n");
return 1;
}
I'm compiling my dll like this:
gcc mylib.c -shared -o mylib.dll -llua
The idea being that I can load it from Lua and use it like this:
require "mylib"
print (dub(5)) --should print 10
However, when I actually try to run the Lua code, it crashes on the require "mylib" line. The DLL is able to print "One" and "Two", but it does not get to print "Three" before it crashes. This tells me the problem may be with the 'lua_setglobal' call.
What's going wrong? How to I debug it further or fix it?
As a bonus question: what should the return value of luaopen_mylib be?
Thanks!
I compiled your code and it works fine. So your problem isn't from your code. My guess is that you're using different versions of Lua, or the linking is somehow going wrong.
There are three places a version mismatch could occur.
In your #include <lua.h> line
In your build line with -llua
When you run the lua executable
It is vital that all three of these be the same lua version. If one is mismatched, that could cause the problem you're having. I would guess that your lua executable version doesn't match the others.
One other thing to try, is that your dll should link against the lua dll. There should be a file on your system called lua53.dll. Copy it to your build directory. When you compile with gcc, try this instead:
gcc mylib.c -shared -o mylib.dll lua53.dll
This means that your dll extension calls the exact same lua code that the lua executable is using. They way you have it now, with the -llua line, it appears that you are linking in lua statically. This is almost never what you want for a dll library, because the lua executable will be calling code in lua53.dll while your dll is calling separate code in the static library. I don't think this alone is causing your crash, but it's not good practice. If lua used global state (it doesn't), this linking issue could certainly cause your crash. Also, if you compile against lua53.dll you'll find that mylib.dll is much smaller.
In summary, I think you have a lua version mismatch somewhere that is causing your crash. You should also link against the lua dll instead of the static library for good form.
As a bonus question: what should the return value of luaopen_mylib be?
In your code it should actually be 0 instead of 1 as you have. The return value is the number of things left on the stack that you want the require() call to return. Your library is just shoving itself into the global state and not returning any lua values. An alternative way to do things is to return a table with the library's functions in it. That way you don't pollute global state. Since your library has only one function, you could return it directly like this:
__declspec(dllexport) int __cdecl luaopen_mylib(lua_State *L){
lua_pushcfunction(L, dub);
return 1;
}
Then you use it from lua as:
local dub = require("mylib")
print (dub(5)) --should print 10
I prefer this way since the caller can decided what to name the imports and it doesn't pollute global space.
So, I have TVZLib.h, TVZlib.dll, and TVZlib.lib, and I am using gcc to compile the following program (it's a simple test case). The complier gives me the error:
"undefined reference to '_imp__TVZGetNavigationMatrix'"
Yet. when I comple the program with a different type of parameter for the function's call, it complains that it's not the correct parameter (requires *float). To me, that means that it at least has found the function, as it knows what it wants.
From my research, I can tell that people think it's to do with the linking of the library, or the order in which I link, but I've tried all of the gcc commands in all combinations, and all give me the same error, so I'm desperate for some help.
#include <stdlib.h>
#include <stdio.h>
#include "TVZLib.h"
int main() {
float floatie = 2;
float *ptr = &floatie;
TVZGetNavigationMatrix(ptr);
getchar();
return 0;
}
Thanks a lot in advance!
My compiler command:
gcc dlltest.c -L. TVZLib.lib
The header file (TVZLib.h).
And the direct output:
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\ccuDpoiE.o:dlltest.c:(.text+0x2c): undefined reference to `_imp__TVZGetNavigationMatrix'
collect2: ld returned 1 exit status
It's been a while since I've been compiling natively on Windows...
Did you intend to link statically against TVZlib.lib? That's not happening.
By default, gcc will pick the dynamic version of a library if it finds both a static and a dynamic lib. If you want to force gcc to link statically, you can use the -static option.
If memory serves me right the _imp__ prefix is a sign that a DLL was loaded (_imp__ symbol prefix is used for the trampoline function that calls into the DLL).
c and i use it to generate x.so shared library
in x.c i want to use few functions that are in the main module, (dir containing main files and exe), kind of recursive dependeny.
is there a way to do this (without copying those functions in x.c) ?
i read about -rdynamic , but could not get it fully.
when i compile i get 'somefunc' undeclared. (somefunc is in main module, i did extern somefunx in x.c but did not work)
please let me know
thanks
You could define the affected methods in your shared library to take the call back function pointer arguments, and then at call time pass the main module's functions as arguments. E.g.
// Library
void dosomething (int arg, void (*callback)(void)) { ... }
// Main module
void called_from_lib(void) { ... }
dosomething(10, called_from_lib);
This looks like unix. There is a function, dlopen(), that lets you dynamically call a function in a library - without referencing it at compile time and without linking it into the program. dlopen() is POSIX, and so should be on any modern unix box.
Example here:
http://www.dwheeler.com/program-library/Program-Library-HOWTO/x172.html
There is also LD_LIBRARY_PATH. This environment variable lets you use the same code, but allows you to substitute in a library that was not there at compile time. This is not exactly what you are asking, but it can be made to do something along the lines of using adhoc shared libraries without resorting to dlopen. Some systems like HPUX also support SHLIB_PATH which does the same thing.