How one can achieve late binding in C language ?
Late binding is not really a function of the C language itself, more something that your execution environment provides for you.
Many systems will provide deferred binding as a feature of the linker/loader and you can also use explicit calls such as dlopen (to open a shared library) and dlsym (to get the address of a symbol within that library so you can access it or call it).
The only semi-portable way of getting late binding with the C standard would be to use some trickery with system() and even that is at least partially implementation-specific.
If you're not so much talking about deferred binding but instead polymorphism, you can achieve that effect with function pointers. Basically, you create a struct which has all the data for a type along with function pointers for locating the methods for that type. Then, in the "constructor" (typically an init() function), you set the function pointers to the relevant functions for that type.
You still need to include all the code even if you don't use it but it is possible to get polymorphism that way.
Symbol binding in C is always done at compile time, never runtime.
Library binding, or dynamic linking as it's called, is done via dlopen() and dlsym() on *nix, and LoadLibrary() and GetProcAddress() on Windows.
How one can achieve late binding in C language ?
The closest would be through dynamic loading of library (DLL) such as with dlopen & dlsym on Linux. Otherwise, it is not directly available in C.
Use Objective-C or Lua. Both are late-bound languages which can easily interface with C.
Of course you could implement your own name resolution scheme, but why re-invent the wheel?
Unfortunately you did not specify an OS. For Unix you can use shared libraries, or create a configurable (plugin) module structure. For details you may find the source code of an apache 1.3 webserver useful. http://httpd.apache.org/download.cgi
cppdev seems to be the one and only to hit the spot with his/her remark.
Please, have a look at the definition itself. In a few words:
Late binding, or dynamic binding, is a computer programming mechanism
in which the method being called upon an object is looked up by name
at runtime.
All other answers just miss the main point, that is "look up by name".
The needed solution would be very similar to a lookup table of pointers to functions along with a function or two to select the right one by name (or even by signature). We call it a "hash table".
Related
Is there a reliable way to prevent external code from calling inner functions of a lib that was compiled from C code?
I would like to deliver a static library with an API header file. The library has different modules, consisting of .c and .h files. I would like to prevent the recepients from using functions declared in the inner .h files.
Is this possible?
Thanks!
Is there a reliable way to prevent external code from calling inner functions of a lib ?
No, there cannot be (read about Rice's theorem; detecting statically such non-trivial properties is undecidable). The library or the code might use function pointers. A malicious user could play with function pointers and pointer arithmetic to call some private function (perhaps after having reverse-engineered your code), even if it is static.
On Linux you might play with visibility tricks.
Or you could organize your library as a single translation unit (a bit like sqlite is doing its amalgamation) and have all internal functions be static ...
In general, the library should have naming conventions about its internal functions (e.g. suffix all of them with _). This could be practically helpful (but not against malicious users).
Most importantly, a library should be well documented (with naming conventions being also documented), and a serious user will only use documented functions the way they are documented to be useful.
(so I don't think you should care about internal functions being called; you do need to document which public functions can be called, and how, and when...; a user calling anything else should expect undefined behavior, that is very bad things)
I would like to deliver a static library with an APIheader file, and would like to prevent the recepients from using the structs I define and the inner functions.
I am not sure that (at least on Linux) delivering a static library is wise. I would recommend delivering a shared library, read Drepper's How to Write Shared Libraries.
And you can't prevent the recipient (assuming a malicious, clever, and determined one) to use inner functions and internal struct-s. You should just discourage them in the documentation, and document well your public functions and data types.
I would like to prevent the recepients from using functions declared in the inner .h files. Is this possible?
No, that is impossible.
It looks like you seek a technical solution to a social issue. You need to trust your users (and they need to trust you), so you should document what functions can be used (and you could even add in your documentation some sentence saying that using directly any undocumented function yields undefined behavior). You can't do much more. Perhaps (in particular if you are selling your library as a proprietary software) you need a lawyer to write a good contract.
You might consider writing your own GCC plugin (or GCC MELT extension) to detect such calls. That could take you weeks of work and is not worth the trouble (and will remain imperfect).
I am not able to guess your motivations and use case (is it some life-critical software driving a nuclear reactor, a medical drug injector, an autonomous vehicule, a missile?). Please explain what would happen to you if some (malicious but clever) user would call an internal undocumented function. And what could happen to that user?
According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).
If a library function like e.g. malloc is reimplemented, then there are two symbols with that name, one in a local object file and one in the system library. Which of the two is used when a function from e.g. stdio is used, which calls malloc (and why)?
The link behaviour is, in a general way:
Include all symbols defined in the object files.
Then resolve the undefined ones using the objects in the libs.
So, if malloc is reimplemented and linked as object file, the version in the object file will override the version in the standard libraries. However, if the new malloc is linked as a library, it depends on the libraries linking order.
Another way, considering the gnu binutils as scope, to override library functions is to wrap the function using the --wrap parameter: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html
By using the --wrap ld option we can get both functions linked and the wrapper function would be able to call the wrapped one.
The linking order also depends on the command line parameters order. So I'm considering here that libs are listed after objects because, in general, doesn't make sense to put libraries before objects as their objective is to supply the missing symbols demanded by those.
One answer is that you are getting yourself in an awful lot of trouble by trying to replace malloc. Don't go there. Don't even think about it. Especially if you need to ask questions on stackoverflow, don't even think about it.
Another answer is that you are invoking undefined behaviour, and the likely result is that the malloc function will be called that will hurt you most. If you are lucky, while developing. If you are less lucky, once your code is in the hand of customers. Don't do it.
Writing wrappers around malloc with new names is bad enough. Trying to replace malloc is madness.
I am trying to write a generic library in pure c , just some data structures like stack, queue...
In my stack.h when giving name to those functions. I have questions about that.
Can I use such name, for example "init" as the function name to init a stack. Will there be something wrong?
I know maybe there exist other functions which just do other things and have the same name as "init". Then would the program be confused, especially when i both include the different init's headers.
3.I know my worry may be unnecessary, but i still want to know the principle.
Any help is appreciated, thanks.
Can I use such name, for example "init" as the function name to init a
stack. Will there be something wrong?
Yes, if anyone else wants a function named init.
I know my worry may be unnecessary, but i still want to know the
principle
Your worry is necessary, this (the lack of namespaces) is a serious problem in C.
Export as few functions as possible. Make everything static if you can
Prefix function names with something. For instance, instead of init, try stack_init
You don't have namespaces in C so usually you prefix every identifier with the name or nickname of your library.
init();
becomes
fancy_lib_init();
There might be existing libraries doing what you want (e.g. Glib). At least, study them a little before writing your own.
If you claim to develop a generic reusable C library, I suggest having naming conventions. For instance, have all the identifiers (notably function names, typedef-s, struct names...) share some common prefix.
Be systematic in your naming conventions. For instance, initializers for stacks and for queues should have similar names & signatures, and end with _init. Document your naming conventions.
Define very clearly how should data be allocated and released. Who and when should call free?
init() might be okay (if you're including your library into something else as an actual library, rather than compiling its source in), but it's better practice to use something like stack_init(), and to prefix your library's functions with stack_ or queue_, etc.
A program using your library may get confused, depending on the order the libraries are included, see #1.
As far as the principles go, the linker (on Linux, anyway) will look for symbols, and there's an ordering to how those symbols will be found. For more information, you can check out the man page for dlsym(), and specifically for RTLD_NEXT.
Function names in C are global. If two functions in a program have the same name, the program should fail to compile. (Well, sometimes it fails at link time, but the idea still holds.)
Generally, you get around this problem by using some sort of prefix or suffix on the function names in your library. "apporc_stack_init()" is much less likely to collide with something than "init()" is.
How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");
AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.
There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.
It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.
You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.
there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work
Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.