Help with using LD_PRELOAD

Help with using LD_PRELOAD - c

I want to create a library with a modified version of printf and then call LD_PRELOAD so when my program calls printf it uses my version. Can someone explain to me how to use LD_PRELOAD and if there is a something special I need to do in my code or my library?

You just set the environment variable LD_PRELOAD to the full path to the replacement library. Since all programs you launch after that point will attempt to use this library, you may want to make a wrapper script that sets LD_PRELOAD then calls the program you want to run.

As far as I know first of all the program cannot have changed evective uid or gid (so called suid or guid programs).
It should be used only for specific purposes such as debugging. As far as I recall you may shadow functions in C (in elf?). However both techniques - LD_PRELOAD and shadowing should be deal with extream care. I remember discovering bug in shadowing g_malloc in gpgme code (or other related to gpg) as the GLib internals changed.
The simple answer is - don't do it. The more complicated - do it if and only if you have to - and usually you don't (unless you write some sort of debugging software).

That seems like a bad idea. Why not name your version of printf something else?

Related

how to catch calls with LD_PRELOAD when unknown programs may be calling execve without passing environment

I know how to intercept system calls with LD_PRELOAD, that occur in compiled programs I may not have source for. For example, if I want to know about the calls to int fsync(int) of some unknown program foobar, I compile a wrapper
int fsync(int)
for
(int (*) (int))dlsym(RTLD_NEXT,"fsync");
into a shared library and then I can set the environment variable LD_PRELOAD to that and run foobar. Assuming that foobar is dynamically linked, which most programs are, I will know about the calls to fsync.
But now suppose there is another unknown program foobar1 and in the source of that program was a statement like this:
execve("foobar", NULL, NULL)
that is, the environment was not passed. Now the whole LD_PRELOAD scheme breaks down?
I checked by compiling the statemet above into foobar1, when that is run, the calls from foobar are not reported.
While one can safely assume most modern programs are dynamically linked, one cannot at all assume how they may or may not be using execve?
So then, the whole LD_PRELOAD scheme, which everybody says is such a great thing, is not really working unless you have the source to the programs concerned, in which case you can check the calls to execve and edit them if necessary. But in that case, there is no need for LD_PRELOAD, if you have sources to everything. LD_PRELOAD is specifically, supposed to be, useful when you don't have sources to the programs you are inspecting.
Where am I wrong here - how can people say, that LD_PRELOAD is useful for inspecting what unknown programs are doing??

I guess I could also write a wrapper for execve. In the wrapper, I add to the original envp argument, one more string: "LD_PRELOAD=my library" . This "seems" to work, I checked on simple examples.
I am not sure if I should be posting an "answer" which may very easily exceed my level of C experience.
Can somebody more experienced than me comment if this is really going to work in the long run?

When I debug a C program with gdb, and key in 'p system', what exactly do I get?

Before I go deep into my questions, I need to confess that I am still fairly inexperienced to this subject, and am confused over quite a number of concepts, so please bear with me if my manner of asking those questions seems unorganized.
I recently learnt that as standard C library would be loaded into every C program we compiled (is this because we have #include at the beginning of the source file?[quesiton1]), we would have its functions loaded into the memory. So, I would know that the system() function had already been loaded and stored somewhere in the memory, and then I was made know that I could find the exact address of where the system() function was stored by debugging a random C program with gdb, and issuing the command 'p system', which would print out the address of the function. I understand that 'p' is used to print variable in gdb, and 'system' in this case probably indicates the address of the system() function, so it seems to make sense to do so, but then I think to myself, wait a second, it does not appear that I have used the system() function anywhere in my code, why would the inventor of gdb include such a variable for me to print out the address of some function that I don't even use? and does this imply that the address of every function in stand C library can be found out in the same fashion? and they all have a corresponding variable name in gdb? [question2]
One more question unrelated to stuff I talked above is whether functions like system(), execve() and many others are specific to Linux OS, or they are also used in Windows OS? [question3]
Hope that you guys can help me out. Thanks in advance!

The standard C library is linked with every program because it's necessary for it to be there to be able to run your program. There's a lot of things happening in your program before your main function gets called and after it returns, the standard library takes care of this. It also provides you with most of the standard functions you can call. You can compile things without a standard library, but that's an advanced topic. This is pretty much unrelated to #include.
Gdb can see system with p because it prints more than just variables. It prints anything that is in scope. system just happens to be a symbol that's visible to you in that scope. You could print any symbol that's visible to you, including all the globally visible variables and functions in libc and your program. Symbols in this context means "names of various things that need to be findable by programs and other libraries", this includes all functions, variables, section boundaries and many other things that the compiler/linker/runtime/debugger need to find to do its job.
Usually the standard library gets linked dynamically, which means that every program has the exact same copy of the library. In that case all symbols in it will be visible to your program because there's no reason to exclude them. If you link your program statically only the necessary parts of libc will be included and you would probably not see the system symbol unless you actually use that function.

removing unneeded code from gcc andd mingw

i noticed that mingw adds alot of code before calling main(), i assumed its for parsing command line parameters since one of those functions is called __getmainargs(), and also lots of strings are added to the final executable, such as mingwm.dll and some error strings (incase the app crashed) says mingw runtime error or something like that.
my question is: is there a way to remove all this stuff? i dont need all these things, i tried tcc (tiny c compiler) it did the job. but not cross platform like gcc (solaris/mac)
any ideas?
thanks.

Yes, you really do need all those things. They're the startup and teardown code for the C environment that your code runs in.
Other than non-hosted environments such as low-level embedded solutions, you'll find pretty much all C environments have something like that. Things like /lib/crt0.o under some UNIX-like operating systems or crt0.obj under Windows.
They are vital to successful running of your code. You can freely omit library functions that you don't use (printf, abs and so on) but the startup code is needed.
Some of the things that it may perform are initialisation of atexit structures, argument parsing, initialisation of structures for the C runtime library, initialisation of C/C++ pre-main values and so forth.
It's highly OS-specific and, if there are things you don't want to do, you'll probably have to get the source code for it and take them out, in essence providing your own cut-down replacement for the object file.

You can safely assume that your toolchain does not include code that is not needed and could safely be left out.
Make sure you compiled without debug information, and run strip on the resulting executable. Anything more intrusive than that requires intimate knowledge of your toolchain, and can result in rather strange behaviour that will be hard to debug - i.e., if you have to ask how it could be done, you shouldn't try to do it.

Is it possible to do hot code swapping in C?

this
en.wikipedia.org/wiki/Hot_swapping#cite_note-1
says that VS can do it with the help of its debugger. Does gdb provide a similar functionality ?
this is the closest i could find, but doesn't seem to be ready to be used:
http://www.aitdspace.gr/xmlui/handle/123456789/219
dlopen/dlsym/dlclose are also close, but will not work for -lmylib referenced libraries (reference count never gets to 0).
alternatives i've considered:
1) using -Wl,-wrap,foo and on __wrap_foo() { func = dlopen(); func(); }
2) making libfoo.so a shared library and when we need to hotswap we dlopen(RTLD_GLOBAL) to load the new code and provide updated symbols to the next call to foo();
1) doesn't work very well because it requires me to enumerate all the functions i want to hotswap, which are all of them.
2) doesn't work very well because when foo() is called, the new code is loaded, but foo has forever the reference to that symbol. calling dlopen multiple times make foo to be re evaluated.

You may be interested in Ksplice. It's a technology that came out of MIT that allows software patches to be applied to the Linux kernel without rebooting. This is most relevant for applying security updates:
http://www.ksplice.com/paper

You could certainly hack yourself a system where you store a list of function pointers and can change these pointers to point to whatever library you have dlopen()'d at the time.
You're right, there isn't any easy way to intercept calls to routines with fixed linkage. You can always clobber the start of the routine with an assembly jump to another routine, but that can be dangerous (and isn't C).
Maybe a symbol which is weak in your code and strong in a dlopen()'d library would work?
In any of these cases, you have to deal with the situation where the old code is currently running. That isn't easy either, unless you have points in your program where you know no thread is in the library you want to swap.

the closest i have found is solari dbx which comes with oracle developer studio,however dev studio uses dbx in both linux and solaris,only solaris version supports "edit-and-continue" or "hot code swap"

Change library load order at run time (like LD_PRELOAD but during execution)

How do I change the library a function loads from during run time?
For example, say I want to replace the standard printf function with something new, I can write my own version and compile it into a shared library, then put "LD_PRELOAD=/my/library.so" in the environment before running my executable.
But let's say that instead, I want to change that linkage from within the program itself. Surely that must be possible... right?
EDIT
And no, the following doesn't work (but if you can tell me how to MAKE it work, then that would be sufficient).
void* mylib = dlopen("/path/to/library.so",RTLD_NOW);
printf = dlsym(mylib,"printf");

AFAIK, that is not possible. The general rule is that if the same symbol appears in two libraries, ld.so will favor the library that was loaded first. LD_PRELOAD works by making sure the specified libraries are loaded before any implicitly loaded libraries.
So once execution has started, all implicitly loaded libraries will have been loaded and therefore it's too late to load your library before them.

There is no clean solution but it is possible. I see two options:
Overwrite printf function prolog with jump to your replacement function.
It is quite popular solution for function hooking in MS Windows. You can find examples of function hooking by code rewriting in Google.
Rewrite ELF relocation/linkage tables.
See this article on codeproject that does almost exactly what you are asking but only in a scope of dlopen()'ed modules. In your case you want to also edit your main (typically non-PIC) module. I didn't try it, but maybe its as simple as calling provided code with:
void* handle = dlopen(NULL, RTLD_LAZY);
void* original;
original = elf_hook(argv[0], LIBRARY_ADDRESS_BY_HANDLE(handle), printf, my_printf);
If that fails you'll have to read source of your dynamic linker to figure out what needs to be adapted.

It should be said that trying to replace functions from the libc in your application has undefined behavior as per ISO C/POSIX, regardless of whether you do it statically or dynamically. It may work (and largely will work on GNU/Linux), but it's unwise to rely on it working. If you just want to use the name "printf" but have it do something nonstandard in your program, the best way to do this is to #undef printf and #define printf my_printf AFTER including any system headers. This way you don't interfere with any internal use of the function by libraries you're using...and your implementation of my_printf can even call the system printf if/when it needs to.
On the other hand, if your goal is to interfere with what libraries are doing, somewhere down the line you're probably going to run into compatibility issues. A better approach would probably be figuring out why the library won't do what you want without redefining the functions it uses, patching it, and submitting patches upstream if they're appropriate.

You can't change that. In general *NIX linking concept (or rather lack of concept) symbol is picked from first object where it is found. (Except for oddball AIX which works more like OS/2 by default.)
Programmatically you can always try dlsym(RTLD_DEFAULT) and dlsym(RTLD_NEXT). man dlsym for more. Though it gets out of hand quite quickly. Why is rarely used.

there is an environment variable LD_LIBRARY_PATH where the linker searches for shred libraries, prepend your path to LD_LIBRARY_PATH, i hope that would work

Store the dlsym() result in a lookup table (array, hash table, etc). Then #undef print and #define print to use your lookup table version.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight