Why shared library is unloaded while another program still uses it?

Why shared library is unloaded while another program still uses it? - c

For what I understand, if there are more than one program using a shared library, the shared library won't get unloaded untill all program finishes.
I am reading The Linux Programming Interface:
42.4 Initialization and Finalization Functions It is possible to define one or more functions that are executed automatically when a
shared library is loaded and unloaded. This allows us to perform
initialization and finalization actions when working with shared
libraries. Initialization and finalization functions are executed
regardless of whether the library is loaded automatically or loaded
explicitly using the dlopen interface (Section 42.1).
Initialization and finalization functions are defined using the gcc
constructor and destructor attributes. Each function that is to be
executed when the library is loaded should be defined as follows:
void __attribute__ ((constructor)) some_name_load(void)
{
/* Initialization code */
}
Unload functions are similarly defined:
void __attribute__ ((destructor)) some_name_unload(void)
{
/* Finalization code */
} The function names `some_name_load()` and `some_name_unload()` can be replaced by any desired names. ....
Then I wrote 3 files to test:
foo.c
#include <stdio.h>
void __attribute__((constructor)) call_me_when_load(void){
printf("Loading....\n");
}
void __attribute__((destructor)) call_me_when_unload(void){
printf("Unloading...\n");
}
int xyz(int a ){
return a + 3;
}
main.c
#include <stdio.h>
#include <unistd.h>
int main(){
int xyz(int);
int b;
for(int i = 0;i < 1; i++){
b = xyz(i);
printf("xyz(i) is: %d\n", b);
}
}
main_while_sleep.c
#include <stdio.h>
#include <unistd.h>
int main(){
int xyz(int);
int b;
for(int i = 0;i < 10; i++){
b = xyz(i);
sleep(1);
printf("xyz(i) is: %d\n", b);
}
}
Then I compile a shared library and 2 executables:
gcc -g -Wall -fPIC -shared -o libdemo.so foo.c
gcc -g -Wall -o main main.c libdemo.so
gcc -g -Wall -o main_while_sleep main_while_sleep.c libdemo.so
finally run LD_LIBRARY_PATH=. ./main_while_sleep in a shell and run LD_LIBRARY_PATH=. ./main in another:
main_while_sleep output:
Loading....
xyz(i) is: 3
xyz(i) is: 4
xyz(i) is: 5
xyz(i) is: 6
xyz(i) is: 7
xyz(i) is: 8
xyz(i) is: 9
xyz(i) is: 10
xyz(i) is: 11
xyz(i) is: 12
Unloading...
main output:
Loading....
xyz(i) is: 3
Unloading...
My question is, while main_while_sleep is not finished, why Unloading is printed in main, which indicates the shared library has been unloaded? The shared library shouldn't be unloaded yet, main_while_sleep is still running!
Do I get something wrong?

My question is, while main_while_sleep is not finished, why Unloading is printed in main, which indicates the shared library has been unloaded? The shared library shouldn't be unloaded yet, main_while_sleep is still running!
You are confusing/conflating initialization/deinitialization with load/unload.
A constructor is an initialization function that is called after a shared library has been mapped into a given process's memory.
It does not affect any other process (which is in a separate, per-process address space).
Likewise, the mapping (or unmapping) of a shared library in a given process does not affect any other process.
When a process maps a library, nothing is "loaded". When the process tries to access a memory page that is part of the shared library, it receives a page fault and the given page is mapped, the page is marked resident, and the faulting instruction is restarted.
There is much more detail in my answers:
How does mmap improve file reading speed?
Which segments are affected by a copy-on-write?
read line by line in the most efficient way *platform specific*
Is Dynamic Linker part of Kernel or GCC Library on Linux Systems?
Malloc is using 10x the amount of memory necessary

Related

Where do Linux shells look for interpreters for ELF binaries? [duplicate]

So everyone probably knows that glibc's /lib/libc.so.6 can be executed in the shell like a normal executable in which cases it prints its version information and exits. This is done via defining an entry point in the .so. For some cases it could be interesting to use this for other projects too. Unfortunately, the low-level entry point you can set by ld's -e option is a bit too low-level: the dynamic loader is not available so you cannot call any proper library functions. glibc for this reason implements the write() system call via a naked system call in this entry point.
My question now is, can anyone think of a nice way how one could bootstrap a full dynamic linker from that entry point so that one could access functions from other .so's?

Update 2: see Andrew G Morgan's slightly more complicated solution which does work for any GLIBC (that solution is also used in libc.so.6 itself (since forever), which is why you can run it as ./libc.so.6 (it prints version info when invoked that way)).
Update 1: this no longer works with newer GLIBC versions:
./a.out: error while loading shared libraries: ./pie.so: cannot dynamically load position-independent executable
Original answer from 2009:
Building your shared library with -pie option appears to give you everything you want:
/* pie.c */
#include <stdio.h>
int foo()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return 42;
}
int main()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return foo();
}
/* main.c */
#include <stdio.h>
extern int foo(void);
int main()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return foo();
}
$ gcc -fPIC -pie -o pie.so pie.c -Wl,-E
$ gcc main.c ./pie.so
$ ./pie.so
in main pie.c:9
in foo pie.c:4
$ ./a.out
in main main.c:6
in foo pie.c:4
$
P.S. glibc implements write(3) via system call because it doesn't have anywhere else to call (it is the lowest level already). This has nothing to do with being able to execute libc.so.6.

I have been looking to add support for this to pam_cap.so, and found this question. As #EmployedRussian notes in a follow-up to their own post, the accepted answer stopped working at some point. It took a while to figure out how to make this work again, so here is a worked example.
This worked example involves 5 files to show how things work with some corresponding tests.
First, consider this trivial program (call it empty.c):
int main(int argc, char **argv) { return 0; }
Compiling it, we can see how it resolves the dynamic symbols on my system as follows:
$ gcc -o empty empty.c
$ objcopy --dump-section .interp=/dev/stdout empty ; echo
/lib64/ld-linux-x86-64.so.2
$ DL_LOADER=/lib64/ld-linux-x86-64.so.2
That last line sets a shell variable for use later.
Here are the two files that build my example shared library:
/* multi.h */
void multi_main(void);
void multi(const char *caller);
and
/* multi.c */
#include <stdio.h>
#include <stdlib.h>
#include "multi.h"
void multi(const char *caller) {
printf("called from %s\n", caller);
}
__attribute__((force_align_arg_pointer))
void multi_main(void) {
multi(__FILE__);
exit(42);
}
const char dl_loader[] __attribute__((section(".interp"))) =
DL_LOADER ;
(Update 2021-11-13: The forced alignment is to help __i386__ code be SSE compatible - without it we get hard to debug glibc SIGSEGV crashes.)
We can compile and run it as follows:
$ gcc -fPIC -shared -o multi.so -DDL_LOADER="\"${DL_LOADER}\"" multi.c -Wl,-e,multi_main
$ ./multi.so
called from multi.c
$ echo $?
42
So, this is a .so that can be executed as a stand alone binary. Next, we validate that it can be loaded as shared object.
/* opener.c */
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
void *handle = dlopen("./multi.so", RTLD_NOW);
if (handle == NULL) {
perror("no multi.so load");
exit(1);
}
void (*multi)(const char *) = dlsym(handle, "multi");
multi(__FILE__);
}
That is we dynamically load the shared-object and run a function from it:
$ gcc -o opener opener.c -ldl
$ ./opener
called from opener.c
Finally, we link against this shared object:
/* main.c */
#include "multi.h"
int main(int argc, char **argv) {
multi(__FILE__);
}
Where we compile and run it as follows:
$ gcc main.c -o main multi.so
$ LD_LIBRARY_PATH=./ ./main
called from main.c
(Note, because multi.so isn't in a standard system library location, we need to override where the runtime looks for the shared object file with the LD_LIBRARY_PATH environment variable.)

I suppose you'd have your ld -e point to an entry point which would then use the dlopen() family of functions to find and bootstrap the rest of the dynamic linker. Of course you'd have to ensure that dlopen() itself was either statically linked or you might have to implement enough of your own linker stub to get at it (using system call interfaces such as mmap() just as libc itself is doing.
None of that sounds "nice" to me. In fact just the thought of reading the glibc sources (and the ld-linux source code, as one example) enough to assess the size of the job sounds pretty hoary to me. It might also be a portability nightmare. There may be major differences between how Linux implements ld-linux and how the linkages are done under OpenSolaris, FreeBSD, and so on. (I don't know).

what can be called from -fini function of shared library?

I am using a shared library with LD_PRELOAD, and it seems that I can't call some functions from the function set with -fini= ld option. I am running Linux Ubuntu 20.04 on a 64-bit machine.
Here is the SSCCE:
shared.sh:
#!/bin/bash
gcc -shared -fPIC -Wl,-init=init -Wl,-fini=fini shared.c -o shared.so
LD_PRELOAD=$PWD/shared.so whoami
shared.c:
#include <stdio.h>
#include <unistd.h>
void init() {
printf("%s\n", __func__);
fflush(stdout);
}
void fini() {
int printed;
printed = printf("%s\n", __func__);
if (printed < 0)
sleep(2);
fflush(stdout);
}
When I call ./shared.sh , I get
init
mark
and 2 second pause.
So it seems printf() fails in fini() but sleep() succeeds (errno values are not specified for printf, so I don't check it) Why and what kind of functions can I call from fini? ld manpage does not say anything about any restrictions.

The initialization functions of each dynamically linked component are executed in the order in which the components are loaded. In particular, if A depends on B but B does not depend on A, then B's initialization functions run before A's. The termination functions of each dynamically linked component are executed in the order in which the components are unloaded. In particular, if A depends on B but B does not depend on A, then B's initialization functions run after A's. Generally, termination functions run in reverse order from initialization functions, but I don't know if that's true in all cases (for example when there are circular dependencies). You can find the rules in the System V ABI specification which Linux and many other Unix variants follow. Note that the rules leave some cases unspecified; they might depend on the compiler and on the standard library (possibly on the kernel, but I think for this particular topic it doesn't matter).
A shared library loaded with LD_PRELOAD is loaded before the main executable, so its initialization functions run before the ones from libc and its termination functions run after the ones from libc. In particular, libc flushes standard streams and closes the file descriptors for the output streams. You can see this happening by tracing system calls:
$ strace env LD_PRELOAD=$PWD/shared.so whoami
…
write(1, "gilles\n", 6gilles
) = 6
close(1) = 0
close(2) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=2, tv_nsec=0}, 0x7ffc12bd2df0) = 0
exit_group(0) = ?
+++ exited with 0 +++
The call to clock_nanosleep is sleep(2). The calls to printf and fflush happen just before; since stdout has been closed, they do nothing and return -1. Check the return value or use a debugger to confirm this.
Contrast with what happens if shared.so is linked normally, rather than preloaded.
$ cat main.c
#include <stdio.h>
int main(void) {
puts("main");
return 0;
}
$ gcc -o main main.c -Wl,-rpath,. -Wl,--no-as-needed -L. -l:shared.so
$ ./main
init
main
fini
Here, since main loads shared.so, the shared library is initialized last and terminated first. So by the time the fini function in shared.so runs, libc hasn't run its termination functions and the standard streams are still available.

In gcc is there any way to dynamically add a function call to the start of main()?

I'm dynamically overriding malloc() with a fast_malloc() implementation of mine in a glibc benchmark malloc speed test (glibc/benchtests/bench-malloc-thread.c), by writing these functions in my fast_malloc.c file:
// Override malloc() and free(); see: https://stackoverflow.com/a/262481/4561887
inline void* malloc(size_t num_bytes)
{
static bool first_call = true;
if (first_call)
{
first_call = false;
fast_malloc_error_t error = fast_malloc_init();
assert(error == FAST_MALLOC_ERROR_OK);
}
return fast_malloc(num_bytes);
}
inline void free(void* ptr)
{
fast_free(ptr);
}
Notice that I have this inefficient addition to my malloc() wrapper to ensure fast_malloc_init() gets called first on just the first call, to initialize some memory pools. I'd like to get rid of that and dynamically insert that init call into the start of main(), without modifying the glibc code, if possible. Is this possible?
The downside of how I've written my malloc() wrapper so far is it skews my benchtest results making it look like my fast_malloc() is slower than it really is, because the init func gets timed by glibc/benchtests/bench-malloc-thread.c, and I have this extraneous if (first_call) which gets checked every malloc call.
Currently I dynamically override malloc() and free(), while calling the bench-malloc-thread executable, like this:
LD_PRELOAD='/home/gabriel/GS/dev/fast_malloc/build/libfast_malloc.so' \
glibc-build/benchtests/bench-malloc-thread 1
Plot I will be adding my fast_malloc() speed tests to (using this repo):
LinkedIn post I made about this: https://www.linkedin.com/posts/gabriel-staples_software-engineering-tradeoffs-activity-6815412255325339648-_c8L.
Related:
[my repo fork] https://github.com/ElectricRCAircraftGuy/malloc-benchmarks
[how I learned how to generate *.so dynamic libraries in gcc] https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
Create a wrapper function for malloc and free in C

Is this possible?
Yes. You are building and LD_PRELOADing a shared library, and shared libraries can have special initializer and finalizer functions, which are called by the dynamic loader when the library is loaded and unloaded respectively.
As kaylum commented, to create such a constructor, you would use __attribute__((constructor)), like so:
__attribute__((constructor))
void fast_malloc_init_ctor()
{
fast_malloc_error_t error = fast_malloc_init();
assert(error == FAST_MALLOC_ERROR_OK);
}
// ... the rest of implementation here.
P.S.
it skews my benchtest results making it look like my fast_malloc() is slower than it really is, because the init func gets timed
You are comparing with multi-threaded benchmarks. Note that your static bool fist_call is not thread-safe. In practice this will not matter, because malloc is normally called long before any threads (other than the main thread) exist.
I doubt that this single comparison actually makes your fast_malloc() slower. It probably is slower even after you remove the comparison -- writing a fast heap allocator takes a lot of effort, and smart people have spent many man-years optimizing GLIBC malloc, TCMalloc and jemalloc.

How to dynamically inject function calls before and after another executable's main() function.
Here is a full, runnable example for anyone wanting to test this on their own. Tested on Linux Ubuntu 20.04.
This code is all part of my eRCaGuy_hello_world repo.
hello_world_basic.c:
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
printf("This is the start of `main()`.\n");
printf(" Hello world.\n");
printf("This is the end of `main()`.\n");
return 0;
}
dynamic_func_call_before_and_after_main.c:
#include <assert.h>
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
#include <stdlib.h> // For `atexit()`
/// 3. This function gets attached as a post-main() callback (a sort of program "destructor")
/// via the C <stdlib.h> `atexit()` call below
void also_called_after_main()
{
printf("`atexit()`-registered callback functions are also called AFTER `main()`.\n");
}
/// 1. Functions with gcc function attribute, `constructor`, get automatically called **before**
/// `main()`; see:
/// https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
__attribute__((__constructor__))
void called_before_main()
{
printf("gcc constructors are called BEFORE `main()`.\n");
// 3. Optional way to register a function call for AFTER main(), although
// I prefer the simpler gcc `destructor` attribute technique below, instead.
int retcode = atexit(also_called_after_main);
assert(retcode == 0); // ensure the `atexit()` call to register the callback function succeeds
}
/// 2. Functions with gcc function attribute, `destructor`, get automatically called **after**
/// `main()`; see:
/// https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
__attribute__((__destructor__))
void called_after_main()
{
printf("gcc destructors are called AFTER `main()`.\n");
}
How to build and run the dynamic lib*.so shared-object library and dynamically load it with LD_PRELOAD as you run another program (see "dynamic_func_call_before_and_after_main__build_and_run.sh from my eRCaGuy_hello_world repo"):
# 1. Build the other program (hello_world_basic.c) that has `main()` in it which we want to use
mkdir -p bin && gcc -Wall -Wextra -Werror -O3 -std=c11 -save-temps=obj hello_world_basic.c \
-o bin/hello_world_basic
# 2. Create a .o object file of this program, compiling with Position Independent Code (PIC); see
# here: https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
gcc -Wall -Wextra -Werror -O3 -std=c11 -fpic -c dynamic_func_call_before_and_after_main.c \
-o bin/dynamic_func_call_before_and_after_main.o
# 3. Link the above PIC object file into a dynamic shared library (`lib*.so` file); link above shows
# we must use `-shared`
gcc -shared bin/dynamic_func_call_before_and_after_main.o -o \
bin/libdynamic_func_call_before_and_after_main.so
# 4. Call the other program with `main()` in it, dynamically injecting this code into that other
# program via this code's .so shared object file, and via Linux's `LD_PRELOAD` trick
LD_PRELOAD='bin/libdynamic_func_call_before_and_after_main.so' bin/hello_world_basic
Sample output. Notice that we have injected some special function calls both before AND after the main() function found in "hello_world_basic.c":
gcc constructors are called BEFORE `main()`.
This is the start of `main()`.
Hello world.
This is the end of `main()`.
gcc destructors are called AFTER `main()`.
`atexit()`-registered callback functions are also called AFTER `main()`.
References:
How to build dynamic lib*.so libraries in Linux: https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
#kaylum's comment
#Employed Russian's answer
#Lundin's comment
gcc constructor and destructor function attributes!:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
c atexit() func to register functions to be called AFTER main() returns or exits!:
https://en.cppreference.com/w/c/program/atexit

Threaded shared library for non threaded application

I have some application for which I need to write extension using shared library. In my shared library I need to use threads. And main application neither uses threads neither linked with threads library (libpthread.so, for example).
As first tests showed my library causes crashes of the main application. And if i use LD_PRELOAD hack crashes goes away:
LD_PRELOAD=/path/to/libpthread.so ./app
The only OS where i have no segfaults without LD_PRELOAD hack is OS X. On other it just crashes. I tested: Linux, FreeBSD, NetBSD.
My question is: is there a way to make my threaded shared library safe for non-threaded application without changing of the main application and LD_PRELOAD hacks?
To reproduce the problem i wrote simple example:
mylib.c
#include <pthread.h>
#include <assert.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void *_thread(void *arg) {
int i;
struct addrinfo *res;
for (i=0; i<10000; i++) {
if (getaddrinfo("localhost", NULL, NULL, &res) == 0) {
if (res) freeaddrinfo(res);
}
}
pthread_mutex_lock(&mutex);
printf("Just another thread message!\n");
pthread_mutex_unlock(&mutex);
return NULL;
}
void make_thread() {
pthread_t tid[10];
int i, rc;
for (i=0; i<10; i++) {
rc = pthread_create(&tid[i], NULL, _thread, NULL);
assert(rc == 0);
}
void *rv;
for (i=0; i<10; i++) {
rc = pthread_join(tid[i], &rv);
assert(rc == 0);
}
}
main.c
#include <stdio.h>
#include <dlfcn.h>
int main() {
void *mylib_hdl;
void (*make_thread)();
mylib_hdl = dlopen("./libmy.so", RTLD_NOW);
if (mylib_hdl == NULL) {
printf("dlopen: %s\n", dlerror());
return 1;
}
make_thread = (void (*)()) dlsym(mylib_hdl, "make_thread");
if (make_thread == NULL) {
printf("dlsym: %s\n", dlerror());
return 1;
}
(*make_thread)();
return 0;
}
Makefile
all:
cc -pthread -fPIC -c mylib.c
cc -pthread -shared -o libmy.so mylib.o
cc -o main main.c -ldl
clean:
rm *.o *.so main
And all together: https://github.com/olegwtf/sandbox/tree/bbbf76fdefe4bacef8a0de7a2475995719ae0436/threaded-so-for-non-threaded-app
$ make
cc -pthread -fPIC -c mylib.c
cc -pthread -shared -o libmy.so mylib.o
cc -o main main.c -ldl
$ ./main
*** glibc detected *** ./main: double free or corruption (fasttop): 0x0000000001614c40 ***
Segmentation fault
$ ldd libmy.so | grep thr
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe7e2591000)
$ LD_PRELOAD=/lib/x86_64-linux-gnu/libpthread.so.0 ./main
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!
Just another thread message!

My question is: is there a way to make my threaded shared library safe
for non-threaded application without changing of the main application
and LD_PRELOAD hacks?
No, those are the two ways you can make it work. With neither in place, your program is invalid.

dlopen is supposed to do the right thing, and to open all the libraries your own .so depends upon.
In fact, your code is working for me if I comment out the address lookup code that you placed inside your thread function. So loading the pthread library works perfectly.
And if I run the code including the lookup, valgrind shows me that the crash is below getaddrinfo.
So the problem is not that the libraries aren't loaded, somehow their initialization code is not executed or not in the right order.

gdb helped to understand what's goin on with this example.
After 3 tries gdb showed that app always crashed at rewind.c line 36 inside libc. Since tests were run on Debian 7, libc implementation is eglibc. And here you can see line 36 of rewind.c:
http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_13/libc/libio/rewind.c?annotate=12752
_IO_acquire_lock() is a macros and after grepping eglibc source I found 2 places where it is defined:
bits/stdio-lock.h line 49: http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_13/libc/bits/stdio-lock.h?annotate=12752
sysdeps/pthread/bits/stdio-lock.h line 91: http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_13/libc/nptl/sysdeps/pthread/bits/stdio-lock.h?annotate=12752
Comment for first says Generic version and for second NPTL version, where NTPL is Native POSIX Thread Library. So in few words first defines non-threaded implementation for this and several other macroses and second threaded implementation.
When our main application is not linked with pthreads it starts and loads this first non-threaded implementation of _IO_acquire_lock() and others macroses. Then it opens our threaded shared library and executes function from it. And this function uses already loaded and non thread safe version of _IO_acquire_lock(). However in fact should use threads compatible version defined by pthreads. This is where segfault occures.
This is how it works on Linux. On *BSD situation is even more sad. On FreeBSD your program will hang up immediately after your threaded library will try to create new thread. On NetBSD instead of hang up program will be terminated with SIGABRT.
So answering to the main question: is it possible to use threaded shared library from application not linked with pthreads?
In general -- no. And particularly this depends on libc implementation. For OS X, for example, this will work without any problems. For Linux this will work if you'll not use libc functions that uses such special macroses redefined by pthreads. But how to know which uses? Ok, you can make 1+1, this looks safe. On *BSD your program will crash or hang up immediately, no matter what your thread do.

Can gdb make a function pointer point to another location?

I'll explain:
Let's say I'm interested in replacing the rand() function used by a certain application.
So I attach gdb to this process and make it load my custom shared library (which has a customized rand() function):
call (int) dlopen("path_to_library/asdf.so")
This would place the customized rand() function inside the process' memory. However, at this point the symbol rand will still point to the default rand() function. Is there a way to make gdb point the symbol to the new rand() function, forcing the process to use my version?
I must say I'm also not allowed to use the LD_PRELOAD (linux) nor DYLD_INSERT_LIBRARIES (mac os x) methods for this, because they allow code injection only in the beginning of the program execution.
The application that I would like to replace rand(), starts several threads and some of them start new processes, and I'm interested in injecting code on one of these new processes. As I mentioned above, GDB is great for this purpose because it allows code injection into a specific process.

I followed this post and this presentation and came up with the following set of gdb commands for OSX with x86-64 executable, which can be loaded with -x option when attaching to the process:
set $s = dyld_stub_rand
set $p = ($s+6+*(int*)($s+2))
call (void*)dlsym((void*)dlopen("myrand.dylib"), "my_rand")
set *(void**)$p = my_rand
c
The magic is in set $p = ... command. dyld_stub_rand is a 6-byte jump instruction. Jump offset is at dyld_stub_rand+2 (4 bytes). This is a $rip-relative jump, so add offset to what $rip would be at this point (right after the instruction, dyld_stub_rand+6).
This points to a symbol table entry, which should be either real rand or dynamic linker routine to load it (if it was never called). It is then replaced by my_rand.
Sometimes gdb will pick up dyld_stub_rand from libSystem or another shared library, if that happens, unload them first with remove-symbol-file before running other commands.

This question intrigued me, so I did a little research. What you are looking for is a 'dll injection'. You write a function to replace some library function, put it in a .so, and tell ld to preload your dll. I just tried it out and it worked great! I realize this doesn't really answer your question in relation to gdb, but I think it offers a viable workaround.
For a gdb-only solution, see my other solution.
// -*- compile-command: "gcc -Wall -ggdb -o test test.c"; -*-
// test.c
#include "stdio.h"
#include "stdlib.h"
int main(int argc, char** argv)
{
//should print a fairly random number...
printf("Super random number: %d\n", rand());
return 0;
}
/ -*- compile-command: "gcc -Wall -fPIC -shared my_rand.c -o my_rand.so"; -*-
//my_rand.c
int rand(void)
{
return 42;
}
compile both files, then run:
LD_PRELOAD="./my_rand.so" ./test
Super random number: 42

I have a new solution, based on the new original constraints. (I am not deleting my first answer, as others may find it useful.)
I have been doing a bunch of research, and I think it would work with a bit more fiddling.
In your .so rename your replacement rand function, e.g my_rand
Compile everything and load up gdb
Use info functions to find the address of rand in the symbol table
Use dlopen then dlsym to load the function into memory and get its address
call (int) dlopen("my_rand.so", 1) -> -val-
call (unsigned int) dlsym(-val-, "my_rand") -> my_rand_addr
-the tricky part- Find the hex code of a jumpq 0x*my_rand_addr* instruction
Use set {int}*rand_addr* = *my_rand_addr* to change symbol table instruction
Continue execution: now whenever rand is called, it will jump to my_rand instead
This is a bit complicated, and very round-about, but I'm pretty sure it would work. The only thing I haven't accomplished yet is creating the jumpq instruction code. Everything up until that point works fine.

I'm not sure how to do this in a running program, but perhaps LD_PRELOAD will work for you. If you set this environment variable to a list of shared objects, the runtime loader will load the shared object early in the process and allow the functions in it to take precedence over others.
LD_PRELOAD=path_to_library/asdf.so path/to/prog
You do have to do this before you start the process but you don't have to rebuild the program.

Several of the answers here and the code injection article you linked to in your answer cover chunks of what I consider the optimal gdb-oriented solution, but none of them pull it all together or cover all the points. The code-expression of the solution is a bit long, so here's a summary of the important steps:
Load the code to inject. Most of the answers posted here use what I consider the best approach -- call dlopen() in the inferior process to link in a shared library containing the injected code. In the article you linked to the author instead loaded a relocatable object file and hand-linked it against the inferior. This is quite frankly insane -- relocatable objects are not "ready-to-run" and include relocations even for internal references. And hand-linking is tedious and error-prone -- far simpler to let the real runtime dynamic linker do the work. This does mean getting libdl into the process in the first place, but there are many options for doing that.
Create a detour. Most of the answers posted here so far have involved locating the PLT entry for the function of interest, using that to find the matching GOT entry, then modifying the GOT entry to point to your injected function. This is fine up to a point, but certain linker features -- e.g., use of dlsym -- can circumvent the GOT and provide direct access to the function of interest. The only way to be certain of intercepting all calls to a particular function is overwrite the initial instructions of that function's code in-memory to create a "detour" redirecting execution to your injected function.
Create a trampoline (optional). Frequently when doing this sort of injection you'll want to call the original function whose invocation you are intercepting. The way to allow this with a function detour is to create a small code "trampoline" which includes the overwritten instructions of the original function then a jump to the remainder of the original. This can be complex, because any IP-relative instructions in the copied set need to be modified to account for their new addresses.
Automate it all. These steps can be tedious, even if doing some of the simpler solutions posted in other answers. The best way to ensure that the steps are done correctly every time with variable parameters (injecting different functions, etc) is to automate their execution. Starting with the 7.0 series, gdb has included the ability to write new commands in Python. This support can be used to implement a turn-key solution for injecting and detouring code in/to the inferior process.
Here's an example. I have the same a and b executables as before and an inject2.so created from the following code:
#include <unistd.h>
#include <stdio.h>
int (*rand__)(void) = NULL;
int
rand(void)
{
int result = rand__();
printf("rand invoked! result = %d\n", result);
return result % 47;
}
I can then place my Python detour command in detour.py and have the following gdb session:
(gdb) source detour.py
(gdb) exec-file a
(gdb) set follow-fork-mode child
(gdb) catch exec
Catchpoint 1 (exec)
(gdb) run
Starting program: /home/llasram/ws/detour/a
a: 1933263113
a: 831502921
[New process 8500]
b: 918844931
process 8500 is executing new program: /home/llasram/ws/detour/b
[Switching to process 8500]
Catchpoint 1 (exec'd /home/llasram/ws/detour/b), 0x00007ffff7ddfaf0 in _start ()
from /lib64/ld-linux-x86-64.so.2
(gdb) break main
Breakpoint 2 at 0x4005d0: file b.c, line 7.
(gdb) cont
Continuing.
Breakpoint 2, main (argc=1, argv=0x7fffffffdd68) at b.c:7
7 {
(gdb) detour libc.so.6:rand inject2.so:rand inject2.so:rand__
(gdb) cont
Continuing.
rand invoked! result = 392103444
b: 22
Program exited normally.
In the child process, I create a detour from the rand() function in libc.so.6 to the rand() function in inject2.so and store a pointer to a trampoline for the original rand() in the rand__ variable of inject2.so. And as expected, the injected code calls the original, displays the full result, and returns that result modulo 47.
Due to length, I'm just linking to a pastie containing the code for my detour command. This is a fairly superficial implementation (especially in terms of the trampoline generation), but it should work well in a large percentage of cases. I've tested it with gdb 7.2 (most recently released version) on Linux with both 32-bit and 64-bit executables. I haven't tested it on OS X, but any differences should be relatively minor.

For executables you can easily find the address where the function pointer is stored by using objdump. For example:
objdump -R /bin/bash | grep write
00000000006db558 R_X86_64_JUMP_SLOT fwrite
00000000006db5a0 R_X86_64_JUMP_SLOT write
Therefore, 0x6db5a0 is the adress of the pointer for write. If you change it, calls to write will be redirected to your chosen function. Loading new libraries in gdb and getting function pointers has been covered in earlier posts. The executable and every library have their own pointers. Replacing affects only the module whose pointer was changed.
For libraries, you need to find the base address of the library and add it to the address given by objdump. In Linux, /proc/<pid>/maps gives it out. I don't know whether position-independent executables with address randomization would work. maps-information might be unavailable in such cases.

As long as the function you want to replace is in a shared library, you can redirect calls to that function at runtime (during debugging) by poking at the PLT. Here is an article that might be helpful:
Shared library call redirection using ELF PLT infection
It's written from the standpoint of malware modifying a program, but a much easier procedure is adaptable to live use in the debugger. Basically you just need to find the function's entry in the PLT and overwrite the address with the address of the function you want to replace it with.
Googling for "PLT" along with terms like "ELF", "shared library", "dynamic linking", "PIC", etc. might find you more details on the subject.

You can still us LD_PRELOAD if you make the preloaded function understand the situations it's getting used in. Here is an example that will use the rand() as normal, except inside a forked process when it will always return 42. I use the dl routines to load the standard library's rand() function into a function pointer for use by the hijacked rand().
// -*- compile-command: "gcc -Wall -fPIC -shared my_rand.c -o my_rand.so -ldl"; -*-
//my_rand.c
#include <sys/types.h>
#include <unistd.h>
#include <dlfcn.h>
int pid = 0;
int (*real_rand)(void) = NULL;
void f(void) __attribute__ ((constructor));
void f(void) {
pid = getpid();
void* dl = dlopen("libc.so.6", RTLD_LAZY);
if(dl) {
real_rand = dlsym(dl, "rand");
}
}
int rand(void)
{
if(pid == getpid() && real_rand)
return real_rand();
else
return 42;
}
//test.c
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char** argv)
{
printf("Super random number: %d\n", rand());
if(fork()) {
printf("original process rand: %d\n", rand());
} else {
printf("forked process rand: %d\n", rand());
}
return 0;
}
jdizzle#pudding:~$ ./test
Super random number: 1804289383
original process rand: 846930886
forked process rand: 846930886
jdizzle#pudding:~$ LD_PRELOAD="/lib/ld-linux.so.2 ./my_rand.so" ./test
Super random number: 1804289383
original process rand: 846930886
forked process rand: 42

I found this tutorial incredibly useful, and so far its the only way I managed to achieve what I was looking with GDB: Code Injection into Running Linux Application: http://www.codeproject.com/KB/DLL/code_injection.aspx
There is also a good Q&A on code injection for Mac here: http://www.mikeash.com/pyblog/friday-qa-2009-01-30-code-injection.html

I frequently use code injection as a method of mocking for automated testing of C code. If that's the sort of situation you're in -- if your use of GDB is simply because you're not interested in the parent processes, and not because you want to interactively select the processes which are of interest -- then you can still use LD_PRELOAD to achieve your solution. Your injected code just needs to determine whether it is in the parent or child processes. There are several ways you could do this, but on Linux, since your child processes exec(), the simplest is probably to look at the active executable image.
I produced two executables, one named a and the other b. Executable a prints the result of calling rand() twice, then fork()s and exec()s b twice. Executable b print the result of calling rand() once. I use LD_PRELOAD to inject the result of compiling the following code into the executables:
// -*- compile-command: "gcc -D_GNU_SOURCE=1 -Wall -std=gnu99 -O2 -pipe -fPIC -shared -o inject.so inject.c"; -*-
#include <sys/types.h>
#include <unistd.h>
#include <limits.h>
#include <stdio.h>
#include <dlfcn.h>
#define constructor __attribute__((__constructor__))
typedef int (*rand_t)(void);
typedef enum {
UNKNOWN,
PARENT,
CHILD
} state_t;
state_t state = UNKNOWN;
rand_t rand__ = NULL;
state_t
determine_state(void)
{
pid_t pid = getpid();
char linkpath[PATH_MAX] = { 0, };
char exepath[PATH_MAX] = { 0, };
ssize_t exesz = 0;
snprintf(linkpath, PATH_MAX, "/proc/%d/exe", pid);
exesz = readlink(linkpath, exepath, PATH_MAX);
if (exesz < 0)
return UNKNOWN;
switch (exepath[exesz - 1]) {
case 'a':
return PARENT;
case 'b':
return CHILD;
}
return UNKNOWN;
}
int
rand(void)
{
if (state == CHILD)
return 47;
return rand__();
}
constructor static void
inject_init(void)
{
rand__ = dlsym(RTLD_NEXT, "rand");
state = determine_state();
}
The result of running a with and without injection:
$ ./a
a: 644034683
a: 2011954203
b: 375870504
b: 1222326746
$ LD_PRELOAD=$PWD/inject.so ./a
a: 1023059566
a: 986551064
b: 47
b: 47
I'll post a gdb-oriented solution later.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight