I want to dynamically load a library without using functions from dlfcn.h i have a folder full of .so files compiled with:
gcc -Wall -shared -fPIC -o filename.so filename.c
And all of them have an entry function named:
void __start(int size, char** cmd);
(I know, probably not the best name)
then i try to call open over the so, then read the elf header as an Elf64_Ehdr load it into memory with mmap (i know i should use mprotect but i want to make it work first and then add the security) and finally adding the the elf header entry to the pointer returned by mmap (plus an offset of 0x100).
The test code is the following:
#include <elf.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
typedef void (*ptr_t) (int, char**);
void* alloc_ex_memory(int fd) {
struct stat s;
fstat(fd, &s);
void * ptr = mmap(0, s.st_size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, fd, 0);
if (ptr == (void *)-1) {
perror("mmap");
return NULL;
}
return ptr;
}
void* load_ex_file(const char* elfFile) {
Elf64_Ehdr header;
void * ptr;
FILE* file = fopen(elfFile, "rb");
if(file) {
fread(&header, 1, sizeof(header), file);
if (memcmp(header.e_ident, ELFMAG, SELFMAG) == 0) {
ptr = alloc_ex_memory(fileno(file));
printf("PTR AT -> %p\n", ptr);
printf("Entry at -> %lx\n", header.e_entry+256);
ptr = (header.e_entry + 256);
} else {
ptr = NULL;
}
fclose(file);
return ptr;
}
return NULL;
}
int main(int argc, char** argv) {
ptr_t func = load_ex_file(argv[1]);
printf("POINTER AT: %p\n", func);
getchar();
func(argc, argv);
return 0;
}
Let me explain a bit further:
When i run objdump -d filename.so i get that the _start is at 0x6c0. The elf header entry point returns 0x5c0 (i compensate it adding 256 in dec).
Also pmap shows an executable area being created at lets say 0x7fdf94d0c000
so the function pointer direction that i get and which i call is at 0x7fdf94d0c6c0 this should be the entry point and callable via function pointer. But what i get is, you guessed it: segfault.
Las thing that i would like to point out is that i have the same example running with (dlopen, dlsym, dlclose) but its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how. (perhaps there is a simpler way that im missing).
I based the code on this jit example
Thank you in advance!
Edit: This is the code of the .c that i compile into .so:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void __start(int size, char** cmd) {
if (size==2) {
if (!strcmp(cmd[1], "-l")) {
printf("******.********.*****#udc.es\n");
return;
} else if (!strcmp(cmd[1], "-n")) {
printf("****** ******** *****\n");
return;
}
} else if (size==1) {
printf("****** ******** ***** (******.********.*****#udc.es)\n");
return;
}
printf("Wrong command syntax: autores [-l | -n]\n");
}
its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how.
For this to work, your __start must be completely stand-alone, and not call any other library (you violated that requirement by calling strcmp and printf).
The moment you call an unresolved symbol, you ask the dynamic loader to resolve that symbol, and the loader needs all kinds of info for that. That info is normally set up during dlopen call. Since you bypassed dlopen, it is not at all surprising that your program crashes.
Your first step should be to change your code such that __start doesn't do anything at all (is empty), and verify that you can then call __start. That would confirm that you are computing its address correctly.
Second, add a crash to that code:
void __start(...)
{
int *p = NULL;
*p = 42; // should crash here
}
and verify that you are observing the crash now. That will confirm that your code is getting called.
Third, remove the crashing code and use only direct system calls (e.g. write(1, "Hello\n", 6) (but don't call write -- you must implement it directly in your library)) to implement whatever you need.
Related
I have been trying to intercept calls to malloc and free, following our textbook (CSAPP book).
I have followed their exact code, and nearly the same code that I found online and I keep getting a segmentation fault. I heard our professor saying something about printf that mallocs and frees memory so I think that this happens because I am intercepting a malloc and since I am using a printf function inside the intercepting function, it will call itself recursively.
However I can't seem to find a solution to solving this problem? Our professor demonstrated that intercepting worked ( he didn't show us the code) and prints our information every time a malloc occurs, so I do know that it's possible.
Can anyone suggest a working method??
Here is the code that I used and get nothing:
mymalloc.c
#ifdef RUNTIME
// Run-time interposition of malloc and free based on // dynamic linker's (ld-linux.so) LD_PRELOAD mechanism #define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h> #include <dlfcn.h>
void *malloc(size_t size) {
static void *(*mallocp)(size_t size) = NULL; char *error;
void *ptr;
// get address of libc malloc
if (!mallocp) {
mallocp = dlsym(RTLD_NEXT, "malloc"); if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(EXIT_FAILURE);
}
}
ptr = mallocp(size);
printf("malloc(%d) = %p\n", (int)size, ptr); return ptr;
}
#endif
test.c
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("main\n");
int* a = malloc(sizeof(int)*5);
a[0] = 1;
printf("end\n");
}
The result i'm getting:
$ gcc -o test test.c
$ gcc -DRUNTIME -shared -fPIC mymalloc.c -o mymalloc.so
$ LD_PRELOAD=./mymalloc.so ./test
Segmentation Fault
This is the code that I tried and got segmentation fault (from https://gist.github.com/iamben/4124829):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void* malloc(size_t size)
{
static void* (*rmalloc)(size_t) = NULL;
void* p = NULL;
// resolve next malloc
if(!rmalloc) rmalloc = dlsym(RTLD_NEXT, "malloc");
// do actual malloc
p = rmalloc(size);
// show statistic
fprintf(stderr, "[MEM | malloc] Allocated: %lu bytes\n", size);
return p;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define STR_LEN 128
int main(int argc, const char *argv[])
{
char *c;
char *str1 = "Hello ";
char *str2 = "World";
//allocate an empty string
c = malloc(STR_LEN * sizeof(char));
c[0] = 0x0;
//and concatenate str{1,2}
strcat(c, str1);
strcat(c, str2);
printf("New str: %s\n", c);
return 0;
}
The makefile from the git repo didn't work so I manually compiled the files and got:
$ gcc -shared -fPIC libint.c -o libint.so
$ gcc -o str str.c
$ LD_PRELOAD=./libint.so ./str
Segmentation fault
I have been doing this for hours and I still get the same incorrect result, despite the fact that I copied textbook code. I would really appreciate any help!!
One way to deal with this is to turn off the printf when your return is called recursively:
static char ACallIsInProgress = 0;
if (!ACallIsInProgress)
{
ACallIsInProgress = 1;
printf("malloc(%d) = %p\n", (int)size, ptr);
ACallIsInProgress = 0;
}
return ptr;
With this, if printf calls malloc, your routine will merely call the actual malloc (via mallocp) and return without causing another printf. You will miss printing information about a call to malloc that the printf does, but that is generally tolerable when interposing is being used to study the general program, not the C library.
If you need to support multithreading, some additional work might be needed.
The printf implementation might allocate a buffer only once, the first time it is used. In that case, you can initialize a flag that turns off the printf similar to the above, call printf once in the main routine (maybe be sure it includes a nice formatting task that causes printf to allocate a buffer, not a plain string), and then set the flag to turn on the printf call and leave it set for the rest of the program.
Another option is for your malloc routine not to use printf at all but to cache data in a buffer to be written later by some other routine or to write raw data to a file using write, with that data interpreted and formatted by a separate program later. Or the raw data could be written by a pipe to a program that formats and prints it and that is not using your interposed malloc.
I need to get base address of stack inside my running process. This would enable me to print raw stacktraces that will be understood by addr2line (running binary is stripped, but addr2line has access to symbols).
I managed to do this by examining elf header of argv[0]: I read entry point and substract it from &_start:
#include <stdio.h>
#include <execinfo.h>
#include <unistd.h>
#include <elf.h>
#include <stdio.h>
#include <string.h>
void* entry_point = NULL;
void* base_addr = NULL;
extern char _start;
/// given argv[0] will populate global entry_pont
void read_elf_header(const char* elfFile) {
// switch to Elf32_Ehdr for x86 architecture.
Elf64_Ehdr header;
FILE* file = fopen(elfFile, "rb");
if(file) {
fread(&header, 1, sizeof(header), file);
if (memcmp(header.e_ident, ELFMAG, SELFMAG) == 0) {
printf("Entry point from file: %p\n", (void *) header.e_entry);
entry_point = (void*)header.e_entry;
base_addr = (void*) ((long)&_start - (long)entry_point);
}
fclose(file);
}
}
/// print stacktrace
void bt() {
static const int MAX_STACK = 30;
void *array[MAX_STACK];
auto size = backtrace(array, MAX_STACK);
for (int i = 0; i < size; ++i) {
printf("%p ", (long)array[i]-(long)base_addr );
}
printf("\n");
}
int main(int argc, char* argv[])
{
read_elf_header(argv[0]);
printf("&_start = %p\n",&_start);
printf("base address is: %p\n", base_addr);
bt();
// elf header is also in memory, but to find it I have to already have base address
Elf64_Ehdr * ehdr_addr = (Elf64_Ehdr *) base_addr;
printf("Entry from memory: %p\n", (void *) ehdr_addr->e_entry);
return 0;
}
Sample output:
Entry point from file: 0x10c0
&_start = 0x5648eeb150c0
base address is: 0x5648eeb14000
0x1321 0x13ee 0x29540f8ed09b 0x10ea
Entry from memory: 0x10c0
And then I can
$ addr2line -e a.out 0x1321 0x13ee 0x29540f8ed09b 0x10ea
/tmp/elf2.c:30
/tmp/elf2.c:45
??:0
??:?
How can I get base address without access to argv? I may need to print traces before main() (initialization of globals). Turning of ASLR or PIE is not an option.
How can I get base address without access to argv? I may need to print traces before main()
There are a few ways:
If /proc is mounted (which it almost always is), you could read the ELF header from /proc/self/exe.
You could use dladdr1(), as Antti Haapala's answer shows.
You could use _r_debug.r_map, which points to the linked list of loaded ELF images. The first entry in that list corresponds to a.out, and its l_addr contains the relocation you are looking for. This solution is equivalent to dladdr1, but doesn't require linking against libdl.
Could you provide sample code for 3?
Sure:
#include <link.h>
#include <stdio.h>
extern char _start;
int main()
{
uintptr_t relocation = _r_debug.r_map->l_addr;
printf("relocation: %p, &_start: %p, &_start - relocation: %p\n",
(void*)relocation, &_start, &_start - relocation);
return 0;
}
gcc -Wall -fPIE -pie t.c && ./a.out
relocation: 0x555d4995e000, &_start: 0x555d4995e5b0, &_start - relocation: 0x5b0
Are both 2 and 3 equally portable?
I think they are about equally portable: dladdr1 is a GLIBC extension that is also present on Solaris. _r_debug predates Linux and would also work on Solaris (I haven't actually checked, but I believe it will). It may work on other ELF platforms as well.
This piece of code produces the same value as your base_addr on Linux:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
Dl_info info;
void *extra = NULL;
dladdr1(&_start, &info, &extra, RTLD_DL_LINKMAP);
struct link_map *map = extra;
printf("%#llx", (unsigned long long)map->l_addr);
The dladdr1 manual page says the following of RTLD_DL_LINKMAP:
RTLD_DL_LINKMAP
Obtain a pointer to the link map for the matched file. The
extra_info argument points to a pointer to a link_map structure (i.e., struct link_map **), defined in as:
struct link_map {
ElfW(Addr) l_addr; /* Difference between the
address in the ELF file and
the address in memory */
char *l_name; /* Absolute pathname where
object was found */
ElfW(Dyn) *l_ld; /* Dynamic section of the
shared object */
struct link_map *l_next, *l_prev;
/* Chain of loaded objects */
/* Plus additional fields private to the
implementation */
};
Notice that -ldl is required to link against the dynamic loading routines.
I have such a funny problem I thought I'd share with you.
I cornered it down to the most little program I could :
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int cmd_left(char *name)
{
pid_t pid;
int f_d;
if ((pid = fork()) == -1)
{
perror("");
exit(1);
}
f_d = open(name);
printf("%d\n", f_d);
close(f_d);
}
int main(int ac, char **av, char **env)
{
char **dummy_env;
if (ac < 2)
return (0);
dummy_env = malloc(10);
cmd_left(av[1]);
}
Basically, if I remove the malloc, opening works just fine.
You just have to compile and give the program a (valid) file to see the magic.
open(2) takes at least two parameters. Since you are passing it only one argument, you are invoking Undefined Behavior. In this case, open() is just using some garbage as second argument.
You need #include <fcntl.h> to get a declaration for open() in scope, which would then tell you that you are not calling it with enough arguments:
int open(const char *filename, int flags, ...);
(The optional argument - singular - is the permissions for the file (mode_t perms) if you have O_CREAT amongst the options in the flags argument.)
The call to malloc() scribbles over enough stack to remove the zeroes on it initially, which leaves the 'extra arguments' to open() in a state where they are not zero and you run into problems.
Undefined behaviour - which you're invoking - can lead to any weird result.
Make sure you compile with at least 'gcc -Wall' and I recommend 'gcc -Wmissing-prototypes -Wstrict-prototypes -Wall -Wextra'.
The header file for open is missing and open expects at least a second parameter.
If you fix that it should be OK.
When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!
When loaded a shared library is opened via the function dlopen(), is there a way for it to call functions in main program?
Code of dlo.c (the lib):
#include <stdio.h>
// function is defined in main program
void callb(void);
void test(void) {
printf("here, in lib\n");
callb();
}
Compile with
gcc -shared -olibdlo.so dlo.c
Here the code of the main program (copied from dlopen manpage, and adjusted):
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void callb(void) {
printf("here, i'm back\n");
}
int
main(int argc, char **argv)
{
void *handle;
void (*test)(void);
char *error;
handle = dlopen("libdlo.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
*(void **) (&test) = dlsym(handle, "test");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
(*test)();
dlclose(handle);
exit(EXIT_SUCCESS);
}
Build with
gcc -ldl -rdynamic main.c
Output:
[js#HOST2 dlopen]$ LD_LIBRARY_PATH=. ./a.out
here, in lib
here, i'm back
[js#HOST2 dlopen]$
The -rdynamic option puts all symbols in the dynamic symbol table (which is mapped into memory), not only the names of the used symbols. Read further about it here. Of course you can also provide function pointers (or a struct of function pointers) that define the interface between the library and your main program. It's actually the method what i would choose probably. I heard from other people that it's not so easy to do -rdynamic in windows, and it also would make for a cleaner communication between library and main program (you've got precise control on what can be called and not), but it also requires more house-keeping.
Yes, If you provide your library a pointer to that function, I'm sure the library will be able to run/execute the function in the main program.
Here is an example, haven't compiled it so beware ;)
/* in main app */
/* define your function */
int do_it( char arg1, char arg2);
int do_it( char arg1, char arg2){
/* do it! */
return 1;
}
/* some where else in main app (init maybe?) provide the pointer */
LIB_set_do_it(&do_it);
/** END MAIN CODE ***/
/* in LIBRARY */
int (*LIB_do_it_ptr)(char, char) = NULL;
void LIB_set_do_it( int (*do_it_ptr)(char, char) ){
LIB_do_it_ptr = do_it_ptr;
}
int LIB_do_it(){
char arg1, arg2;
/* do something to the args
...
... */
return LIB_do_it_ptr( arg1, arg2);
}
The dlopen() function, as discussed by #litb, is primarily provided on systems using ELF format object files. It is rather powerful and will let you control whether symbols referenced by the loaded library can be satisfied from the main program, and generally does let them be satisfied. Not all shared library loading systems are as flexible - be aware if it comes to porting your code.
The callback mechanism outlined by #hhafez works now that the kinks in that code are straightened out.