I'm writing a little program which trace all the syscall and calls of a binary file (elf) using ptrace (singlestep, getregs, pick_text, opcodes comparison, etc).
So far I've succeed to trace syscalls and simple calls like user defined functions.
But I failed to get the name of the printf symbol from the address I pick thanks to ptrace.
My question is: For dynamic linked function as printf, strlen, etc, how can I retrieve in the elf file the name of the symbol from the address ?
With simple calls it's kind of easy, I run through the .strtab section and when an address match I return the corresponding str.
But for printf, the symbol is known in the .strtab but has the address "0".
objdump -d somehow succeed to link a call to printf with its address.
Do you have any idea ?
I think you may need to read up a little more about dynamic linking. Let's take strlen as an example symbol as printf is a bit special (fortification stuff).
Your problem is (I think) that you want to take the address of a symbol and translate that back into an address. You're trying to do this by parsing the ELF file of the program you are debugging. This works with symbols that are in your program, but not with dynamically linked symbols such as strlen. And you want to know how to resolve that.
The reason for that is that the address of symbols such as strlen are not held within your ELF program. They are instead unresolved references that are resolved dynamically when the program loads. Indeed modern Linux will (I believe) load dynamic libraries (which contain relocatable aka position independent code) in a randomised order and at randomised addresses, so the location of those symbols won't be known until the program loads.
For libraries that you have opened with dlopen() (i.e. where you are doing the loading yourself in the program), you can retrieve the address of such symbols using dlsym(); that's not much good if they are linked into the program at compile/link time.
On gcc, to resolve the position of symbols in general, use the gcc extension dladdr(). From the man page:
The function dladdr() takes a function pointer and tries to
resolve name and file where it is located. Information is
stored in the Dl_info structure:
typedef struct {
const char *dli_fname; /* Pathname of shared object that
contains address */
void *dli_fbase; /* Address at which shared object
is loaded */
const char *dli_sname; /* Name of nearest symbol with address
lower than addr */
void *dli_saddr; /* Exact address of symbol named
in dli_sname */
} Dl_info;
If no symbol matching addr could be found, then dli_sname and
dli_saddr are set to NULL.
dladdr() returns 0 on error, and nonzero on success.
I believe that will work for you.
For further information, I suggest you look at the source to ltrace which traces library calls, and how backtrace_symbols (and here) works; note that particularly for non-global symbols this is going to be unreliable, and note the comment re adding -r dynamic to the link line.
You might also want to look at addr2line and its source.
Related
I'm writing some C code to hook some function of .so ELF (shared-library) loaded into memory.
My C code should be able to re-direct an export function of another .so library that was loaded into the app/program's memory.
Here's a bit of elaboration:
Android app will have multiple .so files loaded. My C code has to look through export function that belongs to another shared .so library (called target.so in this case)
This is not a regular dlsym approach because I don't just want address of a function but I want to replace it with my own fuction; in that: when another library makes the call to its own function then instead my hook_func gets called, and then from my hook_func I should call the original_func.
For import functions this can work. But for export functions I'm not sure how to do it.
Import functions have the entries in the symbol table that have corresponding entry in relocation table that eventually gives the address of entry in global offset table (GOT).
But for the export functions, the symbol's st_value element itself has address of the procedure and not GOT address (correct me if I'm wrong).
How do I perform the hooking for the export function?
Theoretically speaking, I should get the memory location of the st_value element of dynamic symbol table entry ( Elf32_Sym ) of export function. If I get that location then I should be able to replace the value in that location with my hook_func's address. However, I'm not able to write into this location so far. I have to assume the dynamic symbol table's memory is read-only. If that is true then what is the workaround in that case?
Thanks a lot for reading and helping me out.
Update: LD_PRELOAD can only replace the original functions with my own, but then I'm not sure if there any way to call the originals.
In my case for example:
App initializes the audio engine by calling Audio_System_Create and passes a reference of AUDIO_SYSTEM object to Audio_System_Create(AUDIO_SYSTEM **);
AUDIO API allocates this struct/object and function returns.
Now if only I could access that AUDIO_SYSTEM object, I would easily attach a callback to this object and start receiving audio data.
Hence, my ultimate goal is to get the reference to AUIOD_SYSTEM object; and in my understanding, I can only get that if I intercept the call where that object is first getting allocated through Audio_System_Create(AUIOD_SYSTEM **).
Currently there is no straight way to grab the output audio at android. (all examples talk about recording audio that comes from microphone only)
Update2:
As advised by Basile in his answer, I made use of dladdr() but strangely enough it gives me the same address as I pass to it.
void *pFunc=procedure_addr; //procedure address calculated from the st_value of symbol from symbol table in ELF file (not from loaded file)
int nRet;
// Lookup the name of the function given the function pointer
if ((nRet = dladdr(pFunc, &DlInfo)) != 0)
{
LOGE("Symbol Name is: %s", DlInfo.dli_sname);
if(DlInfo.dli_saddr==NULL)
LOGE("Symbol Address is: NULL");
else
LOGE("Symbol Address is: 0x%x", DlInfo.dli_saddr);
}
else
LOGE("dladdr failed");
Here's the result I get:
entry_addr =0x75a28cfc
entry_addr_through_dlysm =0x75a28cfc
Symbol Name is: AUDIO_System_Create
Symbol Address is: 0x75a28cfc
Here address obtained through dlysm or calculated through ELF file is the address of procedure; while I need the location where this address itself is; so that I can replace this address with my hook_func address. dladdr() didn't do what I thought it will do.
You should read in details Drepper's paper: how to write shared libraries - notably to understand why using LD_PRELOADis not enough. You may want to study the source code of the dynamic linker (ld-linux.so) inside your libc. You might try to change with mprotect(2) and/or mmap(2) and/or mremap(2) the relevant pages. You can query the memory mapping thru proc(5) using /proc/self/maps & /proc/self/smaps. Then you could, in an architecture-specific way, replace the starting bytes (perhaps using asmjit or GNU lightning) of the code of original_func by a jump to your hook_func function (which you might need to change its epilogue, to put the overwritten instructions -originally at original_func- there...)
Things might be slightly easier if original_func is well known and always the same. You could then study its source and assembler code, and write the patching function and your hook_func only for it.
Perhaps using dladdr(3) might be helpful too (but probably not).
Alternatively, hack your dynamic linker to change it for your needs. You might study the source code of musl-libc
Notice that you probably need to overwrite the machine code at the address of original_func (as given by dlsym on "original_func"). Alternatively, you'll need to relocate every occurrence of calls to that function in all the already loaded shared objects (I believe it is harder; if you insist see dl_iterate_phdr(3)).
If you want a generic solution (for an arbitrary original_func) you'll need to implement some binary code analyzer (or disassembler) to patch that function. If you just want to hack a particular original_func you should disassemble it, and patch its machine code, and have your hook_func do the part of original_func that you have overwritten.
Such horrible and time consuming hacks (you'll need weeks to make it work) make me prefer using free software (since then, it is much simpler to patch the source of the shared library and recompile it).
Of course, all this isn't easy. You need to understand in details what ELF shared objects are, see also elf(5) and read Levine's book: Linkers and Loaders
NB: Beware, if you are hacking against a proprietary library (e.g. unity3d), what you are trying to achieve might be illegal. Ask a lawyer. Technically, you are violating most abstractions provided by shared libraries. If possible, ask the author of the shared library to give help and perhaps implement some plugin machinery in it.
I modified the io-packet of qnx and calculating a timestamp in the recieve.c file at ip layer.
CODE:
uint64_t ipStart_time, IPLatency;
EXPORT_SYMBOL(IPLatency); //I am using this in Linux
void rtl_receive ()
{
ipStart_time = clock_cycles();
IPLatency = ipStart_time;
}
I want to read that timestamp in my user program:
So I did :
code:
extern uint64_t IPLatency;
But it is showing error: undefined reference to IPLatency
The extern keyword informs the compiler that it should expect a function or variable (symbol) to be defined in another of the linking objects. For a function this means that the compiler will not give an error if a function has not been implemented.
The error you get, undefined reference, indicates that the linker cannot find a exported symbol with that signature in any of the object files included in the linking.
I'm guessing a little bit here because there isn't a lot of information about where these respective files are. With that as a disclaimer here goes ....
You have edited a file that belongs to the Operating System Kernel and exported the symbol. That just makes the symbol visible to kernel modules that wish to link with the kernel and not to a userspace program. Your userspace program is not linked to the kernel and thus when you attempt to link the linker cannot find a reference to IPLatency and it exits.
The extern keyword tells the compiler that this symbol is external to this file, and so just assume it exists and let the linker worry about it. The purpose of the extern definition is so the compiler knows what the type of this variable is, and not to reserve memory for it. The linker needs to find all the symbols in order to turn a symbol in to an actual reference to the correct memory location. So an extern variable needs to be declared somewhere in the program that you are trying to link.
What you are attempting to do is transfer information from kernel space to userspace. This is going to be much more difficult, involving either a new system call or by making the information available via some other mechanism (e.g. sysfs). You will probably need to research some more and then ask another question about that (and the answer to that question will probably be beyond me).
The answer might be short: EXPORT_SYMBOL is used for exporting symbols inside the kernel, e.g. to other kernel modules. Your user space program would not have access to it, thus your linker would report a undefined reference error.
Suggestion: send the data via things like copy_to_user or write the information in /proc and let the userspace program read from it.
to get opcodes author here does following:
[bodo#bakawali testbed8]$ as testshell2.s -o testshell2.o
[bodo#bakawali testbed8]$ ld testshell2.o -o testshell2
[bodo#bakawali testbed8]$ objdump -d testshell2
and then he gets three sections (or mentions only these 3):
<_start>
< starter>
< ender>
I have tried to get hex opcodes the same way but cannot ld correctly. Of course I can produce .o and prog file for example with:
gcc main.o -o prog -g
however when
objdump --prefix-addresses --show-raw-insn -Srl prog
to see complete code with annotations and symbols, I have many additional sections there, for example:
.init
.plt
.text (yes, I know, main is here) [many parts here: _start(), call_gmon_start(), __do_global_dtors_aux(), frame_dummy(), main(), __libc_csu_init(), __libc_csu_fini(), __do_global_ctors_aux()]
.fini
I assume these are additions introduced by gcc linking to runtime libraries. I think i don't need these all sections to call opcode from c code (author uses only those 3 sections) however my problem is I don't know which exactly I might discard and which are necessary. I want to use it like this:
#include <unistd.h>
char code[] = "\x31\xed\x49\x89\x...x00\x00";
int main(int argc, char **argv)
{
/*creating a function pointer*/
int (*func)();
func = (int (*)()) code;
(int)(*func)();
return 0;
}
so I have created this :
#include <unistd.h>
/*
*
*/
int main() {
char *shell[2];
shell[0] = "/bin/sh";
shell[1] = NULL;
execve(shell[0], shell, NULL);
return 0;
}
and I did disassembly as I described. I tried to use opcode from .text main(), this gave me segmentation fault, then .text main() + additionally .text _start(), with same result.
So, what to choose from above sections, or how to generate only as minimized "prog" as with three sections?
char code[] = "\x31\xed\x49\x89\x...x00\x00";
This will not work.
Reason: The code definitely contains adresses. Mainly the address of the function execve() and the address of the string constant "/bin/sh".
The executable using the "code[]" approach will not contain a string constant "/bin/sh" at all and the address of the function execve() will be different (if the function will be linked into the executable at all).
Therefore the "call" instruction to the "execve()" function will jump to anywhere in the executable using the "code[]" approach.
Some theory about executables - just for your information:
There are two possibilities for executables:
Statically linked: These executables contain all necessary code. Therefore they do not access dynamic libraries like "libc.so"
Dynamically linked: These executables do not contain code that is frequently used. Such code is stored in files common to all executables: The dynamic libraries (e.g. "libc.so")
When the same C code is used then statically linked executables are much bigger than dynamically linked executables because all C functions (e.g. "printf", "execve", ...) must be bundled into the executable.
When not using any of these library functions the statically linked executables are simpler and therefore easier to understand.
Statically linked executable behaviour
A statically linked executable is loaded into the memory by the operating system (when it is started using execve()). The executable contains an entry point address. This address is stored in the file header of the executable. You can see it using "objdump -h ...".
The operating system performs a jump to that address so the program execution starts at this address. The address is typically the function "_start" however this can be changed using command line options when linking using "ld".
The code at "_start" will prepare the executable (e.g. initialize variables, calculate the values for "argc" and "argv", ...) and call the "main()" function. When "main()" returns the "_start" function will pass the value returned by "main()" to the "_exit()" function.
Dynamically linked executable behaviour
Such executables contain two additional sections. The first section contains the file name of the dynamic linker (maybe. "/lib/ld-linux.so.1"). The operating system will then load the executable and the dynamic linker and jump to the entry point of the dynamic linker (and not to that of the executable).
The dynamic linker will read the second additional section: It contains information about dynamic libraries (e.g. "libc.so") required by the executable. It will load all these libraries and initialize a lot of variables. Then it calls the initialization function ("_init()") of all libraries and of the executable.
Note that both the operating system and the dynamic linker ignore the function and section names! The address of the entry point is taken from the file header and the addresses of the "_init()" functions is taken from the additional section - the functions may be named differently!
When all this is done the dynamic linker will jump to the entry point ("_start") of the executable.
About the "GOT", "PLT", ... sections:
These sections contain information about the addresses where the dynamic libraries have been loaded by the linker. The "PLT" section contains wrapper code that will contain jumps to the dynamic libraries. This means: The section "PLT" will contain a function "printf()" that will actually do nothing but jump to the "printf()" function in "libc.so". This is done because directly calling a function in a dynamic library from C code would make linking much more difficult so C code will not call functions in a dynamic library directly. Another advantage of this implementation is that "lazy linking" is possible.
Some words about Windows
Windows only knows dynamically linked executables. Windows XP even refused to load an executable not requiring DLLs. The "dynamic linker" is integrated into the operating system and not a separate file. There is also an equivalent of the "PLT" section. However many compilers support "directly" calling DLL code from C code without calling the code in the PLT section first (theoretically this would also be possible under Linux). Lazy linking is not supported.
You should read this article: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.
It explains all you need to create really tiny program in great detail.
Is there good documentation of what happen when I run some executable in Linux. For example: I start ./a.out, so probably some bootloader assembly is run (come with c runtime?), and it finds start symbol in program, doing dynamic relocation, finally call main.
I know the above is not correct, but looking for detailed documentation of how this process happen. Can you please explain, or point to links or books that do?
For dynamic linked programs, the kernel detects the PT_INTERP header in the ELF file and first mmaps the dynamic linker (/lib/ld-linux.so.2 or similar), and starts execution at the e_entry address from the main ELF header of the dynamic linker. The initial state of the stack contains the information the dynamic linker needs to find the main program binary (already in memory). It's responsible for reading this and finding all the additional libraries that must be loaded, loading them, performing relocations, and jumping to the e_entry address of the main program.
For static linked programs, the kernel uses the e_entry address from the main program's ELF header directly.
In either case, the main program begins with a routine written in assembly traditionally called _start (but the name is not important as long as its address is in the e_entry field of the ELF header). It uses the initial stack contents to determine argc, argv, environ, etc. and calls the right implementation-internal functions (usually written in C) to run global constructors (if any) and perform any libc initialization needed prior to the entry to main. This usually ends with a call to exit(main(argc, argv)); or equivalent.
A book "Linker and Loader" gives a detail description about the loading process. Maybe it can give you some help on the problem.
I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).