In linux, Is there a way to get the shared library name from one of its functions (or from any static library functions linked to it).
Basically, wanted to check if there is an API/variable available similar to program_invocation_short_name/program_invocation_name that is currently available for processes.
If you want to know if there is a dynamic symbol named "foo", use dlsym(RTLD_DEFAULT, "foo") to find out the address of such symbol, or NULL if there is no such dynamic symbol.
I don't know why you'd care about the shared library name, though.
When you have the address of a symbol, you can always read the /proc/self/maps pseudofile to find out which binary the symbol originates from. (If the symbol is in an r-- mapping, it is an immutable contant, like a string literal for example. If it is in an r-x mapping, it is in code, probably a function. It if is in an rw- mapping, it is a global variable.) Do note that because it is a pseudofile, it is part of the kernel binary interface, and never localized.
Related
I understand that shared libraries are loaded into memory and used by various programs.
How can a program know where in memory the library is?
When a shared library is used, there are two parts to the linkage process. At compile time, the linker program, ld in Linux, links against the shared library in order to learn which symbols are defined by it. However, none of the code or data initializers from the shared library are actually included in the ultimate a.out file. Instead, ld just records which dynamic libraries were linked against and the information is placed into an auxiliary section of the a.out file.
The second phase takes placed at execution time, before main gets invoked. The kernel loads a small helper program, ld.so, into the address space and this gets executed. Therefore, the start address of the program is not main or even _start (if you have heard of it). Rather, it is actually the start address of the dynamic library loader.
In Linux, the kernel maps the ld.so loader code into a convenient place in the precess address space and sets up the stack so that the list of required shared libraries (and other necessary info) is present. The dynamic loader finds each of the required libraries by looking at a sequence of directories which are often point in the LD_LIBRARY_PATH environment variable. There is also a pre-defined list which is hard-coded into ld.so (and additional search places can be hard-coded into the a.out during link time). For each of the libraries, the dynamic loader reads its header and then uses mmap to create memory regions for the library.
Now for the fun part.
Since the actual libraries used at run-time to satisfy the requirements are not known at link-time, we need to figure out a way to access functions defined in the shared library and global variables that are exported by the shared library (this practice is deprecated since exporting global variables is not thread-safe, but it is still something we try to handle).
Global variables are assigned a statics address at link time and are then accessed by absolute memory address.
For functions exported by the library, the user of the library is going to emit a series of call assembly instructions, which reference an absolute memory address. But, the exact absolute memory address of the referenced function is not known at link time. How do we deal with this?
Well, the linker creates what is known as a Procedure Linkage Table, which is a series of jmp (assembly jump) instructions. The target of the jump is filled in at run time.
Now, when dealing with the dynamic portions of the code (i.e. the .o files that have been compiled with -fpic), there are no absolute memory references whatsoever. In order to access global variables which are also visible to the static portion of the code, another table called the Global Offset Table is used. This table is an array of pointers. At link time, since the absolute memory addresses of the global variables are known, the linker populates this table. Then, at run time, dynamic code is able to access the global variables by first finding the Global Offset Table, then loading the address of the correct variable from the appropriate slot in the table, and finally dereferencing the pointer.
I am learning about working with shared libraries in C/C++ on Linux. I encountered a little problem that I don't know how to solve.
Let's say I have a shared library and an executable. However I don't know the library's name or file location (so I can't dlopen it). I can only find the address range where the library is mapped into my executable's memory.
Is there a way to programmatically get either the handle of the library (something like handle = dlopen(library_address)) or offset of a symbol within the library (something like address = dlsym(library_address, symbol_name))?
If you knew the library's name, you could just call dlopen again.
From the man page:
If the same shared object is loaded again with dlopen(), the same object handle is returned.
To discover the loaded modules, you can use dl_iterate_phdr().
You can also use dladdr() to inquire about a specific address.
I'm writing a little program which trace all the syscall and calls of a binary file (elf) using ptrace (singlestep, getregs, pick_text, opcodes comparison, etc).
So far I've succeed to trace syscalls and simple calls like user defined functions.
But I failed to get the name of the printf symbol from the address I pick thanks to ptrace.
My question is: For dynamic linked function as printf, strlen, etc, how can I retrieve in the elf file the name of the symbol from the address ?
With simple calls it's kind of easy, I run through the .strtab section and when an address match I return the corresponding str.
But for printf, the symbol is known in the .strtab but has the address "0".
objdump -d somehow succeed to link a call to printf with its address.
Do you have any idea ?
I think you may need to read up a little more about dynamic linking. Let's take strlen as an example symbol as printf is a bit special (fortification stuff).
Your problem is (I think) that you want to take the address of a symbol and translate that back into an address. You're trying to do this by parsing the ELF file of the program you are debugging. This works with symbols that are in your program, but not with dynamically linked symbols such as strlen. And you want to know how to resolve that.
The reason for that is that the address of symbols such as strlen are not held within your ELF program. They are instead unresolved references that are resolved dynamically when the program loads. Indeed modern Linux will (I believe) load dynamic libraries (which contain relocatable aka position independent code) in a randomised order and at randomised addresses, so the location of those symbols won't be known until the program loads.
For libraries that you have opened with dlopen() (i.e. where you are doing the loading yourself in the program), you can retrieve the address of such symbols using dlsym(); that's not much good if they are linked into the program at compile/link time.
On gcc, to resolve the position of symbols in general, use the gcc extension dladdr(). From the man page:
The function dladdr() takes a function pointer and tries to
resolve name and file where it is located. Information is
stored in the Dl_info structure:
typedef struct {
const char *dli_fname; /* Pathname of shared object that
contains address */
void *dli_fbase; /* Address at which shared object
is loaded */
const char *dli_sname; /* Name of nearest symbol with address
lower than addr */
void *dli_saddr; /* Exact address of symbol named
in dli_sname */
} Dl_info;
If no symbol matching addr could be found, then dli_sname and
dli_saddr are set to NULL.
dladdr() returns 0 on error, and nonzero on success.
I believe that will work for you.
For further information, I suggest you look at the source to ltrace which traces library calls, and how backtrace_symbols (and here) works; note that particularly for non-global symbols this is going to be unreliable, and note the comment re adding -r dynamic to the link line.
You might also want to look at addr2line and its source.
I'm writing some C code to hook some function of .so ELF (shared-library) loaded into memory.
My C code should be able to re-direct an export function of another .so library that was loaded into the app/program's memory.
Here's a bit of elaboration:
Android app will have multiple .so files loaded. My C code has to look through export function that belongs to another shared .so library (called target.so in this case)
This is not a regular dlsym approach because I don't just want address of a function but I want to replace it with my own fuction; in that: when another library makes the call to its own function then instead my hook_func gets called, and then from my hook_func I should call the original_func.
For import functions this can work. But for export functions I'm not sure how to do it.
Import functions have the entries in the symbol table that have corresponding entry in relocation table that eventually gives the address of entry in global offset table (GOT).
But for the export functions, the symbol's st_value element itself has address of the procedure and not GOT address (correct me if I'm wrong).
How do I perform the hooking for the export function?
Theoretically speaking, I should get the memory location of the st_value element of dynamic symbol table entry ( Elf32_Sym ) of export function. If I get that location then I should be able to replace the value in that location with my hook_func's address. However, I'm not able to write into this location so far. I have to assume the dynamic symbol table's memory is read-only. If that is true then what is the workaround in that case?
Thanks a lot for reading and helping me out.
Update: LD_PRELOAD can only replace the original functions with my own, but then I'm not sure if there any way to call the originals.
In my case for example:
App initializes the audio engine by calling Audio_System_Create and passes a reference of AUDIO_SYSTEM object to Audio_System_Create(AUDIO_SYSTEM **);
AUDIO API allocates this struct/object and function returns.
Now if only I could access that AUDIO_SYSTEM object, I would easily attach a callback to this object and start receiving audio data.
Hence, my ultimate goal is to get the reference to AUIOD_SYSTEM object; and in my understanding, I can only get that if I intercept the call where that object is first getting allocated through Audio_System_Create(AUIOD_SYSTEM **).
Currently there is no straight way to grab the output audio at android. (all examples talk about recording audio that comes from microphone only)
Update2:
As advised by Basile in his answer, I made use of dladdr() but strangely enough it gives me the same address as I pass to it.
void *pFunc=procedure_addr; //procedure address calculated from the st_value of symbol from symbol table in ELF file (not from loaded file)
int nRet;
// Lookup the name of the function given the function pointer
if ((nRet = dladdr(pFunc, &DlInfo)) != 0)
{
LOGE("Symbol Name is: %s", DlInfo.dli_sname);
if(DlInfo.dli_saddr==NULL)
LOGE("Symbol Address is: NULL");
else
LOGE("Symbol Address is: 0x%x", DlInfo.dli_saddr);
}
else
LOGE("dladdr failed");
Here's the result I get:
entry_addr =0x75a28cfc
entry_addr_through_dlysm =0x75a28cfc
Symbol Name is: AUDIO_System_Create
Symbol Address is: 0x75a28cfc
Here address obtained through dlysm or calculated through ELF file is the address of procedure; while I need the location where this address itself is; so that I can replace this address with my hook_func address. dladdr() didn't do what I thought it will do.
You should read in details Drepper's paper: how to write shared libraries - notably to understand why using LD_PRELOADis not enough. You may want to study the source code of the dynamic linker (ld-linux.so) inside your libc. You might try to change with mprotect(2) and/or mmap(2) and/or mremap(2) the relevant pages. You can query the memory mapping thru proc(5) using /proc/self/maps & /proc/self/smaps. Then you could, in an architecture-specific way, replace the starting bytes (perhaps using asmjit or GNU lightning) of the code of original_func by a jump to your hook_func function (which you might need to change its epilogue, to put the overwritten instructions -originally at original_func- there...)
Things might be slightly easier if original_func is well known and always the same. You could then study its source and assembler code, and write the patching function and your hook_func only for it.
Perhaps using dladdr(3) might be helpful too (but probably not).
Alternatively, hack your dynamic linker to change it for your needs. You might study the source code of musl-libc
Notice that you probably need to overwrite the machine code at the address of original_func (as given by dlsym on "original_func"). Alternatively, you'll need to relocate every occurrence of calls to that function in all the already loaded shared objects (I believe it is harder; if you insist see dl_iterate_phdr(3)).
If you want a generic solution (for an arbitrary original_func) you'll need to implement some binary code analyzer (or disassembler) to patch that function. If you just want to hack a particular original_func you should disassemble it, and patch its machine code, and have your hook_func do the part of original_func that you have overwritten.
Such horrible and time consuming hacks (you'll need weeks to make it work) make me prefer using free software (since then, it is much simpler to patch the source of the shared library and recompile it).
Of course, all this isn't easy. You need to understand in details what ELF shared objects are, see also elf(5) and read Levine's book: Linkers and Loaders
NB: Beware, if you are hacking against a proprietary library (e.g. unity3d), what you are trying to achieve might be illegal. Ask a lawyer. Technically, you are violating most abstractions provided by shared libraries. If possible, ask the author of the shared library to give help and perhaps implement some plugin machinery in it.
I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).