How to programmatically list ELF shared library symbols

How to programmatically list ELF shared library symbols - linker

In my C shared library, I want to dlopen() another shared library and retrieve a list of the exported symbols this library has.
Is there a way I can do that programmatically, without running nm/objdump?
As a secondary question: How can I retrieve the base address where this second library got loaded after dlopen() - without knowing the names of any symbols (so I can't run dlsym!) and without reading /proc/self/maps?
I have tried the following:
struct link_map *imagehandle = (struct link_map*)dlopen(libraryname, RTLD_LOCAL | RTLD_LAZY);
void * fbase = (void*) imagehandle->l_addr;
printf("base addr is %p",fbase)
This prints
"base addr is 0x6862696c"
However, the library is not located there:
[ /proc/pid/maps output: ]
b6d27000-b6d28000 r-xp 00000000 1f:01 1581 mysecondlib.so
b6d28000-b6d29000 r--p 00000000 1f:01 1581 mysecondlib.so
b6d29000-b6d2a000 rw-p 00001000 1f:01 1581 mysecondlib.so
It has been suggested that l_addr is not the actual library base address ,but the offset from the executable header - but I am not sure how to find that header address.

Yes, you definitely can get the symbol table programmatically. I would suggest that instead of running dlopen() on the SO, you open it yourself via mmap or even just read the file into contiguous memory yourself. Once it's in memory, you can quite easily iterate through each section of the SO by following documentation here: http://linux.die.net/man/5/elf. Sections having sh_type equal to SHT_SYMTAB (symbol table) are what you are looking for.
IF you still need to find the base address of a loaded SO, I have not found a way to get it from dlopen() in my experience. The best way I have found is to call dladdr() on a known symbol in the SO, which populates a Dl_info structure that contains a dli_fbase member, which is the base address of the module which the symbol came from. IF you indeed do not know any symbol from that SO, you can use dl_iterate_phdr (http://linux.die.net/man/3/dl_iterate_phdr), which will iterate over all the loaded SOs in your process and give you a dl_phdr_info struct instance for each one. This struct contains the name, base address (dlpi_addr), and array of program headers.

Related

Get the shared library name from a function in the library

In linux, Is there a way to get the shared library name from one of its functions (or from any static library functions linked to it).
Basically, wanted to check if there is an API/variable available similar to program_invocation_short_name/program_invocation_name that is currently available for processes.

If you want to know if there is a dynamic symbol named "foo", use dlsym(RTLD_DEFAULT, "foo") to find out the address of such symbol, or NULL if there is no such dynamic symbol.
I don't know why you'd care about the shared library name, though.
When you have the address of a symbol, you can always read the /proc/self/maps pseudofile to find out which binary the symbol originates from. (If the symbol is in an r-- mapping, it is an immutable contant, like a string literal for example. If it is in an r-x mapping, it is in code, probably a function. It if is in an rw- mapping, it is a global variable.) Do note that because it is a pseudofile, it is part of the kernel binary interface, and never localized.

How can I calculate the adress of a dynamic symbol in a binary at runtime?

I am facing a difficulty with one of my program. I need to be able to tell when a CALL is executed, to what function symbol the address of the call is linked to.
I understand that with my own function I need to parse the .symtab section of my elf binary, but symbol linked to a shared library does not have any address.
I also parsed the .dyndsym .rela.dyn .rela.plt and .dynstr sections. With that much of information, I now know which function goes with which library.
I understand that the function address given in the shared object will not be the same as the function address in the binary. Also all my program (written in C) are compiled with the option "-fno-stack-protector".
My question is : Is there a way to calculate the address of a dynamic symbol before executing the binary ?

What does 'Segment type: Externs' mean in IDA?

I'm trying to analyse an dynamically linked 64-bit ELF file using IDA pro, and I find a segment with an extern tpye, which is right after the .bss, as follows
extern:00000000006021C0 ; Segment type: Externs
extern:00000000006021C0 ; extern
extern:00000000006021C0 ; void free(void *ptr)
extern:00000000006021C0 extrn free:near ; DATA XREF: .got.plt:off_602018o
However, when I debug it at the runtime using gdb, I find that this 'extern' segment contains ONLY ZERO! There isn't any valid data other than zero in this segment. Also, there is no descriptions about the permissions of this segment, it looks as if this segment doesn't even exist.
Since there is DATA XREF in GOT, maybe it has something to do with import functions? But I couldn't find relevant documents, I wonder how IDA recognizes it, and what it is exactly?
Thanks!

extern is not a real segment. It is a pseudo segment created by IDA to represent symbols with unknown addresses in other modules; the GOT usually contains pointers to those. During debugging it probably gets covered by .bss or stack area cleared by the OS loader, that's why you see zeroes there.

extern in the context of IDA is a bit different than in the context of C/C++.
In C/C++, the extern keyword is used to declare a variable/function/object that is not actually defined in the current object but will be available by the time the binary is linked. This is for when you define an array in one .c file and access it in multiple files, for example.
In the context of IDA, the externs section is used to describe a memory area defining APIs from .so/.dll files. This is usually the IAT in a PE and the GOT in an ELF file. When an object in an externs section has a name of a known API, IDA will automatically color it pink and add the prototype if available.

Printf Symbol Resolution

I'm writing a little program which trace all the syscall and calls of a binary file (elf) using ptrace (singlestep, getregs, pick_text, opcodes comparison, etc).
So far I've succeed to trace syscalls and simple calls like user defined functions.
But I failed to get the name of the printf symbol from the address I pick thanks to ptrace.
My question is: For dynamic linked function as printf, strlen, etc, how can I retrieve in the elf file the name of the symbol from the address ?
With simple calls it's kind of easy, I run through the .strtab section and when an address match I return the corresponding str.
But for printf, the symbol is known in the .strtab but has the address "0".
objdump -d somehow succeed to link a call to printf with its address.
Do you have any idea ?

I think you may need to read up a little more about dynamic linking. Let's take strlen as an example symbol as printf is a bit special (fortification stuff).
Your problem is (I think) that you want to take the address of a symbol and translate that back into an address. You're trying to do this by parsing the ELF file of the program you are debugging. This works with symbols that are in your program, but not with dynamically linked symbols such as strlen. And you want to know how to resolve that.
The reason for that is that the address of symbols such as strlen are not held within your ELF program. They are instead unresolved references that are resolved dynamically when the program loads. Indeed modern Linux will (I believe) load dynamic libraries (which contain relocatable aka position independent code) in a randomised order and at randomised addresses, so the location of those symbols won't be known until the program loads.
For libraries that you have opened with dlopen() (i.e. where you are doing the loading yourself in the program), you can retrieve the address of such symbols using dlsym(); that's not much good if they are linked into the program at compile/link time.
On gcc, to resolve the position of symbols in general, use the gcc extension dladdr(). From the man page:
The function dladdr() takes a function pointer and tries to
resolve name and file where it is located. Information is
stored in the Dl_info structure:
typedef struct {
const char *dli_fname; /* Pathname of shared object that
contains address */
void *dli_fbase; /* Address at which shared object
is loaded */
const char *dli_sname; /* Name of nearest symbol with address
lower than addr */
void *dli_saddr; /* Exact address of symbol named
in dli_sname */
} Dl_info;
If no symbol matching addr could be found, then dli_sname and
dli_saddr are set to NULL.
dladdr() returns 0 on error, and nonzero on success.
I believe that will work for you.
For further information, I suggest you look at the source to ltrace which traces library calls, and how backtrace_symbols (and here) works; note that particularly for non-global symbols this is going to be unreliable, and note the comment re adding -r dynamic to the link line.
You might also want to look at addr2line and its source.

Finding Statically Allocated Data Blocks in a Static Library

I've got a small static library (.a). In the static library is a pointer that points to a large, statically allocated, 1D array.
When I link my code to this library, the pointer's address is hardcoded in various locations, easily found through the disassembly. The issue is, I'd like my code to be able to have access to this array (the library is faulting, and I want to know why).
Naturally, it would be trivial to get that pointer by disassembling, hardcoding that address into my code, and then recompiling. That wouldn't be a problem except the library can be configured in different ways with other modules, and the array's pointer changes depending on what modules are linked in.
What are my options for getting that pointer? Because the starting state of the array is predictable, I could walk through memory, catching segfaults with a signal handler, until I found something that looks reasonable. Is there a better way?

Since your library is a .a archive, I'll assume you are on some kind of UNIX.
The global array should have a symbolic name associated with it. Your job would be easier or harder depending on what kind of symbol describes it.
If there is a global symbol describing this array, then you can just reference it directly, e.g.
extern char some_array[];
for (int i = 0; i < 100; i++) printf("%2d: 0x%2x\n", i, some_array[i]);
If the symbol is local, then you can first globalize it with objcopy --globalize-symbol=some_array, then proceed as above.
So how can you determine what is the symbol describing that array? Run objdump -dr foo.o, where foo.o contains instructions which you know reference that array. The relocation that will appear next to the referring instruction will tell you the name.
Finally, run nm foo.o | grep some_array. If you see 00000XX D some_array, you are done -- the array is globally visible (same for B). If you see 000XX d some_array, you need to globalize it first (likewise for b).
Update:
The -dr to objectdump didn't work
Right, because the symbol turned out to be local, the relocation probably referred to .bss + 0xNNN.
00000000006b5ec0 b grid
00000000006c8620 b grid
00000000006da4a0 b grid
00000000006ec320 b grid
00000000006fe1a0 b grid
You must have run nm on the final linked executable, not on individual foo.o objects inside your archive. There are five separate static arrays called grid in your binary, only the first one is the one you apparently care about.
declaring "extern int grid[];" and using it gives an undefined reference
That's expected for local symbols: the code in the library was something like:
// foo.c
static char grid[1000];
and you can't reference this grid from outside foo.o without globalizing the symbol first.
I'm not allowed to run a changed binary of the library on our server for security reasons
I hope you understand that that argument is total BS: if you can link your own code into that binary, then you can do anything on the server (subject to user-id restrictions); you are already trusted. Modifying third-party library should be the least worry of the server's admin if he doesn't trust you.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight