In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.
Related
I would like to change the global structures in an ELF file.
I would first like to know whether a global variable is a structure or not?
But the only information I see is the name and the size of the variable.
You cannot find it. The ELF format has a very limited type information (and does not know if some global variable is a struct or an int; it knows mostly the sizeof the variable). See elf(5).
However, if you compile your program (and the libraries it is using) for debug support (e.g. with g++ -Wall -g). then the ELF file contains additional debug sections, often using the DWARF format. These information can be removed from an ELF file using the strip command. So see strip(1) & readelf(1) & objdump(1).
But you really want to use the source code. So get the source of the program (since Linux is mostly made of free software, this is generally possible) and recompile it (and of course study the source code of the program). Perhaps you want (but that does require weeks of work) to customize your compiler (e.g. by writing your own GCC plugin, or using GCC MELT, or customizing your Clang compiler) and use that to recompile the code (and process specifically all global variable declarations).
In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.
to get opcodes author here does following:
[bodo#bakawali testbed8]$ as testshell2.s -o testshell2.o
[bodo#bakawali testbed8]$ ld testshell2.o -o testshell2
[bodo#bakawali testbed8]$ objdump -d testshell2
and then he gets three sections (or mentions only these 3):
<_start>
< starter>
< ender>
I have tried to get hex opcodes the same way but cannot ld correctly. Of course I can produce .o and prog file for example with:
gcc main.o -o prog -g
however when
objdump --prefix-addresses --show-raw-insn -Srl prog
to see complete code with annotations and symbols, I have many additional sections there, for example:
.init
.plt
.text (yes, I know, main is here) [many parts here: _start(), call_gmon_start(), __do_global_dtors_aux(), frame_dummy(), main(), __libc_csu_init(), __libc_csu_fini(), __do_global_ctors_aux()]
.fini
I assume these are additions introduced by gcc linking to runtime libraries. I think i don't need these all sections to call opcode from c code (author uses only those 3 sections) however my problem is I don't know which exactly I might discard and which are necessary. I want to use it like this:
#include <unistd.h>
char code[] = "\x31\xed\x49\x89\x...x00\x00";
int main(int argc, char **argv)
{
/*creating a function pointer*/
int (*func)();
func = (int (*)()) code;
(int)(*func)();
return 0;
}
so I have created this :
#include <unistd.h>
/*
*
*/
int main() {
char *shell[2];
shell[0] = "/bin/sh";
shell[1] = NULL;
execve(shell[0], shell, NULL);
return 0;
}
and I did disassembly as I described. I tried to use opcode from .text main(), this gave me segmentation fault, then .text main() + additionally .text _start(), with same result.
So, what to choose from above sections, or how to generate only as minimized "prog" as with three sections?
char code[] = "\x31\xed\x49\x89\x...x00\x00";
This will not work.
Reason: The code definitely contains adresses. Mainly the address of the function execve() and the address of the string constant "/bin/sh".
The executable using the "code[]" approach will not contain a string constant "/bin/sh" at all and the address of the function execve() will be different (if the function will be linked into the executable at all).
Therefore the "call" instruction to the "execve()" function will jump to anywhere in the executable using the "code[]" approach.
Some theory about executables - just for your information:
There are two possibilities for executables:
Statically linked: These executables contain all necessary code. Therefore they do not access dynamic libraries like "libc.so"
Dynamically linked: These executables do not contain code that is frequently used. Such code is stored in files common to all executables: The dynamic libraries (e.g. "libc.so")
When the same C code is used then statically linked executables are much bigger than dynamically linked executables because all C functions (e.g. "printf", "execve", ...) must be bundled into the executable.
When not using any of these library functions the statically linked executables are simpler and therefore easier to understand.
Statically linked executable behaviour
A statically linked executable is loaded into the memory by the operating system (when it is started using execve()). The executable contains an entry point address. This address is stored in the file header of the executable. You can see it using "objdump -h ...".
The operating system performs a jump to that address so the program execution starts at this address. The address is typically the function "_start" however this can be changed using command line options when linking using "ld".
The code at "_start" will prepare the executable (e.g. initialize variables, calculate the values for "argc" and "argv", ...) and call the "main()" function. When "main()" returns the "_start" function will pass the value returned by "main()" to the "_exit()" function.
Dynamically linked executable behaviour
Such executables contain two additional sections. The first section contains the file name of the dynamic linker (maybe. "/lib/ld-linux.so.1"). The operating system will then load the executable and the dynamic linker and jump to the entry point of the dynamic linker (and not to that of the executable).
The dynamic linker will read the second additional section: It contains information about dynamic libraries (e.g. "libc.so") required by the executable. It will load all these libraries and initialize a lot of variables. Then it calls the initialization function ("_init()") of all libraries and of the executable.
Note that both the operating system and the dynamic linker ignore the function and section names! The address of the entry point is taken from the file header and the addresses of the "_init()" functions is taken from the additional section - the functions may be named differently!
When all this is done the dynamic linker will jump to the entry point ("_start") of the executable.
About the "GOT", "PLT", ... sections:
These sections contain information about the addresses where the dynamic libraries have been loaded by the linker. The "PLT" section contains wrapper code that will contain jumps to the dynamic libraries. This means: The section "PLT" will contain a function "printf()" that will actually do nothing but jump to the "printf()" function in "libc.so". This is done because directly calling a function in a dynamic library from C code would make linking much more difficult so C code will not call functions in a dynamic library directly. Another advantage of this implementation is that "lazy linking" is possible.
Some words about Windows
Windows only knows dynamically linked executables. Windows XP even refused to load an executable not requiring DLLs. The "dynamic linker" is integrated into the operating system and not a separate file. There is also an equivalent of the "PLT" section. However many compilers support "directly" calling DLL code from C code without calling the code in the PLT section first (theoretically this would also be possible under Linux). Lazy linking is not supported.
You should read this article: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.
It explains all you need to create really tiny program in great detail.
The title is clear, we can loaded a library by dl_open etc..
But how can I get the signature of functions in it?
This answer cannot be answered in general. Technically if you compiled your executable with exhaustive debugging information (code may still be an optimized, release version), then the executable will contain extra sections, providing some kind of reflectivity of the binary. On *nix systems (you referred to dl_open) this is implemented through DWARF debugging data in extra sections of the ELF binary. Similar it works for Mach Universal Binaries on MacOS X.
Windows PEs however uses a completely different format, so unfortunately DWARF is not truley cross plattform (actually in the early development stages of my 3D engine I implemented an ELF/DWARF loader for Windows, so that I could use a common format for the engines various modules, so with some serious effort such can be done).
If you don't want to go into implementing your own loaders, or debugging information accessors, then you may embed the reflection information through some extra symbols exported (by some standard naming scheme) which refer to a table of function names, mapping to their signature. In the case of C source files writing a parser to extract the information from the source file itself is rather trivial. C++ OTOH is so notoriously difficult to parse correctly, that you need some fully fledged compiler to get it right. For this purpose GCCXML was developed, technically a GCC that emits the AST in XML form instead of an object binary. The emitted XML then is much easier to parse.
From the extracted information create a source file with some kind of linked list/array/etc. structure describing each function. If you don't directly export each function's symbol but instead initialize some field in the reflection structure with the function pointer you got a really nice and clean annotated exporting scheme. Technically you could place this information in a spearate section of the binary as well, but putting it in the read only data section does the job as well, too.
However if you're given a 3rd party binary – say worst case scenario it has been compiled from C source, no debugging information and all symbols not externally referenced stripped – you're pretty much screwed. The best you could do, was applying some binary analysis of the way the function accesses the various places in which parameters can be passed.
This will only tell you the number of parameters and the size of each parameter value, but not the type or name/meaning. When reverse engineering some program (e.g. malware analysis or security audit), identifying the type and meaning of the parameters passed to functions is one of the major efforts. Recently I came across some driver I had to reverse for debugging purposes, and you cannot believe how astounded I was by the fact that I found C++ symbols in a Linux kernel module (you can't use C++ in the Linux kernel in a sane way), but also relieved, because the C++ name mangling provided me with plenty information.
On Linux (or Mac) you can use a combination of "nm" and "c++filt" (for C++ libraries)
nm mylibrary.so | c++filt
or
nm mylibrary.a | c++filt
"nm" will give you the mangled form and "c++filt" attempts to put them in a more human-readable format. You might want to use some options in nm to filter down the results, especially if the library is large (or you can "grep" the final output to find a particular item)
No this is not possible. Signature of a function doesn't mean anything at runtime, its a piece of information useful at compile time for the compiler to validate your program.
You can't. Either the library publishes a public API in a header, or you need to know the signature by some other means.
The parameters of a function in the lower level depends on how many stack arguments in the stack frame you consider and how you interpret them. Therefore once the function is compiled into object code it is not possible to get the signature like that. One remote possibility is to disassemble the code and read how it function is working to know the number if parameters, but still the type would be difficult or impossible to determine. In a word, it is not possible.
This information is not available. Not even the debugger knows:
$ cat foo.c
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char foo[10] = { 0 };
char bar[10] = { 0 };
printf("%s\n", "foo");
memcpy(bar, foo, sizeof(foo));
return 0;
}
$ gcc -g -o foo foo.c
$ gdb foo
Reading symbols from foo...done.
(gdb) b main
Breakpoint 1 at 0x4005f3: file foo.c, line 5.
(gdb) r
Starting program: foo
Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at foo.c:5
5 {
(gdb) ptype printf
type = int ()
(gdb) ptype memcpy
type = int ()
(gdb)
I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).