I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).
Related
I understand that shared libraries are loaded into memory and used by various programs.
How can a program know where in memory the library is?
When a shared library is used, there are two parts to the linkage process. At compile time, the linker program, ld in Linux, links against the shared library in order to learn which symbols are defined by it. However, none of the code or data initializers from the shared library are actually included in the ultimate a.out file. Instead, ld just records which dynamic libraries were linked against and the information is placed into an auxiliary section of the a.out file.
The second phase takes placed at execution time, before main gets invoked. The kernel loads a small helper program, ld.so, into the address space and this gets executed. Therefore, the start address of the program is not main or even _start (if you have heard of it). Rather, it is actually the start address of the dynamic library loader.
In Linux, the kernel maps the ld.so loader code into a convenient place in the precess address space and sets up the stack so that the list of required shared libraries (and other necessary info) is present. The dynamic loader finds each of the required libraries by looking at a sequence of directories which are often point in the LD_LIBRARY_PATH environment variable. There is also a pre-defined list which is hard-coded into ld.so (and additional search places can be hard-coded into the a.out during link time). For each of the libraries, the dynamic loader reads its header and then uses mmap to create memory regions for the library.
Now for the fun part.
Since the actual libraries used at run-time to satisfy the requirements are not known at link-time, we need to figure out a way to access functions defined in the shared library and global variables that are exported by the shared library (this practice is deprecated since exporting global variables is not thread-safe, but it is still something we try to handle).
Global variables are assigned a statics address at link time and are then accessed by absolute memory address.
For functions exported by the library, the user of the library is going to emit a series of call assembly instructions, which reference an absolute memory address. But, the exact absolute memory address of the referenced function is not known at link time. How do we deal with this?
Well, the linker creates what is known as a Procedure Linkage Table, which is a series of jmp (assembly jump) instructions. The target of the jump is filled in at run time.
Now, when dealing with the dynamic portions of the code (i.e. the .o files that have been compiled with -fpic), there are no absolute memory references whatsoever. In order to access global variables which are also visible to the static portion of the code, another table called the Global Offset Table is used. This table is an array of pointers. At link time, since the absolute memory addresses of the global variables are known, the linker populates this table. Then, at run time, dynamic code is able to access the global variables by first finding the Global Offset Table, then loading the address of the correct variable from the appropriate slot in the table, and finally dereferencing the pointer.
In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.
I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13
This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.
I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.
To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so
In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.
I've got a worked binary used in embeded system. Now i want to write a some kind of patch for it. The patch will be loaded into a RAM bellow the main program and then will be called from main program. The question is how to tell gcc to use manually setted addresses of some function which will be used from patch. in other words:
Old code has function sin() and i could use nm to find out the address of sin() in old code. My patched code will use sin() (or something else from main programm) and i want to tell the gcc (or maybe ld or maybe something else) for it to use the static address of function sin() while it linking the patched code. is it possible?
The problem is that you would gave to replace all references to the original sin() function for the patched code. That would require the runtime system to contain all the object code data used to resolve references, and for the original code to be modifiable (i.e. not in ROM for example).
Windriver's RTOS VxWorks can do something close to what you are suggesting; the way it does it is you use "partial linking" (GNU linker option -r) to generate an object file with links that will be resolved at runtime - this allows an object file to be created with unresolved links - i.e. an incomplete executable. VxWorks itself contains a loader and runtime "linker" that can dynamically load partially linked object files and resolve references. A loaded object file however must be resolvable entirely using already loaded object code - so no circular dependencies, and in your example you would have to reload/restart the system so that the object file containing the sin() were loaded before those that reference it, otherwise only those loaded after would use the new implementation.
So if you were to use VxWorks (or an OS with similar capabilities), the solution is perhaps simple, if not you would have to implement your own loader/linker, which is of course possible, but not trivial.
Another, perhaps simpler possibility is to have all your code call functions through pointers that you hold in variables, so that all calls (or at least all calls you might want to replace) are resolved at runtime. You would have to load the patch and then modify the sin() function's pointer so that all calls thereafter are made to the new function. The problem with this approach is that you would either have to know a priori which functions you might later want to replace, or have all functions called that way (which may be prohibitively expensive in memory terms. It would perhaps be useful for this solution to have some sort of preprocessor or code generator that would allow you to mark functions that would be "dynamic" in this way and could automatically generate the pointers and calling code. So for example you might write code thus:
__dynamic void myFunction( void ) ;
...
myFunction() ;
and your custom preprocessor would generate:
void myFunction( void ) ;
void (*__dynamic_myFunction)(void) = myFunction() ;
...
__dynamic_myFunction() ;
then your patch/loader code would reassign myFunctionDyn with the address of the replacement function.
You could generate a "dynamic symbol table" containing just the names and addresses of the __dynamic_xxxxx symbols and include that in your application so that a loader could change the __dynamic_xxxxx variables by matching the xxxxx name with the symbols in the loaded object file - if you load a plain binary however you would have to provide the link information to the loader - i.e. which __dynamic_xxxxx variable to be reasssigned and teh address to assign to it.