how to get minimum executable opcodes for c program? - c

to get opcodes author here does following:
[bodo#bakawali testbed8]$ as testshell2.s -o testshell2.o
[bodo#bakawali testbed8]$ ld testshell2.o -o testshell2
[bodo#bakawali testbed8]$ objdump -d testshell2
and then he gets three sections (or mentions only these 3):
<_start>
< starter>
< ender>
I have tried to get hex opcodes the same way but cannot ld correctly. Of course I can produce .o and prog file for example with:
gcc main.o -o prog -g
however when
objdump --prefix-addresses --show-raw-insn -Srl prog
to see complete code with annotations and symbols, I have many additional sections there, for example:
.init
.plt
.text (yes, I know, main is here) [many parts here: _start(), call_gmon_start(), __do_global_dtors_aux(), frame_dummy(), main(), __libc_csu_init(), __libc_csu_fini(), __do_global_ctors_aux()]
.fini
I assume these are additions introduced by gcc linking to runtime libraries. I think i don't need these all sections to call opcode from c code (author uses only those 3 sections) however my problem is I don't know which exactly I might discard and which are necessary. I want to use it like this:
#include <unistd.h>
char code[] = "\x31\xed\x49\x89\x...x00\x00";
int main(int argc, char **argv)
{
/*creating a function pointer*/
int (*func)();
func = (int (*)()) code;
(int)(*func)();
return 0;
}
so I have created this :
#include <unistd.h>
/*
*
*/
int main() {
char *shell[2];
shell[0] = "/bin/sh";
shell[1] = NULL;
execve(shell[0], shell, NULL);
return 0;
}
and I did disassembly as I described. I tried to use opcode from .text main(), this gave me segmentation fault, then .text main() + additionally .text _start(), with same result.
So, what to choose from above sections, or how to generate only as minimized "prog" as with three sections?

char code[] = "\x31\xed\x49\x89\x...x00\x00";
This will not work.
Reason: The code definitely contains adresses. Mainly the address of the function execve() and the address of the string constant "/bin/sh".
The executable using the "code[]" approach will not contain a string constant "/bin/sh" at all and the address of the function execve() will be different (if the function will be linked into the executable at all).
Therefore the "call" instruction to the "execve()" function will jump to anywhere in the executable using the "code[]" approach.
Some theory about executables - just for your information:
There are two possibilities for executables:
Statically linked: These executables contain all necessary code. Therefore they do not access dynamic libraries like "libc.so"
Dynamically linked: These executables do not contain code that is frequently used. Such code is stored in files common to all executables: The dynamic libraries (e.g. "libc.so")
When the same C code is used then statically linked executables are much bigger than dynamically linked executables because all C functions (e.g. "printf", "execve", ...) must be bundled into the executable.
When not using any of these library functions the statically linked executables are simpler and therefore easier to understand.
Statically linked executable behaviour
A statically linked executable is loaded into the memory by the operating system (when it is started using execve()). The executable contains an entry point address. This address is stored in the file header of the executable. You can see it using "objdump -h ...".
The operating system performs a jump to that address so the program execution starts at this address. The address is typically the function "_start" however this can be changed using command line options when linking using "ld".
The code at "_start" will prepare the executable (e.g. initialize variables, calculate the values for "argc" and "argv", ...) and call the "main()" function. When "main()" returns the "_start" function will pass the value returned by "main()" to the "_exit()" function.
Dynamically linked executable behaviour
Such executables contain two additional sections. The first section contains the file name of the dynamic linker (maybe. "/lib/ld-linux.so.1"). The operating system will then load the executable and the dynamic linker and jump to the entry point of the dynamic linker (and not to that of the executable).
The dynamic linker will read the second additional section: It contains information about dynamic libraries (e.g. "libc.so") required by the executable. It will load all these libraries and initialize a lot of variables. Then it calls the initialization function ("_init()") of all libraries and of the executable.
Note that both the operating system and the dynamic linker ignore the function and section names! The address of the entry point is taken from the file header and the addresses of the "_init()" functions is taken from the additional section - the functions may be named differently!
When all this is done the dynamic linker will jump to the entry point ("_start") of the executable.
About the "GOT", "PLT", ... sections:
These sections contain information about the addresses where the dynamic libraries have been loaded by the linker. The "PLT" section contains wrapper code that will contain jumps to the dynamic libraries. This means: The section "PLT" will contain a function "printf()" that will actually do nothing but jump to the "printf()" function in "libc.so". This is done because directly calling a function in a dynamic library from C code would make linking much more difficult so C code will not call functions in a dynamic library directly. Another advantage of this implementation is that "lazy linking" is possible.
Some words about Windows
Windows only knows dynamically linked executables. Windows XP even refused to load an executable not requiring DLLs. The "dynamic linker" is integrated into the operating system and not a separate file. There is also an equivalent of the "PLT" section. However many compilers support "directly" calling DLL code from C code without calling the code in the PLT section first (theoretically this would also be possible under Linux). Lazy linking is not supported.

You should read this article: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.
It explains all you need to create really tiny program in great detail.

Related

backtrace not complete stack trace [duplicate]

In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.

how to make shared library an executable

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13
This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.
I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.
To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

Alternative to backtrace() on Linux that can find symbols for static functions

In the man page, the backtrace() function on Linux says:
Note that names of "static" functions
are not exposed, and won't be available in the backtrace.
However, with debugging symbols enabled (-g), programs like addr2line and gdb can still get the names of static functions. Is there a way to get the names of static functions programmatically from within the process itself?
Yes, by examining its own executable (/proc/self/exe) using e.g. libbfd or an ELF file parsing library, to parse the actual symbols themselves. Essentially, you'd write C code that does the equivalent of something like
env LANG=C LC_ALL=C readelf -s executable | awk '($5 == "LOCAL" && $8 ~ /^[^_]/ && $8 !~ /\./)'
As far as I know, the dynamic linker interface in Linux (<dlfcn.h>) does not return addresses for static (local) symbols.
A simple and pretty robust approach is to execute readelf or objdump from your program. Note that you cannot give the /proc/self/exe pseudo-file path to those, since it always refers to the process' own executable. Instead, you have to use eg. realpath("/proc/self/exe", NULL) to obtain a dynamically allocated absolute path to the current executable you can supply to the command. You also definitely want to ensure the environment contains LANG=C and LC_ALL=C, so that the output of the command is easily parseable (and not localized to whatever language the current user prefers). This may feel a bit kludgy, but it only requires the binutils package to be installed to work, and you don't need to update your program or library to keep up with the latest developments, so I think it is overall a pretty good approach.
Would you like an example?
One way to make it easier, is to generate separate arrays with the symbol information at compile time. Basically, after the object files are generated, a separate source file is dynamically generated by running objdump or readelf over the related object files, generating an array of names and pointers similar to
const struct {
const char *const name;
const void *const addr;
} local_symbol_names[] = {
/* Filled in using objdump or readelf and awk, for example */
{ NULL, NULL }
};
perhaps with a simple search function exported in a header file, so that when the final executable is linked, it can easily and efficiently access the array of local symbols.
It does duplicate some data, since the same information is already in the executable file, and if I remember correctly, you have to first link the final executable with a stub array to obtain the actual addresses for the symbols, and then relink with the symbol array, making it a bit of a hassle at a compile time.. But it avoids having a run-time dependence on binutils.
If your executable (and linked libraries) are compiled with debugging information (i.e. with -g flag to gcc or g++) then you could use Ian Taylor's libbacktrace (announced here) from inside GCC - see its code here
That library (BSD licensed free software) is using DWARF debug information from executables and shared libraries linked by the process. See its README file.
Beware that if you compile with optimizations, some functions could be inlined (even without being explicitly tagged inline in the source code, and static inlined functions might not have any proper own code). Then backtracing won't tell much about them.

process of running linux executable

Is there good documentation of what happen when I run some executable in Linux. For example: I start ./a.out, so probably some bootloader assembly is run (come with c runtime?), and it finds start symbol in program, doing dynamic relocation, finally call main.
I know the above is not correct, but looking for detailed documentation of how this process happen. Can you please explain, or point to links or books that do?
For dynamic linked programs, the kernel detects the PT_INTERP header in the ELF file and first mmaps the dynamic linker (/lib/ld-linux.so.2 or similar), and starts execution at the e_entry address from the main ELF header of the dynamic linker. The initial state of the stack contains the information the dynamic linker needs to find the main program binary (already in memory). It's responsible for reading this and finding all the additional libraries that must be loaded, loading them, performing relocations, and jumping to the e_entry address of the main program.
For static linked programs, the kernel uses the e_entry address from the main program's ELF header directly.
In either case, the main program begins with a routine written in assembly traditionally called _start (but the name is not important as long as its address is in the e_entry field of the ELF header). It uses the initial stack contents to determine argc, argv, environ, etc. and calls the right implementation-internal functions (usually written in C) to run global constructors (if any) and perform any libc initialization needed prior to the entry to main. This usually ends with a call to exit(main(argc, argv)); or equivalent.
A book "Linker and Loader" gives a detail description about the loading process. Maybe it can give you some help on the problem.

How to get memory locations of library functions?

I am compiling a C program with the SPARC RTEMS C compiler.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
I have also tried using the RCC nm utility, which returns a slightly more readable symbol table. I assume that the location given by this utility for, say, printf, is the location where printf is in memory and that every program that calls printf will reach that location during execution. Is this a valid assumption?
Is there any way to get a list of locations for all the library/system functions? Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library? It seems to me to be the latter, given the number of things I found in the symbol table and memory map. Can I make it link only the required functions?
Thanks for your help.
Most often, when using a dynamic library, the nm utility will not be able to give you the exact answer. Binaries these days use what is known as relocatable addresses. These addresses change when they are mapped to the process' address space.
Using the Xlinker -M option, I am able to get a large memory map with a lot of things I don't recognize.
The linker map will usually have all symbols -- yours, the standard libraries, runtime hooks etc.
Is there any way to get a list of locations for all the library/system functions?
The headers are a good place to look.
Also, when the linking is done, does it link just the functions that the executable calls, or is it all functions in the library?
Linking does not necessarily mean that all symbols will be resolved (i.e. given an address). It depends on the type of binary you are creating.
Some compilers like gcc however, does allow you whether to create a non-relocatable binary or not. (For gcc you may check out exp files, dlltool etc.) Check with the appropriate documentation.
With dynamic linking,
1. your executable has a special place for all external calls (PLT table).
2. your executable has a list of libraries it depends on
These two things are independent. It is impossible to say which external function lives in which library.
When a program does an external function call, what actually happens it calls an entry in the PLT table, which does a jump into the dynamic loader. The dynamic loader looks which function was called (via PLT), looks its name (via symbol table in the executable) and looks up that name in ALL libraries that are mapped (all that given executable is dependant on). Once the name is found, the address of the corresponding function is written back to the PLT, so next time the call is made directly bypassing the dynamic linker.
To answer your question, you should do the same job as dynamic linker does: get a list of dependent libs, and lookup all names in them. This could be done using 'nm' or 'readelf' utility.
As for static linkage, I think all symbols in given object file within libXXX.a get linked in. For example, static library libXXX.a consists of object files a.o, b.o and c.o. If you need a function foo(), and it resides in a.o, then a.o will be linked to your app - together with function foo() and all other data defined in it. This is the reason why for example C library functions are split per file.
If you want to dynamically link you use dlopen/dlsym to resolve UNIX .so shared library entry points.
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Assuming you know the names of the functions you want to call, and which .so they are in. It is fairly simple.
void *handle;
int *iptr, (*fptr)(int);
/* open the needed object */
handle = dlopen("/usr/home/me/libfoo.so", RTLD_LOCAL | RTLD_LAZY);
/* find the address of function and data objects */
*(void **)(&fptr) = dlsym(handle, "my_function");
iptr = (int *)dlsym(handle, "my_object");
/* invoke function, passing value of integer as a parameter */
(*fptr)(*iptr);
If you want to get a list of all dynamic symbols, objdump -T file.so is your best bet. (objdump -t file.a if your looking for statically bound functions). Objdump is cross platform, part of binutils, so in a pinch, you can copy your binary files to another system and interrorgate them with objdump on a different platform.
If you want dynamic linking to be optimal, you should take a look at your ld.so.conf, which specifie's the search order for the ld.so.cache (so.cache right ;).

Resources