I would like to list all exported functions in a DLL and dump their bytes. It's pretty trivial to list all the exports using either dumpbin or rabin2 from the radare2 package. I also found a way to disassemble the whole DLL using dumpbin but there's no way to see function boundaries in the dump.
I'm looking for a way to disassemble (with bytes) or ideally just dump the bytes for for a specific or all functions inside a DLL. I don't mind parsing the output if it's got some other information in it. I've tried all kids of tools and so far I was not able to achieve what I need.
One of the possible directions would be to script radare2 to do that.
In order to dump a function's bytes, you will have to know where that function ends.
You could do some static analysis which might work or you could do one of the following:
For 64-bit executables, you can parse the .pdata section which contains a list of RUNTIME_FUNCTIONs. DUMPBIN can do that using either the /unwindinfo or /pdata option.
Note that this may not include every exported function, see reference.
The second option, which works for both 32 and 64-bit executables, is to make use of the DIA SDK
(see IDiaSymbol::get_length). This should cover all exported and non-exported functions but requires you to have access to the executable's .pdb file.
Related
In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
I'm working on advanced debugging procedure that needs to capture certain events and manipulate runtime. One of the actions requires knowledge of an address of a private function (just the address) which is used as a key elsewhere.
My current solution calculates offset of that private function relative to a known exported function at build time using nm. This solution restricts debugging capabilities since it depends on a particular build of the shared library.
The preferable solution should be capable of recovering the address in runtime.
I was hoping to communicate with the attached debugger from within the app, but struggle to find any API for that.
What are my options?
In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
Debugger is not a magical unicorn. If the symbol table is available to the debugger, it is also available to your application.
I need to recover its address by name using the debugger ...
That is entirely wrong approach.
Instead of using the debugger, read the symbol table for the library in your application, and use the info gained to call the target function.
Reading ELF symbol table is pretty easy. Example. If you are not on ELF platform, getting equivalent info should not be much harder.
In lldb you can quickly find the address by setting a symbolic breakpoint if it's known to the debugger by whatever means:
b symbolname
If you want call a non exported function from a library without a debugger attached there are couple of options but each will not be reliable in the long run:
Hardcode the offset from an exported library and call exportedSymbol+offset (this will work for a particular library binary version but will likely break for anything else)
Attempt to search for a binary signature of your nonexported function in the loaded library. (slightly less prone to break but the binary signature might always change)
Perhaps if you provide more detailed context what are you trying achieve better options can be considered.
Update:
Since lldb is somehow aware of the symbol I suspect it's defined in Mach-O LC_SYMTAB load command of your library. To verify that you could inspect your lib binary with tools like MachOView or MachOExplorer . Or Apple's otool or Jonathan Levin's jtool/jtool2 in console.
Here's an example from very 1st symbol entry yielded from LC_SYMTAB in MachOView. This is /usr/lib/dyld binary
In the example here 0x1000 is virtual address. Your library most likely will be 64bit so expect 0x10000000 and above. The actual base gets randomized by ASLR, but you can verify the current value with
sample yourProcess
yourProcess being an executable using the library you're after.
The output should contain:
Binary Images:
0x10566a000 - 0x105dc0fff com.apple.finder (10.14.5 - 1143.5.1) <3B0424E1-647C-3279-8F90-4D374AA4AC0D> /System/Library/CoreServices/Finder.app/Contents/MacOS/Finder
0x1080cb000 - 0x1081356ef dyld (655.1.1) <D3E77331-ACE5-349D-A7CC-433D626D4A5B> /usr/lib/dyld
...
These are the loaded addresses 0x100000000 shifted by ASLR. There might be more nuances how exactly those addresses are chosen for dylibs but you get the idea.
Tbh I've never needed to find such address programmatically but it's definitely doable (as /usr/bin/sample is able to do it).
From here to achieve something practically:
Parse Mach-o header of your lib binary (check this & this for starters)
Find LC_SYMTAB load command
Find your symbol text based entry and find the virtual address (the red box stuff)
Calculate ASLR and apply the shift
There is some C Apple API for parsing Mach-O. Also some Python code exists in the wild (being popular among reverse engineering folks).
Hope that helps.
I want to build a library which is relocatable (ie. nothing other than local variables. I also want to force the location of the library to be at a fixed location in memory. I think this has to be done in the makefile, but I am confused as to what I have to do to force the library to be loaded at a fixed location. This is using mb-gcc.
The reason I need this is I want to write a loader where I dont want to clobber over the code that is actually doing the copy of the other program. So I want the program that is doing the copying to be located somewhere else at a location that is not being used (ie. ddr).
If I have all the functions that do the compiled into a library, what special makefile arguments do I need to force this to be loaded at location 0x80000000 for example.
Any help would be greatly appreciated. Thanks in advance.
You write a linker script, and tell the compiler/linker to use it by using the -T script.ld option (to gcc and/or ld, depending on how you build your firmware files).
In your library C source files, you can use the __attribute__((section ("name"))) syntax to put your functions and variables into a specific section. The linker script can then decide where to put each section -- often at a fixed address for these kinds of devices. (You'll often see macro declarations like #define FIRMWARE __attribute__((section(".text.firmware"))) or similar, to make the code easier to read and understand.)
If you create a separate firmware file just for your library, then you don't need to add the attributes to your code, just write the linker script to put the .text (executable code), .rodata (read-only constants), and .bss (uninitialized variables) sections at suitable addresses.
A web search for microblaze "linker script" finds some useful examples, and even more guides. Some of them must be suitable for your tools.
For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?
If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.
The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!
When I debug any program with debugger (for example OllyDbg), in disassembled assembly code, I can see function names, for example:
push 0
call msvcrt.exit
How does the debugger know the function names? Where do they come from? In machine code, it is represented as call address. So how debugger knows it?
Compilers generate "symbols" files, providing to debuggers a way to show the name of a symbol that corresponds to a particular address or an offset. This is highly system-dependent: for example, VS toolchain on Windows places these symbols in separate .pdb files, while on some UNIX flavors these debug symbols are embedded into the executable. EDIT : According to the comments, OllyDbg pulls symbols from the Import Address Table embedded in executable files.
When symbols are embedded into the executable, compiler vendors provide a tool to remove these symbols. For example, GNU provides the strip utility to work with their toolchain.
Don't get me wrong by looking at the question title - I know what they are (format for portable executable files). But my interest scope is slightly different
MY CONFUSION
I am involved in re-hosting/retargeting applications that are originally from third parties. The problem is that sometimes the formats for object codes are also in .elf, .COFF formats and still says, "Executable and linkable".
I am primarily a Windows user and know that when you compile and assemble your C/C++ code, you get something similar to .o or .obj. that are not executable (well, I never tried to execute them). But when you complete linking static and dynamic libraries and finish building, the executable appears. My understanding is that you can then go about and link that executable or "bash" test it with some form of script if necessary.
However, in Linux (or UNIX-like systems) there are .o files after you compile and assemble the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib. In my quick web search none of the sources mentioned anything about .o files as executables.
QUESTIONS
Therefore my question turns into the followings:
What is the true definitions for portable executables and object code?
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Am I misinterpreting "Linkable"? My interpretation of "Linkable" is something that is compiled object code and can then be "linked" to other static/dynamic link libraries. Is this a stupid thought?
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the executables/object files on a different machine.
Thanks in advance for the right nudges. Meanwhile, I will keep digging and update the question if necessary.
UPDATE
I have managed to dig this out from somewhere :( Seems like a lot to swallow to me.
I am primarily a Windows user and know that when you compile your C/C++ code, you get something similar to .o or .obj. that are not executable
Well, last time I compiled stuff on Windows, the result of the compilation was an .obj file, which is exactly what its name suggests: it's an object file. You're right in that it's not an executable in itself. It contains machine code which doesn't (yet) contain enough information to be directly run on the CPU.
However, in Linux (or UNIX-like systems) there are .o files after you compile the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib.
Living in the 90's, that is :P No modern compilers I am aware of target the a.out format as their default output format for object code. Maybe it's a misleading default of GCC to put the object code into a file called a.out when no explicit output file name is specified, but if you run the file command on a.out, you'll find out that it's an ELF file. The a.out format is ancient and it's kind of "de facto obsolete".
What is the true definitions for portable executables and object code?
You've already got the Wikipedia link to object files, here's the one to "Portable Executable".
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Because the ELF format (and apparently COFF too) has been designed like so. And why not? It's just the very same machine code after all, it seems quite logical to use one file format during all the compilation steps. Just like we don't like when dynamic libraries and stand-alone executables have a different format. (That's why ELF is called ELF - it's an "Executable and Linkable Format".)
Am I misinterpreting "Linkable"?
I don't know. From your question it's not clear to me what you think "linkable" is. In general, it means that it's a file that can be linked against, i. e. a library.
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the object files on a different machine.
I think this one is not related to the executable format used. If you want to debug, you have to generate debugging information no matter what. But if you don't need to debug, then you're free to omit them of course.