My embedded projects have a post-process step that replaces a value in the executable with the CRC of (some sections of) the flash. This step can only be done after linking since that is the first opportunity to CRC the image. In the past the file format was COFF, and I have created a custom tool to do the patching.
The development tool has switched to ELF, so I need to re-implement the CRC patcher. Before I do, I thought I'd look for an existing tool to do this. The compiler is based on gcc, but I can't see any combination of ld and nm and readelf that can do the job. A Google search has not been fruitful.
My present tool uses nm to find the address to patch, and calls the patcher with the address, the expected value (to prevent overwriting the wrong data), and the new CRC value. The CRC is calculated on a "hex" format of the executable (that I also patch) so fortunately I don't have to redo that part.
I can implement this with libelf and custom code again, but before I do, does it already exist?
Is there a better way to accomplish my goal of putting a CRC of the executable into the executable so it's available to the application?
If I've understood what you're trying to do correctly, I think the following would work:
nm gives you the runtime virtual address of the location you want to patch;
readelf -S gives you both the runtime virtual address and the offset within the file for the beginning of each section;
stitching the two together (e.g. with a few lines of your favourite scripting language) gives the offset within the file to patch.
I'm not sure if this would work, but you might be able to arrange it so that the CRC location within your object file were to be set to the address of an external symbol X. That external symbol might then be satisfied by a last linking step by linking in an elf file that did nothing but specify that X's address was the CRC that you have calculated.
This is still pretty hacky, and I'm not sure if it's easily do-able (since it is such an abuse of the tools).
Related
Foreword
There already exist questions like this one, but the ones I have found so far were either specific to a given toolchain (this solution works with GCC, but not with Clang), or specific to a given format (this one is specific to Mach-O). I tagged ELF in this question merely out of familiarity, but I'm trying to figure out as "portable" a solution as reasonably possible, for other platforms. Anything that can be done with GCC and GCC-compatible toolchains (like MinGW and Clang) would suffice for me.
Problem
I have an ELF section containing a collection of relocatable "gadgets" that are to be copied/injected into something that can execute them. They are completely relocatable in that the raw bytes of the section can be copied verbatim (e.g. by memcpy) to any [correctly aligned] memory location, with little to no relocation processing (done by the injector). The problem is that I don't have a portable way of determining the size of such a section. I could "cheat" a little by using --section-start, and outright determine a start address, but that feels a little hacky, and I still wouldn't have a way to get the section's end/size.
Overview
Most of the gadgets in the section are written in assembly, so the section itself is declared there. No languages were tagged because I'm trying to be portable with this and get it working for various architectures/platforms. I have separate assembly sources for each (e.g. ARM, AArch64, x86_64, etc).
; The injector won't be running this code, so no need to be executable.
; Relocations (if any) will be done by the injector.
.section gadgets, "a", #progbits
...
The more "heavy duty" code is written in C and compiled in via a section attribute.
__attribute__((section("gadgets")))
void do_totally_innocent_things();
Alternatives
I technically don't need to make use of sections like this at all. I could instead figure out the ends of each function in the gadget, and then copy those however I like. I figured using a section would be a more straightforward way to go about it, to keep everything in one modular relocatable bundle.
I'm not sure if you considered this or this is out of picture but you could read the elf headers.
This would be sort of 'universal' as you can do the same thing with Mach-O binaries
So for example:
Creating 3 integer variables inside the 'custom_sect' section
These would add up to 12 or 0xC bytes which if we read the headers we can confirm:
Size with readelf
and here is how a section is represented in the ELF executables: ELF section representation
So each section will have its own size property which you can just read out
In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
I'm working on advanced debugging procedure that needs to capture certain events and manipulate runtime. One of the actions requires knowledge of an address of a private function (just the address) which is used as a key elsewhere.
My current solution calculates offset of that private function relative to a known exported function at build time using nm. This solution restricts debugging capabilities since it depends on a particular build of the shared library.
The preferable solution should be capable of recovering the address in runtime.
I was hoping to communicate with the attached debugger from within the app, but struggle to find any API for that.
What are my options?
In runtime I'm trying to recover an address of a function that is not exported but is available through shared library's symbols table and therefore is visible to the debugger.
Debugger is not a magical unicorn. If the symbol table is available to the debugger, it is also available to your application.
I need to recover its address by name using the debugger ...
That is entirely wrong approach.
Instead of using the debugger, read the symbol table for the library in your application, and use the info gained to call the target function.
Reading ELF symbol table is pretty easy. Example. If you are not on ELF platform, getting equivalent info should not be much harder.
In lldb you can quickly find the address by setting a symbolic breakpoint if it's known to the debugger by whatever means:
b symbolname
If you want call a non exported function from a library without a debugger attached there are couple of options but each will not be reliable in the long run:
Hardcode the offset from an exported library and call exportedSymbol+offset (this will work for a particular library binary version but will likely break for anything else)
Attempt to search for a binary signature of your nonexported function in the loaded library. (slightly less prone to break but the binary signature might always change)
Perhaps if you provide more detailed context what are you trying achieve better options can be considered.
Update:
Since lldb is somehow aware of the symbol I suspect it's defined in Mach-O LC_SYMTAB load command of your library. To verify that you could inspect your lib binary with tools like MachOView or MachOExplorer . Or Apple's otool or Jonathan Levin's jtool/jtool2 in console.
Here's an example from very 1st symbol entry yielded from LC_SYMTAB in MachOView. This is /usr/lib/dyld binary
In the example here 0x1000 is virtual address. Your library most likely will be 64bit so expect 0x10000000 and above. The actual base gets randomized by ASLR, but you can verify the current value with
sample yourProcess
yourProcess being an executable using the library you're after.
The output should contain:
Binary Images:
0x10566a000 - 0x105dc0fff com.apple.finder (10.14.5 - 1143.5.1) <3B0424E1-647C-3279-8F90-4D374AA4AC0D> /System/Library/CoreServices/Finder.app/Contents/MacOS/Finder
0x1080cb000 - 0x1081356ef dyld (655.1.1) <D3E77331-ACE5-349D-A7CC-433D626D4A5B> /usr/lib/dyld
...
These are the loaded addresses 0x100000000 shifted by ASLR. There might be more nuances how exactly those addresses are chosen for dylibs but you get the idea.
Tbh I've never needed to find such address programmatically but it's definitely doable (as /usr/bin/sample is able to do it).
From here to achieve something practically:
Parse Mach-o header of your lib binary (check this & this for starters)
Find LC_SYMTAB load command
Find your symbol text based entry and find the virtual address (the red box stuff)
Calculate ASLR and apply the shift
There is some C Apple API for parsing Mach-O. Also some Python code exists in the wild (being popular among reverse engineering folks).
Hope that helps.
As far as I know compiler convert source code to machine code. But this code do not have any OS-related sections and linker add them to file.
But is it's possible to make some executable without linker?
Answering your question very literally - yes, it is possible to make an executive file without a linker: you don't need a compiler or linker to generate machine code. Binaries are a series of opcodes and relevant information (offsets, addresses etc). If you open a binary editor then type out some opcodes and make a program. Save and run it.
Of course the binary will be processor specific, just as if you had compiled a binary (native) executive. Here's a reference to the Intel x86 opcodes.
http://ref.x86asm.net/coder32.html.
If you're however asking, "Can I compile a source file directly into an executive file without a linker?" then speaking purely: no - unless the compiler has aspects of a linker integrated within it. The compiler generates intermediate objects that are passed on to the linker to "link" them into a binary such as a library or executive. Without the link step the pipeline is not complete.
Let's first make a statement that is to be considered true, compilers do not generate machine code that can be immediately executed (JIT's do, but lets ignore that).
Instead they generate files (object, static, dynamic, executable) which describe what they contains as well as groups of symbols. Symbols can be global variables or functions.
But symbols just like the file itself contain metadata. This metadata is very important. See the machine code stored in a symbol is the raw instructions for the target architecture but it does not know where memory is stored.
While modern CPU's give each process its own address space, a symbol may not land and probably won't land in the same address twice. In very recent times this is a security measure, but in past its so that dynamic linking works correctly.
So when the OS loads up an executable or shared library it can place it wherever it wants and by doing so make it not repeatable. Otherwise we'd all have to start caring and saying "this file contains 100% of the code I intend to execute". Usually on load the raw binary in the symbol table get transformed by patching it with the symbol locations in RAM. Making everything just work.
In summary the compiler emits files that allow for dynamic patching of assembly
prior to execution. If it didn't, we would be living in a very restrictive and problematic world.
Linkers even have scripts to change how they operate. They are a very complex and delicate piece of software required to make our programs work.
Have a read of the PE-COFF and ELF standards if you want to get an idea of just how complex those formats really are.
I want to write c program (like backtrace) I am getting the address of functions but I don't know how to convert those address to symbols (function name ). Please help me
The first answer is that the symbol handling is an internal ABI hidden away. Some OS even perform this magic in kernel space. But you obviously want ARM + linux.
First information needed is to map addresses back to their origin. This mapping you can retrieve from here: /proc/self/stat
Next part is more tricky, reversing those offsets from those files into symbols. For that you will need to parse the ELF files. If you do not want to parse the binary data, you can cheat and use objdump and parse the ASCII formatted data instead
http://man7.org/linux/man-pages/man5/elf.5.html
objdump -t -T -r -R /lib/x86_64-linux-gnu/libnss_files-2.19.so
If you want even more details information than this, you will need to parse the section that contains the debug information if present - and it might be moved to a separate file to allow apt to have those nice -dbg packages, but that is probably way to much work, and easier to hack gdb instead or extract code from projects like valgrind.
PS: If your use-case is to perform debug/diagnostics when things go wrong, I will recommend to use valgrind
I am working on previously developed software and source code is compiled as linux shared libraries (.so) and source code is not present. Is there any tool which can extract source code from the linux shared libraries?
Thanks,
Ravi
There isn't. Once you compile your code there is no trace of it left in the binary, only machine code.
Some may mention decompilers but those don't extract the source, they analyze the executable and produce some source that should have the same effect as the original one did.
You can try disassembling the object code and get the machine code mnemonics.
objdump -D --disassembler-options intel sjt.o to get Intel syntax assembly
objdump -D --disassembler-options att sjt.o or objdump -D sjt.o to get AT&T syntax assembly
But the original source code could never be found. You might try to reverse the process by studying and reconstruct the sections. It would be hell pain.
Disclaimer: I work for Hex-Rays SA.
The Hex-Rays decompiler is the only commercially available decompiler I know of that works well with real-life x86 and ARM code. It's true that you don't get the original source, but you get something which is equivalent to it. If you didn't strip your binary, you might even get the function names, or, with some luck, even types and local variables. However, even if you don't have symbol info, you don't have to stick to the first round of decompilation. The Hex-Rays decompiler is interactive - you can rename any variable or function, change variable types, create structure types to represent the structures in the original code, add comments and so on. With a little work you can recover a lot. And quite often what you need is not the whole original file, but some critical algorithm or function - and this Hex-Rays can usually provide to you.
Have a look at the demo videos and the comparison pages. Still think "staring at the assembly" is the same thing?
No. In general, this is impossible. Source is not packaged in compiled objects or libraries.
You cannot. But you can open it as an archive in 7-Zip. You can see the file type and size of each file separately in that. You can replace the files in it with your custom files.