How can I extract an array from an executable file?

How can I extract an array from an executable file? - c

I want to do the inverse of this question.
I am embedding a file into an executable as an array, and I would later like to extract the embedded file from the executable.
It seems like objcopy might be useful here but I haven't figured out the proper incantation yet.
(Edit: clarify question, I somehow removed the crux of it in editing originally...)

If you place the embedded file within its own section you can use objcopy to extract that section into a raw output file, I think.
Look into gcc's __attribute__((section("embedded_file") ))
Or if you are getting the file into the exe some other way using the linker you should be able to get it into another section another way, but I'm not familiar with doing that.

Put a recognizable pattern at the beginning of the array to help you find the data in the file.
If you're creating a Windows executable, put the data into a binary resource in the executable instead of just encoding it into an array -- you can then use normal Windows resource functions (FindResource, LoadResource, etc.) to get the data (though this is a bit trickier to get working correctly than it initially seems like it should be).

Related

How to retrieve function name from function address using link register (like backtrace_symbol) in linux

I want to write c program (like backtrace) I am getting the address of functions but I don't know how to convert those address to symbols (function name ). Please help me

The first answer is that the symbol handling is an internal ABI hidden away. Some OS even perform this magic in kernel space. But you obviously want ARM + linux.
First information needed is to map addresses back to their origin. This mapping you can retrieve from here: /proc/self/stat
Next part is more tricky, reversing those offsets from those files into symbols. For that you will need to parse the ELF files. If you do not want to parse the binary data, you can cheat and use objdump and parse the ASCII formatted data instead
http://man7.org/linux/man-pages/man5/elf.5.html
objdump -t -T -r -R /lib/x86_64-linux-gnu/libnss_files-2.19.so
If you want even more details information than this, you will need to parse the section that contains the debug information if present - and it might be moved to a separate file to allow apt to have those nice -dbg packages, but that is probably way to much work, and easier to hack gdb instead or extract code from projects like valgrind.
PS: If your use-case is to perform debug/diagnostics when things go wrong, I will recommend to use valgrind

Add resources to ELF object, grouped in one section

A small program I made contains a lot of small bitmaps and sound clips that I would prefer to include into the binary itself (they need to be memory mapped anyway). In the MS PE/COFF standard, there is a specific description on how to include resources (the .rsrc section) that has a nice file system-like hierarchy. I have not found anything like that in the Linux ELF specification, thus I assume one is free to include these resources as seemed fit.
What I want to achieve is that I can include all resources in only one ELF section with a symbolic name on the start of each resource (so that I can address them from my C code). What I am doing now is using a small NASM file that has the following layout:
SECTION .rsrc
_resource_1:
incbin "../rsrc/file_name_1"
_resource_1_length:
dw $-resource_1
_resource_2:
incbin "../rsrc/file_name_2"
_resource_2_length:
dw $-resource_2
...
I can easily assemble this to an ELF object that can be linked with my C code. However, I dislike the use of assembly as that makes my code platform-dependent.
What would be a better way to achieve the same result?
This question has already been asked on stackoverflow, but the proposed solutions are not applicable to my case:
The solution proposed over here: C/C++ with GCC: Statically add resource files to executable/library
Including the resources as hex arrays in C code is not really useful, as that mixes the code and the data in one section. (Besides, it's not practical either, as I can't preview the resources once they are converted to arrays)
Using objcopy --add-section on every resource works, but then every resource gets its own section (including header and all that). That seems a little wasteful as I have around 120 files to include (each of +/- 4K).

You're wrong saying that using hexarrays mixes data and code, as ELF files will split them by default, in particular, if you define the hexarray as a constant array, it'll end up in .rodata. See an old post of mine for more details on .rodata.
Adding resources with objcopy should create multiple sections in the object file, but then they should all be merged in the output executable, although then you would have some extra padding almost certainly. Another post on a related topic.
Your other alternative if you want to go from the actual binary file (say a PNG) to ELF, you can use ldscripts, which allow you to build ELF files with arbitrary sections/symbols and reading the data from files. You'll still need custom rules to build your ELF file.
I'm actually surprised this kind of resource management is not used more often for ELF, especially since, for many small files, it'll improve filesystem performance quite quickly, as then you only have one file to map rather than many.

If your resource is not too large, you can translate them into C/C++ source code, for example, as a unsigned char array. Then you can access them as global variables, and compile & link them like normal source code.

How linkers work

I need to understand how linkers work and how a linker makes an .exe with all its different sections and segments. To be more precise, I wanna be atleast able to make an .exe which contains the following code, with my hands using a hex editor.
int main()
{
}
Please suggest me some necessary reading. I have Linkers & Loaders by John Levine.

Ian Lance Taylor wrote a new ELF linker, called gold. He described the general tasks that linkers do in a series of blog posts.
This is different in details from what a PE linker does, but the general ideas are all the same (and same as in "Linkers & Loaders".
I wanna be atleast able to make an .exe which contains the following code, with my hands using a hex editor
I am not sure why you want a linker in that case. Read the description of PE file format, construct the necessary headers, write a tiny .text section containing xor %eax,%eax; ret, set the entry point to that section, ..., profit?

How (static) linkers work:
Take all necessary object files (module), merging/combining/flattening the addresses so that inter-module calls are possible and correct, structuring them to fulfill the executable format requirements, and perhaps do link time optimizations.
To create an executable (note that here I'm not referring to any specific format), you have to know the executable format, usually consists of headers and sections, plus any required startup code (compilers usually already provided this as an object file, precompiled). For PE (.exe) files, MSDN has a good explanation.

How to extract C source code from .so file?

I am working on previously developed software and source code is compiled as linux shared libraries (.so) and source code is not present. Is there any tool which can extract source code from the linux shared libraries?
Thanks,
Ravi

There isn't. Once you compile your code there is no trace of it left in the binary, only machine code.
Some may mention decompilers but those don't extract the source, they analyze the executable and produce some source that should have the same effect as the original one did.

You can try disassembling the object code and get the machine code mnemonics.
objdump -D --disassembler-options intel sjt.o to get Intel syntax assembly
objdump -D --disassembler-options att sjt.o or objdump -D sjt.o to get AT&T syntax assembly
But the original source code could never be found. You might try to reverse the process by studying and reconstruct the sections. It would be hell pain.

Disclaimer: I work for Hex-Rays SA.
The Hex-Rays decompiler is the only commercially available decompiler I know of that works well with real-life x86 and ARM code. It's true that you don't get the original source, but you get something which is equivalent to it. If you didn't strip your binary, you might even get the function names, or, with some luck, even types and local variables. However, even if you don't have symbol info, you don't have to stick to the first round of decompilation. The Hex-Rays decompiler is interactive - you can rename any variable or function, change variable types, create structure types to represent the structures in the original code, add comments and so on. With a little work you can recover a lot. And quite often what you need is not the whole original file, but some critical algorithm or function - and this Hex-Rays can usually provide to you.
Have a look at the demo videos and the comparison pages. Still think "staring at the assembly" is the same thing?

No. In general, this is impossible. Source is not packaged in compiled objects or libraries.

You cannot. But you can open it as an archive in 7-Zip. You can see the file type and size of each file separately in that. You can replace the files in it with your custom files.

Tool for simple modification of elf file?

My embedded projects have a post-process step that replaces a value in the executable with the CRC of (some sections of) the flash. This step can only be done after linking since that is the first opportunity to CRC the image. In the past the file format was COFF, and I have created a custom tool to do the patching.
The development tool has switched to ELF, so I need to re-implement the CRC patcher. Before I do, I thought I'd look for an existing tool to do this. The compiler is based on gcc, but I can't see any combination of ld and nm and readelf that can do the job. A Google search has not been fruitful.
My present tool uses nm to find the address to patch, and calls the patcher with the address, the expected value (to prevent overwriting the wrong data), and the new CRC value. The CRC is calculated on a "hex" format of the executable (that I also patch) so fortunately I don't have to redo that part.
I can implement this with libelf and custom code again, but before I do, does it already exist?
Is there a better way to accomplish my goal of putting a CRC of the executable into the executable so it's available to the application?

If I've understood what you're trying to do correctly, I think the following would work:
nm gives you the runtime virtual address of the location you want to patch;
readelf -S gives you both the runtime virtual address and the offset within the file for the beginning of each section;
stitching the two together (e.g. with a few lines of your favourite scripting language) gives the offset within the file to patch.

I'm not sure if this would work, but you might be able to arrange it so that the CRC location within your object file were to be set to the address of an external symbol X. That external symbol might then be satisfied by a last linking step by linking in an elf file that did nothing but specify that X's address was the CRC that you have calculated.
This is still pretty hacky, and I'm not sure if it's easily do-able (since it is such an abuse of the tools).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight