I'm trying to write a function that copies a function (and ends up modify its assembly) and returns it. This works fine for one level of indirection, but at two I get a segfault.
Here is a minimum (not)working example:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I can memcpy into an mmap'ed area once to create a valid function that can be called (g()()). But if I try to apply it again (h()()()) it segfaults. I have confirmed that it correctly creates the copied version of g, but when I execute that version I get a segfault.
Is there some reason why I can't execute code in one mmap'ed area from another mmap'ed area? From exploratory gdb-ing with x/i checks it seems like I can call down successfully, but when I return the function I came from has been erased and replaced with 0s.
How can I get this behaviour to work? Is it even possible?
BIG EDIT:
Many have asked for my rationale as I am obviously doing an XY problem here. That is true and intentional. You see, a little under a month ago this question was posted on the code golf stack exchange. It also got itself a nice bounty for a C/Assembly solution. I gave some idle thought to the problem and realized that by copying a functions body while stubbing out an address with some unique value I could search its memory for that value and replace it with a valid address, thus allowing me to effectively create lambda functions that take a single pointer as an argument. Using this I could get single currying working, but I need the more general currying. Thus my current partial solution is linked here. This is the full code that exhibits the segfault I am trying to avoid. While this is pretty much the definition of a bad idea, I find it entertaining and would like to know if my approach is viable or not. The only thing I'm missing is ability to run a function created from a function, but I can't get that to work.
The code is using relative calls to invoke mmap and memcpy so the copied code ends up calling an invalid location.
You can invoke them through a pointer, e.g.:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
void* (*mmap_ptr)(void *addr, size_t length, int prot, int flags,
int fd, off_t offset) = mmap;
void* (*memcpy_ptr)(void *dest, const void *src, size_t n) = memcpy;
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I'm trying to write a function that copies a function
I think that is pragmatically not the right approach, unless you know very well machine code for your platform (and then you would not ask the question). Be aware of position independent code (useful because in general mmap(2) would use ASLR and give some "randomness" in the addresses). BTW, genuine self-modifying machine code (i.e. changing some bytes of some existing valid machine code) is today cache and branch-predictor unfriendly and should be avoided in practice.
I suggest two related approaches (choose one of them).
Generate some temporary C file (see also this), e.g. in /tmp/generated.c, then fork a compilation using gcc -Wall -g -O -fPIC /tmp/generated.c -shared -o /tmp/generated.so of it into a plugin, then dlopen(3) (for dynamic loading) that /tmp/generated.so shared object plugin (and probably use dlsym(3) to find function pointers in it...). For more about shared objects, read Drepper's How To Write Shared Libraries paper. Today, you can dlopen many hundreds of thousands of such shared libraries (see my manydl.c example) and C compilers (like recent GCC) are fast enough to compile a few thousand lines of code in a time compatible with interaction (e.g. less than a tenth of second). Generating C code is a widely used practice. In practice you would represent some AST in memory of the generated C code before emitting it.
Use some JIT compilation library, such as GCCJIT, or LLVM, or libjit, or asmjit, etc.... which would generate a function in memory, do the required relocations, and give you some pointer to it.
BTW, instead of coding in C, you might consider using some homoiconic language implementation (such as SBCL for Common Lisp, which compiles to machine code at every REPL interaction, or any dynamically contructed S-expr program representation).
The notions of closures and of callbacks are worthwhile to know. Read SICP and perhaps Lisp In Small Pieces (and of course the Dragon Book, for general compiler culture).
this question was posted on code golf.SE
I updated the 8086 16-bit code-golf answer on the sum-of-args currying question to include commented disassembly.
You might be able to use the same idea in 32-bit code with a stack-args calling convention to make a modified copy of a machine code function that tacks on a push imm32. It wouldn't be fixed-size anymore, though, so you'd need to update the function size in the copied machine code.
In normal calling conventions, the first arg is pushed last, so you can't just append another push imm32 before a fixed-size call target / leave / ret trailer. If writing a pure asm answer, you could use an alternate calling convention where args are pushed in the other order. Or you could have a fixed-size intro, then an ever-growing sequence of push imm32 + call / leave / ret.
The currying function itself could use a register-arg calling convention, even if you want the target function to use i386 System V for example (stack args).
You'd definitely want to simplify by not supporting args wider than 32 bit, so no structs by value, and no double. (Of course you could chain multiple calls to the currying function to build up a larger arg.)
Given the way the new code-golf challenge is written, I guess you'd compare the total number of curried args against the number of args the target "input" function takes.
I don't think there's any chance you can make this work in pure C with just memcpy; you have to modify the machine code.
Related
This question already has answers here:
How to write self-modifying code in x86 assembly
(7 answers)
Closed 6 years ago.
Is there any way to put processor instructions into array, make its memory segment executable and run it as a simple function:
int main()
{
char myarr[13] = {0x90, 0xc3};
(void (*)()) myfunc = (void (*)()) myarr;
myfunc();
return 0;
}
On Unix (these days, that means "everything except Windows and some embedded and mainframe stuff you've probably never heard of") you do this by allocating a whole number of pages with mmap, writing the code into them, and then making them executable with mprotect.
void execute_generated_machine_code(const uint8_t *code, size_t codelen)
{
// in order to manipulate memory protection, we must work with
// whole pages allocated directly from the operating system.
static size_t pagesize;
if (!pagesize) {
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) fatal_perror("getpagesize");
}
// allocate at least enough space for the code + 1 byte
// (so that there will be at least one INT3 - see below),
// rounded up to a multiple of the system page size.
size_t rounded_codesize = ((codelen + 1 + pagesize - 1)
/ pagesize) * pagesize;
void *executable_area = mmap(0, rounded_codesize,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0);
if (!executable_area) fatal_perror("mmap");
// at this point, executable_area points to memory that is writable but
// *not* executable. load the code into it.
memcpy(executable_area, code, codelen);
// fill the space at the end with INT3 instructions, to guarantee
// a prompt crash if the generated code runs off the end.
// must change this if generating code for non-x86.
memset(executable_area + codelen, 0xCC, rounded_codesize - codelen);
// make executable_area actually executable (and unwritable)
if (mprotect(executable_area, rounded_codesize, PROT_READ|PROT_EXEC))
fatal_perror("mprotect");
// now we can call it. passing arguments / receiving return values
// is left as an exercise (consult libffi source code for clues).
((void (*)(void)) executable_area)();
munmap(executable_area, rounded_codesize);
}
You can probably see that this code is very nearly the same as the Windows code shown in cherrydt's answer. Only the names and arguments of the system calls are different.
When working with code like this, it is important to know that many modern operating systems will not allow you to have a page of RAM that is simultaneously writable and executable. If I'd written PROT_READ|PROT_WRITE|PROT_EXEC in the call to mmap or mprotect, it would fail. This is called the W^X policy; the acronym stands for Write XOR eXecute. It originates with OpenBSD, and the idea is to make it harder for a buffer-overflow exploit to write code into RAM and then execute it. (It's still possible, the exploit just has to find a way to make an appropriate call to mprotect first.)
Depends on the platform.
For Windows, you can use this code:
// Allocate some memory as readable+writable
// TODO: Check return value for error
LPVOID memPtr = VirtualAlloc(NULL, sizeof(myarr), MEM_COMMIT, PAGE_READWRITE);
// Copy data
memcpy(memPtr, myarr, sizeof(myarr);
// Change memory protection to readable+executable
// Again, TODO: Error checking
DWORD oldProtection; // Not used but required for the function
VirtualProtect(memPtr, sizeof(myarr), PAGE_EXECUTE_READ, &oldProtection);
// Assign and call the function
(void (*)()) myfunc = (void (*)()) memPtr;
myfunc();
// Free the memory
VirtualFree(memPtr, 0, MEM_RELEASE);
This codes assumes a myarr array as in your question's code, and it assumes that sizeof will work on it i.e. it has a directly defined size and is not just a pointer passed from elsewhere. If the latter is the case, you would have to specify the size in another way.
Note that here there are two "simplifications" possible, in case you wonder, but I would advise against them:
1) You could call VirtualAlloc with PAGE_EXECUTE_READWRITE, but this is in general bad practice because it would open an attack vector for unwanted code exeuction.
2) You could call VirtualProtect on &myarr directly, but this would just make a random page in your memory executable which happens to contain your array executable, which is even worse than #1 because there might be other data in this page as well which is now suddenly executable as well.
For Linux, I found this on Google but I don't know much about it.
Very OS-dependent: not all OSes will deliberately (read: without a bug) allow you to execute code in the data segment. DOS will because it runs in Real Mode, Linux can also with the appropriate privileges. I don't know about Windows.
Casting is often undefined and has its own caveats, so some elaboration on that topic here. From C11 standard draft N1570, §J.5.7/1:
A pointer to an object or to void may
be cast to a pointer to a function, allowing data to be invoked as a
function (6.5.4).
(Formatting added.)
So, it's perfectly fine and should work as expected. Of course, you would need to cohere to the ABI's calling convention.
Im trying to copy a function i have to an executable page and run it from there, but i seem to be having some problems.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <windows.h>
int foo()
{
return 4;
}
int goo()
{
return 5;
}
int main()
{
int foosize = (int)&goo-(int)&foo;
char* buf = VirtualAlloc(NULL, foosize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == NULL)
{
printf("Failed\n");
return 1;
}
printf("foo %x goo %x size foo %d\n", &foo, &goo, foosize);
memcpy (buf, (void*)&foo, foosize);
int(*f)() = &foo;
int ret1 = f();
printf("ret 1 %d\n", ret1);
int(*f2)() = (int(*)())&buf;
int ret2 = f2 (); // <-- crashes here
printf("ret2 %d\n", ret2);
return 0;
}
I know some of the code is technically UB ((int)&goo-(int)&foo), but it behaves fine in this case.
My question is why is this not working as expected?
It seems to me i mapped a page as executable and copied an existing function there and im just calling it.
What am i missing?
Would this behave differently on linux with mmap?
Thanks in advance
As everyone has already stated in comments, this is totally undefined behavior and should never really expect to work. However, I played with your code some with the debugger and realized the reason it's not working (at least in Cygwin gcc compiler) is you're creating f2 incorrectly to point to the the address of the pointer storing the allocated memory, namely buf. You want to point to the memory that buf points to. Therefore, your assignment should be
int(*f2)() = (int(*)())buf;
With that change, your code executes for me. But even if it works, it might break again as soon as you make any additional changes to the program.
Well I made a try of your code with MVSC 2008 in debug mode. Compiler happens to create a jmp table with relative offsets, and &foo and &goo are just entries in that table.
So even if you have successfully created an executable buffer and copied the code (much more than was useful...) the relative jump now points to a different location and (in my example) soon fell in a int 3 trap!
TL/DR: as compiler can arrange its code at will, and as many jump use relative offsets, you cannot rely on copying executable code. It is really Undefined Behaviour:
if compiler had been smart enough to just generate something like :
mov AX, 4
ret
it could have worked
if compiler has generated more complicated code with a relative jump it just breaks
Conclusion: you can only copy executable code if you have full control on the binary machine code for example if you used assembly code and know you will have no relocation problem
You need to declare foo and goo as static or will have to disable Incremental Linking.
Incremental linking is used to shorten the linking time when building your applications, the difference between normally and incrementally linked executables is that in incrementally linked ones each function call goes through an extra JMP instruction emitted by the linker.
These JMPs allow the linker to move the functions around in memory without updating all the CALL instructions that reference the function. But it's exactly this JMP that causes problems in your case. Declaring a function as static prevents the linker from creating this extra JMP instruction.
I want to take a piece of code, copy it into a global array and execute it from there.
In other words, I am trying to to copy a bunch of instructions from the code-section into the data-section, and then set the program-counter to continue the execution of the program from the data-section.
Here is my code:
#include <stdio.h>
#include <string.h>
typedef void(*func)();
static void code_section_func()
{
printf("hello");
}
#define CODE_SIZE 73
// I verified this size in the disassembly of 'code_section_func'
static long long data[(CODE_SIZE-1)/sizeof(long long)+1];
// I am using 'long long' in order to obtain the maximum alignment
int main()
{
func data_section_func = (func)data;
memcpy((void*)data_section_func,(void*)code_section_func,CODE_SIZE);
data_section_func();
return 0;
}
I might have been naive thinking it could work, so I'd be happy to get an explanation why it didn't.
For example, after a program is loaded into memory, does the MMU restrict instruction-fetching to a specific area within the memory address space of the process (i.e., the code-section of the program)?
For the protocol, I have tested this with VS2013 compiler over a 64-bit OS and an x64-based processor.
Thanks
Windows (and many other modern OSes) by default sets the data section as read/write/no-execute, so attempting to "call" a data object will fail.
Instead, you should VirtualAlloc a chunk of memory with the PAGE_EXECUTE_READWRITE protection. Note, it may be necessary to use FlushInstructionCache to ensure the newly-copied code is executed.
I typed this into Google, but I only found how-tos in C++.
How can I do it in C?
There are no exceptions in C. In C the errors are notified by the returned value of the function, the exit value of the process, signals to the process (Program Error Signals (GNU libc)) or the CPU hardware interruption (or other notification error form the CPU if there is)(How processor handles the case of division by zero).
Exceptions are defined in C++ and other languages though. Exception handling in C++ is specified in the C++ standard "S.15 Exception handling", there is no equivalent section in the C standard.
In C you could use the combination of the setjmp() and longjmp() functions, defined in setjmp.h. Example from Wikipedia
#include <stdio.h>
#include <setjmp.h>
static jmp_buf buf;
void second(void) {
printf("second\n"); // prints
longjmp(buf,1); // jumps back to where setjmp
// was called - making setjmp now return 1
}
void first(void) {
second();
printf("first\n"); // does not print
}
int main() {
if ( ! setjmp(buf) ) {
first(); // when executed, setjmp returns 0
} else { // when longjmp jumps back, setjmp returns 1
printf("main"); // prints
}
return 0;
}
Note: I would actually advise you not to use them as they work awful with C++ (destructors of local objects wouldn't get called) and it is really hard to understand what is going on. Return some kind of error instead.
There's no built-in exception mechanism in C; you need to simulate exceptions and their semantics. This is usually achieved by relying on setjmp and longjmp.
There are quite a few libraries around, and I'm implementing yet another one. It's called exceptions4c; it's portable and free. You may take a look at it, and compare it against other alternatives to see which fits you most.
Plain old C doesn't actually support exceptions natively.
You can use alternative error handling strategies, such as:
returning an error code
returning FALSE and using a last_error variable or function.
See http://en.wikibooks.org/wiki/C_Programming/Error_handling.
C is able to throw C++ exceptions. It is machine code anyway.
For example, in file bar.c:
#include <stdlib.h>
#include <stdint.h>
extern void *__cxa_allocate_exception(size_t thrown_size);
extern void __cxa_throw (void *thrown_exception, void* *tinfo, void (*dest) (void *) );
extern void * _ZTIl; // typeinfo of long
int bar1()
{
int64_t * p = (int64_t*)__cxa_allocate_exception(8);
*p = 1976;
__cxa_throw(p, &_ZTIl, 0);
return 10;
}
In file a.cc,
#include <stdint.h>
#include <cstdio>
extern "C" int bar1();
void foo()
{
try
{
bar1();
}
catch(int64_t x)
{
printf("good %ld", x);
}
}
int main(int argc, char *argv[])
{
foo();
return 0;
}
To compile it:
gcc -o bar.o -c bar.c && g++ a.cc bar.o && ./a.out
Output
good 1976
https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html has more detail info about __cxa_throw.
I am not sure whether it is portable or not, and I test it with 'gcc-4.8.2' on Linux.
This question is super old, but I just stumbled across it and thought I'd share a technique: divide by zero, or dereference a null pointer.
The question is simply "how to throw", not how to catch, or even how to throw a specific type of exception. I had a situation ages ago where we needed to trigger an exception from C to be caught in C++. Specifically, we had occasional reports of "pure virtual function call" errors, and needed to convince the C runtime's _purecall function to throw something. So we added our own _purecall function that divided by zero, and boom, we got an exception that we could catch on C++, and even use some stack fun to see where things went wrong.
On Windows with Microsoft Visual C++ (MSVC) there's __try ... __except ..., but it's really horrible and you don't want to use it if you can possibly avoid it. Better to say that there are no exceptions.
C doesn't have exceptions.
There are various hacky implementations that try to do it (one example at: http://adomas.org/excc/).
As mentioned in numerous threads, the "standard" way of doing this is using setjmp/longjmp. I posted yet another such solution to https://github.com/psevon/exceptions-and-raii-in-c
This is to my knowledge the only solution that relies on automatic cleanup of allocated resources. It implements unique and shared smartpointers, and allows intermediate functions to let exceptions pass through without catching and still have their locally allocated resources cleaned up properly.
C doesn't support exceptions. You can try compiling your C code as C++ with Visual Studio or G++ and see if it'll compile as-is. Most C applications will compile as C++ without major changes, and you can then use the try... catch syntax.
If you write code with the happy path design pattern (for example, for an embedded device) you may simulate exception error processing (AKA deffering or finally emulation) with operator "goto".
int process(int port)
{
int rc;
int fd1;
int fd2;
fd1 = open("/dev/...", ...);
if (fd1 == -1) {
rc = -1;
goto out;
}
fd2 = open("/dev/...", ...);
if (fd2 == -1) {
rc = -1;
goto out;
}
// Do some with fd1 and fd2 for example write(f2, read(fd1))
rc = 0;
out:
//if (rc != 0) {
(void)close(fd1);
(void)close(fd2);
//}
return rc;
}
It is not actually an exception handler, but it takes you a way to handle error at function exit.
P.S.: You should be careful to use goto only from the same or more deep scopes and never jump a variable declaration.
Implementing exceptions in C by Eric Roberts.
Chapter 4 of C Interfaces and Implementations by Hanson.
A Discipline of Error Handling by Doug Moen
Implementing Exceptions in C (details the article of E. Roberts)
In C we can't use try case to handle the error.
but if you can use Windows.h so you can:
#include <stdio.h>
#include <Windows.h>
#include <setjmp.h>
jmp_buf Buf;
NTAPI Error_Handler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
printf("co loi roi ban oi.!!!\r\n");
longjmp(Buf, 1);
}
void main()
{
AddVectoredExceptionHandler(1, Error_Handler);
int x = 0;
printf("start main\r\n");
if (setjmp(Buf) == 0)
{
int y = 1 / x;
}
printf("end main\r\n");
}
What is the best way for unit testing code paths involving a failed malloc()? In most instances, it probably doesn't matter because you're doing something like
thingy *my_thingy = malloc(sizeof(thingy));
if (my_thingy == NULL) {
fprintf(stderr, "We're so screwed!\n");
exit(EXIT_FAILURE);
}
but in some instances you have choices other than dying, because you've allocated some extra stuff for caching or whatever, and you can reclaim that memory.
However, in those instances where you can try to recover from a failed malloc() that you're doing something tricky and error prone in a code path that's pretty unusual, making testing especially important. How do you actually go about doing this?
I saw a cool solution to this problem which was presented to me by S. Paavolainen. The idea is to override the standard malloc(), which you can do just in the linker, by a custom allocator which
reads the current execution stack of the thread calling malloc()
checks if the stack exists in a database that is stored on hard disk
if the stack does not exist, adds the stack to the database and returns NULL
if the stack did exist already, allocates memory normally and returns
Then you just run your unit test many times: this system automatically enumerates through different control paths to malloc() failure and is much more efficient and reliable than e.g. random testing.
I suggest creating a specific function for your special malloc code that you expect could fail and you could handle gracefully. For example:
void* special_malloc(size_t bytes) {
void* ptr = malloc(bytes);
if(ptr == NULL) {
/* Do something crafty */
} else {
return ptr;
}
}
Then you could unit-test this crafty business in here by passing in some bad values for bytes. You could put this in a separate library and make a mock-library that does behaves special for your testing of the functions which call this one.
This is a kinda gross, but if you really want unit testing, you could do it with #ifdefs:
thingy *my_thingy = malloc(sizeof(thingy));
#ifdef MALLOC_UNIT_TEST_1
my_thingy = NULL;
#endif
if (my_thingy == NULL) {
fprintf(stderr, "We're so screwed!\n");
exit(EXIT_FAILURE);
}
Unfortunately, you'd have to recompile a lot with this solution.
If you're using linux, you could also consider running your code under memory pressure by using ulimit, but be careful.
write your own library that implements malloc by randomly failing or calling the real malloc (either staticly linked or explicitly dlopened)
then LD_PRELOAD it
In FreeBSD I once simply overloaded C library malloc.o module (symbols there were weak) and replaced malloc() implementation with one which had controlled probability to fail.
So I linked statically and started to perform testing. srandom() finished the picture with controlled pseudo-random sequence.
Also look here for a set of good tools that you seems to need by my opinion. At least they overload malloc() / free() to track leaks so it seems as usable point to add anything you want.
You could hijack malloc by using some defines and global parameter to control it... It's a bit hackish but seems to work.
#include <stdio.h>
#include <stdlib.h>
#define malloc(x) fake_malloc(x)
struct {
size_t last_request;
int should_fail;
void *(*real_malloc)(size_t);
} fake_malloc_params;
void *fake_malloc(size_t size) {
fake_malloc_params.last_request = size;
if (fake_malloc_params.should_fail) {
return NULL;
}
return (fake_malloc_params.real_malloc)(size);;
}
int main(void) {
fake_malloc_params.real_malloc = malloc;
void *ptr = NULL;
ptr = malloc(1);
printf("last: %d\n", (int) fake_malloc_params.last_request);
printf("ptr: 0x%p\n", ptr);
return 0;
}