This has been pending for a long time in my list now. In brief - I need to run mocked_dummy() in the place of dummy() ON RUN-TIME, without modifying factorial(). I do not care on the entry point of the software. I can add up any number of additional functions (but cannot modify code within /*---- do not modify ----*/).
Why do I need this?
To do unit tests of some legacy C modules. I know there are a lot of tools available around, but if run-time mocking is possible I can change my UT approach (add reusable components) make my life easier :).
Platform / Environment?
Linux, ARM, gcc.
Approach that I'm trying with?
I know GDB uses trap/illegal instructions for adding up breakpoints (gdb internals).
Make the code self modifiable.
Replace dummy() code segment with illegal instruction, and return as immediate next instruction.
Control transfers to trap handler.
Trap handler is a reusable function that reads from a unix domain socket.
Address of mocked_dummy() function is passed (read from map file).
Mock function executes.
There are problems going ahead from here. I also found the approach is tedious and requires good amount of coding, some in assembly too.
I also found, under gcc each function call can be hooked / instrumented, but again not very useful since the the function is intended to be mocked will anyway get executed.
Is there any other approach that I could use?
#include <stdio.h>
#include <stdlib.h>
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
int (*fp)(int) = atoi(argv[1]);
printf("fp = %x\n",fp);
printf("factorial of 5 is = %d\n",fp(5));
printf("factorial of 5 is = %d\n",factorial(5));
return 1;
}
test-dept is a relatively recent C unit testing framework that allows you to do runtime stubbing of functions. I found it very easy to use - here's an example from their docs:
void test_stringify_cannot_malloc_returns_sane_result() {
replace_function(&malloc, &always_failing_malloc);
char *h = stringify('h');
assert_string_equals("cannot_stringify", h);
}
Although the downloads section is a little out of date, it seems fairly actively developed - the author fixed an issue I had very promptly. You can get the latest version (which I've been using without issues) with:
svn checkout http://test-dept.googlecode.com/svn/trunk/ test-dept-read-only
the version there was last updated in Oct 2011.
However, since the stubbing is achieved using assembler, it may need some effort to get it to support ARM.
This is a question I've been trying to answer myself. I also have the requirement that I want the mocking method/tools to be done in the same language as my application. Unfortunately this cannot be done in C in a portable way, so I've resorted to what you might call a trampoline or detour. This falls under the "Make the code self modifiable." approach you mentioned above. This is were we change the actually bytes of a function at runtime to jump to our mock function.
#include <stdio.h>
#include <stdlib.h>
// Additional headers
#include <stdint.h> // for uint32_t
#include <sys/mman.h> // for mprotect
#include <errno.h> // for errno
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
typedef void (*dummy_fun)(void);
void set_run_mock()
{
dummy_fun run_ptr, mock_ptr;
uint32_t off;
unsigned char * ptr, * pg;
run_ptr = dummy;
mock_ptr = mocked_dummy;
if (run_ptr > mock_ptr) {
off = run_ptr - mock_ptr;
off = -off - 5;
}
else {
off = mock_ptr - run_ptr - 5;
}
ptr = (unsigned char *)run_ptr;
pg = (unsigned char *)(ptr - ((size_t)ptr % 4096));
if (mprotect(pg, 5, PROT_READ | PROT_WRITE | PROT_EXEC)) {
perror("Couldn't mprotect");
exit(errno);
}
ptr[0] = 0xE9; //x86 JMP rel32
ptr[1] = off & 0x000000FF;
ptr[2] = (off & 0x0000FF00) >> 8;
ptr[3] = (off & 0x00FF0000) >> 16;
ptr[4] = (off & 0xFF000000) >> 24;
}
int main(int argc, char * argv[])
{
// Run for realz
factorial(5);
// Set jmp
set_run_mock();
// Run the mock dummy
factorial(5);
return 0;
}
Portability explanation...
mprotect() - This changes the memory page access permissions so that we can actually write to memory that holds the function code. This isn't very portable, and in a WINAPI env, you may need to use VirtualProtect() instead.
The memory parameter for mprotect is aligned to the previous 4k page, this also can change from system to system, 4k is appropriate for vanilla linux kernel.
The method that we use to jmp to the mock function is to actually put down our own opcodes, this is probably the biggest issue with portability because the opcode I've used will only work on a little endian x86 (most desktops). So this would need to be updated for each arch you plan to run on (which could be semi-easy to deal with in CPP macros.)
The function itself has to be at least five bytes. The is usually the case because every function normally has at least 5 bytes in its prologue and epilogue.
Potential Improvements...
The set_mock_run() call could easily be setup to accept parameters for reuse. Also, you could save the five overwritten bytes from the original function to restore later in the code if you desire.
I'm unable to test, but I've read that in ARM... you'd do similar but you can jump to an address (not an offset) with the branch opcode... which for an unconditional branch you'd have the first bytes be 0xEA and the next 3 bytes are the address.
Chenz
An approach that I have used in the past that has worked well is the following.
For each C module, publish an 'interface' that other modules can use. These interfaces are structs that contain function pointers.
struct Module1
{
int (*getTemperature)(void);
int (*setKp)(int Kp);
}
During initialization, each module initializes these function pointers with its implementation functions.
When you write the module tests, you can dynamically changes these function pointers to its mock implementations and after testing, restore the original implementation.
Example:
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummyFn(void)
{
printf("__%s__()\n",__func__);
}
static void (*dummy)(void) = dummyFn;
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
void (*oldDummy) = dummy;
/* with the original dummy function */
printf("factorial of 5 is = %d\n",factorial(5));
/* with the mocked dummy */
oldDummy = dummy; /* save the old dummy */
dummy = mocked_dummy; /* put in the mocked dummy */
printf("factorial of 5 is = %d\n",factorial(5));
dummy = oldDummy; /* restore the old dummy */
return 1;
}
You can replace every function by the use of LD_PRELOAD. You have to create a shared library, which gets loaded by LD_PRELOAD. This is a standard function used to turn programs without support for SOCKS into SOCKS aware programs. Here is a tutorial which explains it.
Related
I have some software which uses the documented API for RSA's Authentication Agent. This is a product which runs as a service on the client machines in a domain, and authenticates users locally by communicating with an "RSA Authentication Manager" installed centrally.
The Authentication Agent's API is publicly documented here: Authentication Agent API 8.1.1 for C Developers Guide. However, the docs seem to be incorrect, and I do not have access to the RSA header files - they are not public; only the PDF documentation is available for download without paying $$ to RSA. If anyone here has access to up to date header files, would you be able to confirm for me whether the documentation is out of date?
The function signatures given in the API docs seem incorrect - in fact, I'm absolutely convinced that they are wrong on x64 machines. For example, the latest PDF documentation shows the following:
int WINAPI AceSetUserData(SDI_HANDLE hdl, unsigned int userData)
int WINAPI AceGetUserData(SDI_HANDLE hdl, unsigned int *pUserData)
The documentation states several times that the "userData" value is a 32-bit quantity, for example in the documentation for AceInit, AceSetUserData, and AceGetUserData. A relevant excerpt from the docs for AceGetUserData:
This function is synchronous and the caller must supply, as the second argument, a pointer to a 32-bit storage area (that is, an unsigned int) into which to copy the user data value.
This is clearly false - from some experimentation, if you pass in a pointer to the center of a buffer filled with 0xff, AceGetUserData is definitely writing out a 64-bit value, not a 32-bit quantity.
My version of aceclnt.dll is 8.1.3.563; the corresponding documentation is labelled "Authentication Agent API 8.1 SP1", and this corresponds to version 7.3.1 of the Authentication Agent itself.
Test code
Full test code given, even though it's not relevant to the problem at all... It's no use to me if someone else runs the test code (I know what it does!), what I need is someone with access to the RSA header files who can confirm the function signatures.
#include <assert.h>
#include <stdlib.h>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#include <tchar.h>
#define SDAPI WINAPI
#else
#define SDAPI
#endif
typedef int SDI_HANDLE;
typedef uint32_t SD_BOOL;
typedef void (SDAPI* AceCallback)(SDI_HANDLE);
#define ACE_SUCCESS 1
#define ACE_PROCESSING 150
typedef SD_BOOL (SDAPI* AceInitializeEx_proto)(const char*, char*, uint32_t);
typedef int (SDAPI* AceInit_proto)(SDI_HANDLE*, void*, AceCallback);
typedef int (SDAPI* AceClose_proto)(SDI_HANDLE, AceCallback);
typedef int (SDAPI* AceGetUserData_proto)(SDI_HANDLE, void*);
typedef int (SDAPI* AceSetUserData_proto)(SDI_HANDLE, void*);
struct Api {
AceInitializeEx_proto AceInitializeEx;
AceInit_proto AceInit;
AceClose_proto AceClose;
AceGetUserData_proto AceGetUserData;
AceSetUserData_proto AceSetUserData;
} api;
static void api_init(struct Api* api) {
// All error-checking stripped...
HMODULE dll = LoadLibrary(_T("aceclnt.dll")); // leak this for the demo
api->AceInitializeEx = (AceInitializeEx_proto)GetProcAddress(dll, "AceInitializeEx");
api->AceInit = (AceInit_proto)GetProcAddress(dll, "AceInit");
api->AceClose = (AceClose_proto)GetProcAddress(dll, "AceClose");
api->AceGetUserData = (AceGetUserData_proto)GetProcAddress(dll, "AceGetUserData");
api->AceSetUserData = (AceSetUserData_proto)GetProcAddress(dll, "AceSetUserData");
int success = api->AceInitializeEx("C:\\my\\conf\\directory", 0, 0);
assert(success);
}
static void demoFunction(SDI_HANDLE handle) {
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
int err = api.AceGetUserData(handle, (void*)(u.testBuffer + sizeof(void*)));
assert(err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
fprintf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
// Prints:
// DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
// According to the docs, this should only write out a 32-bit value
}
static void SDAPI demoCallback(SDI_HANDLE h) {
fprintf(stderr, "Callback invoked, handle = %p\n", (void*)h);
}
int main(int argc, const char** argv)
{
api_init(&api);
SDI_HANDLE h;
int err = api.AceInit(&h, /* contentious argument */ 0, &demoCallback);
assert(err == ACE_PROCESSING);
demoFunction(h);
api.AceClose(h, 0);
return 0;
}
As you've copied the function/type definitions out of the documentation, you basically don't have and never will have the correct definition for the version of the .dll you're using and could always end up in crashes or worse, undefined behavior.
What you could do is to debug the corresponding .dll:
Do you run Visual Studio? I remember that VS could enter a function call in debug mode and show the assembly, not sure though how it is today. But any disassembler should do the trick. As of x64 ABI register rcx gets the first argument, rdx the second. If the function internally works with the 32bit register names or clears the upper 32bit than you can assume a 32bit integer. If it uses it to load an address (e.g. lea instruction) you could assume a pointer. But as you can see, that's probably not a road you wanna go down...
So what else do you have left?
The document you've linked states a 32-bit and 64-bit library - depending on the platform you use. I guess you use the 64bit lib and that RSA did not update the documentation for this library, but at some point the developers needed to upgrade the library to 64bit.
So think about this way: If you would be the API developer, what is possible to migrate to 64bit and what not. E.g. everything that needs to work across 32/64 implementations (stuff that gets send over the network or stored and shared on disk) cannot be touched. But everything that's local to the instance, can be migrated. As the userData seems to be a runtime thing, it makes sense to support whatever the platform provides: unsigned long on 64bit and unsigned int on 32bit.
You've figured out that userData must be 64 bit. But not because the function writes out a 64bit integer, but because the function sees a 64bit value to start with. As integers are passed by value (I guess in general, but definitely in WINAPI), there's absolutely no chance the function could see the full 64bit value if it would be a 32bit datatype. So most likely, the API developers changed the datatype to unsigned long (in any case to 64bit type).
PS: If you end up putting a pointer into userData, cast the pointer to uintptr_t and store/read that type.
To avoid questions of undefined behavior, please replace your test function with this one, and report what it prints. Please also show us the complete test program, so that people who have access to this library can compile and run it for themselves and tinker with it. I would especially like to see the declarations of the api global and its type, and the code that initializes api, and to know where the type came from (did you make it up as part of this reverse engineering exercise or did you get it from somewhere?)
static void demoFunction(SDI_HANDLE handle) {
int err = api.AceSetUserData(handle, 0);
assert(err == ACE_SUCCESS);
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
err = api.AceGetUserData(handle, (void *)(u.testBuffer + sizeof(void*)));
assert (err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
printf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
}
(If your hypothesis is correct, the output will be
DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
.)
this is a bit of a strange use case so searching for existing discussion is difficult. I'm programming for embedded systems (Microchip PIC24 using XC16 compiler) and am currently implementing a communication protocol identically across 3 separate UART channels (each UART will grab data from a master data table).
The way I started out writing the project was to have each UART handled by a separate module, with a lot of code duplication, along the lines of the following pseudocode:
UART1.c:
static unsigned char buffer[128];
static unsigned char pointer = 0;
static unsigned char packet_received = 0;
void interrupt UART1Receive (void) {
buffer[pointer++] = UART1RX_REG;
if (end of packet condition) packet_received = 1;
}
void processUART1(void) { // This is called regularly from main loop
if (packet_received) {
// Process packet
}
}
UART2.c:
static unsigned char buffer[128];
static unsigned char pointer = 0;
static unsigned char packet_received = 0;
void interrupt UART2Receive (void) {
buffer[pointer++] = UART2RX_REG;
if (end of packet condition) packet_received = 1;
}
void processUART2(void) { // This is called regularly from main loop
if (packet_received) {
// Process packet
}
}
While the above is neat and works well, in practice the communication protocol itself is quite complex, so having it duplicated three times (simply with changes to references to the UART registers) is increasing the opportunity for bugs to be introduced. Having a single function and passing pointers to it is not an option, since this will have too great an impact on speed. The code needs to be physically duplicated in memory for each UART.
I gave it a lot of thought and despite knowing the rules of never putting functions in a header file, decided to try a specific header file that included the duplicate code, with references as #defined values:
protocol.h:
// UART_RECEIVE_NAME and UART_RX_REG are just macros to be defined
// in calling file
void interrupt UART_RECEIVE_NAME (void) {
buffer[pointer++] = UART_RX_REG;
if (end of packet condition) packet_received = 1;
}
UART1.c:
static unsigned char buffer[128];
static unsigned char pointer = 0;
static unsigned char packet_received = 0;
#define UART_RECEIVE_NAME UART1Receive
#define UART_RX_REG UART1RX_REG
#include "protocol.h"
void processUART1(void) { // This is called regularly from main loop
if (packet_received) {
// Process packet
}
}
UART2.c:
static unsigned char buffer[128];
static unsigned char pointer = 0;
static unsigned char packet_received = 0;
#define UART_RECEIVE_NAME UART2Receive
#define UART_RX_REG UART2RX_REG
#include "protocol.h"
void processUART2(void) { // This is called regularly from main loop
if (packet_received) {
// Process packet
}
}
I was slightly surprised when the code compiled without any errors! It does seem to work though, and post compilation MPLAB X can even work out all of the symbol references so that every macro reference in UART1.c and UART2.c don't get identified as an unresolvable identifier. I did then realise I should probably rename the protocol.h file to protocol.c (and update the #includes accordingly), but that's not practically a big deal.
There is only one downside: the IDE has no idea what to do while stepping through code included from protocol.h while simulating or debugging. It just stays at the calling instruction while the code executes, so debugging will be a little more difficult.
So how hacky is this solution? Will the C gods smite me for even considering this? Are there any better alternatives that I've overlooked?
An alternative is to define a function macro that contains the body of code. Some token pasting operators can automatically generate the symbol names required. Multi-line macros can be generated by using \ at the end of all but the last line.
#define UART_RECEIVE(n) \
void interrupt UART##n##Receive (void) { \
buffer[pointer++] = UART##n##RX_REG; \
if (end of packet condition) packet_received = 1; \
}
UART_RECEIVE(1)
UART_RECEIVE(2)
Using macros for this purpose seems for mee to be a bad idea. Making debugging impossible is just one disadvantage. It also makes it difficult to understand, by hiding the real meaning of symbols. And interrupt routines should realy be kept independant and short, with common functions hidden in handler functions.
The first thing I would do is to define a common buffer struct for each UART. This makes it possible with simultanous communications. If each uart needs a separate handler function for the messages, it can be included as a function pointer. The syntax is a bit
complicated, but it results in efficient code.
typedef struct uart_buf uart_buf_t;
struct uart_buf {
uint8_t* buffer;
int16_t inptr;
bool packet_received;
void (*handler_func)(uart_buf_t*);
};
uart_buf_t uart_buf_1;
uart_buf_t uart_buf_2;
Then each interrupt handler will be like this:
void interrupt UART1Receive (void) {
handle_input(UART1RX_REG, &uart_buf_1);
}
void interrupt UART2Receive (void) {
handle_input(UART2RX_REG, &uart_buf_2);
}
And the common handler will be:
void handle_input(uint8_t in_char, *buff) {
buf->buffer[buf->inptr++] = in_char;
if (in_char=LF)
buf->packet_received = true;
buf->handler_func(buf);
}
}
And the message handler is:
void hadle_packet(uart_buf_t* buf) {
... code to handle message
buf->packet_received=0;
}
And the function pointers must be initialized:
void init() {
uart_buf_1.handler_func=handler1;
uart_buf_2.handler_func=handler1;
}
The resulting code is very flexible, and can be easily changed. Single-steping the code is no problem.
I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).
Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!
So, with that in mind, I have the following small C program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int *m = malloc(sizeof(int));
*m = 0x90; // NOP instruction code
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "eax");
return 42;
}
Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path?
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way.
Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
unsigned char *m;
int main() {
unsigned int pagesize = getpagesize();
printf("pagesize: %u\n", pagesize);
m = malloc(1023+pagesize+1);
if(m==NULL) return(1);
printf("%p\n", m);
m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
printf("%p\n", m);
if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
printf("mprotect fail...\n");
return 0;
}
m[0] = 0xc9; //leave
m[1] = 0xc3; //ret
m[2] = 0x90; //nop
printf("%p\n", m);
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "ebx");
return 21;
}
Question: Am I on the right path?
I would say yes.
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of?
Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:
typedef void (*generated_function)(void);
void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;
*o++ = 0x90; // NOP
*o++ = 0xcb; // RET
func_exec();
I want to write a piece of code that changes itself continuously, even if the change is insignificant.
For example maybe something like
for i in 1 to 100, do
begin
x := 200
for j in 200 downto 1, do
begin
do something
end
end
Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.
Is writing such a code possible ? Would I need to use inline assembly for that ?
EDIT :
Here is why I want to do it in C:
This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.
You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.
If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.
A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.
Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.
Below is the first code :
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
void foo(void);
int change_page_permissions_of_address(void *addr);
int main(void) {
void *foo_addr = (void*)foo;
// Change the permissions of the page that contains foo() to read, write, and execute
// This assumes that foo() is fully contained by a single page
if(change_page_permissions_of_address(foo_addr) == -1) {
fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
return 1;
}
// Call the unmodified foo()
puts("Calling foo...");
foo();
// Change the immediate value in the addl instruction in foo() to 42
unsigned char *instruction = (unsigned char*)foo_addr + 18;
*instruction = 0x2A;
// Call the modified foo()
puts("Calling foo...");
foo();
return 0;
}
void foo(void) {
int i=0;
i++;
printf("i: %d\n", i);
}
int change_page_permissions_of_address(void *addr) {
// Move the pointer to the page boundary
int page_size = getpagesize();
addr -= (unsigned long)addr % page_size;
if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
return -1;
}
return 0;
}
It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.
This would be a good start. Essentially Lisp functionality in C:
http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/
Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:
#include <stdio.h>
void multiply_x (int * x, int multiplier)
{
*x *= multiplier;
}
void add_to_x (int * x, int increment)
{
*x += increment;
}
int main (void)
{
int x = 0;
int i;
void (*fp)(int *, int);
for (i = 1; i < 6; ++i) {
fp = (i % 2) ? add_to_x : multiply_x;
fp(&x, i);
printf("%d\n", x);
}
return 0;
}
The output, when we compile and run the program, is:
1
2
5
20
25
Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.
A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil eval() function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute via eval().
The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.
There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.
Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.
In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.
You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.
Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.
Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.
My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.
This just requires leveraging two libraries -- an assembler, and a disassembler:
FASM assembler: https://github.com/ZenLulz/Fasm.NET
Udis86 disassembler: https://github.com/vmt/udis86
We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using VirtualProtect on windows to change page permissions to allow editing the code. On Unix you have to use mprotect instead.
I posted an article on how we did it, as well as the sample code.
These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.
This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.
#include <cstdio>
#include <Memoryapi.h>
#include <windows.h>
using namespace std;
typedef unsigned char byte;
int main(int argc, char** argv){
byte bytes [] = { 0x48, 0x31, 0xC0, 0x48, 0x83, 0xC0, 0x0F, 0xC3 }; //put code here
//xor %rax, %rax
//add %rax, 15
//ret
int size = sizeof(bytes);
DWORD protect = PAGE_READWRITE;
void* meth = VirtualAlloc(NULL, size, MEM_COMMIT, protect);
byte* write = (byte*) meth;
for(int i = 0; i < size; i++){
write[i] = bytes[i];
}
if(VirtualProtect(meth, size, PAGE_EXECUTE_READ, &protect)){
typedef int (*fptr)();
fptr my_fptr = reinterpret_cast<fptr>(reinterpret_cast<long>(meth));
int number = my_fptr();
for(int i = 0; i < number; i++){
printf("I will say this 15 times!\n");
}
return 0;
} else{
printf("Unable to VirtualProtect code with execute protection!\n");
return 1;
}
}
You assemble the code using this tool.
While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.
#include <stdio.h>
#include <string.h>
int main()
{
struct Foo
{
char a;
char b[4];
} foo;
foo.a = 42;
strncpy(foo.b, "foo", 3);
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
*(int*)&foo.a = 1918984746;
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
return 0;
}
$ gcc -o foo foo.c && ./foo
foo.a=42, foo.b="foo"
foo.a=42, foo.b="bar"
First, we change the value of foo.a and foo.b and print the struct. Then we change only the value of foo.a, but observe the output.
I would like to know how in C in can copy the content of a function into memory and the execute it?
I'm trying to do something like this:
typedef void(*FUN)(int *);
char * myNewFunc;
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
}
void f1 (int *v) {
*v = 10;
}
// allocate enough spcae but how much ??
myNewFunc = allocExecutablePages(...)
/* Copy f1 somewere else
* (how? assume that i know the size of f1 having done a (nm -S foo.o))
*/
((FUN)template)(&val);
printf("%i",val);
Thanks for your answers
You seem to have figured out the part about protection flags. If you know the size of the function, now you can just do memcpy() and pass the address of f1 as the source address.
One big caveat is that, on many platforms, you will not be able to call any other functions from the one you're copying (f1), because relative addresses are hardcoded into the binary code of the function, and moving it into a different location it the memory can make those relative addresses turn bad.
This happens to work because function1 and function2 are exactly the same size in memory.
We need the length of function2 for our memcopy so what should be done is:
int diff = (&main - &function2);
You'll notice you can edit function 2 to your liking and it keeps working just fine!
Btw neat trick. Unfurtunate the g++ compiler does spit out invalid conversion from void* to int... But indeed with gcc it compiles perfectly ;)
Modified sources:
//Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
//fixed the diff address to also work when function2 is variable size
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
//printf("hello world");
int k=32;
int l=40;
return x+5+k+l;
}
int main(){
int diff = (&main - &function2);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
Output:
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ gcc memoryFun.c
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ ./a.out
pagesize: 4096, diff: 35
native: 1
memory: 83
native: 1
Another to note is calling printf will segfault because printf is most likely not found due to relative address going wrong...
Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
return x+5;
}
int main(){
int diff = (&function2 - &function1);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
I have tried this issue many times in C and came to the conclusion that it cannot be accomplished using only the C language. My main thorn was finding the length of the function to copy.
The Standard C language does not provide any methods to obtain the length of a function. However, one can use assembly language and "sections" to find the length. Once the length is found, copying and executing is easy.
The easiest solution is to create or define a linker segment that contains the function. Write an assembly language module to calculate and publicly declare the length of this segment. Use this constant for the size of the function.
There are other methods that involve setting up the linker, such as predefined areas or fixed locations and copying those locations.
In embedded systems land, most of the code that copies executable stuff into RAM is written in assembly.
This might be a hack solution here. Could you make a dummy variable or function directly after the function (to be copied), obtain that dummy variable's/function's address and then take the functions address to do sum sort of arithmetic using addresses to obtain the function size? This might be possible since memory is allocated linearly and orderly (rather than randomly). This would also keep function copying within a ANSI C portable nature rather than delving into system specific assembly code. I find C to be rather flexible, one just needs to think things out.