Related
I'm trying to write a function that copies a function (and ends up modify its assembly) and returns it. This works fine for one level of indirection, but at two I get a segfault.
Here is a minimum (not)working example:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I can memcpy into an mmap'ed area once to create a valid function that can be called (g()()). But if I try to apply it again (h()()()) it segfaults. I have confirmed that it correctly creates the copied version of g, but when I execute that version I get a segfault.
Is there some reason why I can't execute code in one mmap'ed area from another mmap'ed area? From exploratory gdb-ing with x/i checks it seems like I can call down successfully, but when I return the function I came from has been erased and replaced with 0s.
How can I get this behaviour to work? Is it even possible?
BIG EDIT:
Many have asked for my rationale as I am obviously doing an XY problem here. That is true and intentional. You see, a little under a month ago this question was posted on the code golf stack exchange. It also got itself a nice bounty for a C/Assembly solution. I gave some idle thought to the problem and realized that by copying a functions body while stubbing out an address with some unique value I could search its memory for that value and replace it with a valid address, thus allowing me to effectively create lambda functions that take a single pointer as an argument. Using this I could get single currying working, but I need the more general currying. Thus my current partial solution is linked here. This is the full code that exhibits the segfault I am trying to avoid. While this is pretty much the definition of a bad idea, I find it entertaining and would like to know if my approach is viable or not. The only thing I'm missing is ability to run a function created from a function, but I can't get that to work.
The code is using relative calls to invoke mmap and memcpy so the copied code ends up calling an invalid location.
You can invoke them through a pointer, e.g.:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define BODY_SIZE 100
void* (*mmap_ptr)(void *addr, size_t length, int prot, int flags,
int fd, off_t offset) = mmap;
void* (*memcpy_ptr)(void *dest, const void *src, size_t n) = memcpy;
int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }
int (*g(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, f, BODY_SIZE);
return r;
}
int (*(*h(void))(void))(void) {
void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memcpy_ptr(r, g, BODY_SIZE);
return r;
}
int main() {
printf("%d\n", f());
printf("%d\n", G()());
printf("%d\n", g()());
printf("%d\n", H()()());
printf("%d\n", h()()()); // This one fails - why?
return 0;
}
I'm trying to write a function that copies a function
I think that is pragmatically not the right approach, unless you know very well machine code for your platform (and then you would not ask the question). Be aware of position independent code (useful because in general mmap(2) would use ASLR and give some "randomness" in the addresses). BTW, genuine self-modifying machine code (i.e. changing some bytes of some existing valid machine code) is today cache and branch-predictor unfriendly and should be avoided in practice.
I suggest two related approaches (choose one of them).
Generate some temporary C file (see also this), e.g. in /tmp/generated.c, then fork a compilation using gcc -Wall -g -O -fPIC /tmp/generated.c -shared -o /tmp/generated.so of it into a plugin, then dlopen(3) (for dynamic loading) that /tmp/generated.so shared object plugin (and probably use dlsym(3) to find function pointers in it...). For more about shared objects, read Drepper's How To Write Shared Libraries paper. Today, you can dlopen many hundreds of thousands of such shared libraries (see my manydl.c example) and C compilers (like recent GCC) are fast enough to compile a few thousand lines of code in a time compatible with interaction (e.g. less than a tenth of second). Generating C code is a widely used practice. In practice you would represent some AST in memory of the generated C code before emitting it.
Use some JIT compilation library, such as GCCJIT, or LLVM, or libjit, or asmjit, etc.... which would generate a function in memory, do the required relocations, and give you some pointer to it.
BTW, instead of coding in C, you might consider using some homoiconic language implementation (such as SBCL for Common Lisp, which compiles to machine code at every REPL interaction, or any dynamically contructed S-expr program representation).
The notions of closures and of callbacks are worthwhile to know. Read SICP and perhaps Lisp In Small Pieces (and of course the Dragon Book, for general compiler culture).
this question was posted on code golf.SE
I updated the 8086 16-bit code-golf answer on the sum-of-args currying question to include commented disassembly.
You might be able to use the same idea in 32-bit code with a stack-args calling convention to make a modified copy of a machine code function that tacks on a push imm32. It wouldn't be fixed-size anymore, though, so you'd need to update the function size in the copied machine code.
In normal calling conventions, the first arg is pushed last, so you can't just append another push imm32 before a fixed-size call target / leave / ret trailer. If writing a pure asm answer, you could use an alternate calling convention where args are pushed in the other order. Or you could have a fixed-size intro, then an ever-growing sequence of push imm32 + call / leave / ret.
The currying function itself could use a register-arg calling convention, even if you want the target function to use i386 System V for example (stack args).
You'd definitely want to simplify by not supporting args wider than 32 bit, so no structs by value, and no double. (Of course you could chain multiple calls to the currying function to build up a larger arg.)
Given the way the new code-golf challenge is written, I guess you'd compare the total number of curried args against the number of args the target "input" function takes.
I don't think there's any chance you can make this work in pure C with just memcpy; you have to modify the machine code.
This question already has answers here:
How to write self-modifying code in x86 assembly
(7 answers)
Closed 6 years ago.
Is there any way to put processor instructions into array, make its memory segment executable and run it as a simple function:
int main()
{
char myarr[13] = {0x90, 0xc3};
(void (*)()) myfunc = (void (*)()) myarr;
myfunc();
return 0;
}
On Unix (these days, that means "everything except Windows and some embedded and mainframe stuff you've probably never heard of") you do this by allocating a whole number of pages with mmap, writing the code into them, and then making them executable with mprotect.
void execute_generated_machine_code(const uint8_t *code, size_t codelen)
{
// in order to manipulate memory protection, we must work with
// whole pages allocated directly from the operating system.
static size_t pagesize;
if (!pagesize) {
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) fatal_perror("getpagesize");
}
// allocate at least enough space for the code + 1 byte
// (so that there will be at least one INT3 - see below),
// rounded up to a multiple of the system page size.
size_t rounded_codesize = ((codelen + 1 + pagesize - 1)
/ pagesize) * pagesize;
void *executable_area = mmap(0, rounded_codesize,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0);
if (!executable_area) fatal_perror("mmap");
// at this point, executable_area points to memory that is writable but
// *not* executable. load the code into it.
memcpy(executable_area, code, codelen);
// fill the space at the end with INT3 instructions, to guarantee
// a prompt crash if the generated code runs off the end.
// must change this if generating code for non-x86.
memset(executable_area + codelen, 0xCC, rounded_codesize - codelen);
// make executable_area actually executable (and unwritable)
if (mprotect(executable_area, rounded_codesize, PROT_READ|PROT_EXEC))
fatal_perror("mprotect");
// now we can call it. passing arguments / receiving return values
// is left as an exercise (consult libffi source code for clues).
((void (*)(void)) executable_area)();
munmap(executable_area, rounded_codesize);
}
You can probably see that this code is very nearly the same as the Windows code shown in cherrydt's answer. Only the names and arguments of the system calls are different.
When working with code like this, it is important to know that many modern operating systems will not allow you to have a page of RAM that is simultaneously writable and executable. If I'd written PROT_READ|PROT_WRITE|PROT_EXEC in the call to mmap or mprotect, it would fail. This is called the W^X policy; the acronym stands for Write XOR eXecute. It originates with OpenBSD, and the idea is to make it harder for a buffer-overflow exploit to write code into RAM and then execute it. (It's still possible, the exploit just has to find a way to make an appropriate call to mprotect first.)
Depends on the platform.
For Windows, you can use this code:
// Allocate some memory as readable+writable
// TODO: Check return value for error
LPVOID memPtr = VirtualAlloc(NULL, sizeof(myarr), MEM_COMMIT, PAGE_READWRITE);
// Copy data
memcpy(memPtr, myarr, sizeof(myarr);
// Change memory protection to readable+executable
// Again, TODO: Error checking
DWORD oldProtection; // Not used but required for the function
VirtualProtect(memPtr, sizeof(myarr), PAGE_EXECUTE_READ, &oldProtection);
// Assign and call the function
(void (*)()) myfunc = (void (*)()) memPtr;
myfunc();
// Free the memory
VirtualFree(memPtr, 0, MEM_RELEASE);
This codes assumes a myarr array as in your question's code, and it assumes that sizeof will work on it i.e. it has a directly defined size and is not just a pointer passed from elsewhere. If the latter is the case, you would have to specify the size in another way.
Note that here there are two "simplifications" possible, in case you wonder, but I would advise against them:
1) You could call VirtualAlloc with PAGE_EXECUTE_READWRITE, but this is in general bad practice because it would open an attack vector for unwanted code exeuction.
2) You could call VirtualProtect on &myarr directly, but this would just make a random page in your memory executable which happens to contain your array executable, which is even worse than #1 because there might be other data in this page as well which is now suddenly executable as well.
For Linux, I found this on Google but I don't know much about it.
Very OS-dependent: not all OSes will deliberately (read: without a bug) allow you to execute code in the data segment. DOS will because it runs in Real Mode, Linux can also with the appropriate privileges. I don't know about Windows.
Casting is often undefined and has its own caveats, so some elaboration on that topic here. From C11 standard draft N1570, §J.5.7/1:
A pointer to an object or to void may
be cast to a pointer to a function, allowing data to be invoked as a
function (6.5.4).
(Formatting added.)
So, it's perfectly fine and should work as expected. Of course, you would need to cohere to the ABI's calling convention.
Im trying to copy a function i have to an executable page and run it from there, but i seem to be having some problems.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <windows.h>
int foo()
{
return 4;
}
int goo()
{
return 5;
}
int main()
{
int foosize = (int)&goo-(int)&foo;
char* buf = VirtualAlloc(NULL, foosize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == NULL)
{
printf("Failed\n");
return 1;
}
printf("foo %x goo %x size foo %d\n", &foo, &goo, foosize);
memcpy (buf, (void*)&foo, foosize);
int(*f)() = &foo;
int ret1 = f();
printf("ret 1 %d\n", ret1);
int(*f2)() = (int(*)())&buf;
int ret2 = f2 (); // <-- crashes here
printf("ret2 %d\n", ret2);
return 0;
}
I know some of the code is technically UB ((int)&goo-(int)&foo), but it behaves fine in this case.
My question is why is this not working as expected?
It seems to me i mapped a page as executable and copied an existing function there and im just calling it.
What am i missing?
Would this behave differently on linux with mmap?
Thanks in advance
As everyone has already stated in comments, this is totally undefined behavior and should never really expect to work. However, I played with your code some with the debugger and realized the reason it's not working (at least in Cygwin gcc compiler) is you're creating f2 incorrectly to point to the the address of the pointer storing the allocated memory, namely buf. You want to point to the memory that buf points to. Therefore, your assignment should be
int(*f2)() = (int(*)())buf;
With that change, your code executes for me. But even if it works, it might break again as soon as you make any additional changes to the program.
Well I made a try of your code with MVSC 2008 in debug mode. Compiler happens to create a jmp table with relative offsets, and &foo and &goo are just entries in that table.
So even if you have successfully created an executable buffer and copied the code (much more than was useful...) the relative jump now points to a different location and (in my example) soon fell in a int 3 trap!
TL/DR: as compiler can arrange its code at will, and as many jump use relative offsets, you cannot rely on copying executable code. It is really Undefined Behaviour:
if compiler had been smart enough to just generate something like :
mov AX, 4
ret
it could have worked
if compiler has generated more complicated code with a relative jump it just breaks
Conclusion: you can only copy executable code if you have full control on the binary machine code for example if you used assembly code and know you will have no relocation problem
You need to declare foo and goo as static or will have to disable Incremental Linking.
Incremental linking is used to shorten the linking time when building your applications, the difference between normally and incrementally linked executables is that in incrementally linked ones each function call goes through an extra JMP instruction emitted by the linker.
These JMPs allow the linker to move the functions around in memory without updating all the CALL instructions that reference the function. But it's exactly this JMP that causes problems in your case. Declaring a function as static prevents the linker from creating this extra JMP instruction.
I'm trying to hack another program by changing the EIP of it. There are two programs running, one is the target, that tells where the function that is the "core-function"(e.g. a function that receive a password string as a parameter and returns true or false) is in memory.
Then now that I know where the core-function is I wanna modify the EIP with the other program so the target program can call my function and simply get a true out of it and print out a beautiful "access granted".
My code is now like this:
Target Program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int checkPwd(char *pwd)
{
printf("\nstill in the function\n");
if(strcmp(pwd, "patrick") == 0) return true;
else return false;
}
int main()
{
char pwd[16];
printf("%d", checkPwd);
scanf("%s", &pwd);
system("pause");
if(checkPwd(pwd)) printf("Granted!\n");
else printf("Not granted\n");
system("pause");
}
Attacker Program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <memory.h>
int returnTrue()
{
return true;
}
int main()
{
int hex;
scanf("%d", &hex);
memcpy((void*)hex, (void*)returnTrue, sizeof(char)*8);
system("pause");
}
I wanna add that I tried to put the hex code directly(without the scanf part) in the attacker program and did not work, it crashed.
So I think I'm missing some part of the theory in here. I'd be glad to know what is it.
Thanks in advance.
This won't work—the processes occupy different memory spaces!
Modern operating systems are designed to protect user programs from exactly this kind of attack. One process doesn't have access to the memory of another—and indeed, the addresses of data are only valid inside that process.
When a program is running, it has its own view of memory, and only can "see" memory that the kernel has instructed the memory management unit (MMU) to map for it.
Some references:
Mapping of Virtual Address to Physical Address
Printing same physical address in a c program
Why are these two addresses not the same?
It is possible to inject a function into another process but it is a little more involved than you think. The first thing is you need the proper length of the function you can do this by creating two functions.
static int realFunction() { ... }
static void realFunctionEnd() {}
Now when you copy the function over you do the length of:
realFunctionEnd - realFunction
This will give you the size. Now you cannot just call the other functions because as stated they are not guranteed to be at the same address in the other process, but you can assume that , I will assume windows, that kernal32.dll is at the same address so you can actually pass that to the realFunction when you create a remote thread.
Now, as to your real issue. What you need to do is to either inject a dll or copy a function over into the other process and then hook the function that you need to change. You can do this by copying another function over and making that code executable and then overwriting the first five bytes of the target function with a jump to your injected code, or you can do a proper detour type hook. In either case it should work. Or, you can find the offset into the function and patch it yourself by writing the proper op codes in place of the real code, such as a return of true.
Some kind of injection or patching is required to complete this, you have the basic idea, but there is a little more to it than you think at the moment. I have working code for windows to copy a function into another process, but I believe it is a good learning experience.
I would like to write a program to consume all the memory available to understand the outcome. I've heard that linux starts killing the processes once it is unable to allocate the memory.
Can anyone help me with such a program.
I have written the following, but the memory doesn't seem to get exhausted:
#include <stdlib.h>
int main()
{
while(1)
{
malloc(1024*1024);
}
return 0;
}
You should write to the allocated blocks. If you just ask for memory, linux might just hand out a reservation for memory, but nothing will be allocated until the memory is accessed.
int main()
{
while(1)
{
void *m = malloc(1024*1024);
memset(m,0,1024*1024);
}
return 0;
}
You really only need to write 1 byte on every page (4096 bytes on x86 normally) though.
Linux "over commits" memory. This means that physical memory is only given to a process when the process first tries to access it, not when the malloc is first executed. To disable this behavior, do the following (as root):
echo 2 > /proc/sys/vm/overcommit_memory
Then try running your program.
Linux uses, by default, what I like to call "opportunistic allocation". This is based on the observation that a number of real programs allocate more memory than they actually use. Linux uses this to fit a bit more stuff into memory: it only allocates a memory page when it is used, not when it's allocated with malloc (or mmap or sbrk).
You may have more success if you do something like this inside your loop:
memset(malloc(1024*1024L), 'w', 1024*1024L);
In my machine, with an appropriate gb value, the following code used 100% of the memory, and even got memory into the swap.
You can see that you need to write only one byte in each page: memset(m, 0, 1);,
If you change the page size: #define PAGE_SZ (1<<12) to a bigger page size: #define PAGE_SZ (1<<13) then you won't be writing to all the pages you allocated, thus you can see in top that the memory consumption of the program goes down.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGE_SZ (1<<12)
int main() {
int i;
int gb = 2; // memory to consume in GB
for (i = 0; i < ((unsigned long)gb<<30)/PAGE_SZ ; ++i) {
void *m = malloc(PAGE_SZ);
if (!m)
break;
memset(m, 0, 1);
}
printf("allocated %lu MB\n", ((unsigned long)i*PAGE_SZ)>>20);
getchar();
return 0;
}
A little known fact (though it is well documented) - you can (as root) prevent the OOM killer from claiming your process (or any other process) as one of its victims. Here is a snippet from something directly out of my editor, where I am (based on configuration data) locking all allocated memory to avoid being paged out and (optionally) telling the OOM killer not to bother me:
static int set_priority(nex_payload_t *p)
{
struct sched_param sched;
int maxpri, minpri;
FILE *fp;
int no_oom = -17;
if (p->cfg.lock_memory)
mlockall(MCL_CURRENT | MCL_FUTURE);
if (p->cfg.prevent_oom) {
fp = fopen("/proc/self/oom_adj", "w");
if (fp) {
/* Don't OOM me, Bro! */
fprintf(fp, "%d", no_oom);
fclose(fp);
}
}
I'm not showing what I'm doing with scheduler parameters as its not relevant to the question.
This will prevent the OOM killer from getting your process before it has a chance to produce the (in this case) desired effect. You will also, in effect, force most other processes to disk.
So, in short, to see fireworks really quickly...
Tell the OOM killer not to bother you
Lock your memory
Allocate and initialize (zero out) blocks in a never ending loop, or until malloc() fails
Be sure to look at ulimit as well, and run your tests as root.
The code I showed is part of a daemon that simply can not fail, it runs at a very high weight (selectively using the RR or FIFO scheduler) and can not (ever) be paged out.
Have a look at this program.
When there is no longer enough memory malloc starts returning 0
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(1)
{
printf("malloc %d\n", (int)malloc(1024*1024));
}
return 0;
}
On a 32-bit Linux system, the maximum that a single process can allocate in its address space is approximately 3Gb.
This means that it is unlikely that you'll exhaust the memory with a single process.
On the other hand, on 64-bit machine you can allocate as much as you like.
As others have noted, it is also necessary to initialise the memory otherwise it does not actually consume pages.
malloc will start giving an error if EITHER the OS has no virtual memory left OR the process is out of address space (or has insufficient to satisfy the requested allocation).
Linux's VM overcommit also affects exactly when this is and what happens, as others have noted.
I just exexuted #John La Rooy's snippet:
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(1)
{
printf("malloc %d\n", (int)malloc(1024*1024));
}
return 0;
}
but it exhausted my memory very fast and cause the system hanged so that I had to restart it.
So I recommend you change some code.
For example:
On my ubuntu 16.04 LTS the code below takes about 1.5 GB ram, physical memory consumption raised from 1.8 GB to 3.3 GB after executing and go down back to 1.8GiB after finishing execution.Though it looks like I have allocate 300GiB ram in the code.
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(int i<300000)
{
printf("malloc %p\n", malloc(1024*1024));
i += 1;
}
return 0;
}
When index i is less then 100000(ie, allocate less than 100 GB), either physical or virtual memory are just very slightly used(less then 100MB), I don't know why, may be there is something to do with virtual memory.
One thing interesting is that when the physical memory begins to shrink, the addresses malloc() returns definitely changes, see picture link below.
I used malloc() and calloc(), seems that they behave similarily in occupying physical memory.
memory address number changes from 48 bits to 28 bits when physical memory begins shrinking
I was bored once and did this. Got this to eat up all memory and needed to force a reboot to get it working again.
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char** argv)
{
while(1)
{
malloc(1024 * 4);
fork();
}
}
If all you need is to stress the system, then there is stress tool, which does exactly what you want. It's available as a package for most distros.