Dereferencing function pointers in C to access CODE memory - c

We are dealing with C here. I'm just had this idea, wondering if it is possible to access the point in memory where a function is stored, say foo and copying the contents of the function to another point in memory. Specifically, I'm trying to get the following to work:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void foo(){
printf("Hello World");
}
int main(){
void (*bar)(void) = malloc(sizeof foo);
memcpy(&bar, &foo, sizeof foo);
bar();
return 0;
}
But running it gives a bus error: Bus error: 10. I'm trying to copy over the contents of function foo into a space of memory bar and then executing the newly created function bar.
This is for no other reason than to see if such a thing is possible, to reveal the intricacies of the C language. I'm not thinking about what practical uses this has.
I'm looking for guidance getting this to work, or otherwise to be told, with a reason, why this won't work
EDIT Looking at some of the answers and learning about read, write, and executable memory, it just dawned upon me that it would be possible to create functions on the fly in C by writing to executable memory.

With standard C, what you try to do is implementation defined behaviour and won't work portably. On a given platform, you might be able to make this work.
The memory malloc gives you is typically not executable. Jumping there causes a bus error (SIGBUS). Assuming you are on a POSIX-like system, either allocate the memory for the function with mmap and flags that cause the memory region to be executable or use mprotect to mark the region as executable.
You also need to be more careful with the amount of memory you provide, you cannot simply take the size of a function and expect that to be the length of the function, sizeof is not designed to provide this kind of functionality. You need to find out the function length using some other approach.

On modern desktops, the virtual memory manager is going to get in your way. Memory regions have three types of access: read, write, and execute. On systems where code segments have only execute permission, the memcpy will fail with a bus error. In the more typical case, where only code segments have the execute permission, you can copy the function, but not run, because the memory region that contains bar will not have execute permission.
Also, determining the size of the function is problematic. Consider the following program
void foo( int *x )
{
printf( "x:(%zu %zu) ", sizeof x, sizeof *x );
}
int main( void )
{
int x = 0;
foo( &x );
printf( "foo:(%zu %zu)\n", sizeof foo, sizeof *foo );
}
On my system, the output is x:(8 4) foo:(1 1) indicating that taking the sizeof a function pointer, or the function itself, is not a supported operation.

Related

Why does setting a value at an arbitrary memory location not work?

I have this code:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
int main (int argc, char** argv) {
*(volatile uint8_t*)0x12345678u = 1;
int var = *(volatile uint8_t*)0x12345678;
printf("%i", var);
printf("%i", &var);
return (EXIT_SUCCESS);
}
I want to see a 1 and the address of that int, which i specified previously. But when compiled by gcc in bash, only "command terminated" without any error will be shown. Does anyone know why so?
PS: I am newbie to C, so just experimenting.
What you are doing:
*(volatile uint8_t*)0x12345678u = 1;
int var = *(volatile uint8_t*)0x12345678;
is totally wrong.
You have no guarantee whatsoever that an arbitrary address like 0x12345678 will be accessible, not to mention writable by your program. In other words, you cannot set a value to an arbitrary address and expect it to work. It's undefined behavior to say the least, and will most likely crash your program due to the operating system stopping you from touching memory you don't own.
The "command terminated" that you get when trying to run your program happens exactly because the operating system is preventing your program from accessing a memory location it is not allowed to access. Your program gets killed before it can do anything.
If you are on Linux, you can use the mmap function to request a memory page at an (almost) arbitrary address before accessing it (see man mmap). Here's an example program which achieves what you want:
#include <sys/mman.h>
#include <stdio.h>
#define WANTED_ADDRESS (void *)0x12345000
#define WANTED_OFFSET 0x678 // 0x12345000 + 0x678 = 0x12345678
int main(void) {
// Request a memory page starting at 0x12345000 of 0x1000 (4096) bytes.
unsigned char *mem = mmap(WANTED_ADDRESS, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Check if the OS correctly granted your program the requested page.
if (mem != WANTED_ADDRESS) {
perror("mmap failed");
return 1;
}
// Get a pointer inside that page.
int *ptr = (int *)(mem + WANTED_OFFSET); // 0x12345678
// Write to it.
*ptr = 123;
// Inspect the results.
printf("Value : %d\n", *ptr);
printf("Address: %p\n", ptr);
return 0;
}
The operating system and loader do not automatically make every possible address available to your program. The virtual address space of your process is constructed on demand by various operations of the program loader and of services inside the process. Although every address “exists” in the sense of being a potential address of memory, what happens when a process attempts to access an address is controlled by special data structures in the system. Those data structures control whether a process can read, write, or execute various portions of memory, whether the virtual addresses are currently mapped to physical memory, and whether the virtual addresses are not currently mapped to memory but will be provide with physical memory when needed. Initially, much of a process’ address space is marked not in use (or at least implicitly marked, in that none of the explicit records for the address space apply to it).
In the executions of your program you have attempted so far, the address 0x12345678 has not been mapped and marked available to your process, so, when your process attempted to use it, the system detected a fault and terminated your process.
(Some systems randomize the layout of the address space when a program is being loaded, to make it harder for an attacker to exploit bugs in a program. Because of this, it is possible that 0x12345678 will be accessible in some executions of your program and not others.)
The quote from C11 standard 6.5.3.2p4:
4 The unary * operator denotes indirection. [...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
You use * operator on (volatile uint8_t*)0x12345678u pointer. Is this a valid pointer? Is it invalid pointer? What is an "invalid value" of a pointer?
There is no check that allows to find out which particilar pointer values are valid, which aren't. It is not implemented in C language. A random pointer may just happen to be a valid pointer. But most, most probably it is an invalid pointer. In which case - the behavior is undefined.
Dereferencing an invalid pointer is undefined behavior. But - outside of C scope and into operating system - on *unix systems trying to access memory that you are not allowed to, should raise a signal SIGSEGV on your program and terminate your program. Most probably this is what happens. Your program is not allowed to access memory location that is behind 0x12345678 value, the operating system specifically protects against that.
Also note, that systems use ASLR, so that pointer values within your program are indeed in some degree random. There are not linear, ie. *(char*)0x01 will not access the first byte in your ram. Operating system (or more exact, the underlying hardware as configured by the operating system) translates pointer values in your program to physical location in ram using what is called virtual memory. The same pointer values may just happen to be valid on the second run of your program. But most probably, because pointers can have so many values, most probably it isn't a valid pointer. Your operating system kills your program, as it detects an invalid memory access.

Memcpy with function pointers leads to a segfault

I know I can just copy the function by reference, but I want to understand what's going on in the following code that produces a segfault.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy(r0c, return0, 100);
printf("Address of r0c is: %x\n", r0c);
printf("copied is: %d\n", (*r0c)());
return 0;
}
Here's my mental model of what I thought should work.
The process owns the memory allocated to r0c. We are copying the data from the data segment corresponding to return0, and the copy is successful.
I thought that dereferencing a function pointer is the same as calling the data segment that the function pointer points to. If that's the case, then the instruction pointer should move to the data segment corresponding to r0c, which will contain the instructions for function return0. The binary code corresponding to return0 doesn't contain any jumps or function calls that would depend on the address of return0, so it should just return 0 and restore ip... 100 bytes is certainly enough for the function pointer, and 0xc3 is well within the bounds of r0c (it is at byte 11).
So why the segmentation fault? Is this a misunderstanding of the semantics of C's function pointers or is there some security feature that prevents self-modifying code that I'm unaware of?
The memory pages used by malloc to allocate memory are not marked as executable. You can't copy code to the heap and expect it to run.
If you want to do something like that you have to go deeper into the operating system, and allocate pages yourself. Then you need to mark those as executable. You would most likely need administrator rights to be able to set the executable flag on memory pages.
And it's really dangerous. If you do this in a program you distribute and have some kind of bug that lets an attacker use our program to write to those allocated memory pages, then the attacker can gain administrator rights and take control of the computer.
There's also other problems with your code, like pointers to functions might not translate well into general pointers on all platforms. It's very hard (not to mention non-standard) to predict or otherwise get the size of a function. You also print out pointers wrong in your code example. (use the "%p" format to print a void *, casting the pointer to a void * is needed).
Also when you declare a function like int fun() that's not the same as declaring a function that takes no arguments. If you want to declare a function that takes no arguments you should explicitly use void as in int fun(void).
The standard says:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
[C2011, 7.24.2.1/2; emphasis added]
In the standard's terminology, functions are not "objects". The standard does not define behavior for the case where the source pointer points to a function, therefore such a memcpy() call produces undefined behavior.
Additionally, the pointer returned by malloc() is an object pointer. C does not provide for direct conversion of object pointers to function pointers, and it does not provide for objects to be called as functions. It is possible to convert between object pointer and function pointer by means of an intermediate integer value, but the effect of doing so is at minimum doubly implementation-defined. Under some circumstances it is undefined.
As in other cases, UB can turn out to be precisely the behavior you hoped for, but it is not safe to rely on that. In this particular case, other answers present good reasons to not expect to get the behavior you hoped for.
As was said in some comments, you need to make the data executable. This requires communicating with the OS to change protections on the data. On Linux, this is the system call int mprotect(void* addr, size_t len, int prot) (see http://man7.org/linux/man-pages/man2/mprotect.2.html).
Here is a Windows solution using VirtualProtect.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#ifdef _WIN32
#include <Windows.h>
#endif
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy((void*) r0c, (void*) return0, 100);
printf("Address of r0c is: %p\n", (void*) r0c);
#ifdef _WIN32
long unsigned int out_protect;
if(!VirtualProtect((void*) r0c, 100, PAGE_EXECUTE_READWRITE, &out_protect)){
puts("Failed to mark r0c as executable");
exit(1);
}
#endif
printf("copied is: %d\n", (*r0c)());
return 0;
}
And it works.
Malloc returns a pointer to an allocated memory (100 bytes in your case). This memory area is uninitialized; assuming that memory could be executed by the CPU, for your code to work, you would have to fill those 100 bytes with the executable instructions that the function implements (if indeed it can be held in 100 bytes). But as has been pointed out, your allocation is on the heap, not in the text (program) segment and I don't think it can be executed as instructions. Perhaps this would achieve what it is you want:
int return0()
{
return 0;
}
typedef int (*r0c)(void);
int main(void)
{
r0c pf = return0;
printf("Address of r0c is: %x\n", pf);
printf("copied is: %d\n", pf());
return 0;
}

Pointer to function that was copied from code

I am studying about memory handling and I came across this code:
void print(const char * str){
printf(str);
}
void (*print_ptr)(const char *)=print;
void foo2(void){
print("goo\n");
return;
}
void baz(void){
print("foo\n");
return;
}
int main()
{
char buf[256];
void (*func_ptr)(void)=(void (*)(void))buf;
memcpy(buf,foo2,((void *)baz)-((void *) foo2));
func_ptr();
return 0;
}
This code will cause seg fault reaching
func_ptr();
I cant understand why. If I change the pointer to point a static function (like func_ptr=&baz it will work properly, but a dynamic code will not.
The code itself, as I understand it, will be copied to the stack, where it should be.
What is wrong with this code?
What you are trying to do is copy the object code consisting of foo2() into your buffer and execute it. This won't work for a number of reasons:
Your code is copied to buf which will be allocated in data space, which is non-executable (i.e. the memory manager will not have execute permission set on that area of memory).
The code is unlikely to be relocatable in the general case. It may either contain absolute references to itself, or relative references to the rest of the code, both of which will break on copying.
You have no guarantee that the code will be compiled with the functions in the order given, so there is no guarantee you are copying just foo2(). In fact there is no guarantee the compiler will produce the foo2() as a single contiguous binary blob. Part of it might (for instance) be after bar(). Or (relatively common case) parts of the function might be before the entry point.
If you really want to understand why it's breaking, fix (1) by allocating the memory for buf with mmap() and MAP_ANON, using PROT_READ|PROT_WRITE|PROT_EXEC, then run it under gdb. I'd suggest compiling with -O0 (disable optimisation) to maximise chances of something working, but I would repeat you have no guarantees.
The larger question is why on earth you want to copy bits of your code around.

How to prevent a malicious function from accessing the heap memory that it is not expected to access?

For a given process, the heap space is shared among all the functions, which can result in some security concerns. In the following code, for example, the function main() is expecting the function foo() to only access the 1024 bytes allocated to p. But the function foo(), if it is malicious, can access the memory out of that range.
One of the approaches to prevent this from happening is to restrict foo() to only access the 1024 bytes allocated to p. Is it necessary to do this? If yes, how can we implement this capability, or is there any other approach to achieve this goal?
void foo(char *p){
printf("value of (p + 2048) is %x\n", *(p + 2048));
}
void main(void){
size_t size = 1024;
char *p = malloc(size);
foo(p);
}
No, there is no way to restrict foo() to accessing only the 1024 bytes allocated for it. A malicious function can always attempt to access past the buffer allocated for it. Some damage it may cause (such as denial-of-service attack by crashing the main service) can be limited by ensuring that foo() is called in a separate process as a user with limited privilege (see POLA).
However, the restriction to access only a limited amount of buffer is usually implemented with the following measures where we assume that foo() is ready to cooperate with its caller. These measures are pointless if foo() is intentionally trying to do harm.
Pass the size around: Pass the size of the allocated buffer to the function foo().
Code carefully: Write the function foo() carefully such that it respects the size of the allocated buffer passed to it and never accesses memory beyond this allocated buffer.
Link to trusted code only: If foo() belongs to a third-party code, ensure that the third-party code can be trusted.
For example,
void foo(char *p, size_t size){
/* printf("value of (p + 2048) is %x\n", *(p + 2048)); */
/* Ensure that this function does not access memory only between
p[0] and p[size - 1], inclusive. */
}
void main(void){
size_t size = 1024;
char *p = malloc(size);
foo(p, size);
}
There are only two situations when the function foo() may act maliciously.
Function foo() relies on external input to decide the memory location to be accessed and that external input can somehow cause foo() to access memory beyond p[size - 1]. This is buffer overflow and needs to be avoided by input validation (code carefully; see point 2 above).
The function foo() comes from another library that you do not trust. Using a library that you do not trust is a bad idea anyway. That library could do things far more worse than buffer overflow. So one has to ensure that the library is good and behaves well before using it. One must never link to an untrusted library (link to trusted code only; see point 3 above).

memcpy a void pointer to a union

Code:
union foo
{
char c;
int i;
};
void func(void * src)
{
union foo dest;
memcpy(&dest, src, sizeof(union foo)); //here
}
If I call func() like this:
int main()
{
char c;
int i;
func(&c);
func(&i);
return 0;
}
In the call func(&c), the size of c is less than sizeof(union foo), which may be dangerous, right?
Is the line with memcpy correct? And if not, how to fix it?
What I want is a safe call to memcpy that copy a void * pointer to a union.
A little background: this is extracted from a very complicated function, and the signature of func() including the void * parameter is out of my control. Of course the example does nothing useful, that's because I removed all the code that isn't relevant to provide an example with minimum code.
In the call func(&c), the size of c is less than sizeof(union foo), which may be dangerous, right?
Right, this will lead to undefined behaviour. dest will likely contain some bytes from memory areas surrounding c, and which these are depends on the internal workings of the compiler. Of course, as long as you only access dest.c, that shouldn't cause any problems in most cases.
But let me be more specific. According to the C standard, writing dest.c but reading dest.i will always yield undefined behaviour. But most compilers on most platforms will have some well-defined behaviour for those cases as well. So often writing dest.c but reading dest.i makes sense despite what the standard says. In this case, however, reading from dest.i will still be affected by unknown surrounding variables, so it is undefined not only from the standards point of view, but also in a very practical sense.
There also is a rare scenario you should consider: c might be located at the very end of allocated memory pages. (This refers to memory pages allocated from the operating system and eventually the memory management unit (MMU) hardware, not to the block-wise user space allocation done by malloc and friends.) In this case, reading more than that single byte might cause access to unmapped memory, and hence cause a severe error, most likely a program crash. Given the location of your c as an automatic variable in main, this seems unlikely, but I take it that this code snippet is only an example.
Is the line with memcpy correct? And if not, how to fix it?
Depends on what you want to do. As it stands, the code doesn't make too much sense, so I don't know what correct reasonable application you might have in mind. Perhaps you should pass the sizeof the src object to func.
Is the line with memcpy correct? And if not, how to fix it?
you should pass the size of memory pointed by void pointer so you can know src has this much size so you just need to copy this much of data...
Further more to be safe you should calculate the size of destination and based on that you should pass size so illegal access in reading and writing both can be avoided.
The memcpy is fine. By passing the address of the smallest member of the union, you will end with garbage in the larger member. A way to avoid the garbage-bit is to by default make all calls to func - which I assume you do control - use only pointers to the larger member - this can be achieved by setting the larger member to the smaller one: i = c and then call func(&i).
func itself is ok.
The problems lies in whether the caller really makes sure that the memory referenced when calling func() is at least sizeof(union foo).
If the latter is always the case, everything is fine. It is not then case for the two calls to func() in the OP's example.
If the memory referenced when calling func() is less then sizeof(union foo) then memcpy() provokes undefined behaviour.
Since you know what and what size to copy, why not give a more explicit function, let the function know how to copy the right size of memory which void pointer pointed to.
union foo
{
char c;
int i;
};
void func(void * src, const char * type)
{
union foo dest;
if(strcmp(type, "char") == 0){
memcpy(&dest, src, 1);
}else if(...){
}
}

Resources