Memcpy with function pointers leads to a segfault - c

I know I can just copy the function by reference, but I want to understand what's going on in the following code that produces a segfault.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy(r0c, return0, 100);
printf("Address of r0c is: %x\n", r0c);
printf("copied is: %d\n", (*r0c)());
return 0;
}
Here's my mental model of what I thought should work.
The process owns the memory allocated to r0c. We are copying the data from the data segment corresponding to return0, and the copy is successful.
I thought that dereferencing a function pointer is the same as calling the data segment that the function pointer points to. If that's the case, then the instruction pointer should move to the data segment corresponding to r0c, which will contain the instructions for function return0. The binary code corresponding to return0 doesn't contain any jumps or function calls that would depend on the address of return0, so it should just return 0 and restore ip... 100 bytes is certainly enough for the function pointer, and 0xc3 is well within the bounds of r0c (it is at byte 11).
So why the segmentation fault? Is this a misunderstanding of the semantics of C's function pointers or is there some security feature that prevents self-modifying code that I'm unaware of?

The memory pages used by malloc to allocate memory are not marked as executable. You can't copy code to the heap and expect it to run.
If you want to do something like that you have to go deeper into the operating system, and allocate pages yourself. Then you need to mark those as executable. You would most likely need administrator rights to be able to set the executable flag on memory pages.
And it's really dangerous. If you do this in a program you distribute and have some kind of bug that lets an attacker use our program to write to those allocated memory pages, then the attacker can gain administrator rights and take control of the computer.
There's also other problems with your code, like pointers to functions might not translate well into general pointers on all platforms. It's very hard (not to mention non-standard) to predict or otherwise get the size of a function. You also print out pointers wrong in your code example. (use the "%p" format to print a void *, casting the pointer to a void * is needed).
Also when you declare a function like int fun() that's not the same as declaring a function that takes no arguments. If you want to declare a function that takes no arguments you should explicitly use void as in int fun(void).

The standard says:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
[C2011, 7.24.2.1/2; emphasis added]
In the standard's terminology, functions are not "objects". The standard does not define behavior for the case where the source pointer points to a function, therefore such a memcpy() call produces undefined behavior.
Additionally, the pointer returned by malloc() is an object pointer. C does not provide for direct conversion of object pointers to function pointers, and it does not provide for objects to be called as functions. It is possible to convert between object pointer and function pointer by means of an intermediate integer value, but the effect of doing so is at minimum doubly implementation-defined. Under some circumstances it is undefined.
As in other cases, UB can turn out to be precisely the behavior you hoped for, but it is not safe to rely on that. In this particular case, other answers present good reasons to not expect to get the behavior you hoped for.

As was said in some comments, you need to make the data executable. This requires communicating with the OS to change protections on the data. On Linux, this is the system call int mprotect(void* addr, size_t len, int prot) (see http://man7.org/linux/man-pages/man2/mprotect.2.html).
Here is a Windows solution using VirtualProtect.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#ifdef _WIN32
#include <Windows.h>
#endif
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy((void*) r0c, (void*) return0, 100);
printf("Address of r0c is: %p\n", (void*) r0c);
#ifdef _WIN32
long unsigned int out_protect;
if(!VirtualProtect((void*) r0c, 100, PAGE_EXECUTE_READWRITE, &out_protect)){
puts("Failed to mark r0c as executable");
exit(1);
}
#endif
printf("copied is: %d\n", (*r0c)());
return 0;
}
And it works.

Malloc returns a pointer to an allocated memory (100 bytes in your case). This memory area is uninitialized; assuming that memory could be executed by the CPU, for your code to work, you would have to fill those 100 bytes with the executable instructions that the function implements (if indeed it can be held in 100 bytes). But as has been pointed out, your allocation is on the heap, not in the text (program) segment and I don't think it can be executed as instructions. Perhaps this would achieve what it is you want:
int return0()
{
return 0;
}
typedef int (*r0c)(void);
int main(void)
{
r0c pf = return0;
printf("Address of r0c is: %x\n", pf);
printf("copied is: %d\n", pf());
return 0;
}

Related

Weird behavior of getpwnam

#include <pwd.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
printf("%s %s\n", getpwnam("steve")->pw_name, getpwnam("root")->pw_name);
printf("%d %d\n", getpwnam("steve")->pw_uid, getpwnam("root")->pw_uid);
return EXIT_SUCCESS;
}
$ gcc main.c && ./a.out
steve steve
1000 0
In line 8, we try to print the user names of steve and root, but it prints steve twice. In line 9, we try to print the UIDs of steve and root, and it successfully prints them.
I wanna ascertain the root cause of that bizarre behavior in line 8.
I know the pointer returned by getpwnam points to a statically allocated memory, and the memory pointed by fields like pw_name/pw_passwd/pw_gecos/pw_dir/pw_shell are also static, which means these values can be overwritten by subsequent calls. But still confused about this strange result.
This is exercise 8-1 of The Linux Programming Interface. Add this so that someone like me could find this through the search engine in the future:). And the question in the book is wrong, go here to see the revised version.
The code calls getpwnam() in succession returning a pointer to the same address and passing the same pointer to printf() twice. The order the compiler decides to make the calls will determine whether it shows “steve” or “root”.
Allocate two buffer spaces and use one for each in the call to printf() by calling getpwnam_r() instead.
The getpwnam function can return a pointer to static data, so each time it's called it returns the same pointer value. And because you're calling this function multiple times as an argument to printf, you'll only see the result of whichever one of those function calls happens last.
The key point here is that the evaluation order of the arguments to a function are unsequenced, which means there's no guarantee whether getpwnam("steve") happens first or getpwnam("root") happens first.
The result from getpwnam() may be overwritten by another call to getpwnam() or getpwuid() or getpwent(). Your code is demonstrating that.
See the POSIX specifications of:
getpwnam()
getpwuid()
getpwent()
You have no control over the order of evaluation of the calls. If you saved the pointers returned, you'd probably get different results printed.
POSIX also says:
The application shall not modify the structure to which the return value points, nor any storage areas pointed to by pointers within the structure. The returned pointer, and pointers within the structure, might be invalidated or the structure or the storage areas might be overwritten by a subsequent call to getpwent(), getpwnam(), or getpwuid(). The returned pointer, and pointers within the structure, might also be invalidated if the calling thread is terminated.
You should treat the return values as if they were const-qualified, in other words.
Note that the code in the library functions need not overwrite previous data. For example, on macOS Big Sur 11.6.8, the following code, compiled with -DUSER1=\"daemon\" as one of the compiler options, yields the result:
daemon root
1 0
U1A = 0x7fecfa405fd0, U2A = 0x7fecfa405c60
U1B = 0x7fecfa405fd0, U2B = 0x7fecfa405c60
Modified code:
/* SO 7345-2740 */
#include <pwd.h>
#include <stdio.h>
#include <stdlib.h>
#ifndef USER1
#define USER1 "steve"
#endif
#ifndef USER2
#define USER2 "root"
#endif
int main(void)
{
const struct passwd *user1a = getpwnam(USER1);
const struct passwd *user2a = getpwnam(USER2);
const char *user1_name = user1a->pw_name;
const char *user2_name = user2a->pw_name;
printf("%s %s\n", user1_name, user2_name);
const struct passwd *user1b = getpwnam(USER1);
const struct passwd *user2b = getpwnam(USER2);
int user1_uid = user1b->pw_uid;
int user2_uid = user2b->pw_uid;
printf("%d %d\n", user1_uid, user2_uid);
printf("U1A = %p, U2A = %p\n", (void *)user1a, (void *)user2a);
printf("U1B = %p, U2B = %p\n", (void *)user1b, (void *)user2b);
return EXIT_SUCCESS;
}
It is moderately likely that the library functions read the whole file into memory and then pass pointers to relevant sections of that memory. Certainly, in this example, the pointers to the entry for user daemon and user root are stable.
YMWV — Your Mileage Will Vary!

Is kmalloc allocation not virtually contiguous?

I found that kmalloc returns physically and virtually contiguous memory.
I wrote some code to observe the behavior, but only the physical memory seems to be contiguous and not the virtual. Am I making any mistake?
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/moduleparam.h>
MODULE_LICENSE("GPL");
static char *ptr;
int alloc_size = 1024;
module_param(alloc_size, int, 0);
static int test_hello_init(void)
{
ptr = kmalloc(alloc_size,GFP_ATOMIC);
if(!ptr) {
/* handle error */
pr_err("memory allocation failed\n");
return -ENOMEM;
} else {
pr_info("Memory allocated successfully:%p\t%p\n", ptr, ptr+100);
pr_info("Physical address:%llx\t %llx\n", virt_to_phys(ptr), virt_to_phys(ptr+100));
}
return 0;
}
static void test_hello_exit(void)
{
kfree(ptr);
pr_info("Memory freed\n");
}
module_init(test_hello_init);
module_exit(test_hello_exit);
dmesg output:
Memory allocated successfully:0000000083318b28 000000001fba1614
Physical address:1d5d09c00 1d5d09c64
Printing kernel pointers is in general a bad idea, because it basically means leaking kernel addresses to user space, so when using %p in printk() (or similar macros like pr_info() etc.), the kernel tries to protect itself and does not print the real address. Instead, it prints a different hashed unique identifier for that address.
If you really want to print that address, you can use %px.
From Documentation/core-api/printk-formats.rst (web version, git):
Pointer Types
Pointers printed without a specifier extension (i.e unadorned %p) are
hashed to give a unique identifier without leaking kernel addresses to user
space. On 64 bit machines the first 32 bits are zeroed. If you really
want the address see %px below.
%p abcdef12 or 00000000abcdef12
Then, later below:
Unmodified Addresses
%px 01234567 or 0123456789abcdef
For printing pointers when you really want to print the address. Please
consider whether or not you are leaking sensitive information about the
Kernel layout in memory before printing pointers with %px. %px is
functionally equivalent to %lx. %px is preferred to %lx because it is more
uniquely grep'able. If, in the future, we need to modify the way the Kernel
handles printing pointers it will be nice to be able to find the call
sites.

Function to get the Size of allocated Memory from pointer only [duplicate]

Is there a way in C to find out the size of dynamically allocated memory?
For example, after
char* p = malloc (100);
Is there a way to find out the size of memory associated with p?
There is no standard way to find this information. However, some implementations provide functions like msize to do this. For example:
_msize on Windows
malloc_size on MacOS
malloc_usable_size on systems with glibc
Keep in mind though, that malloc will allocate a minimum of the size requested, so you should check if msize variant for your implementation actually returns the size of the object or the memory actually allocated on the heap.
comp.lang.c FAQ list · Question 7.27 -
Q. So can I query the malloc package to find out how big an
allocated block is?
A. Unfortunately, there is no standard or portable way. (Some
compilers provide nonstandard extensions.) If you need to know, you'll
have to keep track of it yourself. (See also question 7.28.)
The C mentality is to provide the programmer with tools to help him with his job, not to provide abstractions which change the nature of his job. C also tries to avoid making things easier/safer if this happens at the expense of the performance limit.
Certain things you might like to do with a region of memory only require the location of the start of the region. Such things include working with null-terminated strings, manipulating the first n bytes of the region (if the region is known to be at least this large), and so forth.
Basically, keeping track of the length of a region is extra work, and if C did it automatically, it would sometimes be doing it unnecessarily.
Many library functions (for instance fread()) require a pointer to the start of a region, and also the size of this region. If you need the size of a region, you must keep track of it.
Yes, malloc() implementations usually keep track of a region's size, but they may do this indirectly, or round it up to some value, or not keep it at all. Even if they support it, finding the size this way might be slow compared with keeping track of it yourself.
If you need a data structure that knows how big each region is, C can do that for you. Just use a struct that keeps track of how large the region is as well as a pointer to the region.
Here's the best way I've seen to create a tagged pointer to store the size with the address. All pointer functions would still work as expected:
Stolen from: https://stackoverflow.com/a/35326444/638848
You could also implement a wrapper for malloc and free to add tags
(like allocated size and other meta information) before the pointer
returned by malloc. This is in fact the method that a c++ compiler
tags objects with references to virtual classes. Here is one working
example:
#include <stdlib.h>
#include <stdio.h>
void * my_malloc(size_t s)
{
size_t * ret = malloc(sizeof(size_t) + s);
*ret = s;
return &ret[1];
}
void my_free(void * ptr)
{
free( (size_t*)ptr - 1);
}
size_t allocated_size(void * ptr)
{
return ((size_t*)ptr)[-1];
}
int main(int argc, const char ** argv) {
int * array = my_malloc(sizeof(int) * 3);
printf("%u\n", allocated_size(array));
my_free(array);
return 0;
}
The advantage of this method over a structure with size and pointer
struct pointer
{
size_t size;
void *p;
};
is that you only need to replace the malloc and free calls. All
other pointer operations require no refactoring.
No, the C runtime library does not provide such a function.
Some libraries may provide platform- or compiler-specific functions that can get this information, but generally the way to keep track of this information is in another integer variable.
Everyone telling you it's impossible is technically correct (the best kind of correct).
For engineering reasons, it is a bad idea to rely on the malloc subsystem to tell you the size of an allocated block accurately. To convince yourself of this, imagine that you were writing a large application, with several different memory allocators — maybe you use raw libc malloc in one part, but C++ operator new in another part, and then some specific Windows API in yet another part. So you've got all kinds of void* flying around. Writing a function that can work on any of these void*s impossible, unless you can somehow tell from the pointer's value which of your heaps it came from.
So you might want to wrap up each pointer in your program with some convention that indicates where the pointer came from (and where it needs to be returned to). For example, in C++ we call that std::unique_ptr<void> (for pointers that need to be operator delete'd) or std::unique_ptr<void, D> (for pointers that need to be returned via some other mechanism D). You could do the same kind of thing in C if you wanted to. And once you're wrapping up pointers in bigger safer objects anyway, it's just a small step to struct SizedPtr { void *ptr; size_t size; } and then you never need to worry about the size of an allocation again.
However.
There are also good reasons why you might legitimately want to know the actual underlying size of an allocation. For example, maybe you're writing a profiling tool for your app that will report the actual amount of memory used by each subsystem, not just the amount of memory that the programmer thought he was using. If each of your 10-byte allocations is secretly using 16 bytes under the hood, that's good to know! (Of course there will be other overhead as well, which you're not measuring this way. But there are yet other tools for that job.) Or maybe you're just investigating the behavior of realloc on your platform. Or maybe you'd like to "round up" the capacity of a growing allocation to avoid premature reallocations in the future. Example:
SizedPtr round_up(void *p) {
size_t sz = portable_ish_malloced_size(p);
void *q = realloc(p, sz); // for sanitizer-cleanliness
assert(q != NULL && portable_ish_malloced_size(q) == sz);
return (SizedPtr){q, sz};
}
bool reserve(VectorOfChar *v, size_t newcap) {
if (v->sizedptr.size >= newcap) return true;
char *newdata = realloc(v->sizedptr.ptr, newcap);
if (newdata == NULL) return false;
v->sizedptr = round_up(newdata);
return true;
}
To get the size of the allocation behind a non-null pointer which has been returned directly from libc malloc — not from a custom heap, and not pointing into the middle of an object — you can use the following OS-specific APIs, which I have bundled up into a "portable-ish" wrapper function for convenience. If you find a common system where this code doesn't work, please leave a comment and I'll try to fix it!
#if defined(__linux__)
// https://linux.die.net/man/3/malloc_usable_size
#include <malloc.h>
size_t portable_ish_malloced_size(const void *p) {
return malloc_usable_size((void*)p);
}
#elif defined(__APPLE__)
// https://www.unix.com/man-page/osx/3/malloc_size/
#include <malloc/malloc.h>
size_t portable_ish_malloced_size(const void *p) {
return malloc_size(p);
}
#elif defined(_WIN32)
// https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/msize
#include <malloc.h>
size_t portable_ish_malloced_size(const void *p) {
return _msize((void *)p);
}
#else
#error "oops, I don't know this system"
#endif
#include <stdio.h>
#include <stdlib.h> // for malloc itself
int main() {
void *p = malloc(42);
size_t true_length = portable_ish_malloced_size(p);
printf("%zu\n", true_length);
}
Tested on:
Visual Studio, Win64 — _msize
GCC/Clang, glibc, Linux — malloc_usable_size
Clang, libc, Mac OS X — malloc_size
Clang, jemalloc, Mac OS X — works in practice but I wouldn't trust it (silently mixes jemalloc's malloc and the native libc's malloc_size)
Should work fine with jemalloc on Linux
Should work fine with dlmalloc on Linux if compiled without USE_DL_PREFIX
Should work fine with tcmalloc everywhere
Like everyone else already said: No there isn't.
Also, I would always avoid all the vendor-specific functions here, because when you find that you really need to use them, that's generally a sign that you're doing it wrong. You should either store the size separately, or not have to know it at all. Using vendor functions is the quickest way to lose one of the main benefits of writing in C, portability.
I would expect this to be implementation dependent.
If you got the header data structure, you could cast it back on the pointer and get the size.
If you use malloc then you can not get the size.
In the other hand, if you use OS API to dynamically allocate memory, like Windows heap functions, then it's possible to do that.
Well now I know this is not answering your specific question, however thinking outside of the box as it were... It occurs to me you probably do not need to know. Ok, ok, no I don't mean your have a bad or un-orthodox implementation... I mean is that you probably (without looking at your code I am only guessing) you prbably only want to know if your data can fit in the allocated memory, if that is the case then this solution might be better. It should not offer too much overhead and will solve your "fitting" problem if that is indeed what you are handling:
if ( p != (tmp = realloc(p, required_size)) ) p = tmp;
or if you need to maintain the old contents:
if ( p != (tmp = realloc(p, required_size)) ) memcpy(tmp, p = tmp, required_size);
of course you could just use:
p = realloc(p, required_size);
and be done with it.
Quuxplusone wrote: "Writing a function that can work on any of these void*s impossible, unless you can somehow tell from the pointer's value which of your heaps it came from."
Determine size of dynamically allocated memory in C"
Actually in Windows _msize gives you the allocated memory size from the value of the pointer. If there is no allocated memory at the address an error is thrown.
int main()
{
char* ptr1 = NULL, * ptr2 = NULL;
size_t bsz;
ptr1 = (char*)malloc(10);
ptr2 = ptr1;
bsz = _msize(ptr2);
ptr1++;
//bsz = _msize(ptr1); /* error */
free(ptr2);
return 0;
}
Thanks for the #define collection. Here is the macro version.
#define MALLOC(bsz) malloc(bsz)
#define FREE(ptr) do { free(ptr); ptr = NULL; } while(0)
#ifdef __linux__
#include <malloc.h>
#define MSIZE(ptr) malloc_usable_size((void*)ptr)
#elif defined __APPLE__
#include <malloc/malloc.h>
#define MSIZE(ptr) malloc_size(const void *ptr)
#elif defined _WIN32
#include <malloc.h>
#define MSIZE(ptr) _msize(ptr)
#else
#error "unknown system"
#endif
Note: using _msize only works for memory allocated with calloc, malloc, etc. As stated on the Microsoft Documentation
The _msize function returns the size, in bytes, of the memory block
allocated by a call to calloc, malloc, or realloc.
And will throw an exception otherwise.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/msize?view=vs-2019
This code will probably work on most Windows installations:
template <class T>
int get_allocated_bytes(T* ptr)
{
return *((int*)ptr-4);
}
template <class T>
int get_allocated_elements(T* ptr)
{
return get_allocated_bytes(ptr)/sizeof(T);
}
I was struggling recently with visualizing the memory that was available to write to (i.e using strcat or strcpy type functions immediately after malloc).
This is not meant to be a very technical answer, but it could help you while debugging, as much as it helped me.
You can use the size you mallocd in a memset, set an arbitrary value for the second parameter (so you can recognize it) and use the pointer that you obtained from malloc.
Like so:
char* my_string = (char*) malloc(custom_size * sizeof(char));
if(my_string) { memset(my_string, 1, custom_size); }
You can then visualize in the debugger how your allocated memory looks like:
This may work, a small update in your code:
void* inc = (void*) (++p)
size=p-inc;
But this will result 1, that is, memory associated with p if it is char*. If it is int* then result will be 4.
There is no way to find out total allocation.

Dereferencing function pointers in C to access CODE memory

We are dealing with C here. I'm just had this idea, wondering if it is possible to access the point in memory where a function is stored, say foo and copying the contents of the function to another point in memory. Specifically, I'm trying to get the following to work:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void foo(){
printf("Hello World");
}
int main(){
void (*bar)(void) = malloc(sizeof foo);
memcpy(&bar, &foo, sizeof foo);
bar();
return 0;
}
But running it gives a bus error: Bus error: 10. I'm trying to copy over the contents of function foo into a space of memory bar and then executing the newly created function bar.
This is for no other reason than to see if such a thing is possible, to reveal the intricacies of the C language. I'm not thinking about what practical uses this has.
I'm looking for guidance getting this to work, or otherwise to be told, with a reason, why this won't work
EDIT Looking at some of the answers and learning about read, write, and executable memory, it just dawned upon me that it would be possible to create functions on the fly in C by writing to executable memory.
With standard C, what you try to do is implementation defined behaviour and won't work portably. On a given platform, you might be able to make this work.
The memory malloc gives you is typically not executable. Jumping there causes a bus error (SIGBUS). Assuming you are on a POSIX-like system, either allocate the memory for the function with mmap and flags that cause the memory region to be executable or use mprotect to mark the region as executable.
You also need to be more careful with the amount of memory you provide, you cannot simply take the size of a function and expect that to be the length of the function, sizeof is not designed to provide this kind of functionality. You need to find out the function length using some other approach.
On modern desktops, the virtual memory manager is going to get in your way. Memory regions have three types of access: read, write, and execute. On systems where code segments have only execute permission, the memcpy will fail with a bus error. In the more typical case, where only code segments have the execute permission, you can copy the function, but not run, because the memory region that contains bar will not have execute permission.
Also, determining the size of the function is problematic. Consider the following program
void foo( int *x )
{
printf( "x:(%zu %zu) ", sizeof x, sizeof *x );
}
int main( void )
{
int x = 0;
foo( &x );
printf( "foo:(%zu %zu)\n", sizeof foo, sizeof *foo );
}
On my system, the output is x:(8 4) foo:(1 1) indicating that taking the sizeof a function pointer, or the function itself, is not a supported operation.

Is there any way to get the size of a c function?

I want to know if there is a way to get the size of c function in memory at runtime.
I've used this code but it's not working:
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
int main(void)
{
int t[10];
char c;
offsetof(t, p);
p:
return 0;
}
The answer is generally no. You can't. One reason is because functions are not necessarily contiguous in memory. So they don't have a "size". Sometimes, compilers (namely ICC) will make jumps out of the function to a remote part of the binary and jump back in.
See a related question here:
how to find function boundaries in binary code
You are confusing data with code, the variable t is not related to the function main in terms of memory address. t is stored on the stack, main is in the code section.
As for getting the size of the function, there is no standard way to get the size. If you're willing to write a disassembler and static code analysis you might get a rough idea of the size, but even that is not trivial as the final ret instruction may not be the last instruction of the function, say you return from inside a loop.
You could analyse the compiler / linker output data (PDBs, map files, etc).
But then, why do you need to know?
You can examine a disassembly. Disassemblers are available for most environments, and often the compiler itself can generate them from the source for you.
I have seen it done, but I can't really imagine why you would need it. The thing to remember is if you subtract two pointers, it yields a ptrdiff_t, which if added to the first pointer would yield the second. This works for function pointers! So how about this:
#include <stdio.h>
int main (void) {
printf("main is %u bytes long\n", (unsigned) (metamain - main));
return 0;
}
void metamain (void) {
}

Resources