I am studying about memory handling and I came across this code:
void print(const char * str){
printf(str);
}
void (*print_ptr)(const char *)=print;
void foo2(void){
print("goo\n");
return;
}
void baz(void){
print("foo\n");
return;
}
int main()
{
char buf[256];
void (*func_ptr)(void)=(void (*)(void))buf;
memcpy(buf,foo2,((void *)baz)-((void *) foo2));
func_ptr();
return 0;
}
This code will cause seg fault reaching
func_ptr();
I cant understand why. If I change the pointer to point a static function (like func_ptr=&baz it will work properly, but a dynamic code will not.
The code itself, as I understand it, will be copied to the stack, where it should be.
What is wrong with this code?
What you are trying to do is copy the object code consisting of foo2() into your buffer and execute it. This won't work for a number of reasons:
Your code is copied to buf which will be allocated in data space, which is non-executable (i.e. the memory manager will not have execute permission set on that area of memory).
The code is unlikely to be relocatable in the general case. It may either contain absolute references to itself, or relative references to the rest of the code, both of which will break on copying.
You have no guarantee that the code will be compiled with the functions in the order given, so there is no guarantee you are copying just foo2(). In fact there is no guarantee the compiler will produce the foo2() as a single contiguous binary blob. Part of it might (for instance) be after bar(). Or (relatively common case) parts of the function might be before the entry point.
If you really want to understand why it's breaking, fix (1) by allocating the memory for buf with mmap() and MAP_ANON, using PROT_READ|PROT_WRITE|PROT_EXEC, then run it under gdb. I'd suggest compiling with -O0 (disable optimisation) to maximise chances of something working, but I would repeat you have no guarantees.
The larger question is why on earth you want to copy bits of your code around.
Related
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
We are dealing with C here. I'm just had this idea, wondering if it is possible to access the point in memory where a function is stored, say foo and copying the contents of the function to another point in memory. Specifically, I'm trying to get the following to work:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void foo(){
printf("Hello World");
}
int main(){
void (*bar)(void) = malloc(sizeof foo);
memcpy(&bar, &foo, sizeof foo);
bar();
return 0;
}
But running it gives a bus error: Bus error: 10. I'm trying to copy over the contents of function foo into a space of memory bar and then executing the newly created function bar.
This is for no other reason than to see if such a thing is possible, to reveal the intricacies of the C language. I'm not thinking about what practical uses this has.
I'm looking for guidance getting this to work, or otherwise to be told, with a reason, why this won't work
EDIT Looking at some of the answers and learning about read, write, and executable memory, it just dawned upon me that it would be possible to create functions on the fly in C by writing to executable memory.
With standard C, what you try to do is implementation defined behaviour and won't work portably. On a given platform, you might be able to make this work.
The memory malloc gives you is typically not executable. Jumping there causes a bus error (SIGBUS). Assuming you are on a POSIX-like system, either allocate the memory for the function with mmap and flags that cause the memory region to be executable or use mprotect to mark the region as executable.
You also need to be more careful with the amount of memory you provide, you cannot simply take the size of a function and expect that to be the length of the function, sizeof is not designed to provide this kind of functionality. You need to find out the function length using some other approach.
On modern desktops, the virtual memory manager is going to get in your way. Memory regions have three types of access: read, write, and execute. On systems where code segments have only execute permission, the memcpy will fail with a bus error. In the more typical case, where only code segments have the execute permission, you can copy the function, but not run, because the memory region that contains bar will not have execute permission.
Also, determining the size of the function is problematic. Consider the following program
void foo( int *x )
{
printf( "x:(%zu %zu) ", sizeof x, sizeof *x );
}
int main( void )
{
int x = 0;
foo( &x );
printf( "foo:(%zu %zu)\n", sizeof foo, sizeof *foo );
}
On my system, the output is x:(8 4) foo:(1 1) indicating that taking the sizeof a function pointer, or the function itself, is not a supported operation.
I am trying to debug a piece of code written by someone else that results in a segfault sometimes, but not all the time, during a memcpy operation.
Also, I would dearly appreciate it if anyone could give me a hand in translating what's going on in a piece of code that occurs before the memcpy.
First off, we have a function into which is being passed a void pointer and a pointer to a struct, like so:
void ExampleFunction(void *dest, StuffStruct *buf)
The struct looks something like this:
typedef struct {
char *stuff;
unsigned int totalStuff;
unsigned int stuffSize;
unsigned int validStuff;
} StuffStruct;
Back to ExampleFunction. Inside ExampleFunction, this is happening:
void *src;
int numStuff;
numStuff = buf->validStuff;
src = (void *)(buf->stuff);
I'm confused by the above line. What happens exactly when the char array in buf->stuff gets cast to a void pointer, then set as the value of src? I can't follow what is supposed to happen with that step.
Right after this, the memcpy happens:
memcpy(dest, src, buf->bufSize*numStuff)
And that's where the segfault often happens. I've checked for dest/src being null, neither are ever null.
Additionally, in the function that calls ExampleFunction, the array for dest is declared with a size of 5000, if that matters. However, when I printf the value in buf->bufSize*numStuff in the above code, the value is often high above 5000 -- it can go up as high as 80,000 -- WITHOUT segfaulting, though. That is, it runs fine with the length variable (buf->bufSize*numStuff) being much higher than the supposed length that the dest variable was initialized with. However, maybe that doesn't matter since it was cast to a void pointer?
For various reasons I'm unable to use dbg or install an IDE. I'm just using basic printf debugging. Does anyone have any ideas I could explore? Thank you in advance.
First of all, the cast and assignment just copies the address of buf->stuff into the pointer src. There is no magic there.
numStuff = buf->validStuff;
src = (void *)(buf->stuff);
If dest has only enough storage for 5000 bytes, and you are trying to write beyond that length, then you are corrupting your program stack, which can lead to a segfault either on the copy or sometimes a little later. Whether you cast to a void pointer or not makes no difference at all.
memcpy(dest, src, buf->bufSize*numStuff)
I think you need figure out exactly what buf->bufSize*numStuff is supposed to be computing, and either fix it if it is incorrect (not intended), truncate the copy to the size of the destination, or increase the size of the destination array.
A null-pointer dereference is not the only thing that can cause a segfault. When your program allocates memory, it is also possible to trigger a segfault when you attempt to access memory that is after the regions of memory that you have allocated.
Your code looks like it intends to copy the contents of a buffer pointed to by buf->stuff to a destination buffer. If either of those buffers are smaller than the size of the memcpy operation, the memcpy can be overrunning the bounds of allocated memory and triggering a segfault.
Because the memory allocator allocates memory in large chunks, and then divvies it up to various calls to malloc, your code won't consistently fail every time you run past the end of a malloc'ed buffer. You will get exactly the sporadic failure behavior you described.
The assumption that is baked into this code is that both the buffer pointed to by buf->stuff and by the dest pointer are at least "buf->bufSize * numStuff" bytes in length. One of those two assumptions is false.
I would suggest a couple of approaches:
check the code that allocates both the buffer pointed to by dest, and the buffer pointed to by buf->stuff, and ensure that they are always to be as big or larger than buf->bufSize * numStuff.
Failing that, there are a bunch of tools that can help you get better diagnostic information from your program. The simplest to use is efence ("Electric Fence") that will help identify places in your code where you overrun any of your buffers. (http://linux.die.net/man/3/efence). A more thorough analysis can be done using valgrind (http://valgrind.org/) -- but Valgrind is a bit more involved to use.
Good luck!
PS. There's nothing special about casting a char* pointer to a void* pointer -- it's still just an address to an allocated block of memory.
A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.
I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.
#include <stdio.h>
#include <string.h>
void printx(void *rec) { // I know this should have been a **
char str[1000];
memcpy(str, rec, 1000);
printf("%*.s\n", 1000, str);
printf("Whoa..!! I have not crashed yet :-P");
}
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.
Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.
All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.
The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.
This code would not be so harmless if any of the C runtime conventions are not true:
The architecture uses a stack
A local variable (void *x) is allocated on the stack
The stack grows toward lower numbered memory
Parameters are passed on the stack
Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)
In all mainstream modern implementations, all of these are generally true.
Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.
(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )
Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):
The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.
If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.
When you have
int main(int argc, char **argv) {
void *x = 0; // you could also say void *x = (void *)10;
printx(&x);
}
You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with
memcpy(str, rec, 1000);
you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.
It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.
I am learning function pointers,I understand that we can point to functions using function pointers.Then I assume that they stay in memory.Do they stay in stack or heap?Can we calculate the size of them?
The space for code is statically allocated by the linker when you build the code. In the case where your code is loaded by an operating system, the OS loader requests that memory from the OS and the code is loaded into it. Similarly static data as its name suggests is allocated at this time, as is an initial stack (though further stacks may be created if additional threads are created).
With respect to determining the size of a function, this information is known to the linker, and in most tool-chains the linker can create a map file that includes the size and location of all static memory objects (i.e. those not instantiated at run-time on the stack or heap).
There is no guaranteed way of determining the size of a function at run-time (and little reason to do so) however if you assume that the linker located functions that are adjacent in the source code sequentially in memory, then the following may give an indication of the size of a function:
int first_function()
{
...
}
void second_function( int arg )
{
...
}
int main( void )
{
int first_function_length = (int)second_function - (int)first_function ;
int second_function_length = (int)main - (int)second_function ;
}
However YMMV; I tried this in VC++ and it only gave valid results in a "Release" build; the results for a "Debug" build made no real sense. I suggest that the exercise is for interest only and has no practical use.
Another way of observing the size of your code of course is to look at the disassembly of the code in your debugger for example.
Functions are part of text segment (which may or may not be 'heap') or its equivalent for the architecture you use. There's no data past compilation regarding their size, at most you can get their entry point from symbol table (which doesn't have to be available). So you can't calculate their size in practice on most C environments you'll encounter.
They're (normally) separate from either the stack or heap.
There are ways to find their size, but none of them is even close to portable. If you think you need/want to know the size, chances are pretty good that you're doing something you probably ought to avoid.
There's an interesting way to discover the size of the function.
#define RETN_empty 0xc3
#define RETN_var 0xc2
typedef unsigned char BYTE;
size_t FunctionSize(void* Func_addr) {
BYTE* Addr = (BYTE*)Func_addr;
size_t function_sz = 0;
size_t instructions_qt = 0;
while(*Addr != (BYTE)RETN_empty && *Addr != (BYTE)RETN_var) {
size_t inst_sz = InstructionLength((BYTE*)Addr);
function_sz += inst_sz;
Addr += inst_sz;
++instructions_qt;
}
return function_sz + 1;
}
But you need a function that returns the size of the instruction. You can find a function that finds the Instruction Length here: Get size of assembly instructions.
This function basically keeps checking the instructions of the function until it finds the instruction to return (RETN)[ 0xc3, 0xc2], and returns the size of the function.
To make it simple, functions usually don't go into the stack or the heap because they are meant to be read-only data, whereas stack and heap are read-write memories.
Do you really need to know its size at runtime? If no, you can get it by a simple objdump -t -i .text a.out where a.out is the name of your binary. The .text is where the linker puts the code, and the loader could choose to make this memory read-only (or even just execute-only). If yes, as it has been replied in previous posts, there are ways to do it, but it's tricky and non-portable... Clifford gave the most straightforward solution, but the linker rarely puts function in such a sequential manner into the final binary. Another solution is to define sections in your linker script with pragmas, and reserve a storage for a global variable which will be filled by the linker with the SIZEOF(...) section containing your function. It's linker dependent and not all linkers provide this function.
As has been said above, function sizes are generated by the compiler at compile time, and all sizes are known to the linker at link time. If you absolutely have to, you can make the linker kick out a map file containing the starting address, the size, and of course the name. You can then parse this at runtime in your code. But I don't think there's a portable, reliable way to calculate them at runtime without overstepping the bounds of C.
The linux kernel makes similar use of this for run-time profiling.
C has no garbage collector. Having a pointer to something doesn't make it stay in memory.
Functions are always in memory, whether or not you use them, whether or not you keep a pointer to them.
Dynamically allocated memory can be freed, but it has nothing to do with keeping a pointer to it. You shouldn't keep pointer to memory you have freed, and you should free it before losing the pointer to it, but the language doesn't do it automatically.
If there is anything like the size of the function it should be its STACK FRAME SIZE. Or better still please try to contemplate what exactly, according to you, should be the size of a function? Do you mean its static size, that is the size of all its opcode when it is loaded into memory?If that is what you mean, then I dont see their is any language provided feature to find that out.May be you look for some hack.There can be plenty.But I haven't tried that.
#include<stdio.h>
int main(){
void demo();
int demo2();
void (*fun)();
fun = demo;
fun();
printf("\n%lu", sizeof(demo));
printf("\n%lu", sizeof(*fun));
printf("\n%lu", sizeof(fun));
printf("\n%lu", sizeof(demo2));
return 0;
}
void demo(){
printf("tired");
}
int demo2(){
printf("int type funciton\n");
return 1;
}
hope you will get your answer, all function stored somewhere
Here the output of the code