Log whether a global variable has been read or written - c

Requirement:
Given a C program I have to identify whether the functions accessing global variables are reading them or writing them.
Example code:
#include <stdio.h>
/* global variable declaration */
int g = 20;
int main()
{
/* writing the global variable */
g = 10;
/* reading the global variable */
printf ("value of g = %d\n", g);
return 0;
}
Executing the above code I want to generate a log file in the below format:
1- Global variable a written in function main() "TIME_STAMP"
2- Global variable a read in function main() "TIME_STAMP"
Research:
I am cetainly able to acheive this by doing a static analysis of source code as per below logic:
Go through the c code and identify the statements where the global
variable is read.
Then analysis the c code statement to identify if
it is a read or write statement.(Checking if ++ or -- operator is
used with global variable or any assignemnt has been made to the
global variable)
Add a log statement above the identified statement which will execute
along with this statement execution.
This is not a proper implementation.
Some studies:
I have gone through how debuggers are able to capture information.
Some links in the internet:
How to catch a memory write and call function with address of write

Not completely answering your question, but to just log access you could do:
#include <stdio.h>
int g = 0;
#define g (*(fprintf(stderr, "accessing g from %s. g = %d\n", __FUNCTION__, g), &g))
void foo(void)
{
g = 2;
printf("g=%d\n", g);
}
void bar(void)
{
g = 3;
printf("g=%d\n", g);
}
int main(void)
{
printf("g=%d\n", g);
g = 1;
foo();
bar();
printf("g=%d\n", g);
}
Which would print:
accessing g from main. g = 0
g=0
accessing g from main. g = 0
accessing g from foo. g = 1
accessing g from foo. g = 2
g=2
accessing g from bar. g = 2
accessing g from bar. g = 3
g=3
accessing g from main. g = 3
g=3

Below is the way i solved this problem:
I created a utility(In java) which works as below(C program source file is the input to my utility):
Parse the file line by line identifying the variables and functions.
It stores global variables in a separate container and look for lines using them.
For every line which access the global variable i am analyzing them identifying whether it is a read operation or write operation(ex: ==, +=, -+
etc are write operation).
For every such operation i am instrumenting the code as suggested by #alk(https://stackoverflow.com/a/41158928/6160431) and that in turn will generate the log file when i execute the modified source file.
I am certainly able to achieve what i want but still looking for better implementation if anyone have.
For further discussion if anybody want we can have have a chat.
I refer the source code and algos from the below tools:
http://www.dyninst.org/
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

Related

how to include several lines of C code in one line of source code (AVR-GCC)

Is there a way to add an identifier that the compiler would replace with multiple lines of code?
I read up on macros and inline functions but am getting no where.
I need to write an Interrupt Service Routine and not call any functions for speed.
Trouble is I have several cases where I need to use a function so currently I just repeat all several lines in many places.
for example:
void ISR()
{
int a = 1;
int b = 2;
int c = 3;
// do some stuff here ...
int a = 1;
int b = 2;
int c = 3;
// do more stuff here ...
int a = 1;
int b = 2;
int c = 3;
}
The function is many pages and I need the code to be more readable.
I basically agree with everyone else's reservations with regards to using macros for this. But, to answer your question, Multiline macros can be created with a backslash.
#define INIT_VARS \
int a = 1; \
int b = 2; \
int c = 3;
#define RESET_VARS \
a = 1; \
b = 2; \
c = 3;
void ISR()
{
INIT_VARS
// do some stuff here ...
RESET_VARS
// do more stuff here ...
RESET_VARS
}
You can use inline function that will be rather integrated into place where it is called in source instead of really being called (note that behavior of this depends on several things like compiler support and optimizations setup or using -fno-inline flag feature). GCC documentation on inline functions.
For completeness - other way would be defining // do some stuff here... as pre-processor macro which again gets inserted in place where called; this time by preprocessor - so no type safety, harder to debug and also to read. Usual good rule of thumb is to not write a macro for something that can be done with function.
You are correct - it is recommended that you not place function calls in an ISR. It's not that you cannot do it, but it can be a memory burden depending on the type of call. The primary reason is for timing. ISRs should be quick in and out. You shouldn't be doing a lot of extended work inside them.
That said, here's how you can actually use inline functions.
// In main.c
#include static_defs.h
//...
void ISR() {
inline_func();
// ...
inline_func();
}
// In static_defs.h
static inline void inline_func(void) __attribute__((always_inline));
// ... Further down in file
static inline void inline_func(void) {
// do stuff
}
The compiler will basically just paste the "do stuff" code into the ISR multiple times, but as I said before, if it's a complex function, it's probably not a good idea to do it multiple times in a single ISR, inlined or not. It might be better to set a flag of some sort and do it in your main loop so that other interrupts can do their job, too. Then, you can use a normal function to save program memory space. That depends on what you are really doing and when/why it needs done.
If you are actually setting variables and returning values, that's fine too, although, setting multiple variables would be done by passing/returning a structure or using a pointer to a structure that describes all of the relevant variables.
If you'd prefer to use macros (I wouldn't, because function-like macros should be avoided), here's an example of that:
#define RESET_VARS() do { \
a = 1; \
b = 2; \
c = 3; \
while (0)
//...
void ISR() {
uint8_t a=1, b=2, c=3;
RESET_VARS();
// ...
RESET_VARS();
}
Also, you said it was a hypothetical, but it's recommended to use the bit-width typedefs found in <stdint.h> (automatically included when you include <io.h> such as uint8_t rather than int. On an 8-bit MCU with AVR-GCC, an int is a 16-bit signed variable, which will require (at least) 2 clock cycles for every operation that would have taken one with an 8-bit variable.

Why some of the local variables are not listed in the corresponding stack frame when inspected using GDB?

I have a piece of code in C as shown below-
In a .c file-
1 custom_data_type2 myFunction1(custom_data_type1 a, custom_data_type2 b)
2 {
3 int c=foo();
4 custom_data_type3 t;
5 check_for_ir_path();
6 ...
7 ...
8 }
9
10 custom_data_type4 myFunction2(custom_data_type3 c, const void* d)
11 {
12 custom_data_type4 e;
13 struct custom_data_type5 f;
14 check_for_ir_path();
15 ...
16 temp = myFunction1(...);
17 return temp;
18 }
In a header file-
1 void CRASH_DUMP(int *i)
2 __attribute__((noinline));
3
4 #define INTRPT_FORCE_DUMMY_STACK 3
5
6 #define check_for_ir_path() { \
7 if (checkfunc1() && !checkfunc2()) { \
8 int sv = INTRPT_FORCE_DUMMY_STACK; \
9 ...
10 CRASH_DUMP(&sv);\
11 }\
12 }\
In an unknown scenario, there is a crash.
After processing the core dump using GDB, we get the call stack like -
#0 0x00007ffa589d9619 in myFunction1 [...]
(custom_data_type1=0x8080808080808080, custom_data_type2=0x7ff9d77f76b8) at ../xxx/yyy/zzz.c:5
sv = 32761
t = <optimized out>
#1 0x00007ffa589d8f91 in myFunction2 [...]
(custom_data_type3=<optimized out>, d=0x7ff9d77f7748) at ../xxx/yyy/zzz.c:16
sv = 167937677
f = {
...
}
If you see the function, myFunction1 there are three local variables- c, t, sv (defined as part of macro definition). However, in the backtrace, in the frame 0, we see only two local variables - t and sv. And i dont see the variable c being listed.
Same is the case, in the function myFunction2, there are three local variables - e, f, sv(defined as part of macro definition). However, from the backtrace, in the frame 1, we see only two local variables - f and sv. And i dont see the variable e being listed.
Why is the behavior like this?
Any non-static variable declared inside the function, should be put on the callstack during execution and which should have been listed in the backtrace full, isn't it? However, some of the local variables are missing in the backtrace. Could someone provide an explanation?
Objects local to a C function often do not appear on the stack because optimization during compilation often makes it unnecessary to store objects on the stack. In general, while an implementation of the C abstract machine may be viewed as storing objects to local to a function on the stack, the actual implementation on a real processor after compilation and optimization may be very different. In particular:
An object local to a function may be created and used only inside a processor register. When there are enough processor registers to hold a function’s local objects, or some of them, there is no point in writing them to memory, so optimized code will not do so.
Optimization may eliminate a local object completely or fold it into other values. For example, given void foo(int x) { int t = 10; bar(x+2*t); … }, the compiler may merely generate code that adds an immediate value of 20 to x, with the result that neither 10 nor any other instantiation of t ever appears on stack, in a register, or even in the immediate operand of an instruction. It simply does not exist in the generated code because there was no need for it.
An object local to a function may appear on the stack at one point during a function’s code but not at others. And the places it appears may differ from place to place in the code. For example, with { int t = x*x; … bar(t); … t = x/3; … bar(t); … }, the compiler may decide to stash the first value of t in one place on the stack. But the second value assigned to t is effectively a separate lifetime, and the compiler may stash it in another place on the stack (or not at all, per the above). In a good implementation, the debugger may be aware of these different places and display the stored value of t while the program counter is in a matching section of code. And, while the program counter is not in a matching section of code, t may effectively not exist, and the debugger could report it is optimized out at that point.

How to share variables between two functions without declaring it a global variable in C language

I've been reading through a lot of answers, and there are a lot of opinions about this but I wasn't able to find a code that answers my question (I found a lot of code that answers "how to share variables by declaring")
Here's the situation:
Working with embedded systems
Using IAR workbench systems
STM32F4xx HAL drivers
Declaring global variables is not an option (Edit: Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them)
C language
in case this is important: 2 .c files, and 1 .h is included in both
Now that's out of the way, let me write an example.
file1.c - Monitoring
void function(){
uint8_t varFlag[10]; // 10 devices
for (uint8_t i = 0; i < 10; i++)
{
while (timeout <= 0){
varFlag[i] = 1;
// wait for response. We'll know by the ack() function
// if response back from RX,
// then varFlag[i] = 0;
}
file2.c - RX side
// listening... once indicated, this function is called
// ack is not called in function(), it is called when
// there's notification that there is a received message
// (otherwise I would be able to use a pointer to change
// the value of varFlag[]
void ack(uint8_t indexDevice)
{
// indexDevice = which device was acknowledged? we have 10 devices
// goal here is to somehow do varFlag[indexDevice] = 0
// where varFlag[] is declared in the function()
}
You share values or data, not variables. Stricto sensu, variables do not exist at runtime; only the compiler knows them (at most, with -g, it might put some metadata such as offset & type of locals in the debugging section -which is usually stripped in production code- of the executable). Ther linker symbol table (for global variables) can, and often is, stripped in a embedded released ELF binary. At runtime you have some data segment, and probably a call stack made of call frames (which hold some local variables, i.e. their values, in some slots). At runtime only locations are relevant.
(some embedded processors have severe restrictions on their call stack; other have limited RAM, or scratchpad memory; so it would be helpful to know what actual processor & ISA you are targeting, and have an idea of how much RAM you have)
So have some global variables keeping these shared values (perhaps indirectly thru some pointers and data structures), or pass these values (perhaps indirectly, likewise...) thru arguments.
So if you want to share the ten bytes varFlag[10] array:
it looks like you don't want to declare uint8_t varFlag[10]; as a global (or static) variable. Are you sure you really should not (these ten bytes have to sit somewhere, and they do consume some RAM anyway, perhaps in your call stack....)?
pass the varFlag (array, decayed to pointer when passed as argument) as an argument, so perhaps declare:
void ack(uint8_t indexDevice, uint8_t*flags);
and call ack(3,varFlag) from function...
or declare a global pointer:
uint8_t*globflags;
and set it (using globflags = varFlag;) at the start of the function declaring varFlag as a local variable, and clear if (using globflags = NULL;) at the end of that function.
I would suggest you to look at the assembler code produced by your compiler (with GCC you might compile with gcc -S -Os -fverbose-asm -fstack-usage ....). I also strongly suggest you to get your code reviewed by a colleague...
PS. Perhaps you should use GCC or Clang/LLVM as a cross-compiler, and perhaps your IAR is actually using such a compiler...
Your argument for not using global variables:
Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them
confuses lifetime with scope. Variables with static lifetime occupy memory permanently regardless of scope (or visibility). A variable with global scope happens to also be statically allocated, but then so is any other static variable.
In order to share a variable across contexts it must necessarily be static, so there is no memory saving by avoiding global variables. There are however plenty of other stronger arguments for avoiding global variables and you should read A Pox on Globals by Jack Ganssle.
C supports three-levels of scope:
function (inside a function)
translation-unit (static linkage, outside a function)
global (external linkage)
The second of these allows variable to be directly visible amongst functions in the same source file, while external linkage allows direct visibility between multiple source files. However you want to avoid direct access in most cases since that is the root of the fundamental problem with global variables. You can do this using accessor functions; to use your example you might add a file3.c containing:
#include "file3.h"
static uint8_t varFlag[10];
void setFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 1 ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 0 ;
}
}
uint8_t getFlag( size_t n )
{
return varFlag[n] == 0 ? 0 : 1 ;
}
With an associated header file3.h
#if !defined FILE3_INCLUDE
#define FILE3_INCLUDE
void setFlag( size_t n ) ;
void clrFlag( size_t n ) ;
uint8_t getFlag( size_t n ) ;
#endif
which file1.c and file2.c include so they can access varFlag[] via the accessor functions. The benefits include:
varFlag[] is not directly accessible
the functions can enforce valid values
in a debugger you can set a breakpoint catch specifically set, clear or read access form anywhere in the code.
the internal data representation is hidden
Critically the avoidance of a global variable does not save you memory - the data is still statically allocated - because you cannot get something for nothing varFlag[] has to exist, even if it is not visible. That said, the last point about internal representation does provide a potential for storage efficiency, because you could change your flag representation from uint8_t to single bit-flags without having to change interface to the data or the accessing the accessing code:
#include <limits.h>
#include "file3.h"
static uint16_t varFlags ;
void setFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags |= 0x0001 << n ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags &= ~(0x0001 << n) ;
}
}
uint8_t getFlag( size_t n )
{
return (varFlags & (0x0001 << n)) == 0 ? 0 : 1 ;
}
There are further opportunities to produce robust code, for example you might make only the read accessor (getter) publicly visible and hide the so that all but one translation unit has read-only access.
Put the functions into a seperate translation unit and use a static variable:
static type var_to_share = ...;
void function() {
...
}
void ack() {
...
}
Note that I said translation unit, not file. You can do some #include magic (in the cleanest way possible) to keep both function definitions apart.
Unfortunately you can't in C.
The only way to do such thing is with assemply.

How to write self modifying code in C?

I want to write a piece of code that changes itself continuously, even if the change is insignificant.
For example maybe something like
for i in 1 to 100, do
begin
x := 200
for j in 200 downto 1, do
begin
do something
end
end
Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.
Is writing such a code possible ? Would I need to use inline assembly for that ?
EDIT :
Here is why I want to do it in C:
This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.
You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.
If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.
A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.
Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.
Below is the first code :
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
void foo(void);
int change_page_permissions_of_address(void *addr);
int main(void) {
void *foo_addr = (void*)foo;
// Change the permissions of the page that contains foo() to read, write, and execute
// This assumes that foo() is fully contained by a single page
if(change_page_permissions_of_address(foo_addr) == -1) {
fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
return 1;
}
// Call the unmodified foo()
puts("Calling foo...");
foo();
// Change the immediate value in the addl instruction in foo() to 42
unsigned char *instruction = (unsigned char*)foo_addr + 18;
*instruction = 0x2A;
// Call the modified foo()
puts("Calling foo...");
foo();
return 0;
}
void foo(void) {
int i=0;
i++;
printf("i: %d\n", i);
}
int change_page_permissions_of_address(void *addr) {
// Move the pointer to the page boundary
int page_size = getpagesize();
addr -= (unsigned long)addr % page_size;
if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
return -1;
}
return 0;
}
It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.
This would be a good start. Essentially Lisp functionality in C:
http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/
Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:
#include <stdio.h>
void multiply_x (int * x, int multiplier)
{
*x *= multiplier;
}
void add_to_x (int * x, int increment)
{
*x += increment;
}
int main (void)
{
int x = 0;
int i;
void (*fp)(int *, int);
for (i = 1; i < 6; ++i) {
fp = (i % 2) ? add_to_x : multiply_x;
fp(&x, i);
printf("%d\n", x);
}
return 0;
}
The output, when we compile and run the program, is:
1
2
5
20
25
Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.
A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil eval() function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute via eval().
The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.
There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.
Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.
In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.
You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.
Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.
Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.
My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.
This just requires leveraging two libraries -- an assembler, and a disassembler:
FASM assembler: https://github.com/ZenLulz/Fasm.NET
Udis86 disassembler: https://github.com/vmt/udis86
We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using VirtualProtect on windows to change page permissions to allow editing the code. On Unix you have to use mprotect instead.
I posted an article on how we did it, as well as the sample code.
These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.
This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.
#include <cstdio>
#include <Memoryapi.h>
#include <windows.h>
using namespace std;
typedef unsigned char byte;
int main(int argc, char** argv){
byte bytes [] = { 0x48, 0x31, 0xC0, 0x48, 0x83, 0xC0, 0x0F, 0xC3 }; //put code here
//xor %rax, %rax
//add %rax, 15
//ret
int size = sizeof(bytes);
DWORD protect = PAGE_READWRITE;
void* meth = VirtualAlloc(NULL, size, MEM_COMMIT, protect);
byte* write = (byte*) meth;
for(int i = 0; i < size; i++){
write[i] = bytes[i];
}
if(VirtualProtect(meth, size, PAGE_EXECUTE_READ, &protect)){
typedef int (*fptr)();
fptr my_fptr = reinterpret_cast<fptr>(reinterpret_cast<long>(meth));
int number = my_fptr();
for(int i = 0; i < number; i++){
printf("I will say this 15 times!\n");
}
return 0;
} else{
printf("Unable to VirtualProtect code with execute protection!\n");
return 1;
}
}
You assemble the code using this tool.
While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.
#include <stdio.h>
#include <string.h>
int main()
{
struct Foo
{
char a;
char b[4];
} foo;
foo.a = 42;
strncpy(foo.b, "foo", 3);
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
*(int*)&foo.a = 1918984746;
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
return 0;
}
$ gcc -o foo foo.c && ./foo
foo.a=42, foo.b="foo"
foo.a=42, foo.b="bar"
First, we change the value of foo.a and foo.b and print the struct. Then we change only the value of foo.a, but observe the output.

Copy a function in memory and execute it

I would like to know how in C in can copy the content of a function into memory and the execute it?
I'm trying to do something like this:
typedef void(*FUN)(int *);
char * myNewFunc;
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
}
void f1 (int *v) {
*v = 10;
}
// allocate enough spcae but how much ??
myNewFunc = allocExecutablePages(...)
/* Copy f1 somewere else
* (how? assume that i know the size of f1 having done a (nm -S foo.o))
*/
((FUN)template)(&val);
printf("%i",val);
Thanks for your answers
You seem to have figured out the part about protection flags. If you know the size of the function, now you can just do memcpy() and pass the address of f1 as the source address.
One big caveat is that, on many platforms, you will not be able to call any other functions from the one you're copying (f1), because relative addresses are hardcoded into the binary code of the function, and moving it into a different location it the memory can make those relative addresses turn bad.
This happens to work because function1 and function2 are exactly the same size in memory.
We need the length of function2 for our memcopy so what should be done is:
int diff = (&main - &function2);
You'll notice you can edit function 2 to your liking and it keeps working just fine!
Btw neat trick. Unfurtunate the g++ compiler does spit out invalid conversion from void* to int... But indeed with gcc it compiles perfectly ;)
Modified sources:
//Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
//fixed the diff address to also work when function2 is variable size
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
//printf("hello world");
int k=32;
int l=40;
return x+5+k+l;
}
int main(){
int diff = (&main - &function2);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
Output:
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ gcc memoryFun.c
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ ./a.out
pagesize: 4096, diff: 35
native: 1
memory: 83
native: 1
Another to note is calling printf will segfault because printf is most likely not found due to relative address going wrong...
Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
return x+5;
}
int main(){
int diff = (&function2 - &function1);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
I have tried this issue many times in C and came to the conclusion that it cannot be accomplished using only the C language. My main thorn was finding the length of the function to copy.
The Standard C language does not provide any methods to obtain the length of a function. However, one can use assembly language and "sections" to find the length. Once the length is found, copying and executing is easy.
The easiest solution is to create or define a linker segment that contains the function. Write an assembly language module to calculate and publicly declare the length of this segment. Use this constant for the size of the function.
There are other methods that involve setting up the linker, such as predefined areas or fixed locations and copying those locations.
In embedded systems land, most of the code that copies executable stuff into RAM is written in assembly.
This might be a hack solution here. Could you make a dummy variable or function directly after the function (to be copied), obtain that dummy variable's/function's address and then take the functions address to do sum sort of arithmetic using addresses to obtain the function size? This might be possible since memory is allocated linearly and orderly (rather than randomly). This would also keep function copying within a ANSI C portable nature rather than delving into system specific assembly code. I find C to be rather flexible, one just needs to think things out.

Resources