Custom handling of memory reads and writes in C - c

I am working on writing my own malloc and using the LD_PRELOAD trick to use it. I need to be able to perform custom functionality for every memory access to the heap, both reads and writes (performance is not a concern, functionality is the goal).
For example, for some code like
int x = A[5];
I would like to be able to trap the read from (A + 5) and instead of reading from that memory location, return my own custom value to store in x.
The ideas I have as of now are:
mprotect away, handling the resulting SIGSEGVs and doing what I need to in the handler. As far as I know, I can access the faulty address in void *si_addr, but I'm not sure how to distinguish between a read and a write - and even if I did manage to do so, I'm not sure how to handle writes since I wouldn't know the value to be written within the handler.
Tweak gcc to handle memory accesses specially. From what I have read, understanding gcc code takes a while, and unless its IR/abstract assembly conveniently isolates memory loads/stores, I'm not sure how practical this is.
Any suggestions are appreciated.

The simplest is via malloc ( you might want to own mmap, munmap, mprotect, sig(action, nal, etc) ... for full coverage ). Yours return addresses which do not correspond to valid mappings, capture SIGBUS + SIGSEGV, interpret the siginfo structure to fixup your process, ... But this is somewhat limited to operating on the heap, and a program can readily escape from it, and if you are trying to catch a misbehaving program, the program might corrupt your lookup tables.
For fuller coverage, you might want to take a look at gvisor, which is billed as a container runtime sandbox. Its technology is closer to a debugger, as it takes full control over the target, capturing its faults, system calls, etc.. and manages its address space. It would likely be minor surgery to adapt it to your needs.
In either situation, when you take a fault, you have to either install the memory and restart the program or emulate the instruction. If you are dealing with a clean architecture like riscv or ARM, emulation isn’t too bad, but for an over-indulgent one like x86, you pretty much need to integrate qemu. If you take the gvisor-like approach, you can install the page and set the single-step flag, then remove the page on the single-step trap, which is a bit less cumbersome. There was a precursor to dtrace, called atrace, that used this approach to analyze cache and tlb access patterns.
Sounds like a fun project; I hope it goes well.

Related

What are the good implementation practices to minimize RAM consumption

I run a C code on an arm based Linux device that has a very small RAM space (16MB). My code is often killed (SIGKILL) by the kernel with 'out of memory' message. I run the program with Valgrind, and it does not look like there is a memory leak. I run the code with gdb as well but could not identify any mistake on the code. I will try to optimize my code going it through some many times.
In general, what would be the good implementation practices on a code to minimize the memory usage?
one might be to use functions as much as possible(?), but I guess gcc already optimizes the code to decrease the source usage.
to avoid dynamic memory allocations
what else?
Be careful about scope of objects. Make sure you are handling the memory deallocation after an object is no longer needed. I'm not sure I understand your use functions as much as possible(?). Functions require overhead, every call causes a little bit of extra memory to be taken up because it has to store a few pointers and a little bit of information about the method on the call stack. So, while that may help keep your source code clean - it won't lower your memory usage (it'll probably increase it). One way to get the best of both worlds in C is to use inline functions - which suggests to the compiler that it should not create an actual function, but rather just insert that block of code wherever it is used. Keep in mind that efficient code usually has a more machine level look to it (meaning repetition, pointers, and often developer-managed array indices) rather than taking advantage of broad purpose, function abundant objects. But, thank goodness for smart compilers so you don't have to know every optimization. However, in a lower level language like c, since it gives you so much ability to manipulate everything, you need to be careful that you don't make costly mistakes.
If you have this kind of problem on Linux you can disable overcommit memory. It will make sure that all the memory allocated has physical memory. The kernel will be less likely to kill your program. Then be sure to test the result of all mallocs because they will fail at some point when you don't have memory anymore. You can find more information here : http://www.etalabs.net/overcommit.html
You can also disable some programs on your embedded system to free memory. May be you don't use cron or don't need six TTY at startup.

Porting user space code to kernel space

I have a big system written mostly in C that was running in user space up till now. Now I need to compile the code as a kernel module. For that, afaik, I should at least rewrite the code and replace functions as malloc, calloc, free, printf with their kernel equivalents, because those are solely user-space functions. The problem is, however, that I don't have the source code to some custom-made libraries used in the system, and those libraries call malloc etc. inside their functions. So, basically, I might need to reimplement the whole library.
Now the question: will it be a really dirty hack, if I'd write my own implementation of malloc as a wrapper around kmalloc, something like this:
void *malloc(size_t size) {
return kmalloc(size, GFP_USER);
}
Then link this implementation to the system code, which will eliminate all the Unknown symbol in module errors.
Actually I thought that this would be a common problem and someone would have already written such a kmalloc wrapper, but I've been googling for a couple of days now and found nothing useful.
EDIT: The reason for doing this is that the system I'm talking about was a realtime application running on VxWorks realtime OS and now we want to port it to be used on Linux RTAI, where the apps mostly run in kernel space. But I guess there is a possibility to have real-time in user space as well, so, I should probably do as Mike suggested and separate the code into kernel and user-space parts and communicate between them with shared memory.
I've never seen this done before. I did have to do something similar at a previous job (in our phones, for power savings reasons, we had to port a portion of code from user space from the kernel) but that's how I did it.. I took a portion of the code and moved it, and a small portion at that.
When I did it I changed the user space calls to kernel calls because of a number of reasons two primary ones:
It was less confusing that way (others looking at the code didn't have to wonder why I was calling "malloc" from the kernel)
malloc and kmalloc don't work exactly the same. What I mean by that is
2a. kmalloc takes a flags parameter, in your example above you hardcoded it. What if you decide later that you want to change it in some places and not others? (assuming you have a number of different places where you get dynamic memory).
2b. kmalloc doesn't give you memory in the same way as malloc. malloc() will give you the number of bytes you pass in as size_t size. kmalloc() on the other hand, is in the kernel and thus is dealing with the physical memory of the system, which is available only in page-sized chunks; thus when you call kmalloc() you are going to get only certain predefined, fixed-size byte arrays. if you're not aware of this, you might ask for just over a particular chunk and thus get much more memory than you need... a direct port of your code won't protect you from that.
2c. The header files have to change too. Obviously you can't include <stdlib.h> in the kernel, so just because you "wrapped" the malloc call, you still have to go around replacing header files.
quick example of my point in 2b above:
void * stuff;
stuff = kmalloc(1,GFP_KERNEL);
printk("I got: %zu bytes of memory\n", ksize(stuff));
kfree(stuff);
To show the actual amount of memory allocated:
[90144.702588] I got: 32 bytes of memory
anyway... technically, how you describe it, should work fine. Both take a size_t and return a void * so it should work; but be aware that the more code you move into the kernel the less deterministic things become, and that malloc()<=>kmalloc() isn't as 1:1 as it seems.
Trying to make my RTAI code compilable in both user and kernel spaces (as well as working with POSIX), I have developed URT which essentially does what you are asking. It's a lightweight abstraction level over real-time systems (and even over the inconsistent user-space vs kernel-space RTAI functions).

Can a C program modify its executable file?

I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
char* str = "Hello World\n";
int main(int argc, char* argv) {
printf(str);
FILE * file = fopen(argv, "r+");
fseek(file, 0x1000, SEEK_SET);
fputs("Goodbyewrld\n", file);
fclose(file);
return 0;
}
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.
EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?
Is there a workaround?
Thanks
On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.
When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).
It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).
At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.
If you are using Windows, you can do the following:
Step-by-Step Example:
Call VirtualProtect() on the code pages you want to modify, with the PAGE_WRITECOPY protection.
Modify the code pages.
Call VirtualProtect() on the modified code pages, with the PAGE_EXECUTE protection.
Call FlushInstructionCache().
For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)
It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.
Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.
With other systems you might not even know where the file is.
All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.
However, if you really want to see own self-modifying code in action, you have some possibilities available:
Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick
If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.
If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.
If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.
If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).
When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:
create a new, empty selector ("high" part of far pointers)
set its attributes to that of the XO area
the new selector's access properties must be set RO DATA if you only want to look at what's in it
if you want to modify the data the access properties must be set to RW DATA
So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.
Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."
I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).
to solve the problem my program used thunking in the following way
defined 32-bit far pointers (sel_16:offs_32) using structures
allocated 32-bit data areas (<=> >64KB size) using global memory and received them in 16-bit far pointer (sel_16:offs_16) format
filled in the data in the structures by copying the selector, then calculating the offset using 16-bit multiplication with 32-bit results.
loaded the pointer/structure into es:ebx using the instruction size override prefix
accessed the data using a combination of the instruction size and operand size prefixes
Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).
There are non-portable ways to do this on many platforms. In Windows you can do this with WriteProcessMemory(), for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world! to longer Hello World you'll need to extend the data segment in file).
I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.
If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.
On newer versions of Windows CE (atleast 5.x an newer) where apps run in user space, (compared to earlier versions where all apps ran in supervisor mode), apps cannot even read it's own executable file.

What is the need of randomizing memory addresses for loading libraries?

ldd displays the memory addresses where the shared libraries are linked at runtime
$ cat one.c
#include<stdio.h>
int main() {
printf ("%d", 45);
}
$ gcc one.c -o one -O3
$ ldd one
linux-gate.so.1 => (0x00331000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x00bc2000)
/lib/ld-linux.so.2 (0x006dc000)
$
From this answer to another question,
... The addresses are basically random numbers. Before secure implementations were devised, ldd would consistently indicate the memory addresses where the program sections were loaded. Since about five years ago, many flavors of Linux now intentionally randomize load addresses to frustrate would-be virus writers, etc.
I do not fully understand how these memory addresses can be used for exploitations.
Is the problem something like "If the addresses are fixed, one can put some undesirable code at that address which would be linked as if it was a library" or is it something more than this?
"If the addresses are fixed, one can put some undesirable code at that address which would be linked as if it was a library"
Yes.
Also. Buffer overflow exploits require a consistent memory model so that the bytes that overflow the buffer do known things to known parts of the code.
http://www.corewars.org/ A great illustration of the principle.
Some vulnerabilities allow overwriting some address (stack overflows allow overwriting return addresses, exploit for heap overflows typically overwrite SEH pointers on Win32 and addresses (GOT entries) of dynamically called functions on Linux, ...). So the attacker needs to make the overwritten address point to something interesting. To make this more difficult, several counter-measures have been adopted:
Non-executable stacks prevents exploits from just jumping to some code the attacker has put on the stack.
W^X segments (segments which can never be writable and executable at the same time) prevents the same for other memory areas.
Randomized load addresses for libraries and position independent executables decrease the probabilities of succesful exploitation via return-into-libc and return-oriented-programming techniques, ...
Randomized load addresses also prevent attackers from knowing in advance where to find some interesting function (e.g: imagine an attacker that can overwrite the GOT entry and part of the message for the next logging call, knowing the address of system would be "interesting").
So, you have to view load address randomization as another counter-measure among many (several layers of defense and all that).
Also note that exploits aren't restricted to arbitrary code execution. Getting a program to print some sensitive information instead of (or in addition to, think of string truncation bugs) some non-sensitive information also counts as an exploit; it would not be difficult to write some proof-of-concept program with this kind of vulnerability where knowing absolute addresses would make reliable exploits possible.
You should definitely take a look at return-into-libc and return-oriented-programming. These techniques make heavy use of knowledge of addresses in the executable and libraries.
And finally, I'll note there are two ways to randomize library load addresses:
Do it on every load: this makes (some) exploits less reliable even if an attacker can obtain info about addresses on one run and try to use that info on another run.
Do it once per system: this is what prelink -R does. It avoids attackers using generic information for e.g: all Redhat 7.2 boxes. Obviously, its advantage is that it doesn't interfere with prelink :).
A simple example:
If on a popular operating system the standard C library was always loaded at address 0x00100000 and a recent version of the standard C library had the system function at offset 0x00000100 then if someone were able to exploit a flaw in a program running on a computer with this operating system (such as a web server) causing it to write some data to the stack (via a buffer overrun) they would know that it was very likely that if they wrote 0x00100100 to the place on the stack where the current function expected its return address to be then they could make it so that upon returning from the current function the system function would be called. While they still haven't done everything needed to cause system to execute something that they want it to, they are close, and there are some tricks writing more stuff to the stack aver the address mentioned above that have a high likelihood of resulting in a valid string pointer and a command (or series of commands) being run by this forced call to system.
By randomizing the addresses at which libraries are loaded the attacker is more likely to just crash the web server than gain control of the system.
The typical method is by a buffer overrun, where you put a particular address on the stack, and then return to it. You typically pick an address in the kernel where it assumes the parameters you've passed it on the stack have already been checked, so it just uses them without any further checking, allowing you to do things that normally wouldn't be allowed.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Resources