Call assembly in buffer in C - c

Is this possible?
I want to place intel assembly code into a char buffer, and then execute that code from within a C program
If I placed the assembly code into a buffer, could I cast it into a function pointer and call that?
I am using GCC and linux

Do you want to execute Intel assembly code or machine code? If you want to execute machine code, then yes, you can, provided that the memory page the character buffer is on is not marked NX (no execute).
If you're talking about assembly code, then no, you would first need to run the code through an assembler (on Un*x systems the standard one is typically called as; on Linux, this should be the same as gas) and then run the resulting machine code.

Yes you could. Infact that is how a buffer overflow attack could work. For more information google buffer overflow attacks. Breaking execution into direct assembly will always work (so long as the assembly is correct).

Perhaps Google can help you write a buffer overflow exploit?

Maybe -- the syntax is:
char buff[/* enough space */];
/* fill in buff with the right opcodes that conform to the Linux ABI */
((void (*)()) buff) ();
The problem is the modern x64's have a mode called "W^X" or "NX bit" which prevents the above code from executing from data pointers. There are APIs for dealing with this, but I am not familiar with the Linux one; a little googling seems to indicate that you actually mark your .o files at link time wanting to disable the NX bit. That seems like a bad idea to me (it seems instead that you should be able to, at run time, promote a data region to be executable, or allocate a writable region from a runnable region of memory; but hey, that's just my opinion -- maybe its really hard to do that.)
Assuming you don't have a NX bit or W^X issue, then just do that cast above and have a ball.

This actually works about the way you'd expect, as long as you get the function pointer syntax right. Other than security exploits, you can use this technique for performance optimization.
I should know better than to type code in with my phone, but...
unsigned char buffer[]={blah, blah, blah ...};
void (*p)() = (void (*))buffer;
p();

If you want to execute something like "pop %[register] push %[register]" as you write in your comment, yes, this is possible, but it isn't easy.
You need to either write an assembler or embed an open source assembler in your application. You feed your assembler with your char array, create the machine code (preferably PIC code, so you can omit the linking and relocating) in an other buffer and execute code in this buffer via a function pointer.
If you can guarantee there is an "as" or "gas" on the platform you run the code, you might get away with a quick and dirty hack to call "as" with your code piped in and the object code piped out.

Related

How to instrument/profile memory(heap, pointers) reads and writes in C?

I know this might be a bit vague and far-fetched (sorry, stackoverflow police!).
Is there a way, without external forces, to instrument (track basically) each pointer access and track reads and writes - either general reads/writes or quantity of reads/writes per access. Bonus if it can be done for all variables and differentiate between stack and heap ones.
Is there a way to wrap pointers in general or should this be done via custom heap? Even with custom heap I can't think of a way.
Ultimately I'd like to see a visual representation of said logs that would show me variables represented as blocks (of bytes or multiples of) and heatmap over them for reads and writes.
Ultra simple example:
int i = 5;
int *j = &i;
printf("%d", *j); /* Log would write *j was accessed for read and read sizeof(int) bytes
Attempt of rephrasing in more concise manner:
(How) can I intercept (and log) access to a pointer in C without external instrumentation of binary? - bonus if I can distinguish between read and write and get name of the pointer and size of read/write in bytes.
I guess (or hope for you) that you are developing on Linux/x86-64 with a recent GCC (5.2 in october 2015) or perhaps Clang/LLVM compiler (3.7).
I also guess that you are tracking a naughty bug, and not asking this (too broad) question from a purely theoretical point of view.
(Notice that practically there is no simple answer to your question, because in practice C compilers produce machine code close to the hardware, and most hardware do not have sophisticated instrumentations like the one you dream of)
Of course, compile with all warnings and debug info (gcc -Wall -Wextra -g). Use the debugger (gdb), notably its watchpoint facilities which are related to your issue. Use also valgrind.
Notice also that GDB (recent versions like 7.10) is scriptable in Python (or Guile), and you could code some scripts for GDB to assist you.
Notice also that recent GCC & Clang/LLVM have several sanitizers. Use some of the -fsanitize= debugging options, notably the address sanitizer with -fsanitize=address; they are instrumenting the code to help in detecting pointer accesses, so they are sort-of doing what you want. Of course, the performance of the instrumented generated code is decreasing (depending on the sanitizer, can be 10 or 20% or a factor of 50x).
At last, you might even consider adding your own instrumentation by customizing your compiler, e.g. with MELT -a high level domain specific language designed for such customization tasks for GCC. This would take months of work, unless you are already familiar with GCC internals (then, only several weeks). You could add an "optimization" pass inside GCC which would instrument (by changing the Gimple code) whatever accesses or stores you want.
Read more about aspect-oriented programming.
Notice also that if your C code is generated, that is if you are meta-programming, then changing the C code generator might be very relevant. Read more about reflection and homoiconicity. Dynamic software updating is also related to your issues.
Look also into profiling tools like oprofile and into sound static source analyzers like Frama-C.
You could also run your program inside some (instrumenting) emulator (like Qemu, Unisim, etc...).
You might also compile for a fictitious architecture like MMIX and instrument its emulator.

Assigning (const char *) to function pointer executing a hex code

I found a C code that looks like this:
#include <stdio.h>
char code[] =
"\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x46\x61\x74\x61\x75\xf2\x81\x7e"
"\x08\x45\x78\x69\x74\x75\xe9\x8b\x7a\x24\x01\xc7\x66\x8b\x2c"
"\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf\xfc\x01\xc7\x68\x72\x6c"
"\x64\x01\x68\x6c\x6f\x57\x6f\x68\x20\x48\x65\x6c\x89\xe1\xfe"
"\x49\x0b\x31\xc0\x51\x50\xff\xd7";
int main(void)
{
int (*func)();
func = (int(*)()) code;
(int)(*func)();
return 0;
}
For the given HEX CODE this program runs well and printing ("HelloWorld"). I was thinking that the HEX CODE is some machine instructions and by calling a function pointer that's pointing to that CODE we are executing that CODE.
Was my thought right? is there something to improve it?
How this HEX CODE gets generated?
Tanks for advance.
You are correct that by forcing a function pointer like this you are calling into machine instructions written as a hexadecimal string variable.
I doubt that a program like this would work on any CPU since about 2005.
On most RISC CPUs (like ARM) and on all Intel and AMD CPUs that support 64-bit, memory pages have a No Execute bit. Or in reverse an Execute bit.
On memory pages that do not have an Execute bit, the CPU will not run code. Compilers do not put variables into executable memory pages.
In order to run injected shell codes, attackers now have to use "return into libc" or function pointer overwrite attacks which set things up to call mprotect or VirtualProtect to set the execute bit on their shell code. Either that or get it injected into a executable space such as the Java, .NET, or Javascript JIT compiler uses.
Security hardened kernels will deny the ability to call mprotect. Once the program's address space is set by the dynamic library loader, it sets a security flag and no new executable pages can be created.
In order to make it always work you could assign some executable_readwrite space with malloc or the like and put the code in there and then execute it. Then there won't be any access violation faults.
void main(int argc, char** argv)
{
void* PointerToNewMemoryRegion=0;
void (*FunctionPointer) ();
PointerToNewMemoryRegion=VirtualAlloc(RandomPointer,113,MEM_COMMIT | MEM_RESERVE,PAGE_EXECUTE_READWRITE);
if (PointerToNewMemoryRegion == NULL)
{
std::cout<<"Failed to Allocate Memory region Error code: "<<GetLastError();
return;
}
memcpy(PointerToNewMemoryRegion, code,113);
FunctionPointer = (void(*)()) PointerToNewMemoryRegion;
(void)(*FunctionPointer) ();
VirtualFree(PointerToNewMemoryRegion,113,MEM_DECOMMIT)
}
but the code never returns to my code to execute so my last line is pointless. So my code has a memory leak.
To ask this question from a "general C" point of view isn't all that meaningful.
First of all, your code has many major problems:
The literal "\xFF\xFF\xFF" equals 0xFFFFFF00, not 0x00FFFFFF as may or may not have been the intention.
What this hex code means and if it is at all meaningful, is endian-dependent and also depends on the address bus width of the given CPU.
As others have mentioned, casts between function pointers and regular pointers isn't supported or well-defined by C, the C standard lists it as a "common extension".
That being said, code like this has about one single purpose, and that is various forms of boot loaders and self-updating software used in embedded systems.
Suppose for example that you have a boot loader program that is tasked with re-programming something in the very same segment of flash memory where said program itself is executed from. That is impossible because of the way the memory hardware works. So in order to do so, you would have to execute the actual flash programming routine from RAM. Since the array of hex gibberish is stored in RAM, the program can execute from there with the function pointer trick, assuming that the C compiler has a non-standard extension that allows the cast.
As for how to generate the code, you either write it all in assembler and then translate the assembler instructions to op codes manually (very tedious). Or more likely, you write the function in C and then disassemble it and copy/paste the op codes from the disassembly.
The latter is more dangerous though, as the critical part of getting code like this to work is calling convention: you must be absolutely sure that the function stacks/unstacks things properly when it is called and when it is done, restoring the contents of any CPU registers used etc. Which may force you to write part of the function in assembler anyhow. Needless to say, the code will be completely non-portable.

Can a C program modify its executable file?

I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
char* str = "Hello World\n";
int main(int argc, char* argv) {
printf(str);
FILE * file = fopen(argv, "r+");
fseek(file, 0x1000, SEEK_SET);
fputs("Goodbyewrld\n", file);
fclose(file);
return 0;
}
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.
EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?
Is there a workaround?
Thanks
On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.
When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).
It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).
At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.
If you are using Windows, you can do the following:
Step-by-Step Example:
Call VirtualProtect() on the code pages you want to modify, with the PAGE_WRITECOPY protection.
Modify the code pages.
Call VirtualProtect() on the modified code pages, with the PAGE_EXECUTE protection.
Call FlushInstructionCache().
For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)
It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.
Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.
With other systems you might not even know where the file is.
All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.
However, if you really want to see own self-modifying code in action, you have some possibilities available:
Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick
If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.
If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.
If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.
If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).
When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:
create a new, empty selector ("high" part of far pointers)
set its attributes to that of the XO area
the new selector's access properties must be set RO DATA if you only want to look at what's in it
if you want to modify the data the access properties must be set to RW DATA
So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.
Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."
I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).
to solve the problem my program used thunking in the following way
defined 32-bit far pointers (sel_16:offs_32) using structures
allocated 32-bit data areas (<=> >64KB size) using global memory and received them in 16-bit far pointer (sel_16:offs_16) format
filled in the data in the structures by copying the selector, then calculating the offset using 16-bit multiplication with 32-bit results.
loaded the pointer/structure into es:ebx using the instruction size override prefix
accessed the data using a combination of the instruction size and operand size prefixes
Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).
There are non-portable ways to do this on many platforms. In Windows you can do this with WriteProcessMemory(), for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world! to longer Hello World you'll need to extend the data segment in file).
I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.
If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.
On newer versions of Windows CE (atleast 5.x an newer) where apps run in user space, (compared to earlier versions where all apps ran in supervisor mode), apps cannot even read it's own executable file.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Checking stack usage at compile time

Is there a way to know and output the stack size needed by a function at compile time in C ?
Here is what I would like to know :
Let's take some function :
void foo(int a) {
char c[5];
char * s;
//do something
return;
}
When compiling this function, I would like to know how much stack space it will consume whent it is called. This might be useful to detect the on stack declaration of a structure hiding a big buffer.
I am looking for something that would print something like this :
file foo.c : function foo stack usage is n bytes
Is there a way not to look at the generated assembly to know that ? Or a limit that can be set for the compiler ?
Update : I am not trying to avoid runtime stack overflow for a given process, I am looking for a way to find before runtime, if a function stack usage, as determined by the compiler, is available as an output of the compilation process.
Let's put it another way : is it possible to know the size of all the objects local to a function ? I guess compiler optimization won't be my friend, because some variable will disappear but a superior limit is fine.
Linux kernel code runs on a 4K stack on x86. Hence they care. What they use to check that, is a perl script they wrote, which you may find as scripts/checkstack.pl in a recent kernel tarball (2.6.25 has got it). It runs on the output of objdump, usage documentation is in the initial comment.
I think I already used it for user-space binaries ages ago, and if you know a bit of perl programming, it's easy to fix that if it is broken.
Anyway, what it basically does is to look automatically at GCC's output. And the fact that kernel hackers wrote such a tool means that there is no static way to do it with GCC (or maybe that it was added very recently, but I doubt so).
Btw, with objdump from the mingw project and ActivePerl, or with Cygwin, you should be able to do that also on Windows and also on binaries obtained with other compilers.
StackAnlyser seems to examinate the executable code itself plus some debugging info.
What is described by this reply, is what I am looking for, stack analyzer looks like overkill to me.
Something similar to what exists for ADA would be fine. Look at this manual page from the gnat manual :
22.2 Static Stack Usage Analysis
A unit compiled with -fstack-usage will generate an extra file that specifies the maximum amount of stack used, on a per-function basis. The file has the same basename as the target object file with a .su extension. Each line of this file is made up of three fields:
* The name of the function.
* A number of bytes.
* One or more qualifiers: static, dynamic, bounded.
The second field corresponds to the size of the known part of the function frame.
The qualifier static means that the function frame size is purely static. It usually means that all local variables have a static size. In this case, the second field is a reliable measure of the function stack utilization.
The qualifier dynamic means that the function frame size is not static. It happens mainly when some local variables have a dynamic size. When this qualifier appears alone, the second field is not a reliable measure of the function stack analysis. When it is qualified with bounded, it means that the second field is a reliable maximum of the function stack utilization.
I don't see why a static code analysis couldn't give a good enough figure for this.
It's trivial to find all the local variables in any given function, and the size for each variable can be found either through the C standard (for built in types) or by calculating it (for complex types like structs and unions).
Sure, the answer can't be guaranteed to be 100% accurate, since the compiler can do various sorts of optimizations like padding, putting variables in registers or completely remove unnecessary variables. But any answer it gives should be a good estimate at least.
I did a quick google search and found StackAnalyzer but my guess is that other static code analysis tools have similar capabilities.
If you want a 100% accurate figure, then you'd have to look at the output from the compiler or check it during runtime (like Ralph suggested in his reply)
Only the compiler would really know, since it is the guy that puts all your stuff together. You'd have to look at the generated assembly and see how much space is reserved in the preamble, but that doesn't really account for things like alloca which do their thing at runtime.
Assuming you're on an embedded platform, you might find that your toolchain has a go at this. Good commercial embedded compilers (like, for example the Arm/Keil compiler) often produce reports of stack usage.
Of course, interrupts and recursion are usually a bit beyond them, but it gives you a rough idea if someone has committed some terrible screw-up with a multi megabyte buffer on the stack somewhere.
Not exactly "compile time", but I would do this as a post-build step:
let the linker create a map file for you
for each function in the map file read the corresponding part of the executable, and analyse the function prologue.
This is similar to what StackAnalyzer does, but a lot simpler. I think analysing the executable or the disassembly is the easiest way you can get to the compiler output. While the compiler knows those things internally, I am afraid you will not be able to get it from it (you might ask the compiler vendor to implement the functionality, or if using open source compiler, you could do it yourself or let someone do it for you).
To implement this you need to:
be able to parse map file
understand format of the executable
know what a function prologue can look like and be able to "decode" it
How easy or difficult this would be depends on your target platform. (Embedded? Which CPU architecture? What compiler?)
All of this definitely can be done in x86/Win32, but if you never did anything like this and have to create all of this from the scratch, it can take a few days before you are done and have something working.
Not in general. The Halting Problem in theoretical computer science suggests that you can't even predict if a general program halts on a given input. Calculating the stack used for a program run in general would be even more complicated. So: no. Maybe in special cases.
Let's say you have a recursive function whose recursion level depends on the input which can be of arbitrary length and you are already out of luck.

Resources