Am trying to guess-timate how much stack to allocate on per thread basis.
Found hints that suggest program should scribble a known pattern (ex: 0xEF) in memory, to get upper/lower bounds of stack.
Can someone provide quick C program to do so? Is this truly the way to go?
Any other suggestions?
Thank you for assisting with this doubt.
If you have complete control of your program( code ), it's a nonsense trying to find the size because you would be the one who's telling the OS to allocate the specific amount of stack size when you're creating a thread using CreateThread or pthread_create. However, if you don't, depending on your OS, you can either call pthread_attr_getstack (on unix) or VirtualQuery(on Windows), allocate a stack-based variable, and calculate the distance between the base address of the stack and the position of your variable.
An alternative way to get an estimate of the stack usage is to read the stack pointer value in every function and update the minimum and maximum stack pointer variables. At the end of the program the difference between the two values will give you the estimate.
In order to read the value of the stack pointer you can:
Implement an assembly function (doing mov r/eax, r/esp + ret for the x86 CPU)
Do the same (w/o ret, of course) using inline assembly if supported by your compiler
Implement something like the below (which may not work always/everywhere due to code optimization)
Code:
#include <stdint.h>
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
// uintptr_t is an unsigned integer type from stdint.h
// that is capable of holding a pointer.
// If you don't have it in your compiler, use an
// equivalent, which may be size_t (stddef.h) or
// UINT_PTR (windows.h) or something else.
uintptr_t StackPointerMin = (uintptr_t)-1;
uintptr_t StackPointerMax = 0;
void UpdateStackUsageInner(int dummy, ...)
{
va_list ap;
volatile char* p;
uintptr_t StackPointer;
va_start(ap, dummy);
p = va_arg(ap, volatile char*);
StackPointer = (uintptr_t)p;
if (StackPointer < StackPointerMin) StackPointerMin = StackPointer;
if (StackPointer > StackPointerMax) StackPointerMax = StackPointer;
va_end(ap);
}
void UpdateStackUsage()
{
volatile char c = 'a';
UpdateStackUsageInner(0, &c);
}
void DoSomething(void)
{
char c[1024+1];
UpdateStackUsage();
memset(c, '*', sizeof(c));
c[sizeof(c)-1] = '\0';
printf("%s\n", c);
}
int main(void)
{
UpdateStackUsage();
DoSomething();
printf("Approximate stack usage: %lu\n",
(unsigned long)(StackPointerMax - StackPointerMin));
return 0;
}
Output:
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
****************************************************************
Approximate stack usage: 1040
I also know that some compilers support hooking function entry (and probably exit), which can simplify the task because with that you won't need to insert UpdateStackUsage(); into all/many of your functions. That's been discussed here.
Related
I have a rather huge recursive function (also, I write in C), and while I have no doubt that the scenario where stack overflow happens is extremely unlikely, it is still possible. What I wonder is whether you can detect if stack is going to get overflown within a few iterations, so you can do an emergency stop without crashing the program.
In the C programming language itself, that is not possible. In general, you can't know easily that you ran out of stack before running out. I recommend you to instead place a configurable hard limit on the recursion depth in your implementation, so you can simply abort when the depth is exceeded. You could also rewrite your algorithm to use an auxillary data structure instead of using the stack through recursion, this gives you greater flexibility to detect an out-of-memory condition; malloc() tells you when it fails.
However, you can get something similar with a procedure like this on UNIX-like systems:
Use setrlimit to set a soft stack limit lower than the hard stack limit
Establish signal handlers for both SIGSEGV and SIGBUS to get notified of stack overflows. Some operating systems produce SIGSEGV for these, others SIGBUS.
If you get such a signal and determine that it comes from a stack overflow, raise the soft stack limit with setrlimit and set a global variable to identify that this occured. Make the variable volatile so the optimizer doesn't foil your plains.
In your code, at each recursion step, check if this variable is set. If it is, abort.
This may not work everywhere and required platform specific code to find out that the signal came from a stack overflow. Not all systems (notably, early 68000 systems) can continue normal processing after getting a SIGSEGV or SIGBUS.
A similar approach was used by the Bourne shell for memory allocation.
Heres a simple solution that works for win-32. Actually resembles what Wossname already posted but less icky :)
unsigned int get_stack_address( void )
{
unsigned int r = 0;
__asm mov dword ptr [r], esp;
return r;
}
void rec( int x, const unsigned int begin_address )
{
// here just put 100 000 bytes of memory
if ( begin_address - get_stack_address() > 100000 )
{
//std::cout << "Recursion level " << x << " stack too high" << std::endl;
return;
}
rec( x + 1, begin_address );
}
int main( void )
{
int x = 0;
rec(x,get_stack_address());
}
Here's a naive method, but it's a bit icky...
When you enter the function for the first time you could store the address of one of your variables declared in that function. Store that value outside your function (e.g. in a global). In subsequent calls compare the current address of that variable with the cached copy. The deeper you recurse the further apart these two values will be.
This will most likely cause compiler warnings (storing addresses of temporary variables) but it does have the benefit of giving you a fairly accurate way of knowing exactly how much stack you're using.
Can't say I really recommend this but it would work.
#include <stdio.h>
char* start = NULL;
void recurse()
{
char marker = '#';
if(start == NULL)
start = ▮
printf("depth: %d\n", abs(&marker - start));
if(abs(&marker - start) < 1000)
recurse();
else
start = NULL;
}
int main()
{
recurse();
return 0;
}
An alternative method is to learn the stack limit at the start of the program, and each time in your recursive function to check whether this limit has been approached (within some safety margin, say 64 kb). If so, abort; if not, continue.
The stack limit on POSIX systems can be learned by using getrlimit system call.
Example code that is thread-safe: (note: it code assumes that stack grows backwards, as on x86!)
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
void *stack_limit;
#define SAFETY_MARGIN (64 * 1024) // 64 kb
void recurse(int level)
{
void *stack_top = &stack_top;
if (stack_top <= stack_limit) {
printf("stack limit reached at recursion level %d\n", level);
return;
}
recurse(level + 1);
}
int get_max_stack_size(void)
{
struct rlimit rl;
int ret = getrlimit(RLIMIT_STACK, &rl);
if (ret != 0) {
return 1024 * 1024 * 8; // 8 MB is the default on many platforms
}
printf("max stack size: %d\n", (int)rl.rlim_cur);
return rl.rlim_cur;
}
int main (int argc, char *argv[])
{
int x;
stack_limit = (char *)&x - get_max_stack_size() + SAFETY_MARGIN;
recurse(0);
return 0;
}
Output:
max stack size: 8388608
stack limit reached at recursion level 174549
Okay we are given the following code:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include "callstack.h"
#include "tweetIt.h"
#include "badguy2.c"
static char *correctPassword = "ceriaslyserious";
char *message = NULL;
int validateSanity(char *password) {
for(int i=0;i<strlen(password);i++)
if(!isalpha(password[i]))
return 0;
unsigned int magic = 0x12345678;
return badguy(password);
}
int validate(char *password) {
printf("--Validating something\n", password);
if (strlen(password) > 128) return 0;
char *passwordCopy = malloc(strlen(password) + 1);
strcpy(passwordCopy, password);
return validateSanity(passwordCopy);
}
int check(char *password, char *expectedPassword) {
return (strcmp(password, expectedPassword) == 0);
}
int main() {
char *password = "wrongpassword";
unsigned int magic = 0xABCDE;
char *expectedPassword = correctPassword;
if (!validate(password)) {
printf("--Invalid password!\n");
return 1;
}
if (check(password, expectedPassword)) {
if (message == NULL) {
printf("--No message!\n");
return 1;
} else {
tweetIt(message, strlen(message));
printf("--Message sent.\n");
}
} else {
printf("--Incorrect password!\n");
}
return 0;
}
We are supposed to trick main into sending a tweet using the function badguy. In badguy we have an offset from a previous problem which is the difference between the declaration of password in main and the argument passed to badguy. We have been instructed to use this offset to find the addresses of the correctPassword and password in main and manipulate the value in password to correctPassword so when the password check occurs, it is believed to be legitimate. I am having some trouble figuring out how to use this offset to find the addresses and continuing from there.
First of all, make sure you have good control over your compiler behavior. That is: make sure you know the calling conventions and that they're being respected (not optimized away or altered in any manner). This usually boils down to turn off optimization settings, at least for testing under more controlled conditions until a robust method is devised. Pay special attention to variables such as expectedPassword, since it is highly likely they'll be optimized away (expectedPassword might never be created in the stack, being substituted with the equivalent of correctPassword, rendering you with no stack reference to the correct password at all).
Secondly, note that "wrongpassword" is shorter than "ceriaslyserious"; in other words, if I got it straight, attempting to crack into the buffer pointed to by passwordCopy (whose size is the length of "wrongpassword" plus one) in order to copy "ceriaslyserious" into there could result in a segmentation violation. Nonetheless, it should be relatively simple to track the address of expectedPassword in the call stack, if it exists (see above), specially if you do have already an offset from main()'s stack frame.
Considering an x86 32-bit target under controlled circumstances, expectedPassword will reside 8 bytes below password (4 for password, 4 for magic if it is not optimized away). Having an offset from password to a parameter as you said, it should suffice to subtract the offset from the address of that parameter, and then add 8. The resulting pointer should be expectedPassword, which then points to the static area containing the password. Again, double check your environment. Check this for an explanation on the stack layout in x64 (the layout in the 32-bit case is similar).
Lastly, if expectedPassword does not exist in the call stack, then, since correctPassword is a global static, it will reside in a data segment, rendering the method useless. To achieve the goal in this situation, you would need to carefully scan the data segment with a more intelligent algorithm. It would probably be easier, though, to simply attempt to find the test for check()'s return value in the program text and replace with nops (after properly manipulating the page permissions to allow writing to the text segment).
If you're having problems, inspecting the resulting assembly code is the way to go. If you're using GCC, gcc -S halts the compilation just before assembling (that is, producing an assembly source code file as output). objdump -d could also help. gdb can step between instructions, show the disassembly of a frame and display register contents; check the documentation.
These exercises are specially useful to understand how security breaches occur in common programs, and to provide some basic notions on defensive programming.
I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).
Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!
So, with that in mind, I have the following small C program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int *m = malloc(sizeof(int));
*m = 0x90; // NOP instruction code
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "eax");
return 42;
}
Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path?
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way.
Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
unsigned char *m;
int main() {
unsigned int pagesize = getpagesize();
printf("pagesize: %u\n", pagesize);
m = malloc(1023+pagesize+1);
if(m==NULL) return(1);
printf("%p\n", m);
m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
printf("%p\n", m);
if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
printf("mprotect fail...\n");
return 0;
}
m[0] = 0xc9; //leave
m[1] = 0xc3; //ret
m[2] = 0x90; //nop
printf("%p\n", m);
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "ebx");
return 21;
}
Question: Am I on the right path?
I would say yes.
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of?
Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:
typedef void (*generated_function)(void);
void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;
*o++ = 0x90; // NOP
*o++ = 0xcb; // RET
func_exec();
I would like to know how in C in can copy the content of a function into memory and the execute it?
I'm trying to do something like this:
typedef void(*FUN)(int *);
char * myNewFunc;
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
}
void f1 (int *v) {
*v = 10;
}
// allocate enough spcae but how much ??
myNewFunc = allocExecutablePages(...)
/* Copy f1 somewere else
* (how? assume that i know the size of f1 having done a (nm -S foo.o))
*/
((FUN)template)(&val);
printf("%i",val);
Thanks for your answers
You seem to have figured out the part about protection flags. If you know the size of the function, now you can just do memcpy() and pass the address of f1 as the source address.
One big caveat is that, on many platforms, you will not be able to call any other functions from the one you're copying (f1), because relative addresses are hardcoded into the binary code of the function, and moving it into a different location it the memory can make those relative addresses turn bad.
This happens to work because function1 and function2 are exactly the same size in memory.
We need the length of function2 for our memcopy so what should be done is:
int diff = (&main - &function2);
You'll notice you can edit function 2 to your liking and it keeps working just fine!
Btw neat trick. Unfurtunate the g++ compiler does spit out invalid conversion from void* to int... But indeed with gcc it compiles perfectly ;)
Modified sources:
//Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
//fixed the diff address to also work when function2 is variable size
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
//printf("hello world");
int k=32;
int l=40;
return x+5+k+l;
}
int main(){
int diff = (&main - &function2);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
Output:
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ gcc memoryFun.c
Walter-Schrepperss-MacBook-Pro:cppWork wschrep$ ./a.out
pagesize: 4096, diff: 35
native: 1
memory: 83
native: 1
Another to note is calling printf will segfault because printf is most likely not found due to relative address going wrong...
Hacky solution and simple proof of concept that works for me (and compiles without warning on Mac OS X/GCC 4.2.1):
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include <sys/mman.h>
int function1(int x){
return x-5;
}
int function2(int x){
return x+5;
}
int main(){
int diff = (&function2 - &function1);
printf("pagesize: %d, diff: %d\n",getpagesize(),diff);
int (*fptr)(int);
void *memfun = malloc(4096);
if (mprotect(memfun, 4096, PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror ("mprotect");
}
memcpy(memfun, (const void*)&function2, diff);
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
fptr = memfun;
printf("memory: %d\n",(*fptr)(6) );
fptr = &function1;
printf("native: %d\n",(*fptr)(6));
free(memfun);
return 0;
}
I have tried this issue many times in C and came to the conclusion that it cannot be accomplished using only the C language. My main thorn was finding the length of the function to copy.
The Standard C language does not provide any methods to obtain the length of a function. However, one can use assembly language and "sections" to find the length. Once the length is found, copying and executing is easy.
The easiest solution is to create or define a linker segment that contains the function. Write an assembly language module to calculate and publicly declare the length of this segment. Use this constant for the size of the function.
There are other methods that involve setting up the linker, such as predefined areas or fixed locations and copying those locations.
In embedded systems land, most of the code that copies executable stuff into RAM is written in assembly.
This might be a hack solution here. Could you make a dummy variable or function directly after the function (to be copied), obtain that dummy variable's/function's address and then take the functions address to do sum sort of arithmetic using addresses to obtain the function size? This might be possible since memory is allocated linearly and orderly (rather than randomly). This would also keep function copying within a ANSI C portable nature rather than delving into system specific assembly code. I find C to be rather flexible, one just needs to think things out.
GNU libc's backtrace and In-circuit emulators/debuggers are not always available when porting code to a new platform, especially when the target is a micro C compiler such as for the Z80. (Typically a program bug would "just hang" somewhere, or crash the gadget.)
Is there an alternative to the classic "wolf fencing" method of manually inserting printf? Something simple and portable (using no C extensions) that a coder can do while developing a program that includes tracing and backtracing into a C program?
BTW: Here are a couple of other question on stackoverflow that are related, but these both use GNU GLIBC's backtrace and backtrace is often compiler/implementation specific:
Is there a function to invoke a stack dump in C?
How to generate a stacktrace when my gcc C++ app crashes
Here is the kernel of the kernel of my answer: write some code.
The kernel of my answer is: If your compiler allocates locals on the stack always, then...
Add blobs to the stack at every function entry that record the name of the function, throw in some magic numbers to maybe catch stack smashes.
typedef struct stack_debug_blob_ {
int magic1;
const char * function_name;
int magic2;
struct stack_debug_blob_ * called_by;
int magic3;
} stack_debug_blob;
stack_debug_blob * top_of_stack_debug_blobs = 0;
Create a macro ENTER(f) taking the name of the function. The macro should be about the first line of code in every function after the opening {. It adds a struct with a pointer to the (const) char * function name, a pointer to the previous struct on the stack, and maybe some magic numbers to check sanity. Make the top of blob stack pointer point at this new struct.
#define ENTER(f) \
stack_debug_blob new_stack_debug_blob = { \
MAGIC1, (f), MAGIC2, top_of_stack_debug_blobs, MAGIC3}; \
stack_debug_blob * evil_hack = (top_of_stack_debug_blobs = (&new_stack_debug_blob))
To keep things as portable as possible, all ENTER can do is declare and initialize variables. Hence the evil_hack to do a little extra computation than just initializing a variable.
Create a function to walk down the list of blobs checking pointers and magic numbers. It should signal an error (maybe print to stderr, maybe lockup the cpu with while (1) { /* nada */ }, maybe enter the debugger... depends on your hardware) if it finds things messed up.
Create a macro EXIT() that checks your stack of blobs, then de-links the topmost from the linked list. It needs to be put at the exit points of all your functions.
#define EXIT() do { \
check_debug_blobs(); \
top_of_stack_debug_blobs = new_stack_debug_blob.called_by; \
new_stack_debug_blob.magic1 -= 1; /* paranoia */ \
} while (0)
Probably will also need to replace all return's with RETURN macro calls, the RETURN macro is just like EXIT, but has a return before the } while (0).
Create a function to walk down the list of blobs printing out the function names, call it something like stacktrace or backtrace maybe.
Write a program to instrument your C code with calls to ENTER(f) and EXIT() and RETURN(x).
Left out a few details to let you have fun with it...
See also Any porting available of backtrace for uclibc?
There is an implementation at RosettaCode.org which uses the same basic idea as #jsl4tv's suggestion.
Example, given the following classic C code with built in "hang":
#include <stdio.h>
#include <stdlib.h>
void inner(int k)
{
for(;;){} /* hang */
}
void middle(int x, int y)
{
inner(x*y);
}
void outer(int a, int b, int c)
{
middle(a+b, b+c);
}
int main()
{
outer(2,3,5);
return(EXIT_SUCCESS);
}
#define STACK_TRACE_ON and #include "stack_trace.h" from RosettaCode.org then insert BEGIN(f)/ENDs where required:
#include <stdio.h>
#include <stdlib.h>
#define STACK_TRACE_ON /* compile in these "stack_trace" routines */
#include "stack_trace.h"
void inner(int k)
BEGIN(inner)
print_indent(); printf("*** Now dump the stack ***\n");
print_stack_trace();
for(;;){} /* hang */
END
void middle(int x, int y)
BEGIN(middle)
inner(x*y);
END
void outer(int a, int b, int c)
BEGIN(outer)
middle(a+b, b+c);
END
int main()
BEGIN(main)
stack_trace.on = TRUE; /* turn on runtime tracing */
outer(2,3,5);
stack_trace.on = FALSE;
RETURN(EXIT_SUCCESS);
END
Produces:
stack_trace_test.c:19: BEGIN outer[0x80487b4], stack(depth:1, size:60)
stack_trace_test.c:14: BEGIN middle[0x8048749], stack(depth:2, size:108)
stack_trace_test.c:8: BEGIN inner[0x80486d8], stack(depth:3, size:156)
stack_trace_test.c:8: *** Now dump the stack ***
stack_trace_test.c:8: inner[0x80486d8] --- stack(depth:4, size:156) ---
stack_trace_test.c:14: middle[0x8048749] --- stack(depth:3, size:108) ---
stack_trace_test.c:19: outer[0x80487b4] --- stack(depth:2, size:60) ---
stack_trace_test.c:24: main[0x804882a] --- stack(depth:1, size:0) ---
stack_trace_test.c:8: --- (depth 4) ---
A well polished [open source] version of this BEGIN ~ END method would be perfect. (Esp if it has a "FINALLY" clause for exception handling).
Hints/URLs appreciated.
on Symbian there were some scripts made to go over the registers and stack looking for things that looked like code addresses.
This is not portable, but it doesn't depend on decorating the code either. This was a necessary tradeoff on a platform where byte counts mattered... and it wasn't nearly as limited as Z80! But limited enough to compile without frame-pointers and such.
To calculate a backtrace from a stack without frame-pointers you have to work up the stack not down it.