Why not trap stack writes above ebp to avoid overflow exploits? - c

I've been looking at StackGuard and similar, and also Intel's new technology preview on "Control flow enforcement" (basically a shadow stack), here: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf.
Obviously there is a reason why what I'm wondering will either break everything or not protect against buffer overflows, but its simple so I'm sure someone can explain why I'm barking up the wrong tree.
Why not implement in CPU hardware an optional feature to abort/trap when writing to a stack address higher than or equal to ebp? This would protect the return address and function parameters from being overwritten via a buffer overflow.

Use of ebp as frame pointer is optional, but of course that could be changed. Worse problem is that you may legally write outside of your stack frame, such as if you got a pointer to a variable belonging to a caller:
int foo;
scanf("%d", &foo);
Obviously &foo points outside of the frame of scanf.
Function parameters don't need to be protected, they can be legally modified too. This could also be changed, however.

Related

Can I find how much stack memory is currently used in a Mac thread?

When I try to google this, all I find is stuff about getting and setting the stack limit, such as -[NSThread stackSize], but that's NOT what I want. I want to know how much memory is in actually in use on the stack in the current thread, or equivalently how much stack space remains available.
I'm hoping to figure out a stack overflow in a crash report submitted by a user. In my previous experience, a stack overflow has usually been caused by an infinite recursion, but not this time. So I'm wondering if some of my C++ functions are really using a heck of a lot more stack space than they should.
A comment suggested that I get the stack pointer at the start of the thread, and compare its value later. I happened across the question Print out value of stack pointer. It has several answers:
(The accepted answer) Take the address of a local variable.
Use a little assembly language to get the value of the stack pointer register.
Use the function __builtin_frame_address(0) in GCC or Clang.
I tried those techniques (Apple Clang, macOS 11.2). Methods 2 and 3 produced similar results, but method 1 produced absurdly different results. For one thing, method 1 gives values that increase as you go deeper into a call chain, while the others give values that decrease. What's up with this, are there two different kinds of stacks?
If you are trying to do that, I guess you want to know how much memory are you using to guess the optimum number of threads you can create of some kind.
The answer is not easy, as you normally don't have access to the stack pointer. But I'll try to devise a solution for you that will not require to access the stack pointer, while it requires to use a global variable per thread.
The idea is to force a parameter to be in the stack. Even if the ABI in your system uses register to pass parameters, if you save the address of a parameter (the actual parameter variable) into some local variable, and then after that you call a function, that takes a parameter (the type doesn't matter, as you are going to use it's address to compare both):
static char *initial_stack_pseudo_addr;
size_t save_initial_stack(char dumb)
{
/* the & operator forces dumb to be implemented in the stack */
initial_stack_pseudo_addr = &dumb;
}
size_t how_much_stack(int dumb)
{
return initial_stack_pseudo_addr - &dumb;
}
So when you start the thread, you call save_initial_stack(0);. When you want to know how much stack you have consumed, just can do the following:
size_t stack_size = how_much_stack(0);
printf("at this point I have %zi bytes of stack\n", stack_size);
Basically, what you have done is to calculate how many bytes are between the address of the local parameter of the call to save_initial_stack() to the address of the local parameter of the call you do now to get the stack size. This is approximate, but the stack changes too quick to have a precise idea.
The following example will illustrate the thing. A recursive function is called after setting the initial pointer value, then at each recursive call the current size of the stack (approximate) is computed and printed, and a new recursive call is made. The program should run until the process gets a stack overflow.
#include <stdio.h>
char *stack_at_start;
void save_stack_pointer(char dumb)
{
stack_at_start = &dumb;
}
size_t get_stack_size(char dumb)
{
return stack_at_start - &dumb;
}
void recursive()
{
printf("Stack size: %zi\n", get_stack_size(0));
recursive();
}
int main()
{
save_stack_pointer(0);
recursive();
}

Where, and why, is the x64 frame pointer supposed to point? (Windows x64 ABI)

I've been reading a long catalog of very good articles on the Windows x64 ABI. A very minor aspect of these articles is the description of the frame pointer. The general gist is that, because the Windows x64 call stack rules are so rigid, a dedicated frame pointer is typically not needed, although it is optional.
The one exception I have seen consistently noted is when alloca() is used to dynamically allocate memory on the stack. Functions doing so apparently require a frame pointer. For example, to quote from Microsoft's documentation on "Stack Allocation" (italics and bold added by me):
If space is dynamically allocated (alloca) in a function, then a nonvolatile register must be used as a frame pointer to mark the base of the fixed part of the stack and that register must be saved and initialized in the prolog. Note that when alloca is used, calls to the same callee from the same caller may have different home addresses for their register parameters.
To this, Microsoft's x64 ABI alloca() documentation cryptically adds:
_alloca is required to be 16-byte aligned and additionally required to use a frame pointer.
First of all, why must it be used? I assume for call stack unwinding on exception but I haven't yet found a satisfactory explanation.
Next question: where must it point? In the first of the two above quotations, it says it "must" be used to mark the base of the "fixed part of the stack". What's the "fixed part of the stack"? I get the impression this term denotes, in a given frame, the range of addresses that comprises (higher addresses to lower ones):
the caller return address (if you consider it part of the current function's frame);
the addresses to which non-volatile registers were saved by the function prologue; and
the addresses where local variables are being stored.
Again, I haven't found a satisfactory definition for this "fixed part". The "Stack Allocation" page I linked to above contains the diagram below along with the words "if used, the stack pointer will generally point here":
This very nifty blog post is equally vague, including a diagram stating the frame pointer "points somewhere in here", where "here" is the addresses for the saved non-volatile registers and the locals.
One last bit of crypticness, from Microsoft's MSDN article entitled "Dynamic Parameter Stack Area Construction", which contains only this:
If a frame pointer is used, the option exists to dynamically create the parameter stack area. This is not currently done in the x64 compiler.
What does "generally" mean? Where is "somewhere in here"? What's the option that exists? Is there a rule? Who cares?
Or, tl;dr: What the title asks. Any answer containing annotated assembly gratefully accepted.
The diagram makes it quite clear that the frame pointer points to the bottom of the fixed portion of the local stack frame. The "fixed portion" is the part whose size does not change and whose location is fixed relative to the initial stack pointer. In the diagram it is labelled "Local variables and saved nonvolatile registers."[1]
The precise location of the frame pointer doesn't matter to the operating system because from an information-theoretical point of view, local variables are indistinguishable from memory allocated by alloca immediately upon entry to a function.
void function1()
{
int a;
int *b = (int*)alloca(sizeof(int));
...
}
void function2()
{
int& a = *(int*)alloca(sizeof(int));
int *b = (int*)alloca(sizeof(int));
...
}
The operating system has no way of distinguishing between these two functions. They both store a on the stack directly below the nonvolatile registers.
This equivalence is why the diagram says "generally". In practice, compilers point it where indicated, but in theory they could point it anywhere inside the local frame, as long as the distance from the frame pointer to the return address is a constant.
The function needs to inform the operating system where the frame pointer is so that the stack can be unwound during exception handling. Without this information, it would not be possible to walk the stack because the frame is variable-sized.
[1] You can infer this from the fact that the text says that the frame pointer points to "the base of the fixed part of the stack" and the diagram says "The frame pointer will generally point here", and it's pointing at the base of the local variables and saved nonvolatile registers. Assuming the text and diagram are in agreement, this implies that the fixed part of the stack is the same as the local variables and saved nonvolatile registers. This is the same sort of inference you make every day without even realizing it. For example, if a story says
Sally called out to her brother. "Billy, where are you?"
You can infer that Billy is Sally's brother.
alloca is intended to be used with a size available only at runtime. As such, it will change the stack pointer by an amount that's not known at compilation time. You can normally address your local variables and arguments on the stack relative to the stack pointer due to the fixed layout, but alloca messes that up hence the need for another register that is stable. This frame pointer may point anywhere you want as long as you know the relation to the fixed area.
The frame pointer is also handy when the time comes to free the alloca memory because you can simply restore the stack pointer to a known location without having to worry about how much the stack pointer changed.
I don't think the ABI requires the frame pointer as such, or that it must be rbp or that it must point at any particular place (disclaimer: I don't use windows).

How do pointers work "under the hood" in C?

Take a simple program like this:
int main(void)
{
char p;
char *q;
q = &p;
return 0;
}
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime? If at runtime, is there some table of variables or something where it looks these things up? Does the OS keep track of them and it just asks the OS?
My question may not even make sense in the context of the correct explanation, so feel free to set me straight.
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime?
This is an implementation detail of the compiler. Different compilers can choose different techniques depending on the kind of operating system they are generating code for and the whims of the compiler writer.
Let me describe for you how this is typically done on a modern operating system like Windows.
When the process starts up, the operating system gives the process a virtual address space, of, let's say 2GB. Of that 2GB, a 1MB section of it is set aside as "the stack" for the main thread. The stack is a region of memory where everything "below" the current stack pointer is "in use", and everything in that 1MB section "above" it is "free". How the operating system chooses which 1MB chunk of virtual address space is the stack is an implementation detail of Windows.
(Aside: whether the free space is at the "top" or "bottom" of the stack, whether the "valid" space grows "up" or "down" is also an implementation detail. Different operating systems on different chips do it differently. Let's suppose the stack grows from high addresses to low addresses.)
The operating system ensures that when main is invoked, the register ESP contains the address of the dividing line between the valid and free portions of the stack.
(Aside: again, whether the ESP is the address of the first valid point or the first free point is an implementation detail.)
The compiler generates code for main that pushes the stack pointer by lets say five bytes, by subtracting from it if the stack is growing "down". It decreases by five because it needs one byte for p and four for q. So the stack pointer changes; there are now five more "valid" bytes and five fewer "free" bytes.
Let's say that q is the memory that is now in ESP through ESP+3 and p is the memory now in ESP+4. To assign the address of p to q, the compiler generates code that copies the four byte value ESP+4 into the locations ESP through ESP+3.
(Aside: Note that it is highly likely that the compiler lays out the stack so that everything that has its address taken is on an ESP+offset value that is divisible by four. Some chips have requirements that addresses be divisible by pointer size. Again, this is an implementation detail.)
If you do not understand the difference between an address used as a value and an address used as a storage location, figure that out. Without understanding that key difference you will not be successful in C.
That's one way it could work but like I said, different compilers can choose to do it differently as they see fit.
The compiler cannot know the full address of p at compile-time because a function can be called multiple times by different callers, and p can have different values.
Of course, the compiler has to know how to calculate the address of p at run-time, not only for the address-of operator, but simply in order to generate code that works with the p variable. On a regular architecture, local variables like p are allocated on the stack, i.e. in a position with fixed offset relative to the address of the current stack frame.
Thus, the line q = &p simply stores into q (another local variable allocated on the stack) the address p has in the current stack frame.
Note that in general, what the compiler does or doesn't know is implementation-dependent. For example, an optimizing compiler might very well optimize away your entire main after analyzing that its actions have no observable effect. The above is written under the assumption of a mainstream architecture and compiler, and a non-static function (other than main) that may be invoked by multiple callers.
This is actually an extraordinarily difficult question to answer in full generality because it's massively complicated by virtual memory, address space layout randomization and relocation.
The short answer is that the compiler basically deals in terms of offsets from some “base”, which is decided by the runtime loader when you execute your program. Your variables, p and q, will appear very close to the “bottom” of the stack (although the stack base is usually very high in VM and it grows “down”).
Address of a local variable cannot be completely calculated at compile time. Local variables are typically allocated in the stack. When called, each function allocates a stack frame - a single continuous block of memory in which it stores all its local variables. The physical location of the stack frame in memory cannot be predicted at compile time. It will only become known at run-time. The beginning of each stack frame is typically stored at run-time in a dedicated processor register, like ebp on Intel platform.
Meanwhile, the internal memory layout of a stack frame is pre-determined by the compiler at compile-time, i.e. it is the compiler who decides how local variables will be laid out inside the stack frame. This means that the compiler knows the local offset of each local variable inside the stack frame.
Put this all together and we get that the exact absolute address of a local variable is the sum of the address of the stack frame itself (the run-time component) and the offset of this variable inside that frame (the compile-time component).
This is basically exactly what the compiled code for
q = &p;
will do. It will take the current value of the stack frame register, add some compile-time constant to it (offset of p) and store the result in q.
In any function, the function arguments and the local variables are allocated on the stack, after the position (program counter) of the last function at the point where it calls the current function. How these variables get allocated on the stack and then deallocated when returning from the function, is taken care of by the compiler during compile time.
For e.g. for this case, p (1 byte) could be allocated first on the stack followed by q (4 bytes for 32-bit architecture). The code assigns the address of p to q. The address of p naturally then is 5 added or subtracted from the the last value of the stack pointer. Well, something like that, depends on how the value of the stack pointer is updated and whether the stack grows upwards or downwards.
How the return value is passed back to the calling function is something that I'm not certain of, but I'm guessing that it is passed through the registers and not the stack. So, when the return is called, the underlying assembly code should deallocate p and q, place zero into the register, then return to the last position of the caller function. Of course, in this case, it is the main function, so it is more complicated in that, it causes the OS to terminate the process. But in other cases, it just goes back to the calling function.
In ANSI C, all the local variables should be placed at the top of the function and is allocated once into the stack when entering the function and deallocated when returning from the function. In C++ or later versions of C, this becomes more complicated when local variables can also be declared inside blocks (like if-else or while statement blocks). In this case, the local variable is allocated onto the stack when entering the block and deallocated when leaving the block.
In all cases, the address of a local variable is always a fixed number added or subtracted from the stack pointer (as calculated by the compiler, relative to the containing block) and the size of the variable is determined from the variable type.
However, static local variables and global variables are different in C. These are allocated in fixed locations in the memory, and thus there's a fixed address for them (or a fixed offset relative to the process' boundary), which is calculated by the linker.
Yet a third variety is memory allocated on the heap using malloc/new and free/delete. I think this discussion would be too lengthy if we include that as well.
That said, my description is only for a typical hardware architecture and OS. All of these are also dependent on a wide variety of things, as mentioned by Emmet.
p is a variable with automatic storage. It lives only as long as the function it is in lives. Every time its function is called memory for it is taken from the stack, therefore, its address can change and is not known until runtime.

how to find out which variable malloc() is being assigned to?

I'm trying to track the usage of malloc'ed area through variables that point to the are in a profiler. For example, for the following assignment inside function func().
uint64_t * dictionary = (uint64_t *) malloc(sizeof(uint64_t)*128);
I need to figure out the variable name (which is 'dictionary' in the above example) that points to the malloc'ed memory region. I instrumented malloc() to record the start address and size of the allocation. However, still no knowledge of variable 'dictionary', what I'm thinking is to examine the stack frame of function func(), finding out the local pointer variable pointing to a data type that matches that of malloc'ed type. The approach would need to instrument malloc() to go back one frame to func() to find out the possible local variables, and then fuzzy match by type. Wondering whether there are any other neat ways to implement this.
In general, I would expect this to be impossible. :)
You can't, of course, assume that the variable name is available, the best bet in general would be (I guess) a stack offset in the calling function's frame. If debugging symbols are available you might perhaps be able to map that to a name, though.
I guess it's possible that there is no name; that the return address is put in a register and perhaps manipulated there, before (if ever) being written to memory. If this means your code needs to start analyzing the calling code to track what it does with the return value, that sounds difficult.
What do you want to do with the variable reference once you've isolated it? I assume you're instrumenting malloc() for debugging purposes, so probably you're going to store it somewhere.

how to find if stack increases upwards or downwards?

how to find if stack increases upwards or downwards?
This is very platform-dependent, and even application-dependent.
The code posted by Vino only works in targets where parameters are passed on the stack AND local variables are allocated from the stack, in that order. Many compilers will assign fixed memory addresses to parameters, or pass parameters in registers. While common, passing parameters on the stack is one of the least efficient ways to get data into and out of a function.
Look at the disassembly for your compiled app and see what code the compiler is generating. If your target has native stack manipulation commands (like PUSH and POP) that the compiler is using, then the CPU datasheet/reference manual will tell you which direction the stack is growing. However, the compiler may choose to implement its own stack, in which case you'll have to do some digging.
Or, read the stack pointer, push something on the stack, and read the stack pointer again. Compare the results of the first and second read to determine the direction in which the pointer moves.
For future reference: if you include some details about your target architecture (embedded? PC? Linux, Windows? GCC? VC? Watcom? blah blah blah) you'll get more meaningful answers.
One possible way is...
#include <stdio.h>
void call(int *a)
{
int b;
if (&b > a)
printf("Stack grows up.\n");
else
printf("Stack grows down.\n");
}
int main ()
{
int a;
call(&a);
return 0;
}
Brute force approach is to fill your memory with a known value say 0xFF. Push some items on the stack. Do a memory dump. Push some more items on the stack. Do another memory dump.
Create function with many local variables.
Turn off optimizations.
Either print the assembly language..
Or when debugging, display as mixed source and assembly language.
Note the stack pointer (or register) before the function is executed.
Single-step through the function and watch the stack pointer.
In general, whether a compiler uses incrementing or decrementing stack pointers is a very minor issue as long as the issue is consistent and working. This is one issue that rarely occupies my mind. I tend to concentrate on more important topics, such as quality, correctness and robustness.
I'll trust the compiler to correctly handle stack manipulation. I don't trust recursive functions, especially on embedded or restricted platforms.

Resources