I was wondering if there would be a convenient way to copy the current stack frame, move it somewhere else, and then 'return' from the function, from the new location?
I have been playing around with setjmp and longjmp while allocating large arrays on the stack to force the stack pointer away. I am familiar with the calling conventions and where arguments to functions end up etc, but I am not extremely experienced with pointer arithmetic.
To describe the end goal in general terms; The ambition is to be able to allocate stack frames and to jump to another stack frame when I call a function (we can call this function switch). Before I jump to the new stack frame, however, I'd like to be able to grab the return address from switch so when I've (presumably) longjmpd to the new frame, I'd be able to return to the position that initiated the context switch.
I've already gotten some inspiration of how to imitate coroutines using longjmp an setjmp from this post.
If this is possible, it would be a component of my current research, where I am trying to implement a (very rough) proof of concept extension in a compiler. I'd appreciate answers and comments that address the question posed in my first paragraph, only.
Update
To try and make my intention clearer, I wrote up this example in C. It needs to be compiled with -fno-stack-protector. What i want is for the local variables a and b in main to not be next to each other on the stack (1), but rather be separated by a distance specified by the buffer in call. Furthermore, currently this code will return to main twice, while I only want it to do so once (2). I suggest you read the procedures in this order: main, call and change.
If anyone could answer any of the two question posed in the paragraph above, I would be immensely grateful. It does not have to be pretty or portable.
Again, I'd prefer answers to my questions rather than suggestions of better ways to go about things.
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
jmp_buf* buf;
long* retaddr;
int change(void) {
// local variable to use when computing offsets
long a[0];
for(int i = 0; i < 5; i++) a[i]; // same as below, not sure why I need to read this
// save this context
if(setjmp(*buf) == 0) {
return 1;
}
// the following code runs when longjmp was called with *buf
// overwrite this contexts return address with the one used by call
a[2] = *retaddr;
// return, hopefully now to main
return 1;
}
static void* retain;
int call() {
buf = (jmp_buf*)malloc(sizeof(jmp_buf));
retaddr = (long*) malloc(sizeof(long));
long a[0];
for(int i = 0; i < 5; i++) a[i]; // not sure why I need to do this. a[2] reads (nil) otherwise
// store return address
*retaddr = a[2];
// allocate local variables to move the stackpointer
char n[1024];
retain = n; // maybe cheat the optimiser?
// get a jmp_buf from another context
change();
// jump there
longjmp(*buf, 1);
}
// It returns to main twice, I am not sure why
int main(void) {
char a;
call(); // this function should move stackpointer (in this case, 1024 bytes)
char b;
printf("address of a: %p\n", &a);
printf("address of b: %p\n", &b);
return 1;
}
This is possible, it is what multi-tasking schedulers do, e.g. in embedded environments.
It is however extremely environment-specific and would have to dig into the the specifics of the processor it is running on.
Basically, the possible steps are:
Determine the registers which contain the needed information. Pick them by what you need, they are probably different from what the compiler uses on the stack for implementing function calls.
Find out how their content can be stored (most likely specific assembler instructions for each register).
Use them to store all contents contiguosly.
The place to do so is probably allocated already, inside the object describing and administrating the current task.
Consider not using a return address. Instead, when done with the "inserted" task, decide among the multiple task datasets which describe potential tasks to return to. That is the core of scheduling. If the return address is known in advance, then it is very similar to normal function calling. I.e. the idea is to potentially return to a different task than the last one left. That is also the reason why tasks need their own stack in many cases.
By the way, I don't think that pointer arithmetic is the most relevant tool here.
The content of the registers which make the stack frame are in registers, not anywhere in memory which a pointer can point to. (At least in most current systems, C64 staying out of this....).
tl;dr - no.
(On every compiler worth considering): The compiler knows the address of local variables by their offset from either the sp, or a designated saved stack pointer, the frame or base pointer. a might have an address of (sp+1), and b might have an address of (sp+0). If you manage to successfully return to main with the stack pointer lowered by 1024; these will still be known as (sp+1), (sp+0); although they are technically now (sp+1-1024), (sp+0-1024), which means they are no longer a & b.
You could design a language which fixed the local allocation in the way you consider, and that might have some interesting expressiveness, but it isn't C. I doubt any existing compiler could come up with a consistent handling of this. To do so, when it encountered:
char a;
it would have to make an alias of this address at the point it encountered it; say:
add %sp, $0, %r1
sub %sp, $1, %sp
and when it encountered
char b;
add %sp, $0, %r2
sub %sp, $1, %sp
and so on, but one it runs out of free registers, it needs to spill them on the stack; and because it considers the stack to change without notice, it would have to allocate a pointer to this spill area, and keep that stored in a register.
Btw, this is not far removed from the concept of a splayed stack (golang uses these), but generally the granularity is at a function or method boundary, not between two variable definitions.
Interesting idea though.
Related
Our company bought a proprietary C function: we have a compiled library ProcessData.a and an interface file to call it:
# ProcessData.h
void ProcessData(char* pointer_to_data, int data_len);
We want to use this function on an ARM embedded CPU and we want to know how much stack space it might use.
Question: how to measure the stack usage of an arbitrary function?
What I tried so far is to implement the following helper functions:
static int* stackPointerBeforeCall;
void StartStackMeasurement(void) {
asm ("mov %0, sp" : "=r"(stackPointerBeforeCall));
// For some reason I can't overwrite values immediately below the
// stack pointer. I suspect a return address is placed there.
static int* pointer;
pointer = stackPointerBeforeCall - 4;
// Filling all unused stack space with a fixed constant
while (pointer != &_sstack) {
*pointer = 0xEEEEEEEE;
pointer--;
}
*pointer = 0xEEEEEEEE;
}
void FinishStackMeasurement(void) {
int* lastUnusedAddress = &_sstack;
while (*lastUnusedAddress == 0xEEEEEEEE) {
lastUnusedAddress++;
}
// Printing how many stack bytes a function has used
printf("STACK: %d\n", (stackPointerBeforeCall-lastUnusedAddress)*sizeof(int));
}
And then use them just before and after the function call:
StartStackMeasurement();
ProcessData(array, sizeof(array));
FinishStackMeasurement();
But this seems like a dangerous hack - especially the part where I am subtracting 4 from the stackPointerBeforeCall and overwriting everything below. Is there a better way?
Compile the program and analyze the assembly or machine code for the function in question. Many functions use the stack in a static manner, and this static size can be reasoned by analysis of the compiled code. Some functions dynamically allocate stack space based on some computation, usually associated with some input parameter. In those cases, you'll see different instructions being used to allocate stack space, and will have to work back to reason how the dynamic stack size might be derived.
Of course, this analysis would have to be redone with updates to the function (library).
You can use getrusage which is a function that gets you the resource usage of your software, in particular ru_isrss which is
An integral value expressed the same way, which is the amount of unshared memory used for stack space
(source)
You can then compare it to the stack usage of your program with a mocked call to the library.
However, this will only work if your system has implemented ru_isrss (unlike linux), otherwise the field will be set to 0.
I have an struct variable which is passed like as follows:
//function definition
void function1(const Node* aVAR1)
{
Node* value=NULL;
.....
}
int main()
{
Node* aVAR=NULL;
aVAR=x.value;
function1(aVAR);
}
Here, when I run this in gdb, and step into function1(), I see for variable aVAR one temporary memory address is created.
GDB:
21 aVAR=x.value;
(gdb) p aVAR
$5 = (Node *) 0x654321
(gdb) n
Breakpoint 1, function1(aVAR1=0x7ffffffffebcdf ) at debug/../abc.c:12
12 {
(gdb) p aVAR1
$6 = (const Node *) 0x7ffffffffebcdf
For example,
Initially, the address of aVAR is 0x654321
Later for a short while until the first instruction in function1() is not executed, aVAR1 is kept in some temporary address like 0x7ffffffffebcdf.
After executing Node* value=NULL; which is first instruction in the function1(), the aVar1's address is 0x654321 again.
but this temporary (0x7ffffffffebcdf) address is not cleaned up: even after the function exits, 0x7ffffffffebcdf is not cleared
I want 0x7ffffffffebcdf to be cleared after function exits but that 0x7ffffffffebcdf address does not have a pointer through which I can access this memory. Is there any option while linking in GCC through which I can prevent this?
If I add a malloc for aVAR and clear it later using memset and free, the problem gets resolved BUT logically when I see , I lose the reference to the memory block allocated by malloc() , and I won't be able to free() the allocated memory (causing memory leak ).
In what you presented, you have two variables called aVAR. The first is local var in main, and the second is function1's parameter. Both are in automatic storage (or "temporary" storage, as you call it), and thus will cease to exist when the function containing them exits. Nothing special needs to be done to free them.
Only the pointed structure needs to be freed (assuming it was malloc'ed), and that only needs to be done once, no matter how many pointers you had to it in its lifetime.
In short, all you need is one free per malloc/calloc. (Though keep in mind that strdup will call malloc, and passing NULL to realloc is effectively a malloc.)
I want 0x7ffffffffebcdf to be cleared after function exits...
I have a limited imagination, but among the reasons I can imagine you want this is:
You think that this is still in use; it isn’t, it is out of scope, and unreachable.
If it happens to be reachable, because you have stored its address somewhere, you have made a mistake that no amount of zeroing will cure.
You have a security issue, and you want to make sure temporary memory is scrubbed.
So, given [3] there are two choices; change your code to zero it before main returns; or change your main() to be mymain():
int mymain() {
Node* aVAR=NULL;
aVAR=x.value;
function1(aVAR);
return something;
}
void clearstack() {
int data[1000];
int fd;
if ((fd = open("/dev/zero", O_RDONLY)) != -1) {
read(fd, data, sizeof data);
close(fd);
}
}
int main() {
int r = mymain();
clearstack();
return r;
}
this works because the stack addresses will overlay between the two function calls, so your 0x7f-febcdf will land in the middle of data[]. The choir of implementation defined behaviour should be warming up now.
but really, you would be better off with:
int mymain() {
Node* aVAR=NULL;
aVAR=x.value;
function1(aVAR);
aVAR = 0;
dummyfunction(&aVAR);
return aVAR == 0;
}
Note that by providing the address of aVAR to dummyfunction, you preturb the compilers ability to remove what it might consider useless. This sort of behavior is difficult to predict, however, because it binds your program source to whatever version of whatever compiler is at your disposal; not a great prospect.
If volatile had any sort of rigor in its definition, it would be useful here, but it hasn't.
A little better would be to use malloc() to acquire the variable, then you are bound by a contract that this is memory [ whereas a local variable could be register only ], and you can scrub it before freeing it. It would be at the outer reaches of unacceptable behavior for the compiler to optimize out the scrub. It still might leave data sitting in some registers, which might leak out.
All this said; if an attacker is really out for uncovering secrets that are plaintext in your program, you might not be able to stop them. They could start your program under a debugger or hypervisor, and inspect the data at will.
There are concepts in some modern processors where the cpu can construct a sort of enclave where secrets can be safely unwrapped; but there are many flaws. ARM TrustZone's Secure/Normal world vs. OS's kernel/user mode or x86's Ring0/1/2/3? has more info.
I'm trying to test 2 of my functions that sort of mimic setjmp and longjmp for a homework - which is pretty difficult since we're not allowed to use built in functions or assembly asm() to implement the longjmp and setjmp functions. (Yes, that's really the assignment.)
Problem: I keep getting wrong return values. So, in short, when main() calls foo() and foo() calls bar(), and bar() calls longjump(), then bar() should not return to foo() but instead setjmp() should return to main with return value of 1 which should print "error" (see main() below).
Instead, my output comes out as:
start foo
start bar
segmentation fault
The segmentation fault, i tried fixing by initializing the pointer *p with malloc, but that didn't seem to do anything. Although, would the segmentation fault, be the reason why im not getting the correct return values?
code:
#include <stdio.h>
#include <stdlib.h>
int setjmp(int v);
int longjmp(int v);
int foo(void);
int bar(void);
int *add;
int main(void) {
int r;
r = setjmp(r);
if(r == 0) {
foo();
return(0);
} else {
printf("error\n");
return(2);
}
}
int _main(void) {
return(0);
}
int setjmp(int v)
{
add = &v;
return(0);
}
int longjmp(int v)
{
int *p;
p = &v;
*(p - 1) = *add;
return(1);
}
int foo(void) {
printf("start foo\n");
bar();
return(0);
}
int bar(void) {
int d;
printf("start bar\n");
longjmp(d);
return(0);
}
Implementing setjmp() and longjmp() requires access to the stack pointer. Unfortunately, the assignment you're working from has explicitly banned you from using every sensible method to do this (i.e, using assembly, or using compiler builtins to access the stack pointer).
What's worse is, they've mangled the definition of setjmp() and longjmp() in their sample code. The argument needs to be a type that resolves to an array (e.g, typedef int jmp_buf[1]), not an int…
Anyways. You need some way to reliably find the old stack pointer from a stack frame in C. Probably the best way of doing this will be to define an array on the stack, then look "behind" it…
void get_sp(void) {
int x[1];
sp = x[-1]; // or -2 or -3, etc…
The exact offset will depend on what compiler you're using, as well as possibly on what arguments your function takes and what other local variables the function has. You will need to experiment a bit to get this right. Run your application in the simulator, and/or look at the generated assembly, to make sure you're picking up the right value.
The same trick will probably work to set the stack pointer when "returning" from longjmp(). However, certain compiler optimizations may make this difficult, especially on architectures with a link register -- such as MIPS. Make sure compiler optimizations are disabled. If all else fails, you may need to call a dummy function in longjmp() to force the compiler to save the link register on the stack, rather than leaving it in a register (where it can't be overwritten).
You are going to need to deal with the link register, the stack pointer, and the frame pointer (you would normally also have to save and restore all of the save registers, but I don't think we need to in order to make this example work).
Take a look at the arg3caller function here. Upon entry, it stores the link register and the frame pointer on the stack, and sets the frame pointer to point to the new stack frame. It then calls args3, sets the return value, and, most importantly, copies the frame pointer back into the stack pointer. It then pops the link register and the original frame pointer from where the stack pointer is now located, and jumps to the link register. If you look at args3, it saves the frame pointer into the stack and then restores it from the stack.
So, arg3caller can be longjmp, but if you want it to return with a different stack pointer than it entered with, you are going to have to change the frame pointer, because the frame pointer gets copied into the stack pointer at then end. The frame pointer can be modified by having args3 (a dummy function called by longjmp) modify the copy of the frame pointer that it saved in the stack.
You will need to make setjmp also call a dummy function in order to get the link register and frame pointer stored on the stack in the same way. You can then copy the link register and frame pointer out of setjmp's stack frame into globals (normally, setjmp would copy stuff into the provided jmpbuf, but here, the arguments to setjmp and longjmp are useless, so you have to use globals), as well as the address of the frame. Then, longjmp must copy the saved link register and frame pointer back into the same address, and have the dummy leaf function change the saved frame pointer to that same address. Thus, the dummy leaf function will copy that address into the frame pointer and return to longjmp, which will copy it into the stack pointer. It will then restore the frame pointer and the link register from that stack frame (that you populated), thus returning with everything in the state that it was when setjmp originally returned (except the return value will be different).
Note that you can access these fields by using the negative indexing of a local array trick described by #duskwuff. You should initially compile with the -S flag so that you can see what asm gcc is generating so that you can see where the important registers are being saved in the stack (and how your code might perturb all of that).
Edit:
I don't have immediate access to a MIPS gcc, but I found this, and put it in MIPS gcc 5.4 mode. Playing around, I found that non-leaf functions store lr and fp immediately below where the argument would be placed on the stack (the argument is actually passed in a0, but gcc leaves room for it on the stack in case the callee needs to store it). By having setjmp call a leaf function, we can ensure that setjmp is a non-leaf so that its lr is saved on the stack. We can then save the address of the arg, and the lr and fp that are stored immediately below it (using negative indexing), and return 0. Then, in longjmp, we can call a leaf function to ensure the lr is saved on the stack, but also have the leaf change its stacked fp to the saved sp. Upon return to longjmp, the fp will be pointing at the original frame, which we can re-populate with the saved lr and fp. Returning from longjmp will copy the fp back into the sp and restore the lr and fp from our re-populated frame, making it appear that we are returning from setjmp. This time however, we return 1 so the caller can differentiate the true return from setjmp to the fake one engineered by longjmp.
Note that I have only eyeballed this code, and have not actually executed it!! Also, it must be compiled with optimization disabled (-O0). If you enable any kind of optimization, the compiler inlines the leaf functions and turns both setjmp and longjmp into empty functions. You should see what your compiler does with this to understand how the stack frames are constructed. Again, we're well and truly in the land of undefined behavior, and even changes in gcc version could upset everything. You should also single step the program (using gdb or spim) to make sure you understand what's going on.
struct jmpbuf {
int lr;
int fp;
int *sp;
};
static struct jmpbuf ctx;
static void setjmp_leaf(void) { }
int setjmp(int arg)
{
// call the leaf so that our lr is saved
setjmp_leaf();
// the address of our arg should be immediately
// above the lr and fp
ctx.sp = &arg;
// lr is immediately below arg
ctx.lr = (&arg)[-1];
// fp is below that
ctx.fp = (&arg)[-2];
return 0;
}
static void longjmp_leaf(int arg)
{
// overwrite the caller's frame pointer
(&arg)[-1] = (int)ctx.sp;
}
int longjmp(int arg)
{
// call the leaf so that our lr is saved
// but also to change our fp to the save sp
longjmp_leaf(arg);
// repopulate the new stack frame with the saved
// lr and fp. &arg is calculated relative to fp,
// which was modified by longjmp_leaf. &arg isn't
// where it used to be!
(&arg)[-1] = ctx.lr;
(&arg)[-2] = ctx.fp;
// this should restore the saved fp and lr
// from the new frame, so it looks like we're
// returning from setjmp
return 1;
}
Good luck!
How does a compiler know if something is allocated on the heap or stack, for instance if I made a variable in a function and returned the address of the variable, the compiler warns me that "function returns address of a local variable":
#include <stdio.h>
int* something() {
int z = 21;
return &z;
}
int main() {
int *d = something();
return 0;
}
I understand why this is a warning because when the function exits, the stack frame is no more and if you have a pointer to that memory and you change it's value you will cause a segmentation fault. What I wonder is how the compiler will know if that variable is allocating memory via. malloc, or how it can tell if it's a local variable on the stack?
A compiler builds a syntax tree from which it is able to analyze each part of the source code.
It builds a symbol table which associates to each symbol defined some information. This is required for many aspects:
finding undeclared identifiers
checking that types are convertible
so on
Once you have this symbol table it is quite easy to know if you are trying to return the address of a local variable since you end up having a structure like
ReturnStatement
+ UnaryOperator (&)
+ Identifier (z)
So the compiler can easily check if the identifier is a local stack variable or not.
Mind that this information could in theory propagate along assignments but in practice I don't think many compilers do it, for example if you do
int* something() {
int z = 21;
int* pz = &z;
return pz;
}
The warning goes away. With static code flow analysis you could be able to prove that pz could only refer to a local variable but in practice that doesn't happen.
The example in your question is really easy to figure out.
int* something() {
int z = 21;
return &z;
}
Look at the expression in the return statement. It takes the address of the identifier z.
Find out where z is declared. Oh, it is a local variable.
Not all cases will be as easy as this one and it's likely that you can trick the compiler into giving false positives or negatives if you write sufficiently weird code.
If you're interested in this kind of stuff, you might enjoy watching some of the talks given at CppCon'15 where static analysis of C++ code was a big deal. Some remarkable talks:
Bjarne Stroustrup: “Writing Good C++14”
Herb Sutter: “Writing Good C++14… By Default”
Neil MacIntosh: “Static Analysis and C++: More Than Lint”
The compiler knows what chunk of memory is holding the current stack. Every time a function is called it creates a new stack and moves the previous frame and stack pointers appropriately which effectively give it a beginning and endpoint for the current stack in memory. Checking to see if you're trying to return a pointer to memory that's about to get freed is relatively simple given that setup.
What I wonder is how the compiler will know if that variable is
allocating memory via. malloc, or how it can tell if it's a local
variable on the stack?
The compiler has to analyse all the code and generate machine code from it.
When functions need to be called, the compiler has to push the parameters on the stack (or reserve registers for them), update the stack pointer, look if there are local variables, initialize those on the stack too and update the stack pointer again.
So obviously the compiler knows about local variables being pushed on the stack.
I want to know when exactly the memory is cleared in stack which is allocated for local function calls. I have seen in some video tutorial when the function call is returned to main the memory which is allocated for local function is cleared. I have few questions on below program, please explain.
#include<stdio.h>
void print(){
printf("testing \n");
}
int* sum(int* a, int* b){
int c = *a + *b;
return &c;
}
int main(){
int a=3,b=2;
int *ptr = sum(&a,&b);
print();
printf("sum is: %d",*ptr);
return 0;
}
when I run the above program, it is printing garbage value which is expected. But if I comment "print()" function in main and then run the program it is printing correct value of sum.
Is this mean that even though the execution of local function is completed in stack, until there is another function call to the stack, the previous allocated memory is not cleared ?
If I remove the "printf" statement in "print()" and keep the "print()" call in main, then I could see the result of sum as normal. Why it didn't overwrite the memory in stack?
C has no stack, the word stack is not even mentioned in the standard (C89, C99 or C11). An implementation may use a stack to provide the behavioural aspects of the C abstract machine but it's the abstract machine itself which the standard specifies.
So, as to when the stack is cleared (assuming it even exists), that's something that's totally up to the implementation. What you are doing is basically undefined behaviour, accessing an object after its lifetime has ended, so the results can be anything the implementation chooses.
As to why you can access the items after their lifetime has ended for a specific implementation, most likely it's because entering and exiting a function doesn't clear the stack, it simply adjusts the stack pointer (a lot more efficient than having to clear the memory as well).
So, unless something overwrites what's at that memory location (such as a subsequent call to printf), it'll probably remain at whatever it was last set to.
By way of example, here's a sample prolog code for a function:
push ebp ; Save the frame pointer.
mov ebp, esp ; Set frame pointer to current stack pointer.
sub esp, XX ; Allocate XX space for this frame.
and its equivalent epilog:
mov esp, ebp ; Restore stack pointer.
pop ebp ; Get previous frame pointer.
ret ; Return.
Note that neither the allocation of space (sub in the prolog) nor the deallocation of it (mov in the epilog) actually clears the memory it's using.
However, as stated, it's not something you should rely on.
The answer to your question is operating system specific. In a system that creates process from scratch (VMS/NT) the stack gets cleared only when the process is created. The stack is created from demand-zero pages. When a stack page is accessed for the first time, the operating system creates a new zero pages.
In forking systems the stack gets cleared out whenever a new executable is loaded. Usually the process is the same as above.
After the stack is created, whatever is put there stays there until overwritten.
The stack is managed by the operating system; not the programming languages.