I'm trying to figure out how alloca() actually works on a memory level. From the linux man page:
The alloca() function allocates size bytes of space in the stack
frame of the caller. This temporary space is automatically freed
when the function that called alloca() returns to its caller.
Does this mean alloca() will forward the stack pointer by n bytes? Or where exactly is the newly created memory allocated?
And isn't this exactly the same as variable length arrays?
I know the implementation details are probably left to the OS and stuff. But I want to know how in general this is accomplished.
Yes, alloca is functionally equivalent to a local variable length array, i.e. this:
int arr[n];
and this:
int *arr = alloca(n * sizeof(int));
both allocate space for n elements of type int on the stack. The only differences between arr in each case is that 1) one is an actual array and the other is a pointer to the first element of an array, and 2) the array's lifetime ends with its enclosing scope, while the alloca memory's lifetime ends when the function returns. In both cases the array resides on the stack.
As an example, given the following code:
#include <stdio.h>
#include <alloca.h>
void foo(int n)
{
int a[n];
int *b=alloca(n*sizeof(int));
int c[n];
printf("&a=%p, b=%p, &c=%p\n", (void *)a, (void *)b, (void *)c);
}
int main()
{
foo(5);
return 0;
}
When I run this I get:
&a=0x7ffc03af4370, b=0x7ffc03af4340, &c=0x7ffc03af4320
Which shows that the the memory returned from alloca sits between the memory for the two VLAs.
VLAs first appeared in the C standard in C99, but alloca was around well before that. The Linux man page states:
CONFORMING TO
This function is not in POSIX.1-2001.
There is evidence that the alloca() function appeared in 32V, PWB,
PWB.2, 3BSD, and 4BSD. There is a man page for it in 4.3BSD. Linux
uses the GNU version.
BSD 3 dates back to the late 70's, so alloca was an early nonstandardized attempt at VLAs before they were added to the standard.
Today, unless you're using a compiler that doesn't support VLAs (such as MSVC), there's really no reason to use this function since VLAs are now a standardized way to get the same functionality.
The other answer precisely describes mechanics of VLAs and alloca().
However, there is significant functional difference between alloca() and automatic VLA. The lifetime of the objects.
In case of alloca() the lifetime ends when the function returns.
For VLAs the object is released when the containing block ends.
char *a;
int n = 10;
{
char A[n];
a = A;
}
// a is no longer valid
{
a = alloca(n);
}
// is still valid
As result, it is possible to easily exhaust the stack in the loop while it is not possible to do it with VLAs.
for (...) {
char *x = alloca(1000);
// x is leaking with each iteration consuming stack
}
vs
for (...) {
int n = 1000;
char x[n];
// x is released
}
Although alloca looks like a function from a syntax point of view, it can't be implemented as a normal function in a modern programming environment*. It must be regarded as a compiler feature with a function-like interface.
Traditionally C compilers maintained two pointer registers, a "stack pointer" and a "frame pointer" (or base pointer). The stack pointer delimits the current extent of the stack. The frame pointer saved the value of the stack pointer on entry to the function and is used to access local variables and to restore the stack pointer on function exit.
Nowadays most compilers do not use a frame pointer by default in normal functions. Modern debug/exception information formats have rendered it unnessacery, but they still understand what it is and can use it where needed.
In particular for functions with alloca or variable length arrays using a frame pointer allows the function to keep track of the location of it's stack frame while dynamically modifying the stack pointer to accomodate the variable length array.
For example I built the following code at O1 for arm
#include <alloca.h>
int bar(void * baz);
void foo(int a) {
bar(alloca(a));
}
and got (comments mine)
foo(int):
push {fp, lr} # save existing link register and frame pointer
add fp, sp, #4 # establish frame pointer for this function
add r0, r0, #7 # add 7 to a ...
bic r0, r0, #7 # ... and clear the bottom 3 bits, thus rounding a up to the next multiple of 8 for stack alignment
sub sp, sp, r0 # allocate the space on the stack
mov r0, sp # make r0 point to the newly allocated space
bl bar # call bar with the allocated space
sub sp, fp, #4 # restore stack pointer and frame pointer
pop {fp, pc} # restore frame pointer to value at function entry and return.
And yes alloca and variable length arrays are very similar (though as another answer points out not exactly the same). alloca seems to be the older of the two constructoins.
* With a sufficiently dumb/predictable compiler it is posible to implement alloca as a function in assembler. Specifically the compiler needs to.
Consistently create a frame pointer for all functions.
Consistently use the frame pointer rather than the stack pointer to reference local varaibles.
Consistently use the stack pointer rather than the frame pointer when setting up parameters for calls to functions.
This is apparently how it was first implemented ( https://www.tuhs.org/cgi-bin/utree.pl?file=32V/usr/src/libc/sys/alloca.s ).
I guess it's possible one could also have the actual implementation as an assembler function, but have a special case in the compiler that made it go into dumb/predictable mode when it saw alloca, I don't know if any compiler vendors did that.
The most important difference between alloca and VLAs is the failure case. The following code:
int f(int n) {
int array[n];
return array == 0;
}
int g(int n) {
int *array = alloca(n);
return array == 0;
}
The VLA has no possibility of detecting an allocation failure; which is a very un-C thing to impose on a language construct. Alloca() is thus much better designed.
Related
I was wondering if there would be a convenient way to copy the current stack frame, move it somewhere else, and then 'return' from the function, from the new location?
I have been playing around with setjmp and longjmp while allocating large arrays on the stack to force the stack pointer away. I am familiar with the calling conventions and where arguments to functions end up etc, but I am not extremely experienced with pointer arithmetic.
To describe the end goal in general terms; The ambition is to be able to allocate stack frames and to jump to another stack frame when I call a function (we can call this function switch). Before I jump to the new stack frame, however, I'd like to be able to grab the return address from switch so when I've (presumably) longjmpd to the new frame, I'd be able to return to the position that initiated the context switch.
I've already gotten some inspiration of how to imitate coroutines using longjmp an setjmp from this post.
If this is possible, it would be a component of my current research, where I am trying to implement a (very rough) proof of concept extension in a compiler. I'd appreciate answers and comments that address the question posed in my first paragraph, only.
Update
To try and make my intention clearer, I wrote up this example in C. It needs to be compiled with -fno-stack-protector. What i want is for the local variables a and b in main to not be next to each other on the stack (1), but rather be separated by a distance specified by the buffer in call. Furthermore, currently this code will return to main twice, while I only want it to do so once (2). I suggest you read the procedures in this order: main, call and change.
If anyone could answer any of the two question posed in the paragraph above, I would be immensely grateful. It does not have to be pretty or portable.
Again, I'd prefer answers to my questions rather than suggestions of better ways to go about things.
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
jmp_buf* buf;
long* retaddr;
int change(void) {
// local variable to use when computing offsets
long a[0];
for(int i = 0; i < 5; i++) a[i]; // same as below, not sure why I need to read this
// save this context
if(setjmp(*buf) == 0) {
return 1;
}
// the following code runs when longjmp was called with *buf
// overwrite this contexts return address with the one used by call
a[2] = *retaddr;
// return, hopefully now to main
return 1;
}
static void* retain;
int call() {
buf = (jmp_buf*)malloc(sizeof(jmp_buf));
retaddr = (long*) malloc(sizeof(long));
long a[0];
for(int i = 0; i < 5; i++) a[i]; // not sure why I need to do this. a[2] reads (nil) otherwise
// store return address
*retaddr = a[2];
// allocate local variables to move the stackpointer
char n[1024];
retain = n; // maybe cheat the optimiser?
// get a jmp_buf from another context
change();
// jump there
longjmp(*buf, 1);
}
// It returns to main twice, I am not sure why
int main(void) {
char a;
call(); // this function should move stackpointer (in this case, 1024 bytes)
char b;
printf("address of a: %p\n", &a);
printf("address of b: %p\n", &b);
return 1;
}
This is possible, it is what multi-tasking schedulers do, e.g. in embedded environments.
It is however extremely environment-specific and would have to dig into the the specifics of the processor it is running on.
Basically, the possible steps are:
Determine the registers which contain the needed information. Pick them by what you need, they are probably different from what the compiler uses on the stack for implementing function calls.
Find out how their content can be stored (most likely specific assembler instructions for each register).
Use them to store all contents contiguosly.
The place to do so is probably allocated already, inside the object describing and administrating the current task.
Consider not using a return address. Instead, when done with the "inserted" task, decide among the multiple task datasets which describe potential tasks to return to. That is the core of scheduling. If the return address is known in advance, then it is very similar to normal function calling. I.e. the idea is to potentially return to a different task than the last one left. That is also the reason why tasks need their own stack in many cases.
By the way, I don't think that pointer arithmetic is the most relevant tool here.
The content of the registers which make the stack frame are in registers, not anywhere in memory which a pointer can point to. (At least in most current systems, C64 staying out of this....).
tl;dr - no.
(On every compiler worth considering): The compiler knows the address of local variables by their offset from either the sp, or a designated saved stack pointer, the frame or base pointer. a might have an address of (sp+1), and b might have an address of (sp+0). If you manage to successfully return to main with the stack pointer lowered by 1024; these will still be known as (sp+1), (sp+0); although they are technically now (sp+1-1024), (sp+0-1024), which means they are no longer a & b.
You could design a language which fixed the local allocation in the way you consider, and that might have some interesting expressiveness, but it isn't C. I doubt any existing compiler could come up with a consistent handling of this. To do so, when it encountered:
char a;
it would have to make an alias of this address at the point it encountered it; say:
add %sp, $0, %r1
sub %sp, $1, %sp
and when it encountered
char b;
add %sp, $0, %r2
sub %sp, $1, %sp
and so on, but one it runs out of free registers, it needs to spill them on the stack; and because it considers the stack to change without notice, it would have to allocate a pointer to this spill area, and keep that stored in a register.
Btw, this is not far removed from the concept of a splayed stack (golang uses these), but generally the granularity is at a function or method boundary, not between two variable definitions.
Interesting idea though.
I would like to provoke a stack underflow in a C function to test security measures in my system. I could do this using inline assembler. But C would be more portable. However I can not think of a way to provoke a stack underflow using C since stack memory is safely handled by the language in that regard.
So, is there a way to provoke a stack underflow using C (without using inline assembler)?
As stated in the comments: Stack underflow means having the stack pointer to point to an address below the beginning of the stack ("below" for architectures where the stack grows from low to high).
There's a good reason why it's hard to provoke a stack underflow in C.The reason is that standards compliant C does not have a stack.
Have a read of the C11 standard, you'll find out that it talks about scopes but it does not talk about stacks. The reason for this is that the standard tries, as far as possible, to avoid forcing any design decisions on implementations. You may be able to find a way to cause stack underflow in pure C for a particular implementation but it will rely on undefined behaviour or implementation specific extensions and won't be portable.
You can't do this in C, simply because C leaves stack handling to the implementation (compiler). Similarly, you cannot write a bug in C where you push something on the stack but forget to pop it, or vice versa.
Therefore, it is impossible to produce a "stack underflow" in pure C. You cannot pop from the stack in C, nor can you set the stack pointer from C. The concept of a stack is something on an even lower level than the C language. In order to directly access and control the stack pointer, you must write assembler.
What you can do in C is to purposely write out of bounds of the stack. Suppose we know that the stack starts at 0x1000 and grows upwards. Then we can do this:
volatile uint8_t* const STACK_BEGIN = (volatile uint8_t*)0x1000;
for(volatile uint8_t* p = STACK_BEGIN; p<STACK_BEGIN+n; p++)
{
*p = garbage; // write outside the stack area, at whatever memory comes next
}
Why you would need to test this in a pure C program that doesn't use assembler, I have no idea.
In case someone incorrectly got the idea that the above code invokes undefined behavior, this is what the C standard actually says, normative text C11 6.5.3.2/4 (emphasis mine):
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined 102)
The question is then what's the definition of an "invalid value", as this is no formal term defined by the standard. Foot note 102 (informative, not normative) provides some examples:
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.
In the above example we are clearly not dealing with a null pointer, nor with an object that has passed the end of its lifetime. The code may indeed cause a misaligned access - whether this is an issue or not is determined by the implementation, not by the C standard.
And the final case of "invalid value" would be an address that is not supported by the specific system. This is obviously not something that the C standard mentions, because memory layouts of specific systems are not coverted by the C standard.
It is not possible to provoke stack underflow in C. In order to provoke underflow the generated code should have more pop instructions than push instructions, and this would mean the compiler/interpreter is not sound.
In the 1980s there were implementations of C that ran C by interpretation, not by compilation. Really some of them used dynamic vectors instead of the stack provided by the architecture.
stack memory is safely handled by by the language
Stack memory is not handled by the language, but by the implementation. It is possible to run C code and not to use stack at all.
Neither the ISO 9899 nor K&R specifies anything about the existence of a stack in the language.
It is possible to make tricks and smash the stack, but it will not work on any implementation, only on some implementations. The return address is kept on the stack and you have write-permissions to modify it, but this is neither underflow nor portable.
Regarding already existing answers: I don't think that talking about undefined behaviour in the context of exploitation mitigation techniques is appropriate.
Clearly, if an implementation provides a mitigation against stack underflows, a stack is provided. In practice, void foo(void) { char crap[100]; ... } will end up having the array on the stack.
A note prompted by comments to this answer: undefined behaviour is a thing and in principle any code exercising it can end up being compiled to absolutely anything, including something not resembling the original code in the slightest. However, the subject of exploit mitigation techniques is closely tied to the target environment and what happens in practice. In practice, the code below should "work" just fine. When dealing with this kind of stuff you always have to verify generated assembly to be sure.
Which brings me to what in practice will give an underflow (volatile added to prevent the compiler from optimising it away):
static void
underflow(void)
{
volatile char crap[8];
int i;
for (i = 0; i != -256; i--)
crap[i] = 'A';
}
int
main(void)
{
underflow();
}
Valgrind nicely reports the problem.
By definition, a stack underflow is a type of undefined behaviour, and thus any code which triggers such a condition must be UB. Therefore, you can't reliably cause a stack underflow.
That said, the following abuse of variable-length arrays (VLAs) will cause a controllable stack underflow in many environments (tested with x86, x86-64, ARM and AArch64 with Clang and GCC), actually setting the stack pointer to point above its initial value:
#include <stdint.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
uintptr_t size = -((argc+1) * 0x10000);
char oops[size];
strcpy(oops, argv[0]);
printf("oops: %s\n", oops);
}
This allocates a VLA with a "negative" (very very large) size, which will wrap the stack pointer around and result in the stack pointer moving upwards. argc and argv are used to prevent optimizations from taking out the array. Assuming that the stack grows down (default on the listed architectures), this will be a stack underflow.
strcpy will either trigger a write to an underflowed address when the call is made, or when the string is written if strcpy is inlined. The final printf should not be reachable.
Of course, this all assumes a compiler which doesn't just make the VLA some kind of temporary heap allocation - which a compiler is completely free to do. You should check the generated assembly to verify that the above code does what you actually expect it to do. For example, on ARM (gcc -O):
8428: e92d4800 push {fp, lr}
842c: e28db004 add fp, sp, #4, 0
8430: e1e00000 mvn r0, r0 ; -argc
8434: e1a0300d mov r3, sp
8438: e0433800 sub r3, r3, r0, lsl #16 ; r3 = sp - (-argc) * 0x10000
843c: e1a0d003 mov sp, r3 ; sp = r3
8440: e1a0000d mov r0, sp
8444: e5911004 ldr r1, [r1]
8448: ebffffc6 bl 8368 <strcpy#plt> ; strcpy(sp, argv[0])
This assumption:
C would be more portable
is not true. C doesn't tell anything about a stack and how it is used by the implementation. On your typical x86 platform, the following (horribly invalid) code would access the stack outside of the valid stack frame (until it is stopped by the OS), but it would not actually "pop" from it:
#include <stdarg.h>
#include <stdio.h>
int underflow(int dummy, ...)
{
va_list ap;
va_start(ap, dummy);
int sum = 0;
for(;;)
{
int x = va_arg(ap, int);
fprintf(stderr, "%d\n", x);
sum += x;
}
return sum;
}
int main(void)
{
return underflow(42);
}
So, depending on what exactly you mean with "stack underflow", this code does what you want on some platform. But as from a C point of view, this just exposes undefined behavior, I wouldn't suggest to use it. It's not "portable" at all.
Is it possible to do it reliably in standard compliant C? No
Is it possible to do it on at least one practical C compiler without resorting to inline assembler? Yes
void * foo(char * a) {
return __builtin_return_address(0);
}
void * bar(void) {
char a[100000];
return foo(a);
}
typedef void (*baz)(void);
int main() {
void * a = bar();
((baz)a)();
}
Build that on gcc with "-O2 -fomit-frame-pointer -fno-inline"
https://godbolt.org/g/GSErDA
Basically the flow in this program goes as follows
main calls bar.
bar allocates a bunch of space on the stack (thanks to the big array),
bar calls foo.
foo takes a copy of the return address (using a gcc extension). This address points into the middle of bar, between the "allocation" and the "cleanup".
foo returns the address to bar.
bar cleans up it's stack allocation.
bar returns the return address captured by foo to main.
main calls the return address, jumping into the middle of bar.
the stack cleanup code from bar runs, but bar doesn't currently have a stack frame (because we jumped into the middle of it). So the stack cleanup code underflows the stack.
We need -fno-inline to stop the optimiser inlining stuff and breaking our carefully laid-down strcture. We also need the compiler to free the space on the stack by calculation rather than by use of a frame pointer, -fomit-frame-pointer is the default on most gcc builds nowadays but it doesn't hurt to specify it explicitly.
I belive this tehcnique should work for gcc on pretty much any CPU architecture.
There is a way to underflow the stack, but it is very complicated. The only way that I can think of is define a pointer to the bottom element then decrement its address value. I.e. *(ptr)--. My parentheses may be off, but you want to decrement the value of the pointer, then dereference the pointer.
Generally the OS is just going to see the error and crash. I am not sure what you are testing. I hope this helps. C allows you to do bad things, but it tries to look after the programmer. Most ways to get around this protection is through manipulation of pointers.
Do you mean stack overflow? Putting more things into the stack than the stack can accomodate? If so, recursion is the easiest way to accomplish that.
void foo();
{foo();};
If you mean attempting to remove things from an empty stack, then please post your question to the stackunderflow web site, and let me know where you've found that! :-)
So there are older library functions in C which are not protected. strcpy is a good example of this. It copies one string to another until it reaches a null terminator. One funny thing to do is pass a program that uses this a string with the null terminator removed. It will run amuck until it reaches a null terminator somewhere. Or have a string copy to itself. So back to what I was saying before is C supports pointers to just about anything. You can make a pointer to an element in the stack at the last element. Then you can use the pointer iterator built into C to decrement the value of the address, change the address value to a location preceding the last element in the stack. Then pass that element to the pop. Now if you are doing this to the Operating system process stack that would get very dependent on the compiler and operating system implementation. In most cases a function pointer to the main and a decrement should work to underflow the stack. I have not tried this in C. I have only done this in Assembly Language, great care has to be taken in working like this. Most operating systems have gotten good at stopping this since it was for a long time an attack vector.
My question is if i have some function
void func1(){
char * s = "hello";
char * c;
int b;
c = (char *) malloc(15);
strcpy(c,s);
}
I think the s pointer is allocated on the stack but where is the data "hello" stored does that go in the data segment of the program? As for c and b they are unitialized and since 'c = some memory address' and it doesnt have one yet how does that work? and b also has no contents so it cant stored on the stack?
Then when we allocate memory for c on the heap with malloc c now has some memory address, how is this unitialized c variable given the address of the first byte for that string on the heap?
We need to consider what memory location a variable has and what its contents are. Keep this in mind.
For an int, the variable has a memory address and has a number as its contents.
For a char pointer, the variable has a memory address and its contents is a pointer to a string--the actual string data is at another memory location.
To understand this, we need to consider two things:(1) the memory layout of a program
(2) the memory layout of a function when it's been called
Program layout [typical]. Lower memory address to higher memory address:code segment -- where instructions go:
...
machine instructions for func1
...
data segment -- where initialized global variables and constants go:
...
int myglobal_inited = 23;
...
"hello"
...
bss segment -- for unitialized globals:
...
int myglobal_tbd;
...
heap segment -- where malloc data is stored (grows upward towards higher memory
addresses):
...
stack segment -- starts at top memory address and grows downward toward end
of heap
Now here's a stack frame for a function. It will be within the stack segment somewhere. Note, this is higher memory address to lower:function arguments [if any]:
arg2
arg1
arg0
function's return address [where it will go when it returns]
function's stack/local variables:
char *s
char *c
int b
char buf[20]
Note that I've added a "buf". If we changed func1 to return a string pointer (e.g. "char *func1(arg0,arg1,arg2)" and we added "strcpy(buf,c)" or "strcpy(buf,c)" buf would be usable by func1. func1 could return either c or s, but not buf.
That's because with "c" the data is stored in the data segment and persists after func1 returns. Likewise, s can be returned because the data is in the heap segment.
But, buf would not work (e.g. return buf) because the data is stored in func1's stack frame and that is popped off the stack when func1 returns [meaning it would appear as garbage to caller]. In other words, data in the stack frame of a given function is available to it and any function that it may call [and so on ...]. But, this stack frame is not available to a caller of that function. That is, the stack frame data only "persists" for the lifetime of the called function.
Here's the fully adjusted sample program:
int myglobal_initialized = 23;
int myglobal_tbd;
char *
func1(int arg0,int arg1,int arg2)
{
char *s = "hello";
char *c;
int b;
char buf[20];
char *ret;
c = malloc(15);
strcpy(c,s);
strcpy(buf,s);
// ret can be c, s, but _not_ buf
ret = ...;
return ret;
}
Let's divide this answer in two points of view of the same stuff, because the standards only complicate understanding of this topic, but they're standards anyway :).
Subject common to both parts
void func1() {
char *s = "hello";
char *c;
int b;
c = (char*)malloc(15);
strcpy(c, s);
}
Part I: From a standardese point of view
According to the standards, there's this useful concept known as automatic variable duration, in which a variable's space is reserved automatically upon entering a given scope (with unitialized values, a.k.a: garbage!), it may be set/accessed or not during such a scope, and such a space is freed for future use. Note: In C++, this also involves construction and destruction of objects.
So, in your example, you have three automatic variables:
char *s, which gets initialized to whatever the address of "hello" happens to be.
char *c, which holds garbage until it's initialized by a later assignment.
int b, which holds garbage all of its lifetime.
BTW, how storage works with functions is unspecified by the standards.
Part II: From a real-world point of view
On any decent computer architecture you will find a data structure known as the stack. The stack's purpose is to hold space that can be used and recycled by automatic variables, as well as some space for some stuff needed for recursion/function calling, and can serve as a place to hold temporary values (for optimization purposes) if the compiler decides to.
The stack works in a PUSH/POP fashion, that is, the stack grows downwards. Let my explain it a little better. Imagine an empty stack like this:
[Top of the Stack]
[Bottom of the Stack]
If you, for example, PUSH an int of value 5, you get:
[Top of the Stack]
5
[Bottom of the Stack]
Then, if you PUSH -2:
[Top of the Stack]
5
-2
[Bottom of the Stack]
And, if you POP, you retrieve -2, and the stack looks as before -2 was PUSHed.
The bottom of the stack is a barrier that can be moved uppon PUSHing and POPing. On most architectures, the bottom of the stack is recorded by a processor register known as the stack pointer. Think of it as a unsigned char*. You can decrease it, increase it, do pointer arithmetic on it, etcetera. Everything with the sole purpose to do black magic on the stack's contents.
Reserving (space for) automatic variables in the stack is done by decreasing it (remember, it grows downwards), and releasing them is done by increasing it. Basing us on this, the previous theoretical PUSH -2 is shorthand to something like this in pseudo-assembly:
SUB %SP, $4 # Subtract sizeof(int) from the stack pointer
MOV $-2, (%SP) # Copy the value `-2` to the address pointed by the stack pointer
POP whereToPop is merely the inverse
MOV (%SP), whereToPop # Get the value
ADD %SP, $4 # Free the space
Now, compiling func1() may yield the following pseudo-assembly (Note: you are not expected to understand this at its fullest):
.rodata # Read-only data goes here!
.STR0 = "hello" # The string literal goes here
.text # Code goes here!
func1:
SUB %SP, $12 # sizeof(char*) + sizeof(char*) + sizeof(int)
LEA .STR0, (%SP) # Copy the address (LEA, load effective address) of `.STR0` (the string literal) into the first 4-byte space in the stack (a.k.a `char *s`)
PUSH $15 # Pass argument to `malloc()` (note: arguments are pushed last to first)
CALL malloc
ADD %SP, 4 # The caller cleans up the stack/pops arguments
MOV %RV, 4(%SP) # Move the return value of `malloc()` (%RV) to the second 4-byte variable allocated (`4(%SP)`, a.k.a `char *c`)
PUSH (%SP) # Second argument to `strcpy()`
PUSH 4(%SP) # First argument to `strcpy()`
CALL strcpy
RET # Return with no value
I hope this has led some light on you!
I want to know when exactly the memory is cleared in stack which is allocated for local function calls. I have seen in some video tutorial when the function call is returned to main the memory which is allocated for local function is cleared. I have few questions on below program, please explain.
#include<stdio.h>
void print(){
printf("testing \n");
}
int* sum(int* a, int* b){
int c = *a + *b;
return &c;
}
int main(){
int a=3,b=2;
int *ptr = sum(&a,&b);
print();
printf("sum is: %d",*ptr);
return 0;
}
when I run the above program, it is printing garbage value which is expected. But if I comment "print()" function in main and then run the program it is printing correct value of sum.
Is this mean that even though the execution of local function is completed in stack, until there is another function call to the stack, the previous allocated memory is not cleared ?
If I remove the "printf" statement in "print()" and keep the "print()" call in main, then I could see the result of sum as normal. Why it didn't overwrite the memory in stack?
C has no stack, the word stack is not even mentioned in the standard (C89, C99 or C11). An implementation may use a stack to provide the behavioural aspects of the C abstract machine but it's the abstract machine itself which the standard specifies.
So, as to when the stack is cleared (assuming it even exists), that's something that's totally up to the implementation. What you are doing is basically undefined behaviour, accessing an object after its lifetime has ended, so the results can be anything the implementation chooses.
As to why you can access the items after their lifetime has ended for a specific implementation, most likely it's because entering and exiting a function doesn't clear the stack, it simply adjusts the stack pointer (a lot more efficient than having to clear the memory as well).
So, unless something overwrites what's at that memory location (such as a subsequent call to printf), it'll probably remain at whatever it was last set to.
By way of example, here's a sample prolog code for a function:
push ebp ; Save the frame pointer.
mov ebp, esp ; Set frame pointer to current stack pointer.
sub esp, XX ; Allocate XX space for this frame.
and its equivalent epilog:
mov esp, ebp ; Restore stack pointer.
pop ebp ; Get previous frame pointer.
ret ; Return.
Note that neither the allocation of space (sub in the prolog) nor the deallocation of it (mov in the epilog) actually clears the memory it's using.
However, as stated, it's not something you should rely on.
The answer to your question is operating system specific. In a system that creates process from scratch (VMS/NT) the stack gets cleared only when the process is created. The stack is created from demand-zero pages. When a stack page is accessed for the first time, the operating system creates a new zero pages.
In forking systems the stack gets cleared out whenever a new executable is loaded. Usually the process is the same as above.
After the stack is created, whatever is put there stays there until overwritten.
The stack is managed by the operating system; not the programming languages.
The stack frame of a caller function can be easily obtained via __builtin_frame_address(1), but what about the stack frame size?
Is there a function that will let me know how big is the stack frame of the caller function?
My first reaction would have been, why would anybody want this? It should be considered bad practice for a C function to dynamically determine the size of the stack frame. The whole point of cdecl (the classic C calling convention) is that the function itself (the 'callee') has no knowledge of the size of the stack frame. Any diversion from that philosophy may cause your code to break when switching over to a different platform, a different address size (e.g. from 32-bit to 64-bit), a different compiler or even different compiler settings (in particular optimizations).
On the other hand, since gcc already offers this function __builtin_frame_address, it will be interesting to see how much information can be derived from there.
From the documentation:
The frame address is normally the address of the first word pushed on to the stack by the function.
On x86, a function typically starts with:
push ebp ; bp for 16-bit, ebp for 32-bit, rbp for 64-bit
In other words, __builtin_frame_address returns the base pointer of the caller's stack frame.
Unfortunately, the base pointer says little or nothing about where any stack frame starts or ends;
the base pointer points to a location that is somewhere in the middle of the stack frame (between the parameters and the local variables).
If you are only interested in the part of the stack frame that holds the local variables, then the function itself has all the knowledge. The size of that part is the difference between the stack pointer and the base pointer.
register char * const basepointer asm("ebp");
register char * const stackpointer asm("esp");
size_localvars = basepointer - stackpointer;
Please keep in mind that gcc seems to allocate space on the stack right from the beginning that is used to hold parameters for other functions called from inside the callee. Strictly speaking, that space belongs to the stack frames of those other functions, but the boundary is unclear. Whether this is a problem, depends on your purpose; what you are going to do with the calculated stack frame size?
As for the other part (the parameters), that depends. If your function has a fixed number of parameters, then you could simply measure the size of the (formal) parameters. It does not guarantee that the caller actually pushed the same amount of parameters on the stack, but assuming the caller compiled without warnings against callee's prototype, it should be OK.
void callee(int a, int b, int c, int d)
{
size_params = sizeof d + (char *)&d - (char *)&a;
}
You can combine the two techniques to get the full stackframe (including return address and saved base pointer):
register char * const stackpointer asm("esp");
void callee(int a, int b, int c, int d)
{
total_size = sizeof d + (char *)&d - stackpointer;
}
If however, your function has a variable number of parameter (an 'ellipsis', like printf has), then the size of the parameters is known only to the caller. Unless the callee has a way to derive the size and number of parameters (in case of a printf-style function, by analyzing the format string), you would have to let the caller pass that information on to the callee.
EDIT:
Please note, this only works to let a function measure his own stack frame. A callee cannot calculate his caller's stack frame size; callee will have to ask caller for that information.
However, callee can make an educated guess about the size of caller's local variables. This block starts where callee's parameters end (sizeof d + (char *)&d), and ends at caller's base pointer (__builtin_frame_address(1)). The start address may be slightly inaccurate due to address alignment imposed by the compiler; the calculated size may include a piece of unused stack space.
void callee(int a, int b, int c, int d)
{
size_localvars_of_caller = __builtin_frame_address(1) - sizeof d - (char *)&d;
}