Here is a simple C recursive program.
void func(void)
{
func();
}
int main()
{
func();
return 0;
}
Does this program use the stack in every call of func()
If yes, What does it stores in stack?
its not a terribly big program, id suggest compiling it, and running it and checking if youll get a stack overflow, okay argh....
I just tested it:
with /Od it returned resulted in overflow
with /O2 it also resulted in overflow, returning 3221225725
Tested on compiler explorer MSVC V19. latest
--what does it store in the stack?
The instruction pointer / Program Counter.
How else would the program know where to return to?
--whether optimized or not, the only change was the size of the function,
it wont optimize the function away. Or, MSVC 19.latest wouldnt.
If you find a compiler that does, that would be great for those virtual function calls to non-nothingness, which i believe is one of the critiques against them.
C standard does not know anything about the stack. So generally speaking this question does not have an answer.
This function calls itself at the end of its execution. It is called tail recursion (very specific case) and most modern optimising compilers will optimise it to an infinitive loop (assuming optimisations enabled).
Most implementations use the stack. Recursive functions (except tail recursion ones) will create a new stack frame on every call.
Related
Does below function need any stack for execution?
int a;
void func(void)
{
a = 10;
}
As long as a C compiler can see the definition of func, it can1 implement func without using any stack space. For example, where it sees a call to func, it can implement that by emitting an instruction or two to move 10 into a. That would achieve the same result as calling func as a subroutine, so the C rules permit a C implementation to implement a call to func in that way, and it does not use any stack space.
Generally, if the compiler could not see the definition of func, as when compiling another source file that calls func but does not define it, the compiler would have to issue a call instruction or something similar, and that would, at the least, push the return address onto the stack.
Additionally, if the routine being called were more complicated, the compiler might choose not to implement it inline or might not be able to do so. (For example, if func contained calls to itself, it is generally not possible for the compiler to implement it with inline code in all situations; the compiler will need to implement it with actual subroutine call instructions, which do use stack space.)
Footnote
1 Whether any particular compiler will implement func without using stack space is another matter, dependent on the compiler, the switches used to compile, and other factors.
After reading the following question, I understand that there no such thing exist (at least not 'portable').
However I am starring at the following piece of code from mono code base, which return a pointer to the stack:
static void *
return_stack_ptr ()
{
gpointer i;
return &i;
}
I am surprised that the above code can even work on arch such as PowerPC, I would have assumed this would only work on x86 (and maybe only gcc).
Is this going to work on PowerPC ?
The purpose of the stack is supporting function calls and local variables. If your system has a stack, it's going to use it, and allocate the local variable there. So it's very reasonable to assume that the address of the local variable points somewhere in the stack. This is not specific to x86 or gcc - it's a fairly general idea.
However, using a pointer to a variable that doesn't exist (i.e. after it goes out of scope) is Undefined Behavior. So this function cannot be guaranteed to do anything meaningful. In fact, a "clever" compiler could detect that your program uses undefined behavior, and replace your code by a no-op (and call it a "performance optimization").
Alternatively, a "wise" compiler could recognize that your function returns a pointer to the stack, and inline it by using a hardware stack pointer instead.
Neither option is guaranteed - this code is not portable.
I've been fiddling around with LLVM and wrote a simple compiler. It uses the libc as its standard library. Naturally I have to declare the functions in my IR somehow.
I noticed that the following seems to work:
declare void #puts(i8*)
In C the function is defined like this:
int puts(const char *s);
so it should really be
declare i32 #puts(i8*)
This is a really simple case but I am sure that somewhere along the road I will make mistakes declaring these functions. For instance I was not aware that puts returned an int before I read the manpage.
How grave are these mistakes? Does it mess with the stack or does LLVM handle it somehow? What are the security implications of such mistakes?
Note: I was not able to produce any errors with the void declaration of puts.
The answer to this depends on the calling convention used by your C compiler's ABI. In the conventions used by most C compilers on x86 and x86-64, the return value is passed in a register. Mis-declaring an int-returning function as void will cause the value of the return register to be ignored (which it would be anyway if you're not using it). This doesn't cause any harm because the caller is responsible for saving the eax register anyway.
For example, the following code:
void callee(int, int, int);
void caller(void)
{
callee(1, 2, 3);
}
...will be compiled into exact same assembly if you declare callee to return int instead of void.
This applies to "small" return types, i.e. those that consist of an integer, a double-precision floating-point, or a 64-bit integer (which x86 returns in two integer registers). Large return types are handled differently - if you change the declaration of callee to something like:
struct { char x[100]; } callee(int, int, int);
...the calling code will change drastically, despite the passed-in types not having changed. The return structure will now be allocated on the caller's stack, and its address will be passed as a hidden first argument to the callee (this is on x86, things are slightly different on x86-64), which is expected to write the return value to that area.
In other words, as long as you understand the calling convention, and you are careful not to mis-declare functions that return large types by value (which AFAIK don't exist in the standard C and POSIX libraries), the erroneous declaration will work.
Small return values are usually placed in a return value register so ignoring those won't fatally crash. For larger values some ABIs require the caller to allocate stack space and pass it as an invisible first parameter to the function, in this case your program would probably quickly crash since you wouldn't be allocating or passing it. If you're using an abi that doesn't store previous-frame-pointers I.e. It must know how big it's own stack frame is and the abi allows callees to adjust the stack pointer, this would be fatal as well.
Basically it might work until it doesn't.
Richard
The answers so far are good but I would consider one big implication is if you are ignoring C function returns that, as part of their functionality, allocate memory or open/create files, etc. etc. and then return some kind of pointer.
Ignoring these will, of course, orphan the memory that will only free up when the program exits (if it makes it that far), leave files open, etc. etc.
Basically, if the function you are calling returns anything BUT register values or stack instance values the implications may be significant.
I am building one of the projects and I am looking at the generated list file.(target: x86-64) My code looks like:
int func_1(var1,var2){
asm_inline_(
)
func_2(var1,var2);
return_1;
}
void func_2(var_1,var_2){
asm __inline__(
)
func_3();
}
/**** Jump to kernel ---> System call stub in assembly. This func in .S file***/
void func_3(){
}
When I see the assembly code, I find "jmp" instruction is used instead of "call-return" pair when calling func_2 and func_3. I am sure it is one of the compiler optimization and I have not explored how to disable it. (GCC)
The moment I add some volatile variables to func_2 and func_3 and increment them then "jmp" gets replaced by "call-ret" pair.
I am bemused to see the behavior because those variables are useless and they don't serve any purpose.
Can someone please explain the behavior?
Thanks
If code jumps to the start of another function rather than calling it, when the jumped-to function returns, it will return back to the point where the outer function was called from, ignoring any more of the first function after that point. Assuming the behaviour is correct (the first function contributed nothing else to the execution after that point anyway), this is an optimisation because it reduces the number of instructions and stack manipulations by one level.
In the given example, the behaviour is correct; there's no local stack to pop and no value to return, so there is no code that needs to run after the call. (return_1, assuming it's not a macro for something, is a pure expression and therefore does nothing no matter its value.) So there's no reason to keep the stack frame around for the future when it has nothing more to contribute to events.
If you add volatile variables to the function bodies, you aren't just adding variables whose flow the compiler can analyse - you're adding slots that you've explicitly told the compiler could be accessed outside the normal control flow it can predict. The volatile qualifier warns the compiler that even though there's no obvious way for the variables to escape, something outside has a way to get their address and write to it at any time. So it can't reduce their lifetime, because it's been told that code outside the function might still try to write to that stack space; and obviously that means the stack frame needs to continue to exist for its entire declared lifespan.
When I diassembled my program, I saw that gcc was using jmp for the second pthread_wait_barrier call when compiled with -O3. Why is it so?
What advantage does it get by using jmp instead of call. What tricks the compiler is playing here? I guess its performing tail call optimization here.
By the way I'm using static linking here.
__attribute__ ((noinline)) void my_pthread_barrier_wait(
volatile int tid, pthread_barrier_t *pbar )
{
pthread_barrier_wait( pbar );
if ( tid == 0 )
{
if ( !rollbacked )
{
take_checkpoint_or_rollback( ++iter == 4 );
}
}
//getcontext( &context[tid] );
SETJMP( tid );
asm("addr2jmp:");
pthread_barrier_wait( pbar );
// My suspicion was right, gcc was performing tail call optimization,
// which was messing up with my SETJMP/LONGJMP implementation, so here I
// put a dummy function to avoid that.
dummy_var = dummy_func();
}
As you don't show an example, I can only guess: the called function has the same return type as the calling one, and this works like
return func2(...)
or has no return type at all (void).
In this case, "we" leave "our" return address on the stack, leaving it to "them" to use it to return to "our" caller.
Perhaps it was a tail-recursive call. GCC has some pass doing tail-recursive optimization.
But why should you bother? If the called function is an extern function, then it is public, and GCC should call it following the ABI conventions (which means that it follows the calling convention).
You should not care if the function was called by a jmp.
And it might also be a call to a dynamic library function (i.e. with the PLT for dynamic linking)
jmp has less overhead than call. jmp just jumps, call pushes some stuff on stack and jumps
I'm assuming that this is a tail call, meaning either the current function returns the result of the called function unmodified, or (for a function that returns void) returns immediately after the function call. In either case, it is not necessary to use call.
The call instruction performs two functions. First, it pushes the address of the instruction after the call onto the stack as a return address. Then it jumps to the destination of the call. ret pops the return address off of the stack and jumps to that location.
Since the calling function returns the result of the called function, there is no reason for operation to return to it after the called function returns. Therefore, whenever possible and if the optimization level permits it, GCC will destroy its stack frame before the function call, so that the top of the stack contains the return address for the function that called it, and then simply jump to the called function. The result is that, when the called function returns, it returns directly to the first function instead of the calling function.
You will never know, but one of the likely reasons is "cache" (among other reasons such as the already mentioned tail call optimization).
Inlining can make code faster and it can make code slower, because more code means less of it will be in the L1 cache at one time.
A JMP allows the compiler to reuse the same piece of code at little or no cost at all. Modern processors are deeply pipelined, and pipelines go over a JMP without problems (there is no possibility of a misprediction here!).
In the average case, it will cost as little as 1-2 cycles, in the best cases zero cycles, because the CPU would have to wait on a previous instruction to retire anyway. This obviously depends totally on the respective, individual code.
The compiler could in principle even do that with several functions that have common parts.