Examining local variables returned function - c

I have a coredump of a process that has crashed (hard to reproduce).
I have figured out that something goes wrong in a function that has just returned (it returned a NULL pointer rather than a non-NULL pointer).
It would be of great help for me to know the contents of the stack variables in that function. I think on most architectures, returning from a function just means changing the stack pointer. In other words, those values are still there (below the stack pointer then if we take x86 as an example).
Can anyone confirm my reasoning is correct and maybe provide an example how do this with gdb?
Does my reasoning also hold for MIPS ?

Local variables might have been stored on stack, but not necessarily. If there is only a small number of variables that fit into registers and code is optimized, then local variables were never saved on stack.
Depending on calling convention used, final values of local variables may still persist in registers.
Disassemble the function in question (you can use objdump -dS to do this, so you can easily correlate source). See how local variables were accessed . Were they stored in memory or registers? Were registers already restored to their value relevant for caller?
If original register value was not restored, you can just examine the register that was used to store local. If it was already restored, then it's probably lost.
If local values were stored to stack, then function prologue (first instructions) should tell you how stack and frame pointer were manipulated. Taking into account that call also saved to stack (PC saved) you can calculate the value of stack/frame pointer used in that function. Then use x to examine memory locations.
Depending on called function, you could also be able to examine its arguments (when called) and recalculate the value of local variables.

You may see local variable that hasn't be optimised using:
info locals
It may not work in a function that already return, though. If you can run that program again, try to put a breakpoint just before the function return.
Otherwise, you can manually investigate the stack using x/x and info register to know the stack pointer address.
You may then browse the stack using up and down.

Related

Does a C program push a local variable onto the stack by itself or does it have to call the OS to do that?

I mean does the program use the hardware directly to push a local variable onto a stack or does it has to call a OS library to do that?
BTW for linux, In the code below, when compiling, is '6' stored in any ELF section similiar to .data/.text section? And when 'func' is invoked, is CPU fetching '6' from any section and push it onto the stack?
func(){
int a = 6;
}
Stacks are unique to each process running on a CPU, and the OS does not manage them. A program is free to use its stack however it likes.
So the answer to your first question is: no, the program pushes to the stack without OS intervention. In fact, pushing and popping the stack is such a common operation that many CPU architectures have dedicated instructions to do just that. You would not want to have to make a call into the OS every time you wanted to use the stack, that would add a mind-boggling amount of overhead to execution.
As to the second part of your question: All local variables are stored on the stack, not in an ELF section or other global memory. In fact, the entire purpose of the stack is to store local variables. When a local variable, one that is on the stack, is used, its value is fetched from memory just like any other. The stack is not some special memory or anything, it's just a piece of normal RAM that a program happens to be using as the stack. Now, that being said, the memory used for the stack itself is assigned to the program by the OS. But the memory itself? Nothing special about it.
You should take a look at this answer for further reading:
How does the stack work in assembly language?

Are a functions local variables always stored in the same set of memory locations every time it executes?

I'm assuming no but not positive. Not sure If other variables can take up the same spot in the stack.
No. A function's local variables are not always at the same address.
Consider a recursive function. If the local variables were supposed to be in the same place, all their values would have to be copied in and out each time you went in and out of recursion.
The normal way of doing it is that each function call has a "block" on the stack. If you call the same function twice in a row the local variable addresses will probably be the same. If you call it recursively the second call will be in a different area of stack and so the local variable addresses will be different.
The compiler will generate code to assign memory addresses based on a "stack pointer" address + offset. So, the actual physical address for each local will vary on each invocation of the function. The offset may well be the same each time because the compiler code gen logic will be the same. The stack pointer address is likely to be different based on what else gets executed before the next invocation of the function.

How do called functions return to their caller, after being called?

I read that when a function call is made by a program, the called function must know how to return to its caller.
My question is: How does the called function know how to return to its caller? Is there a mechanism working behind the scenes through the compiler?
The compiler obeys a particular "calling convention", defined as part of the ABI you're targeting. That calling convention will include a way for the system to know what address to return to. The calling convention usually takes advantage of the hardware's support for procedure calls. On Intel, for example, the return address is pushed to the stack:
...the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer).
Returning from a function is done via the ret instruction:
... the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer.
To contrast, on ARM, the return address is put in the link register:
The BL and BLX instructions copy the address of the next instruction into lr (r14, the link register).
Returns are commonly done by executing movs pc, lr to copy the address from the link register back into the program counter register.
References:
Intel Software Developers Manual
ARM Information Center
The compiler knows how to call a function and which calling convention is used. For example in C the arguments for a function are pushed on the stack. The caller is repsonsible for clearing the stack, so the called function doesn't have to remove the arguments. Other calling conventions can include pushing the arguments on the stack and the called function has to clean it. In this case, the generated code is such, that the function corrects the stack before it can return. Ohter calling conventions may pass the arguments in registers, so in such a case the called function also doesn't have to take care.
The CPU has a mechanism to call a subroutine. This will store the current execution address on the stack and then transfer processing to the new address. When the function is done it executes a return statement, which will fetch the caller address and resume execution there.
If the return address is destroyed, because the stack is not properly cleaned uo, or the memory is overwritten, then you get undefined behaviour. Of course the exact implementation details vary depending on the platform which is used.
This is made possible by the stack(especially on Intel-like systems). Let's say that we have a method caller that includes, say, an int that it keeps locally.
When caller( calls target( that int must be saved. It is placed on the stack, along with the address from which the call is made. target( can perform its logic, create its own local variables, and call other methods. Its local variables will be placed onto the stack along with the call's address.
When target( ends, the stack is "unrolled". The top of the stack containing target('s local variables is removed.
When methods recurse too far, the stack may grow too large, and a "stack overflow" may occur.
It requires cooperation between the callee and the caller.
The caller agrees to give the address that the callee should return to to the callee (usually by pushing it on the stack, or by passing it in a register), and the callee agrees to return to that address when it's finished executing.

Function calls, the stack

So I don't know why but I learned that when you call a function and pass an argument to it, it deals with it on the stack(processor?).
Can someone please explain it?
then how does it change values of variables, blocks of memory and so on?
There is no guarantee that parameters are passed on the stack, it's architecture and compiler dependent.
As to how values and memory get changed -- when you call a function that must make changes that are seen by the caller, it's normal that what you provide is not the actual value, but rather the address of (pointer to) that value. As long as the function knows the proper memory location it can make these changes.
Stack is used in most cases to pass arguments to function. The reason for using it is that you are not bound to fixed memory places (for arguments) to have your function functional. If you had function that could take arguments from fixed memory you would probably only be able to run it if the memory was free and you would be able to run just one instance of it. Stack gives you the possibility to store your arguments to current context of your program at any time. On x86 processors there is register that points to end of the stack and other register that points to the begining. Those are actualy just addresses to main memory where you want your stack to reside.
There is PUSH instruction that moves the stack-end register to the next place and stores specified data (could be value from other register or at some address or direct value) to address pointed by stack-end resgister. The other instruction is POP and it works the same just the other way around. This way, if you stick to the plan and keep track of what you pushed to stack, you can have your functions work from any context.
There are some other less used options to pass arguments like via registers, which are used for example by bios interrupts. If you want to know more about this I suggest you read something on "Calling conventions".
Lets start with this suppose you have a function
int foo(int value) {
int a = 10;
return a;
}
So whenever a function call is made OS needs some memory space to allocate the local variables of the function int a in this case and arguments to the function passed int value in this case. This memory requirement is fulfilled by allocating memory on stack. A stack is nothing but a memory region allocated to each process and it actually behaves as a stack data structure(LIFO).
Now the question arises what all things are stored on stack when a function call is made. The first thing pushed on the stack are the arguments passed to the function in reverse order(if more then one).
2. Then the return address of the function which called this function (because once this function foo completes execution it should return back to the place in the code from where it was called)
3. Finally local variables of the function called are pushed on the stack.
Once the called function completes executing the code it returns back to the return address previously stored on the stack and thus we say function call completes or returns.
In this case the function has a return value which it passes back to the callee function.
The space is then free to use and can be overwritten in the subsequent function calls.
(Now if you connect the dotes you can realize why local variables(automatic variables) in a function have scope limited to the life of the function call (you asked a SO question related to scope which was closed) because once a call returns the memory space allocated for these locale variable is gone(it is still there but you cant access them once a function returns) so life of these automatic variable int a in this case limits till foo() returns to the callee function.
Side Note:: I have read many questions that you have posted in SO. I guess you are trying to learn C and basic working of the underlying hardware and OS in general and the confusion in between them is killing you.
I would suggest you some pointers apart from the answer to this question to read and understand which will give you lots of insight into the questions you are facing.
For C refer K&R it is the best book.
In the starting read little bit about OS concepts(Memory handling, Virtual Memory in particular)
Try imagine the working of a system in broad sense as in how different components are interacting.
Some good links for understanding memory related stuff and system internals http://duartes.org/gustavo/blog/best-of
and if you want to dive into stack space for a function call try this link http://www.binarypirates.in/2011/02/17/understanding-function-stack-in-c/
Hope this helps

In C if a variable is not assigned a value then why does it take garbage value?

Why do the variables take garbage values?
I guess the rationale for this is that your program will be faster.
If compiler automatically reset (ie: initialize to 0 or to NaN for float/doubles etc etc) your variables, it would take some time doing that (it'd have to write to memory).
In many cases initializing variables could be unneeded: maybe you will never access your variable, or will write on it the first time you access it.
Today this optimization is arguable: the overhead due to initializing variables is maybe not worth the problems caused by variables uninitialized by mistake, but when C has been defined things were different.
Unassigned variables has so-called indeterminate state that can be implemented in whatever way, usually by just keeping unchanged whatever data was in memory now occupied by the variable.
It just takes whatever is in memory at the address the variable is pointing to.
When you allocate a variable you are allocating some memory. if you dont overwrite it, memory will contain whatever "random" information was there before and that is called garbage value.
Why would it not? A better question might be "Can you explain how it comes about that a member variable in C# which is not initialised has a known default value?"
When variable is declared in C, it involves only assigning memory to variable and no implicit assignment. Thus when you get value from it, it has what is stored in memory cast to your variable datatype. That value we call as garbage value. It remains so, because C language implementations have memory management which does not handle this issue.
This happens with local variables and memory allocated from the heap with malloc(). Local variables are the more typical mishap. They are stored in the stack frame of the function. Which is created simply by adjusting the stack pointer by the amount of storage required for the local variables.
The values those variables will have upon entry of the function is essentially random, whatever happened to be stored in those memory locations from a previous function call that happened to use the same stack area.
It is a nasty source of hard to diagnose bugs. Not in the least because the values aren't really random. As long as the program has predictable call patterns, it is likely that the initial value repeats well. A compiler often has a debug feature that lets it inject code in the preamble of the function that initializes all local variables. A value that's likely to produce bizarre calculation results or a protected mode access violation.
Notable perhaps as well is that managed environments initialize local variables automatically. That isn't done to help the programmer fall into the pit of success, it's done because not initializing them is a security hazard. It lets code that runs in a sandbox access memory that was written by privileged code.

Resources