Is there any underlying reason behind this undefined behaviour?

Is there any underlying reason behind this undefined behaviour? - c

I just tried this as an experiment to see what values are held in d at each call.
I used gcc on an x86-64 machine.
Is there any reason why the old value of d persists after the function returns? From what I understand the call stack frame is popped off once the function returns each time, correct?
#include<stdio.h>
void fun(char ch){
char d;
printf("-- %c\n", d);
d = ch;
}
int main(){
fun('a');
fun('b');
fun('c');
return 0;
}
OUTPUT:
--
-- a
-- b

When you return from a function, the memory that was part of the stack frame of the function typically won't be zero'ed out explicitly, as that would just take up unnecessary cycles. So the memory won't be overwritten until it needs to be.
In your main function you call fun multiple times with no other statements in between. That means that nothing in main touches the memory that was used by a prior invocation of fun before calling it again. As a result, the stack frame of the second call happens to coincide with the stack frame of the first call. And because uninitialized local variables don't get a default value, d takes on whatever value happens to be in that memory location, namely what d contained when the last call to the function returned.
If you added a call to another function such as printf in between each call to fun, then that other function call would write its own stack frame on top of the stack frame of the last call to fun so you would see different results in that case.

Related

Implementing setjmp and longjmp in C without built in functions or assembly (getting incorrect return values)

I'm trying to test 2 of my functions that sort of mimic setjmp and longjmp for a homework - which is pretty difficult since we're not allowed to use built in functions or assembly asm() to implement the longjmp and setjmp functions. (Yes, that's really the assignment.)
Problem: I keep getting wrong return values. So, in short, when main() calls foo() and foo() calls bar(), and bar() calls longjump(), then bar() should not return to foo() but instead setjmp() should return to main with return value of 1 which should print "error" (see main() below).
Instead, my output comes out as:
start foo
start bar
segmentation fault
The segmentation fault, i tried fixing by initializing the pointer *p with malloc, but that didn't seem to do anything. Although, would the segmentation fault, be the reason why im not getting the correct return values?
code:
#include <stdio.h>
#include <stdlib.h>
int setjmp(int v);
int longjmp(int v);
int foo(void);
int bar(void);
int *add;
int main(void) {
int r;
r = setjmp(r);
if(r == 0) {
foo();
return(0);
} else {
printf("error\n");
return(2);
}
}
int _main(void) {
return(0);
}
int setjmp(int v)
{
add = &v;
return(0);
}
int longjmp(int v)
{
int *p;
p = &v;
*(p - 1) = *add;
return(1);
}
int foo(void) {
printf("start foo\n");
bar();
return(0);
}
int bar(void) {
int d;
printf("start bar\n");
longjmp(d);
return(0);
}

Implementing setjmp() and longjmp() requires access to the stack pointer. Unfortunately, the assignment you're working from has explicitly banned you from using every sensible method to do this (i.e, using assembly, or using compiler builtins to access the stack pointer).
What's worse is, they've mangled the definition of setjmp() and longjmp() in their sample code. The argument needs to be a type that resolves to an array (e.g, typedef int jmp_buf[1]), not an int…
Anyways. You need some way to reliably find the old stack pointer from a stack frame in C. Probably the best way of doing this will be to define an array on the stack, then look "behind" it…
void get_sp(void) {
int x[1];
sp = x[-1]; // or -2 or -3, etc…
The exact offset will depend on what compiler you're using, as well as possibly on what arguments your function takes and what other local variables the function has. You will need to experiment a bit to get this right. Run your application in the simulator, and/or look at the generated assembly, to make sure you're picking up the right value.
The same trick will probably work to set the stack pointer when "returning" from longjmp(). However, certain compiler optimizations may make this difficult, especially on architectures with a link register -- such as MIPS. Make sure compiler optimizations are disabled. If all else fails, you may need to call a dummy function in longjmp() to force the compiler to save the link register on the stack, rather than leaving it in a register (where it can't be overwritten).

You are going to need to deal with the link register, the stack pointer, and the frame pointer (you would normally also have to save and restore all of the save registers, but I don't think we need to in order to make this example work).
Take a look at the arg3caller function here. Upon entry, it stores the link register and the frame pointer on the stack, and sets the frame pointer to point to the new stack frame. It then calls args3, sets the return value, and, most importantly, copies the frame pointer back into the stack pointer. It then pops the link register and the original frame pointer from where the stack pointer is now located, and jumps to the link register. If you look at args3, it saves the frame pointer into the stack and then restores it from the stack.
So, arg3caller can be longjmp, but if you want it to return with a different stack pointer than it entered with, you are going to have to change the frame pointer, because the frame pointer gets copied into the stack pointer at then end. The frame pointer can be modified by having args3 (a dummy function called by longjmp) modify the copy of the frame pointer that it saved in the stack.
You will need to make setjmp also call a dummy function in order to get the link register and frame pointer stored on the stack in the same way. You can then copy the link register and frame pointer out of setjmp's stack frame into globals (normally, setjmp would copy stuff into the provided jmpbuf, but here, the arguments to setjmp and longjmp are useless, so you have to use globals), as well as the address of the frame. Then, longjmp must copy the saved link register and frame pointer back into the same address, and have the dummy leaf function change the saved frame pointer to that same address. Thus, the dummy leaf function will copy that address into the frame pointer and return to longjmp, which will copy it into the stack pointer. It will then restore the frame pointer and the link register from that stack frame (that you populated), thus returning with everything in the state that it was when setjmp originally returned (except the return value will be different).
Note that you can access these fields by using the negative indexing of a local array trick described by #duskwuff. You should initially compile with the -S flag so that you can see what asm gcc is generating so that you can see where the important registers are being saved in the stack (and how your code might perturb all of that).
Edit:
I don't have immediate access to a MIPS gcc, but I found this, and put it in MIPS gcc 5.4 mode. Playing around, I found that non-leaf functions store lr and fp immediately below where the argument would be placed on the stack (the argument is actually passed in a0, but gcc leaves room for it on the stack in case the callee needs to store it). By having setjmp call a leaf function, we can ensure that setjmp is a non-leaf so that its lr is saved on the stack. We can then save the address of the arg, and the lr and fp that are stored immediately below it (using negative indexing), and return 0. Then, in longjmp, we can call a leaf function to ensure the lr is saved on the stack, but also have the leaf change its stacked fp to the saved sp. Upon return to longjmp, the fp will be pointing at the original frame, which we can re-populate with the saved lr and fp. Returning from longjmp will copy the fp back into the sp and restore the lr and fp from our re-populated frame, making it appear that we are returning from setjmp. This time however, we return 1 so the caller can differentiate the true return from setjmp to the fake one engineered by longjmp.
Note that I have only eyeballed this code, and have not actually executed it!! Also, it must be compiled with optimization disabled (-O0). If you enable any kind of optimization, the compiler inlines the leaf functions and turns both setjmp and longjmp into empty functions. You should see what your compiler does with this to understand how the stack frames are constructed. Again, we're well and truly in the land of undefined behavior, and even changes in gcc version could upset everything. You should also single step the program (using gdb or spim) to make sure you understand what's going on.
struct jmpbuf {
int lr;
int fp;
int *sp;
};
static struct jmpbuf ctx;
static void setjmp_leaf(void) { }
int setjmp(int arg)
{
// call the leaf so that our lr is saved
setjmp_leaf();
// the address of our arg should be immediately
// above the lr and fp
ctx.sp = &arg;
// lr is immediately below arg
ctx.lr = (&arg)[-1];
// fp is below that
ctx.fp = (&arg)[-2];
return 0;
}
static void longjmp_leaf(int arg)
{
// overwrite the caller's frame pointer
(&arg)[-1] = (int)ctx.sp;
}
int longjmp(int arg)
{
// call the leaf so that our lr is saved
// but also to change our fp to the save sp
longjmp_leaf(arg);
// repopulate the new stack frame with the saved
// lr and fp. &arg is calculated relative to fp,
// which was modified by longjmp_leaf. &arg isn't
// where it used to be!
(&arg)[-1] = ctx.lr;
(&arg)[-2] = ctx.fp;
// this should restore the saved fp and lr
// from the new frame, so it looks like we're
// returning from setjmp
return 1;
}
Good luck!

What does fflush() do in terms of dangling pointers?

I came across this page that illustrates common ways in which dangling pointes are created.
The code below is used to illustrate dangling pointers by returning address of a local variable:
// The pointer pointing to local variable becomes
// dangling when local variable is static.
#include<stdio.h>
int *fun()
{
// x is local variable and goes out of scope
// after an execution of fun() is over.
int x = 5;
return &x;
}
// Driver Code
int main()
{
int *p = fun();
fflush(stdout);
// p points to something which is not valid anymore
printf("%d", *p);
return 0;
}
On running this, this is the compiler warning I get (as expected):
In function 'fun':
12:2: warning: function returns address of local variable [-Wreturn-local-addr]
return &x;
^
And this is the output I get (good so far):
32743
However, when I comment out the fflush(stdout) line, this is the output I get (with the same compiler warning):
5
What is the reason for this behaviour? How exactly is the presence/absence of the fflush command causing this behaviour change?

Returning a pointer to an object on the stack is bad, as you've mentioned. The reason you only see a problem with your fflush() call in place is that the stack is unmodified if it's not there. That is, the 5 is still in place, so the pointer dereference still gives that 5 to you. If you call a function (almost any function, probably) in between fun and printf, it will almost certainly overwrite that stack location, making the later dereference return whatever junk that function happened to leave there.

This is because calling fflush(stdout) writes onto the stack where x was.
Let me explain. The stack in assembly language (which is what all programming languages eventually run as in one way or another) is commonly used to store local variables, return addresses, and function parameters. When a function is called, it pushes these things onto the stack:
the address of where to continue executing code once the function completes.
the parameters to the function, in an order determined by the calling convention used.
the local variables that the function uses.
These things are then popped off of the stack, one by one, simply by changing where the CPU thinks the top of the stack is. This means the data still exists, but it's not guaranteed to continue to exist.
Calling another function after fun() overwrites the previous values above the top of the stack, in this case with the value of stdout, and so the pointer's referenced value changes.
Without calling another function, the data stays there and is still valid when the pointer is dereferenced.

(nil) value returned by int pointer in c

when we declare a pointer it points to some random location or address in memory unless we explicitly assign a particular value(address of any variable) to it.
Here is code:
int *p;
printf("int is %p\n",p);
float *j;
printf("float is %p\n",j);
double *dp;
printf("double is %p\n",dp);
char *ch ;
printf("char is %p\n",ch);
j=(float *)p;
printf("cast int to float %p\n",j);
output:
int is (nil)
float is 0x400460
double is 0x7fff9f0f1a20
char is (nil)
cast int to float (nil)
Rather than printing the random location it prints (nil)
what is (nil) here ?? I don't understand the behaviour of pointers here??

Uninitialized pointer variables don't point to random address. Where they point is undefined.
In practice, they have the value what's left in the stack, which you can't know for sure. In your example, those several pointers happens to have 0 values, so they happen to be null pointer, and they print as nil.
Don't rely on such behavior, ever.

If by the "gnu" tag you mean that you're using glibc, then the reason is that the printf implementation in glibc will print "(nil)" when encountering a NULL pointer.
In other words, several of your pointers happen to have the value NULL (0), because that's what happened to be on the stack at that particular location.

Elaborating on what #yu hao has said:
Whenever a subroutine is invoked, a stack frame is allocated to the subroutine. This frame exists until return statement is encountered.
A subroutine frequently needs memory space for storing the values of local variables, the variables that are known only within the active subroutine and do not retain values after it returns. For doing so , the compiler allocate space for this use by simply moving the top of the stack by enough to provide the space. This is very fast when compared to dynamic memory allocation, which uses the heap space. Note that each separate activation of a subroutine gets its own separate space in the stack for locals known as Stack Frames.
The main reason for doing this is to keep track of the point to which each active subroutine should return control when it finishes executing. An active subroutine is one that has been called but is yet to complete execution after which control should be handed back to the point of call. Such activations of subroutines may be nested to any level (recursive as a special case), hence the stack structure. If, for example, a subroutine DrawSquare calls a subroutine DrawLine from four different places, DrawLine must know where to return when its execution completes. To accomplish this, the address following the call instruction, the return address, is also pushed onto the call stack with each call.
Coming back to your question, during its execution , the function can make changes to its stack frame and When a function 'returns', its frame is 'popped' from stack.
But the contents of stack remains unchanged in this process. Only the stack pointer gets modified to point to previous frame. So when a new subroutine gets called, new frame is allocated on top of previous one,and if the subroutine has uninitialized variables, they will print value that is stored in the memory allocated to them . Which will depend on the state of stack at that point of time.

All of your pointers are nil/ undefiend or the are just random values from the stack!
See: http://ideone.com/wwiy8F

why do i get the same value of the address?

#include<stdio.h>
#include<conio.h>
void vaibhav()
{
int a;
printf("%u\n",&a);
}
int main()
{
vaibhav();
vaibhav();
vaibhav();
getch();
return 0;
}
Every time I get the same value for the address of variable a.
Is this compiler dependent? I am using dev c++ ide.

Try to call it from within different stack depths, and you'll get different addresses:
void func_which_calls_vaibhav()
{
vaibhav();
}
int main()
{
vaibhav();
func_which_calls_vaibhav();
return 0;
}
The address of a local variable in a function depends on the state of the stack (the value of the SP register) at the point in execution when the function is called.
In your example, the state of the stack is simply identical each time function vaibhav is called.

It is not necessary. You may or may not get the same value of the address. And use %p instead.
printf("%p\n", (void *)&a);

You should use %p format specifier to print address of a variable. %u & %d are used to display integer values. And the address on stack can be same or not every time you call function vaibhav().

"a" in the function vaibhav() is an automatic variable, which means it's auto-created in the stack once this fun is called, and auto-released(invalid to the process) once the fun is returned.
When funA(here is main) calls another funB(here is vaibhav), a stack frame of funB will be allocated for funB. When funB returns, stack frame for funB is released.
In this case, the stack for funB(vaibhav) is called 3 times sequentially, exactly one by one. In each call, the stack frame for funB is allocated and released. Then re-allocated and recycled for times.
The same memory block in stack memory is re-used for 3 times. In that block, the exact memory for "a" is also re-used for 3 times. Thus, the same memory and you get the same address of it.
This is definitely compiler dependent. It depends on the compiler implementation. But I believe almost every C compiler will produce the same result, though I bet there is no specific requirement in C standard to define the expected output for this case.

Explain the output

#include<stdio.h>
int * fun(int a1,int b)
{
int a[2];
a[0]=a1;
a[1]=b;
return a;
}
int main()
{
int *r=fun(3,5);
printf("%d\n",*r);
printf("%d\n",*r);
}
Output after running the code:
3
-1073855580
I understand that a[2] is local to fun() but why value is getting changed of same pointer?

The variable a is indeed local to fun. When you return from that function, the stack is popped. The memory itself remains unchanged (for the moment). When you dereference r the first time, the memory is what you'd expect it to be. And since the dereference happens before the call to printf, nothing bad happens. When printf executes, it modifies the stack and the value is wiped out. The second time through you're seeing whatever value happened to be put there by printf the first time through.
The order of events for a "normal" calling convention (I know, I know -- no such thing):
Dereference r (the first time through, this is what it should be)
Push value onto stack (notice this is making a copy of the value) (may wipe out a)
Push other parameters on to stack (order is usually right to left, IIRC) (may wipe out a)
Allocate room for return value on stack (may wipe out a)
Call printf
Push local printf variables onto stack (may wipe out a)
Do your thang
Return from function
If you change int a[2]; to static int a[2]; this will alleviate the problem.

Because r points to a location on the stack that is likely to be overwritten by a function call.
In this case, it's the first call to printf itself which is changing that location.
In detail, the return from fun has that particular location being preserved simply because nothing has overwritten it yet.
The *r is then evaluated (as 3) and passed to printf to be printed. The actual call to printf changes the contents of that location (since it uses the memory for its own stack frame), but the value has already been extracted at that point so it's safe.
On the subsequent call, *r has the different value, changed by the first call. That's why it's different in this case.
Of course, this is just the likely explanation. In reality, anything could be happening since what you've coded up there is undefined behaviour. Once you do that, all bets are off.

As you've mentioned, a[2] is local to fun(); meaning it is created on the stack right before the code within fun() starts executing. When fun exits the stack is popped, meaning it is unwound so that the stack pointer is pointing to where it was before fun started executing.
The compiler is now free to stick whatever it wants into those locations that were unwound. So, it is possible that the first location of a was skipped for a variety of reasons. Maybe it now represents an uninitialized variable. Maybe it was for memory alignment of another variable. Simple answer is, by returning a pointer to a local variable from a function, and then de-referencing that pointer, you're invoking undefined behavior and anything can happen, including demons flying out of your nose.

When you compile you code with the following command:
$ gcc -Wall yourProgram.c
It will yield a warning, which says.
In function ‘fun’:
warning: function returns address of local variable
When r is dereferenced in first printf statement, it's okay as the memory is preserved. However, the second printf statement overwrites the stack and so we get an undesired result.

Because printf is using the stack location and changes it after printing the first value.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Is there any underlying reason behind this undefined behaviour? - c

Related

Implementing setjmp and longjmp in C without built in functions or assembly (getting incorrect return values)

What does fflush() do in terms of dangling pointers?

(nil) value returned by int pointer in c

why do i get the same value of the address?

Explain the output

Categories

Resources