Why volatile works for setjmp/longjmp - c

After invoking longjmp(), non-volatile-qualified local objects should not be accessed if their values could have changed since the invocation of setjmp(). Their value in this case is considered indeterminate, and accessing them is undefined behavior.
Now my question is why volatile works in this situation? Wouldn't change in that volatile variable still fail the longjmp? For example, how longjmp will work correctly in the example given below? When the code get backs to setjmp after longjmp, wouldn't the value of local_var be 2 instead of 1?
void some_function()
{
volatile int local_var = 1;
setjmp( buf );
local_var = 2;
longjmp( buf, 1 );
}

setjmp and longjmp clobber registers. If a variable is stored in a register, its value gets lost after a longjmp.
Conversely, if it's declared as volatile, then every time it gets written to, it gets stored back to memory, and every time it gets read from, it gets read back from memory every time. This hurts performance, because the compiler has to do more memory accesses instead of using a register, but it makes the variable's usage safe in the face of longjmping.

The crux is in the optimization in this scenario: The optimizer would naturally expect that a call to a function like setjmp() does not change any local variables, and optimize away read accesses to the variable. Example:
int foo;
foo = 5;
if ( setjmp(buf) != 2 ) {
if ( foo != 5 ) { optimize_me(); longjmp(buf, 2); }
foo = 6;
longjmp( buf, 1 );
return 1;
}
return 0;
An optimizer can optimize away the optimize_me line because foo has been written in line 2, does not need to be read in line 4 and can be assumed to be 5. Additionally, the assignment in line 5 can be removed because foo would be never read again if longjmp was a normal C functon. However, setjmp() and longjmp() disturb the code flow in a way the optimizer cannot account for, breaking this scheme. The correct result of this code would be a termination; with the line optimized away, we have an endless loop.

The most common reason for problems in the absence of a 'volatile' qualifier is that compilers will often place local variables into registers. These registers will almost certainly be used for other things between the setjmp and longjmp. The most practical way to ensure that the use of these registers for other purposes won't cause the variables to hold the wrong values after the longjmp is to cache the values of those registers in the jmp_buf. This works, but has the side effect that there is no way for the compiler to update the contents of the jmp_buf to reflect changes made to the variables after the registers are cached.
If that were the only problem, the result of accessing local variables not declared volatile would be indeterminate, but not Undefined Behavior. There's a problem even with memory variables, though, which thiton alludes to: even if a local variable happens to be allocated on the stack, a compiler would be free to overwrite that variable with something else any time it determines that its value is no longer needed. For example, a compiler could identify that some variables are never 'live' when a routine calls other routines, place those variables shallowest in its stack frame, and pop them before calling other routines. In such a scenario, even though the variables existed in memory when setjmp() is called, that memory might have been reused for something else like holding return address. As such, after the longjmp() is performed, the memory would be considered uninitialized.
Adding a 'volatile' qualifier to a variable's definition causes storage to be reserved exclusively for the use of that variable, for as long as it is within scope. No matter what happens between the setjmp and longjmp, provided control has not left the scope where the variable was declared, nothing is allowed to use that location for any other purpose.

Related

Pointer allocation and Memory Location [duplicate]

This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 12 months ago.
In an Microprocessor it is said that the local variables are stored in stack. In my case if func1() is called by main function then the local variable (int a = 12;)will be created in stack. Once the Called Function is executed the and return back to main function the stack memory will be deleted. So the pointer address still holds (*b) the value 12. At stack if this 'a = 12' is deleted then 'b' should be a dangling pointer no?? Can anyone explain this ? If you have detailed explanation on what happens in memory when this code is being executed it would be helpful.
#include <stdio.h>
int* func1(void);
int main()
{
int* b = func1();
printf("%d\n",*b);
}
int* func1(void)
{
int a = 12;
int* b = &a;
return b;
}
The pointer is dangling. The memory may still hold the previous value, but dereferencing the pointer invokes undefined behaviour.
GCC will give you a warning about this, if you pass -Wall option.
From the C standard (6.2.4):
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address,25) and retains its last-stored value
throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes
indeterminate when the object it points to reaches the end of its
lifetime.
There are multiple layers here.
First, is the C programming language. It is a language. You say stuff in it and it has meaning. There are sentences that have meaning, but you can also construct grammatically valid sentences that are gibberish.
The code you posted, grammatically valid, is gibberish. The object a inside func1 stops existing when the function returns. *b tries to access an object that does not exists anymore. It is not defined what should happen when you access an object after its lifetime ended. You can read about undefined behavior.
Memory exists. It's not like it is disintegrated when a function returns. It's not like RAM chips are falling out of your PC. They are still there.
So your compiler will produce some machine instructions. These machine instructions will execute. Depending solely on your compiler decisions (the code is undefined behavior, compiler can do what it wants) the actual code can behavior in any way the compiler decides to. But, most probably, *b will generate machine instructions that will read the memory region where object a resided. That memory region may still hold the value 12 or it may have been overwritten somewhere between func1 returning and calling printf, resulting in reading some other value.
At stack if this 'a = 12' is deleted then 'b' should be a dangling pointer no?
Yes.
what happens in memory when this code
This depends on the actual machine instructions generated by the compiler. Compile your code and inspect the assembly.
Strictly speaking, the behaviour is undefined, but repeat the printf or otherwise inspect (in your debugger for example) *b after the printf. It is likely that it will no longer be 12.
It is not accurate to say that the stack memory is "deleted". Rather the stack pointer is restored to the address it had before the call was invoked. The memory is otherwise untouched but becomes available for use by subsequent calls (or for passing arguments to such calls), so after calling printf it is likely to have been reused and modified.
So yes, the pointer is "dangling" since it points to memory that is no longer "valid" in the sense that it does not belong to the context in which the pointer exists.

C - Why variables created in a loop have the same memory address?

Just a simple example of my problem:
while(condition){
int number = 0;
printf("%p", &number);
}
That variable will always be in the same memory address. Why?
And what's the real difference between declaring it inside or outside the loop then?
Would I need to malloc the variable every iteration to get different addresses?
That variable will always be in the same memory address. Why?
It's not required to, but your code is so simple that it probably will be across all platforms. Specifically, because it's stored on the stack, it's always in the same place relative to your stack pointer. Keep in mind you're not allocating memory here (no new or malloc), you're just naming existing (stack-relative) memory.
And what's the real difference between declaring it inside or outside the loop then?
In this case, scope. The variable doesn't exist outside the braces where it lives in. Also outside of the braces, another variable can take its place if it fits in memory and the compiler chooses to do this.
Would I need to malloc the variable every iteration to get different addresses?
Yes, but I have yet to see a good use of malloc to allocate space for an int that a simple stack variable or a standard collection wouldn't do better.
That variable will always be in the same memory address. Why?
The compiler decides where the variable should be, given the operating system constraints, it's much more efficient to maintain the variable at the same address than having it relocated at every iteration, but this could, theoretically, happen.
You can't rely on it being in the same address every time.
And what's the real difference between declaring it inside or outside the loop then?
The difference is lifetime of the variable, if declared within the loop it will only exist inside the loop, you can't access it after the loop ends.
When execution of the block ends the lifetime of the object ends and it can no longer be accessed.
Would I need to malloc the variable every iteration to get different addresses?
malloc is an expensive operation, it does not make much sense to malloc the variable at every iteration, that said, again, the compiler decides where the memory for it is allocated, it may very well be at the same address or not.
Once again you can't rely on the variable location in the previous iteration to assert where it will be on the next one.
There is a difference in the the variables are stored, allocated variables will be on the heap, as opposed to the stack like in the previous case.
It is being put into the same memory address to save memory.
The only real difference between declaring it within and without the loop is that the variable will no longer be within scope outside the loop if it was declared within the loop.
You would have to use malloc to get a different address each time. Also, you would have to leave the frees until after all the mallocs to get this guarantee.
That variable will always be in the same memory address. Why?
The object that number designates has auto storage duration and only exists for the lifetime of the loop body, so logically speaking a new instance is created and destroyed on each loop iteration.
Practically speaking, it's easier to just re-use the same memory location for each loop iteration, which is what most (if not all) C compilers do. It's just not guaranteed to retain its last value from one iteration to the next (especially if you initialize it each iteration).
And what's the real difference between declaring it inside or outside the loop then?
The lifetime of the object (the period of program execution where storage is guaranteed to be reserved for it) changes from the body of the loop to the body of the function. The scope of the identifier (the region of program text where the identifier is visible) changes from the body of the loop to the body of the entire function.
Again, practically speaking, most compilers will allocate stack space for auto objects that are in blocks at function entry - for example, given the code
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
int bletch = 2 * bar;
...
}
}
most compilers will generate instructions to reserve stack space for both bar and bletch at function entry, rather than waiting until loop entry to reserve space for bletch. It's just easier to set the stack pointer once and get it over with. Storage is guaranteed to be reserved for bletch over the lifetime of the loop body, but there's nothing in the language definition that says you can't reserve it before then.
However, if you have a situation like this:
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
if ( bar % 2 == 0 ) // bar is even
{
int bletch = 2 * bar;
...
}
else
{
int blurga = 3 * bar + 1;
...
}
}
bletch and blurga cannot exist at the same time, so the compiler may only allocate space for one additional int object, and that same space will be used for either bletch or blurga depending on the value of bar.
There are compilers that, despite you declaring the variable in the inner loop, just allocate them at the entry to the function block.
Modern compilers tend to allocate all memory for local variables in a single shot at function entry, so that only represents a single stack pointer move, against several push pop instructions to get the same result.
Despite of that, there's another issue you have not considered. The variable in the inner loop is not visible outside the loop, and the memory used by it can be used for a different variable outside. You know that the memory address is always the same... but you don't know when you are out of scope if any of the other variables you use for a different thing are given the same address by the compiler (that's perfectly legal, as your variable is automatic, and so, it ceases to exist as soon as you get out of the block (the pair of curly brackets you put around the loop)

What does fflush() do in terms of dangling pointers?

I came across this page that illustrates common ways in which dangling pointes are created.
The code below is used to illustrate dangling pointers by returning address of a local variable:
// The pointer pointing to local variable becomes
// dangling when local variable is static.
#include<stdio.h>
int *fun()
{
// x is local variable and goes out of scope
// after an execution of fun() is over.
int x = 5;
return &x;
}
// Driver Code
int main()
{
int *p = fun();
fflush(stdout);
// p points to something which is not valid anymore
printf("%d", *p);
return 0;
}
On running this, this is the compiler warning I get (as expected):
In function 'fun':
12:2: warning: function returns address of local variable [-Wreturn-local-addr]
return &x;
^
And this is the output I get (good so far):
32743
However, when I comment out the fflush(stdout) line, this is the output I get (with the same compiler warning):
5
What is the reason for this behaviour? How exactly is the presence/absence of the fflush command causing this behaviour change?
Returning a pointer to an object on the stack is bad, as you've mentioned. The reason you only see a problem with your fflush() call in place is that the stack is unmodified if it's not there. That is, the 5 is still in place, so the pointer dereference still gives that 5 to you. If you call a function (almost any function, probably) in between fun and printf, it will almost certainly overwrite that stack location, making the later dereference return whatever junk that function happened to leave there.
This is because calling fflush(stdout) writes onto the stack where x was.
Let me explain. The stack in assembly language (which is what all programming languages eventually run as in one way or another) is commonly used to store local variables, return addresses, and function parameters. When a function is called, it pushes these things onto the stack:
the address of where to continue executing code once the function completes.
the parameters to the function, in an order determined by the calling convention used.
the local variables that the function uses.
These things are then popped off of the stack, one by one, simply by changing where the CPU thinks the top of the stack is. This means the data still exists, but it's not guaranteed to continue to exist.
Calling another function after fun() overwrites the previous values above the top of the stack, in this case with the value of stdout, and so the pointer's referenced value changes.
Without calling another function, the data stays there and is still valid when the pointer is dereferenced.

Volatile keyword in C [duplicate]

This question already has answers here:
Why is volatile needed in C?
(18 answers)
Closed 9 years ago.
I am writing program for ARM with Linux environment. its not a low level program, say app level
Can you clarify me what is the difference between,
int iData;
vs
volatile int iData;
Does it have hardware specific impact ?
Basically, volatile tells the compiler "the value here might be changed by something external to this program".
It's useful when you're (for instance) dealing with hardware registers, that often change "on their own", or when passing data to/from interrupts.
The point is that it tells the compiler that each access of the variable in the C code must generate a "real" access to the relevant address, it can't be buffered or held in a register since then you wouldn't "see" changes done by external parties.
For regular application-level code, volatile should never be needed unless (of course) you're interacting with something a lot lower-level.
The volatile keyword specifies that variable can be modified at any moment not by a program.
If we are talking about embedded, then it can be e.g. hardware state register. The value that it contains may be modified by the hardware at any unpredictable moment.
That is why, from the compiler point of view that means that compiler is forbidden to apply optimizations on this variable, as any kind of assumption is wrong and can cause unpredictable result on the program execution.
By making a variable volatile, every time you access the variable, you force the CPU to fetch it from memory rather than from a cache. This is helpful in multithreaded programs where many threads could reuse the value of a variable in a cache. To prevent such reuse ( in multithreaded program) volatile keyword is used. This ensures that any read or write to an volatile variable is stable (not cached)
Generally speaking, the volatile keyword is intended to prevent the compiler from applying any optimizations on the code that assume values of variables cannot change "on their own."
(from Wikipedia)
Now, what does this mean?
If you have a variable that could have its contents changed at any time, usually due to another thread acting on it while you are possibly referencing this variable in a main thread, then you may wish to mark it as volatile. This is because typically a variable's contents can be "depended on" with certainty for the scope and nature of the context in which the variable is used. But if code outside your scope or thread is also affecting the variable, your program needs to know to expect this and query the variable's true contents whenever necessary, more than the normal.
This is a simplification of what is going on, of course, but I doubt you will need to use volatile in most programming tasks.
In the following example, global_data is not explicitly modified. so when the optimization is done, compiler thinks that, it is not going to modified anyway. so it assigns global_data with 0. And uses 0, whereever global_data is used.
But actually global_data updated through some other process/method(say through ptrace ). by using volatile you can force it to read always from memory. so you can get updated result.
#include <stdio.h>
volatile int global_data = 0;
int main()
{
FILE *fp = NULL;
int data = 0;
printf("\n Address of global_data:%x \n", &global_data);
while(1)
{
if(global_data == 0)
{
continue;
}
else if(global_data == 2)
{
;
break;
}
}
return 0;
}
volatile keyword can be used,
when the object is a memory mapped io port.
An 8 bit memory mapped io port at physical address 0x15 can be declared as
char const ptr = (char *) 0x15;
Suppose that we want to change the value at that port at periodic intervals.
*ptr = 0 ;
while(*ptr){
*ptr = 4;//Setting a value
*ptr = 0; // Clearing after setting
}
It may get optimized as
*ptr = 0 ;
while(0){
}
Volatile supress the compiler optimization and compiler assumes that tha value can
be changed at any time even if no explicit code modify it.
Volatile char *const ptr = (volatile char * )0x15;
Used when the object is a modified by ISR.
Sometimes ISR may change tha values used in the mainline codes
static int num;
void interrupt(void){
++num;
}
int main(){
int val;
val = num;
while(val != num)
val = num;
return val;
}
Here the compiler do some optimizations to the while statement.ie the compiler
produce the code it such a way that the value of num will always read form the cpu
registers instead of reading from the memory.The while statement will always be
false.But in actual scenario the valu of num may get changed in the ISR and it will
reflect in the memory.So if the variable is declared as volatile the compiler will know
that the value should always read from the memory
volatile means that variables value could be change any time by any external source. in GCC if we dont use volatile than it optimize the code which is sometimes gives unwanted behavior.
For example if we try to get real time from an external real time clock and if we don't use volatile there then what compiler do is it will always display the value which is stored in cpu register so it will not work the way we want. if we use volatile keyword there then every time it will read from the real time clock so it will serve our purpose....
But as u said you are not dealing with any low level hardware programming then i don't think you need to use volatile anywhere
thanks

Ampersand bug and lifetime in c

As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.

Resources