What is an automatic variable in this setjmp/longjmp context? - c

Advanced Programming in the UNIX Environment by W. Richard Stevens states:
"What are the the states of the automatic variables and register
variables in the main function?"
with regard to what happens when you longjmp back to main (or another function) from somewhere lower down the stack.
It goes on to say:
"It depends. Most implementations do not try to roll back these
automatic variables and register variables, but all that the standards
say is that their values are indeterminate. If you have an automatic
variable that you don't want rolled back, define it with the
volatile attribute. Variables that are declared global or static
are left alone when longjmp is executed.
It seems like he's saying that normal stack-variables will not have their values set back to what they were at the time of the setjmp - but then the rest of the function couldn't rely on its stack variables after the longjmp back to it which seems crazy, so I'm guessing I'm wrong.
Can someone define "automatic variables" for me and explain what specifically isn't set back to its original value and why that is?

All it's saying is that if
you have an automatic (function-local non-static) variable that's not declared volatile; and
you change the value of the variable between setjmp and longjmp
then after the longjmp the value of that variable becomes indeterminate.
I believe this has to do with the possibility of such variables residing in CPU registers rather than in RAM, and the associated difficulty of preserving the values of such variable across the longjmp.
Here is a quote from the gcc manual:
If you use longjmp, beware of automatic variables. ISO C says that
automatic variables that are not declared volatile have undefined
values after a longjmp. And this is all GCC promises to do, because
it is very difficult to restore register variables correctly, and one
of GCC's features is that it can put variables in registers without
your asking it to.
If the potential loss of variable values is a problem in your use case, declare the relevant variables as volatile.

"Automatic variables" is an old term for ordinary (not declared with register or static) local variables, which goes back to terminology used in the C standard and the original meaning of the auto keyword. See sections 6.2.4 and 6.7.1 of the standard
As for this:
but then the rest of the function couldn't rely on its stack variables after the longjmp back to it which seems crazy
The idea is that you're not supposed to modify them in the first place if you're going to longjmp because then you can't know what's going to happen.
The reason is that longjmp may restore state such as processor registers, which automatic variables may have been mapped to (there is no guarantee that they will be on "the stack" or in memory at all. And even if they do exist in memory, some operations may not [unless it is declared volatile] directly access the memory but may access a processor register which the value has already been loaded into)
Your question is kind of odd because it implies you would want them to be restored [i.e. your modifications in intervening functions to be erased] - in general this caveat is warning that they may be restored by accident when it's not expected. "Not restored" doesn't mean "unusable" [though the standard DOES declare them unusable because it might restore a cached register but not the memory so you'll get inconsistent results], it means "has the value a later function wrote to it (because you passed the address intending for it to be written to)".

Automatic variables are regular function-local variables - since they are allocated on the stack and you do not have to take care about their memory they are called automatic.
See http://en.wikipedia.org/wiki/Automatic_variable for a more in-depth descriptions.

I can't define an "automatic variable", but maybe that can help you to understand what happens during a setjump:
(some of) the CPU registers and saved into a file.
and during longjump:
the value of the CPU registers is reset to the saved value.
nothing else! (here is an example)
So during the longjump, you just come back higher in the stack, with all your variables saved in the memory untouched, and some (not all of them) of the registers, in particular the stack pointer and the instruction pointer reset to the value they had during the setjmp.

auto means anything that's local to the function and hasn't specifically been defined as static. It's probably worth noting that the standard specifies pretty much the behavior he states (§7.13.2/3):
All accessible objects have values, and all other components of the abstract machine
have state, as of the time the longjmp function was called, except that the values of
objects of automatic storage duration that are local to the function containing the
invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.

Related

How can there be so many register variables, with such a limited number of registers?

I was fooling around with C and I realized, by rights, if I declared a bunch of register variables, wouldn't the values be overwritten? From what I can tell from assembly, there aren't a ton of registers in the microprocessor, not enough to satisfy the demand I created. How does C keep all the values?
There's no requirement that all variables declared with register must be kept in CPU registers.
Here's what the C standard says:
A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast as
possible. The extent to which such suggestions are effective is
implementation-defined.
Reference: ISO C11 N1570 draft, 6.7.1 paragraph 6. Note that it doesn't even mention CPU registers.
A conforming compiler could simply ignore all register keywords (aside from imposing some restrictions on taking the address of register objects).
In practice, most compilers will simply place as many register variables in CPU registers as they can.
And in fact a modern optimizing compiler is likely to be better at register allocation than most programmers are -- especially since they can recompute register mappings every time you recompile your program after modifying it.
The common wisdom these days is that the register keyword doesn't provide much benefit.
Old compiler would allocate as many registers to register variables as they could (in some cases, this number was 0) and allocate the remaining variables on the stack.
Modern compilers generally ignore the register keyword. They employ sophisticated register allocators that automatically keep as many variables in registers as possible.
The only effect of register you can rely on is that you get a diagnostic message if you try to take the address of a register variable. Otherwise, register variables behave just like automatic variables do.
register is a hint that the compiler can keep the variable in registers. You can't force the compiler to use more registers than exist on the target architecture, for the obvious reason that it's impossible.
In C, the register keyword simply means that the variable's address can't be taken. This stops you from doing anything that would prevent the compiler keeping it in a register, but doesn't require it to be in a register.
From https://en.cppreference.com/w/c/language/storage_duration
The register specifier is only allowed for objects declared at block scope, including function parameter lists. It indicates automatic storage duration and no linkage (which is the default for these kinds of declarations), but additionally hints the optimizer to store the value of this variable in a CPU register if possible. Regardless of whether this optimization takes place or not, variables declared register cannot be used as arguments to the address-of operator, cannot use alignas (since C11), and register arrays are not convertible to pointers.
It hasn't really done anything for years: Optimizing compilers already keep vars in regs whenever possible. For vars that are global or have had their address taken, then maybe only for part of a function, storing the result back to memory if the variable can't be optimized away.
BTW, register was officially deprecated in C++, and C++17 actually removed it from the language. https://en.cppreference.com/w/cpp/language/storage_duration.
Related: GNU C has register int foo asm("eax"); (or whatever other register), but even that is only guaranteed to have an effect when used as an operand to an inline-asm statement when used for local variables. In current GCC versions, it does cause the compiler to use that register for the variable, unless it needs to spill / reload it to stack memory across function calls or whatever.
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
But in GNU C, you can use global register variables, where a register is dedicated to a global for the entire life of your program, hurting optimization of code that isn't using that variable. It's an interesting option but not one you should use.
C was designed to allow a compiler to generate assembly code for a function while it was being parsed, rather than having to read an entire function, examine it, and then produce code afterward. A compiler that has parsed a program as far as:
int test(void)
{
int x=0,y=0;
int *p = &y;
while(x < 10)
{
x++;
foo();
x++;
*p *= 3;
x++;
bar();
...
would have no way of knowing whether the value of x could be safely kept in a register across the call to foo and/or the operation on *p or whether it might be possible for foo to alter the value of x.
The purpose of the register keyword was effectively to tell the compiler that it would be safe keep the value of an object in a register across function calls or operations that write to pointers, even if it hasn't seen everything that code might do with the object. Such a meaning could be useful even today if passing the object's address to a nested function weren't a constraint violation, but a compiler was allowed to assume that in any context where a named-object lvalue was used, all operations would involve that named-object lvalue. If an object's address is never taken, no qualifier would be needed to invite such an assumption, but in cases where an object's address is taken but not persisted across conflicting operations involving the object, such a qualifier could give a compiler information it would otherwise not have.
Variables are normally stored on the stack. That is, a block of memory. The value of a variable is normally loaded into a register for manipulation and moved back to the stack (saved) if another variable is to be manipulated. Often the variable isn't even loaded into a register, it is manipulated on the stack.

Optimization for global and static variables

I read some topics over optimization and it is mentioned that global variables can not be stored in registers and hence if we need to optimize we use register variable to store the global data and modify that register variable. Is this applies to static variables too?
For auto storage, what if we store auto variables in register variables? Won't it faster the access from register instead of stack?
Both global variables and static variables exist in the data segment, which includes the data, BSS, and heap sections. If the static variable is initialized to 0 or not initialized to anything, it goes in the BSS section. If it is given a non-zero initialization value, then it is in the "data" section. See:
http://en.wikipedia.org/wiki/Data_segment
As for auto vs. register variables: register does not guarantee that the variable will be stored in a register, it is more providing a hint from the programmer. See:
http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/CONCEPT/storage_class.html
Yes, it is (much) faster to access a register than to access the stack memory, but this optimization nowadays is left up to the compiler (the problem of register allocation) as well as the CPU architecture (which has a great many optimizations too complex to explain here).
Unless you're programming for a really simple or old architecture and/or using a really outdated compiler, you probably should not worry about this kind of optimization.
global variables' values can be held in registers for so long as the compiler can prove there is no other access to the stored value. With values that can't be held in a register themselves, declaring a pointer with the restrict keyword declares that a value isn't being accessed via any other means for that pointer's lifetime; just don't give away any copies and the compiler will take care of the rest. For scalars declaring thistype localval=globalval; works at least as well if you're not changing the value or you've got good control over scope exits -- or even better.
You can only use the restrict declaration if the value really won't be accessed otherwise. Optimizers these days can for instance deduce from your declaring the object won't be accessed in one function that a code path that does access it in another won't be executed, and from that deduce the content of the expression used to take that code path, and so on. "If you lie to the compiler, it will have its revenge" is more true today than ever.

Why C variables stored in specific memory locations?

Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.

scope of variables

Why do local variables use Stack in C/C++?
Technically, C does not use a stack. If you look at the C99 standard, you'll find no reference to the stack. It's probably the same for the C++ standard, although I haven't checked it.
Stacks are just implementation details used by most compilers to implement the C automatic storage semantics.
The question you're actually asking is, "why do C and C++ compilers use the hardware stack to store variables with auto extent?"
As others have mentioned, neither the C nor C++ language definitions explicitly say that variables must be stored on a stack. They simply define the behavior of variables with different storage durations:
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
3 An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
4 An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
5 For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
C language standard, draft n1256.
No doubt that paragraph 5 was written with hardware stacks in mind, but there are oddball architectures out there that don't use a hardware stack, at least not in the same way as something like x86. The hardware stack simply makes the behavior specified in paragraph 5 easy to implement.
Local data storage – A subroutine frequently needs memory space for storing the values of local variables, the variables that are known only within the active subroutine and do not retain values after it returns. It is often convenient to allocate space for this use by simply moving the top of the stack by enough to provide the space. This is very fast compared to heap allocation. Note that each separate activation of a subroutine gets its own separate space in the stack for locals.
Stack allocation is much faster since all it really does is move the stackpointer. Using memory pools you can get comparable performance out of heap allocation but that comes with a slight added complexity and its own headaches.
In Heaps there is another layer of indirection since you will have to go from
stack -> heap before you get the correct object. Also the stack is local for
each thread and is inherintly thread safe, where as the heap is free-for-all
memory
It depends on the implementation where variables are stored.
Some computers might not even have a "stack" :D
Other than that, it is usual to do some house keeping when calling functions for keeping track of the return address and maybe a few other things. Instead of creating another house keeping method for local variables, many compiler implementations choose to use the already existing method, which implements the stack, with only minimal changes.
Local variables are local to frames in the call stack.
Using a stack allows recursion.
Because stack is part of the memory that will be automatically discarged when the scope ends. This is the reason for calling sometimes local variables as "automatic". Local variable in a call are "insulated" from recursive or multithreaded calls to the same function.
Local variables are limited to the scope in which they can be accessed.
Using a stack enables jump of control from one scope to other and on returning, to continue with the local variables present initially.
When there is jump, the local variables are pushed and the jump is executed. On returning back to the scope, the local variables are popped out.

In C if a variable is not assigned a value then why does it take garbage value?

Why do the variables take garbage values?
I guess the rationale for this is that your program will be faster.
If compiler automatically reset (ie: initialize to 0 or to NaN for float/doubles etc etc) your variables, it would take some time doing that (it'd have to write to memory).
In many cases initializing variables could be unneeded: maybe you will never access your variable, or will write on it the first time you access it.
Today this optimization is arguable: the overhead due to initializing variables is maybe not worth the problems caused by variables uninitialized by mistake, but when C has been defined things were different.
Unassigned variables has so-called indeterminate state that can be implemented in whatever way, usually by just keeping unchanged whatever data was in memory now occupied by the variable.
It just takes whatever is in memory at the address the variable is pointing to.
When you allocate a variable you are allocating some memory. if you dont overwrite it, memory will contain whatever "random" information was there before and that is called garbage value.
Why would it not? A better question might be "Can you explain how it comes about that a member variable in C# which is not initialised has a known default value?"
When variable is declared in C, it involves only assigning memory to variable and no implicit assignment. Thus when you get value from it, it has what is stored in memory cast to your variable datatype. That value we call as garbage value. It remains so, because C language implementations have memory management which does not handle this issue.
This happens with local variables and memory allocated from the heap with malloc(). Local variables are the more typical mishap. They are stored in the stack frame of the function. Which is created simply by adjusting the stack pointer by the amount of storage required for the local variables.
The values those variables will have upon entry of the function is essentially random, whatever happened to be stored in those memory locations from a previous function call that happened to use the same stack area.
It is a nasty source of hard to diagnose bugs. Not in the least because the values aren't really random. As long as the program has predictable call patterns, it is likely that the initial value repeats well. A compiler often has a debug feature that lets it inject code in the preamble of the function that initializes all local variables. A value that's likely to produce bizarre calculation results or a protected mode access violation.
Notable perhaps as well is that managed environments initialize local variables automatically. That isn't done to help the programmer fall into the pit of success, it's done because not initializing them is a security hazard. It lets code that runs in a sandbox access memory that was written by privileged code.

Resources