From Programming Language Pragmatics, by Scott
Object lifetimes generally correspond to one of three principal
storage allocation mechanisms, used to manage the object’s space:
Static objects are given an absolute address that is retained throughout the program’s execution.
Stack objects are allocated and deallocated in last-in, first-out order, usually in conjunction with subroutine calls and returns.
Heap objects may be allocated and deallocated at arbitrary times. They require a more general (and expensive) storage management
algorithm.
For example, in C, static objects must be initialized with constant expressions (expressions which can be evaluated at compile time).
I am not sure whether it is the case in other languages and even what other languages also have static objects.
In general, must static objects be initialized? When initialized, must they be initialized with expressions which can be evaluated at compile time?
By initialization, I mean either explicit or implicit (i.e. automatically done by language implementation), as opposed to uninitailziation.
So to rephrase my question: generally, can static objects be left uninitialized by either programs or compilers?
Thanks.
A static variable will be initialized to "zero" automatically, unless you explicitly initialize it.
Other than that and the life-time or linkage part, it's no different than any other variable, which means you can initialize it the same way you initialize any other variable.
Related
I noticed something that I'd like an answer to regarding pointers and variables, it's not really a problem but I'd like to understand the details. To put it simply, to bypass using global variables, I've been starting to have functions return pointers to the data inside the function. And sometimes, that data needs to be static, other times it doesn't.
For example, let's say I have a function that creates a 2d char array, populates it with 'a' and returns a pointer pointing to that array. When the caller tries to use that pointer to access and modify the memory where the 2d array was, random data is there instead of the 2d array's contents. I found out that declaring the 2d array as static fixes this, because the array's contents are saved outside of its scope.
Now on the other hand, let's say I have a function that declares a struct, and initializes it with values. The function then returns a pointer to that struct variable. The caller should have nonsense data when trying to access the struct's values through the pointer but interestingly, it doesn't. I would think that like in the case of the 2d array, after the function call, since the struct is not static, the data at that memory should be freed. However, it's not like that, instead I can access all the struct's elements through the pointer despite it not being static.
Overall, after a function is finished, the caller can access and modify the data of variables created in that function as long as that data was static (in the 2d array case) but sometimes the data of variables is retained outside of scope despite not being labeled static (struct case). Why?
Your observations about data in a two-dimensional array of char or a struct appearing to persist or not after a function returns are the results of happenstance, not behavior defined by the C standard, and you may not rely on them. They were merely artifacts of how your C compiler behaved in particular circumstances. They may change in other circumstances and may not be relied on.
Whenever you provide an object to be used after a function returns (as by returning a pointer to the object), it must not have automatic storage duration. It may have static storage duration, allocated storage duration, or thread storage duration.
First, let us clarify some terminology. This is important for understand the concepts. The C standard does not use the term “global variables”. It generally does not use “variable” to describe object and does not use “global” for them at all. What you think of as a variable is, in the C standard, an identifier and an object. The identifier is the name of the object, and the object is a region of data storage that can represent values.
Whether the C standard guarantees an object can be used at a certain time depends on the object’s lifetime. Lifetime is when during program execution the object exists in the C model of computing. Lifetime is determined in part by its storage duration. The storage duration depends on how and where the object was defined or created.
Lifetime is also affected by the scope of an object’s identifier. Scope is where in the source code an identifier is visible (can be used). There are relationships between scope and lifetime, but they are distinctly different things.
There are four storage durations (and a special temporary storage duration, which I will not discuss here):
If an identifier is declared with _Thread_local, its object has thread storage duration. Its lifetime starts when the thread is created and ends when execution of the thread ends. So it can be used after a function returns as long as its thread is still executing.
Otherwise, if an identifier is declared with static or with extern or outside of any block or list of function parameters (effectively outside of any function), its object has static storage duration. It exists for the entire execution of the program, so it can be used any time during execution.
Otherwise, for any identifier for an object (rather than a type definition, function, or such), its object has automatic storage duration. It is associated with the statement block it is declared in. (A block is a sequence of statements inside braces, { ... }. This can be the main block that defines a function or a block nested within it.) Its lifetime ends when execution of the associated block ends. The C standard makes no guarantee about what happens when you attempt to use an automatic object after execution of its block ends.
Note that when a function calls a subroutine, execution of the function, including the blocks within it, is suspended temporarily, but it is not ended. (Execution ends when the function returns, or special routines like abort, exit, or longjmp are called.) This means that the object still exists while the subroutines are executing. This is true even though the source code in the subroutines has a different scope than the calling function.
Because of the above, statements that say you cannot use objects outside of their scope are false. Scope is not the determining factor in whether an object may be accessed. Lifetime is.
For the fourth storage duration:
Objects with allocated storage duration are created by malloc, calloc, realloc, and aligned_alloc and do not have names (identifiers). An allocated objects extends from when it is allocated to when it is deallocated. So, if a function allocates an object and returns a pointer to it, that pointer may be used to access the object until the object is deallocated.
In C/C++, why are globals and static variables initialized to default values?
Why not leave it with just garbage values? Are there any special
reasons for this?
Security: leaving memory alone would leak information from other processes or the kernel.
Efficiency: the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. The OS can even zero freelist pages when the system is otherwise idle, rather than when some client or user is waiting for the program to start.
Reproducibility: leaving the values alone would make program behavior non-repeatable, making bugs really hard to find.
Elegance: it's cleaner if programs can start from 0 without having to clutter the code with default initializers.
One might then wonder why the auto storage class does start as garbage. The answer is two-fold:
It doesn't, in a sense. The very first stack frame page at each level (i.e., every new page added to the stack) does receive zero values. The "garbage", or "uninitialized" values that subsequent function instances at the same stack level see are really the previous values left by other method instances of your own program and its library.
There might be a quadratic (or whatever) runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large array, say, on any given call, and it could be invoked thousands or millions of times. The initialization of statics and globals, OTOH, only needs to happen once.
Because with the proper cooperation of the OS, 0 initializing statics and globals can be implemented with no runtime overhead.
Section 6.7.8 Initialization of C99 standard (n1256) answers this question:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
Think about it, in the static realm you can't tell always for sure something is indeed initialized, or that main has started. There's also a static init and a dynamic init phase, the static one first right after the dynamic one where order matters.
If you didn't have zeroing out of statics then you would be completely unable to tell in this phase for sure if anything was initialized AT ALL and in short the C++ world would fly apart and basic things like singletons (or any sort of dynamic static init) would simple cease to work.
The answer with the bulletpoints is enthusiastic but a bit silly. Those could all apply to nonstatic allocation but that isn't done (well, sometimes but not usually).
In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard). Hence, the bss section typically includes all uninitialized variables declared at file scope (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword.
Source: Wikipedia
Higher level languages such as javascript don't give the programmer a
choice as to where variables are stored. But C does. My question is:
are there any guidelines as to where to store variables, eg dependent
on size, usage, etc.
As far as I understand, there are three possible locations to store
data (excluding code segment used for actual code):
DATA segment
Stack
Heap
So transient small data items should be stored on the stack?
What about data items which must be shared between functions. These
items could be stored on the heap or in the data segment. How do you
decide which to choose?
You're looking through the wrong end of the telescope. You don't specify particular memory segments in which to store a variable (particularly since the very concept of a "memory segment" is highly platform-dependent).
In C code, you decide a variable's lifetime, visibility, and modifiability based on what makes sense for the code, and based on that the compiler will generate the machine code to store the object in the appropriate segment (if applicable)
For example, any variables declared at file scope (outside of any function) or with the keyword static will have static storage duration, meaning they are allocated at program startup and held until the program terminates; these objects may be allocated in a data segment or bss segment. Variables declared within a function or block without the static keyword have automatic storage duration, and are (typically) allocated on the stack.
String literals and other compile-time constant objects are often (but not always!) allocated in a readonly segment. Numeric literals like 3.14159 and character constants like 'A' are not objects, and do not (typically) have memory allocated for them; rather, those values are embedded directly in the machine code instructions.
The heap is reserved for dynamic storage, and variables as such are not stored there; instead, you use a library call like malloc to grab a chunk of the heap at runtime, and assign the resulting pointer value to a variable allocated as described above. The variable will live in either the stack or a data segment, while the memory it points to lives on the heap.
Ideally, functions should communicate solely through parameters, return values, and exceptions (where applicable); functions should not share data through an external variable (i.e., a global). Function parameters are usually allocated on the stack, although some platforms may pass parameters via registers.
You should prefer local/stack variables to global or heap variables when those variables are small, used often and in a relatively small/limited scope. That will give the compiler more opportunities to optimize the code using them as it'll know they aren't going to change between function calls unless you pass around pointers to them.
Also, the stack is usually relatively small and allocating large structures or arrays on it may lead to stack overflows, especially so in recursive code.
Another thing to consider is the use of global variables in multithreaded programs. You want to minimize chances of race conditions and one strategy for that is maiking functions thread-safe and re-enterant by not using any global resources in them directly (if malloc() is thread-safe, if errno is per-thread, etc you can use them, of course).
Btw, using local variables instead of global variables also improves code readability as the variables are located close to the place where they're used and you can quickly find out their type and where and how they're used.
Other than that, if your code is correct, there shouldn't be much practical difference between making variables local or global or in the heap (of course, malloc() can fail and you should remember about it:).
C only allows you to specify where data is stored indirectly... via the scope of the variable and/or allocation. i.e., a local variable to a function is typically a stack variable unless it is declared static in which case it will likely be DATA/BSS. Variables created dynamically via new/malloc will typically be heap.
However, there's no guarantee of any of that... only the implication of it.
That said, the one thing that is guaranteed to be a bad idea is to declare large local variables in functions... common source of strange errors and stack overflows. Very large arrays and structures are best suited to dynamic allocation and keep the pointers in local/global as required.
Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.
Why do local variables use Stack in C/C++?
Technically, C does not use a stack. If you look at the C99 standard, you'll find no reference to the stack. It's probably the same for the C++ standard, although I haven't checked it.
Stacks are just implementation details used by most compilers to implement the C automatic storage semantics.
The question you're actually asking is, "why do C and C++ compilers use the hardware stack to store variables with auto extent?"
As others have mentioned, neither the C nor C++ language definitions explicitly say that variables must be stored on a stack. They simply define the behavior of variables with different storage durations:
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
3 An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
4 An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
5 For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
C language standard, draft n1256.
No doubt that paragraph 5 was written with hardware stacks in mind, but there are oddball architectures out there that don't use a hardware stack, at least not in the same way as something like x86. The hardware stack simply makes the behavior specified in paragraph 5 easy to implement.
Local data storage – A subroutine frequently needs memory space for storing the values of local variables, the variables that are known only within the active subroutine and do not retain values after it returns. It is often convenient to allocate space for this use by simply moving the top of the stack by enough to provide the space. This is very fast compared to heap allocation. Note that each separate activation of a subroutine gets its own separate space in the stack for locals.
Stack allocation is much faster since all it really does is move the stackpointer. Using memory pools you can get comparable performance out of heap allocation but that comes with a slight added complexity and its own headaches.
In Heaps there is another layer of indirection since you will have to go from
stack -> heap before you get the correct object. Also the stack is local for
each thread and is inherintly thread safe, where as the heap is free-for-all
memory
It depends on the implementation where variables are stored.
Some computers might not even have a "stack" :D
Other than that, it is usual to do some house keeping when calling functions for keeping track of the return address and maybe a few other things. Instead of creating another house keeping method for local variables, many compiler implementations choose to use the already existing method, which implements the stack, with only minimal changes.
Local variables are local to frames in the call stack.
Using a stack allows recursion.
Because stack is part of the memory that will be automatically discarged when the scope ends. This is the reason for calling sometimes local variables as "automatic". Local variable in a call are "insulated" from recursive or multithreaded calls to the same function.
Local variables are limited to the scope in which they can be accessed.
Using a stack enables jump of control from one scope to other and on returning, to continue with the local variables present initially.
When there is jump, the local variables are pushed and the jump is executed. On returning back to the scope, the local variables are popped out.