Rationale for global variables stored in "data section" instead of stack? - static

Global variables could be stored at the top of the stack analogous to a local variable that just never gets deleted except when program ends. But this requires incrementing the stack pointer for each global variable, even though they will never use the stack pointer (because the stack will never grow or shrink in that region, it is persistent scope throughout program runtime. ) Is the rationale for placing these variables instead in a different section (that has to be added separately in the CPU architecture, adding some complexity since it is another part) that they simply do not need the mechanisms built-in to the stack (stack pointer specifically)? Historically, was a separate "global variable section" added in early-on, or did computers use the top of the stack (or bottom, depending on design and what way it is incremented) for that?

The global data variables go in the data section almost by definition.  All the features you would want — initialized data like strings literals, other constants, arrays, etc.. (pre-initialized if you will, by the loader from the program file's data section, i.e. rather than by executing program code), being able to take the address of data within that initialized data (data initialized as pointer to other data), same for code being able to refer to data, labels in data that are constants, "common" data, etc..
Historically, was a separate "global variable section" added in early-on, or did computers use the top of the stack (or bottom, depending on design and what way it is incremented) for that?
Historically, the data sections came first, and computers did not have stacks.  Stacks were only added later as the acceptance for recursion became prominent in software, and hardware design followed software evolution.  (In much older systems, code & data went together in a combined program "core"
image.)
As mentioned above, these data sections support initialized data, a necessity for even the most early programming.  There has never been a reason for the stack to support these capabilities, given the (global) data section already does.
I assumed nested functions could access the above function.
If by nested functions you mean statically nested (e.g. enclosed/embedded; functions that are defined within the scope of another function), that can be done provided the runtime function call mechanism provides for a "static link" that refers to the enclosing scope.  Languages like Pascal have this.  (The static link is passed as a hidden parameter, much like this in OOP languages.)
In Pascal, this mechanism mixes access to dynamic call chain, with access to enclosing function's local variables.  It allows for a sort of poor man's object model, in which during lifetime of the function, it is almost as if an "object" is created from its local variables — on the stack — and functions nested with that scope can access the enclosing scope's local variables, similar to objects — but this is an object that disappears when the outer scope completes.  (The more general object mechanism of OOP languages doesn't mandate the object be removed when the outer scope leaves, and also allows those objects to be first class — referenced explicitly rather than only implicitly.)
If by nested function you are referring to the dynamic call chain alone, access to caller's variables would usually be supplied as reference parameters of some sort — this, in particular, since in this scenario, the callee doesn't know which caller, so letting the caller provide appropriate parameters that relate to the intent of that particular caller makes sense there.
Why do stack frames not tend to be able to read stack frames one scope up
They can access those memory locations, just isn't supported by a number of popular languages.  In order to make sense of how to access variables belonging to the caller, we have to provide extra runtime support.  As I mentioned, Pascal provides a model for doing this; these are broadly referred using the term "non-local" (see: https://en.wikipedia.org/wiki/Non-local_variable).

Related

Which memory locations to use for variable storage

Higher level languages such as javascript don't give the programmer a
choice as to where variables are stored. But C does. My question is:
are there any guidelines as to where to store variables, eg dependent
on size, usage, etc.
As far as I understand, there are three possible locations to store
data (excluding code segment used for actual code):
DATA segment
Stack
Heap
So transient small data items should be stored on the stack?
What about data items which must be shared between functions. These
items could be stored on the heap or in the data segment. How do you
decide which to choose?
You're looking through the wrong end of the telescope. You don't specify particular memory segments in which to store a variable (particularly since the very concept of a "memory segment" is highly platform-dependent).
In C code, you decide a variable's lifetime, visibility, and modifiability based on what makes sense for the code, and based on that the compiler will generate the machine code to store the object in the appropriate segment (if applicable)
For example, any variables declared at file scope (outside of any function) or with the keyword static will have static storage duration, meaning they are allocated at program startup and held until the program terminates; these objects may be allocated in a data segment or bss segment. Variables declared within a function or block without the static keyword have automatic storage duration, and are (typically) allocated on the stack.
String literals and other compile-time constant objects are often (but not always!) allocated in a readonly segment. Numeric literals like 3.14159 and character constants like 'A' are not objects, and do not (typically) have memory allocated for them; rather, those values are embedded directly in the machine code instructions.
The heap is reserved for dynamic storage, and variables as such are not stored there; instead, you use a library call like malloc to grab a chunk of the heap at runtime, and assign the resulting pointer value to a variable allocated as described above. The variable will live in either the stack or a data segment, while the memory it points to lives on the heap.
Ideally, functions should communicate solely through parameters, return values, and exceptions (where applicable); functions should not share data through an external variable (i.e., a global). Function parameters are usually allocated on the stack, although some platforms may pass parameters via registers.
You should prefer local/stack variables to global or heap variables when those variables are small, used often and in a relatively small/limited scope. That will give the compiler more opportunities to optimize the code using them as it'll know they aren't going to change between function calls unless you pass around pointers to them.
Also, the stack is usually relatively small and allocating large structures or arrays on it may lead to stack overflows, especially so in recursive code.
Another thing to consider is the use of global variables in multithreaded programs. You want to minimize chances of race conditions and one strategy for that is maiking functions thread-safe and re-enterant by not using any global resources in them directly (if malloc() is thread-safe, if errno is per-thread, etc you can use them, of course).
Btw, using local variables instead of global variables also improves code readability as the variables are located close to the place where they're used and you can quickly find out their type and where and how they're used.
Other than that, if your code is correct, there shouldn't be much practical difference between making variables local or global or in the heap (of course, malloc() can fail and you should remember about it:).
C only allows you to specify where data is stored indirectly... via the scope of the variable and/or allocation. i.e., a local variable to a function is typically a stack variable unless it is declared static in which case it will likely be DATA/BSS. Variables created dynamically via new/malloc will typically be heap.
However, there's no guarantee of any of that... only the implication of it.
That said, the one thing that is guaranteed to be a bad idea is to declare large local variables in functions... common source of strange errors and stack overflows. Very large arrays and structures are best suited to dynamic allocation and keep the pointers in local/global as required.

Can I say that in languages with Dynamic Type Binding all variable are allocated on a heap?

I am studying about the binding process and the classification of variable based on storage binding. So, I faced with four kind of variable:
Static variables: these are bound to memory before execution (i.e., during compilation) and remain bound throughout execution.
Stack-dynamic variables: these variables are statically bound to a type at compilation time, but they are not bound to a memory location until execution of the code reaches the declaration.
Explicit heap-dynamic variables: these variables are allocated and deallocated via explicit run-time, programmer-specified instructions. The heap, not the stack, is used to provide the required memory cells.
Implicit heap-dynamic variables: All the attributes for these variables, including memory cells, are bound when they are assigned a value.
My question is about the type 2 and 4. In programming languages whose the type biding is dynamic(Php, Ruby, Python, ...) all variable appears to be of type 4.
Is it true? All variables even the local variables are put on heap? Is this a implementation thing or there is not a possibility to implement a language with dynamic type binding whose local variables are put in stack and the others in heap?
No. There is no correlation between typing and allocation. The first is a language feature, the second (usually) a detail of a specific implementation that may depend on specific optimisations and other factors. Some variables will not be "allocated" at all. In more high-level languages, it is even wrong to assume that there is any connection between variables and allocation at all -- they just name certain values in the program text.
The only relation types have with all of this is that they enable more interesting optimisations, or at least make them much easier.

Why C variables stored in specific memory locations?

Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.

static objects vs. stack- & heap- based objects

I came across the following definition:
A static object is one that exists from the time it is constructed and created until the end of the program. Stack- and Heap- based objects are thus excluded. Static objects are destroyed when the program exits, i.e. their destructors are called when main finishes executing.
Why are stack- and heap- based objects excluded???
Here is what I know about stacks and heaps: The stack is the part of the system memory where all the variables are stored before run-time. The heap is the part of the system memory where all the variables are stored during run-time, e.g. dynamically allocated memory. This means that if I declare an integer variable i in my code and assign the value of say 123 to it, then that will be stored in my stack, because the compiler knows the value during the compile time (before run-time). But if I define a pointer variable and want to initialize it somewhere else, then that will be stored in my heap, since it is unknown to the compiler at the compile time.
There are several storage durations:
Static → whole program lifetime
Automatic (stack) → until the end of the current function
Dynamic (heap) → until it gets explicitly ended (via delete)
"A static object is one that exists from the time it is constructed and created until the end of the program. Stack- and Heap- based objects are thus excluded."
Why are stack- and heap- based objects excluded???
They are "excluded" because they do not exist from the time it is constructed and created until the end of the program.
None of this contradicts what you wrote / understand in your 2nd paragraph, though there may be nuances depending on the programming language that you are talking about.
What you've found is a poorly worded definition of static. Nothing more, nothing less.
In general, a static object is "created" by the compiler at compile time. Its behavior as to program exit is likely to be different across languages. For example, in C, there is no special handling at all (and AFAIK that's also true for Objective-C). Often these objects "live" in a read-only memory area that the compiler created and "attached" to the program. When the program is loaded into memory this read-only area is mapped into the program's memory which is a very fast operation. For example, all the static strings (as in printf("I'm a static string.");) in C are treated that way.
Then there's the stack, aka call stack, and a stack. A stack in general is just a data structure, aka LIFO (last-in-first-out). The call stack is indeed created by the OS and is normally limited in size. It stores all the information that are necessary for function call. That mean for each function call, its arguments and other info is "pushed" to the stack (put on top of the stack) and a little space for the function variables is reserved. Once the function returns, all this stuff is removed and only the return value is left (though even this is not always true, often the return value is passed in a CPU register).
You can store values to the stack, and languages like C++ even allow you to store objects on the stack. They "automatically" get cleaned once its enclosing function returns.
You can store also store a pointer to such an object living in the stack in another variable as well. But what you probably mean is that normally you create an object in the heap (e.g. via new in Java, C++, C#, etc. or alloc in Objective-C) and you get a pointer to that object in return.
Back to the start: static objects are known to the compiler at compile time, but everything that has to do with heap and stack is by definition only known at run time.

What's the difference between Pointers and Global Variables in C?

I'm reading The C Book to try and get a better foundation in C. While I think I'm generally getting the concept of pointers, one thing sticks out to me is that it seems like it's generalizing whatever it's pointing to into a global variable (e.g. the ability to use pointers to return values from void functions), which naturally carries with it all the attendant dangers, I assume.
Aside from the fact that a pointer references a specific variable or index in an array, what is the difference between a pointer and a global variable?
They're quite different beasts. To better explain, let me define both.
Pointers:
A variable holds some piece of data. A pointer is a type of data that refers to another piece of memory. Think of it as a sign that says "Over there ---->" pointing at an object of some sort. For example, strings in C are just a pointer to a character, and by convention, you know there's more characters following it until a \0 character. C uses pointers extensively, since there's no other mechanism for sharing common information between parts of the program, except for....
Global Variables:
In a program, you have variables in each function. These can be the parameters to the function, and ones defined inside. As well, you have what are known as global variables. These variables store information that all the functions in a file can access. This can be useful to pass things like a global state around, or configuration. For example, you might have one called debug that your code checks before printing some messages, or to store a global state object, like the score in a video game.
What I think is confusing you: Both can be used to share information between parts of code. Because function arguments are passed by value in C, a function can't modify the variables of what calls it. There are two ways to "fix" that problem. The first (and correct) way is to pass a pointer to the variable into the function. That way, the function knows where to modify the parent's variable.
Another approach is to just use a global variable. That way, instead of passing around pointers, they just edit the global variables directly.
So you can use both of them to accomplish the same thing, but how they work is quite seperate. In fact, a global variable can be a pointer.
A global variable is any variable that is accessible in any scope. A pointer is a variable that contains the address where something lives.
They aren't directly related to each other in any way.
A pointer variable can be in global or local scope and can also point to a variable that is in global, local, or no scope (as if it were coming off of the heap or addressing some DIO lines).
There's a huge difference. Aside from the "other" uses of pointers (which include dealing with strings and arrays, and building dynamic data structures like trees and linked lists), using a pointer to give another function access to a local variable is much more flexible and controlled than sharing a global variable between these two functions.
Firstly, it allows the called function to be provided access to different variables at different times. Think how much more laborious it would be to use scanf() if it always saved its results into the same global variables.
Secondly, passing a pointer to another function makes you much more aware of the fact that that function will be able to modify the object. If you use a global variable for the same purpose, it is easy to forget which functions modify the global and which do not.
Thirdly, global variables consume memory for the life of your program. Local variables are released when their containing function ends, and dynamically-allocated data is released when it is freed. So global variables can at times be a considerable waste of memory.
Using pointers leads to the danger of referring to variables that no longer exist, so care has to be taken. But this is most often a problem when there are complicated global or long-lived data structures which in itself is often a design weakness.
Globals just get in the way of good, modular program design and pointers often provide a better way to achieve the same things.
"Pointer" is a variable that tells you how to get to a value: it's the address of the value you care about. You dereference it (with *) to get to the value.
"Global" defines the scope of the variable: anywhere in the program can say the name and get the value.
You can have local pointers, or global non-pointers. The concepts are completely orthogonal.
The term pointer refers to a variable's type; it is a variable used to refer to another. The term global refers to a variables scope - i.e. its visibility from any part of a program. Therefore the question is somewhat nonsensical since they refer to different kinds of variable attribute; a pointer variable may in fact have global scope, and so have both attributes simultaneously.
While a pointer may indeed refer to an object that is not directly in scope (which is what I think you are referring to), it still allows restricted control of scope, because the pointer itself has scope (unless of course it is a global pointer!).
Moreover a global variable always has static storage class. Whereas a pointer may refer to a static, dynamic, or automatic variable, and because it is a variable, the pointer itself may be static, or auto, or in the case of a dynamically allocated array of pointers - dynamic also.
I think perhaps that you are considering only a very specific use of pointers when in fact they have far greater utility and can be used in many ways. For example, you would almost invariably use pointers to implement the links in a linked list data structure; a global variable will not help you do that.
Clifford
Completely different concepts. You can have pointers to both global and local variables. There's nothing associating the two.
Also, from a function, you can certainly return a pointer to a variable scoped within that function. But that's a bad idea since the variable existed on the function's stack and now that's gone.

Resources