Why did C never implement "stack extension"?

Why did C never implement "stack extension"? - c

Why did C never implement "stack extension" to allow (dynamically-sized) stack variables of a callee function to be referenced from the caller?
This could work by extending the caller's stack frame to include the "dynamically-returned" variables from the callee's stack frame. (You could, but shouldn't, implement this with alloca from the caller - it may not survive optimisation.)
e.g. If I wanted to return the dynamically-size string "e", the implementation could be:
--+---+-----+
| a | b |
--+---+-----+
callee(d);
--+---+-----+---------+---+
| a | b | junk | d |
--+---+-----+---------+---+
char e[calculated_size];
--+---+-----+---------+---+---------+
| a | b | junk | d | e |
--+---+-----+---------+---+---------+
dynamic_return e;
--+---+-----+-------------+---------+
| a | b | waste | e |
--+---+-----+-------------+---------+
("Junk" contains the return address and other system-specific metadata which is invisible to the program.)
This would waste a little stack space, when used.
The up-side is a simplification of string processing, and any other functions which have to currently malloc ram, return pointers and hope that the caller remembers to free at the right time.
Obviously, there is no point in added such a feature to C at this stage of its life, I'm just interested in why this wasn't a good idea.

A new object may be returned through many layers of software. So the wasted space may be that from dozens or even hundreds of function calls.
Consider also a routine that performs some iterative task. In each iteration, it gets some newly allocated object from a subroutine, which it inserts into a linked list or other data structure. Such iterative tasks may repeat for hundreds, thousands, or millions of iterations. The stack will overflow with wasted space.

Some objections to your idea. Some have been mentioned already in comments. Some come from the top of my head.
C doesn't have stacks or stack frames. C simply defines scopes and their life times and it is left to implementations as to how to implement the standard. Stacks and stack frames are really just the most popular way to implement some C semantics.
C doesn't have strings. C doesn't really have arrays as such. Well, it does have arrays, but as soon as you mention an array in an expression (e.g. a return expression), the array decays to a pointer to its first element. Returning a "string" or an array on the stack would involve significant impact on well established areas of the language.
C does have structs. However, you can already return a struct. I can't tell you how its done, because it is an implementation detail.
A problem with your specific implementation is that the caller has to know how big the "waste" is. Don't forget that the waste will include the stack frame of the callee but also the waste from any functions the callee calls either directly or indirectly. The returning convention will have to include information on the size of the waste and a pointer to the return value.
Stacks, as a rule, are quite limited compared to heap memory, particularly in applications that use threading. At some point the caller will need to move the returned array down into its own stack frame. If the array was merely a pointer to storage in the heap, this would be much more efficient, but then you've got the existing model.

You have to realize, that the implementation of the stack is strongly dictated by the CPU and the OS kernel. The language does not have much say in this. Limitiations are, for instance:
The ret instruction of the X86 architecture expects the return address at the memory location stored in the stack pointer. Thus, there cannot be anything else on top (semantical top - usually this is the lowest address, as stacks tend to grow down). You could work around this, of course, but that would likely incur additional overheads which C programmers are not going to be willing to pay.
The stack pointer defines what part of the allocated stack memory is actually used. When control flow is changed asynchronously (hardware interrupt), the current CPU's registers are generally immediately stored to memory addresses below the stack pointer by the interrupt handler. This can happen at any time, even throughout most of the kernel code. Any data stored below the place where the stack pointer point to would be clobbered by this. (Well, technically, that's not fully correct, there is generally a "red zone" below the stack pointer to which the interrupt handlers may not write any data. But here we are getting very firmly into architectural design peculiarities.)
Destroying a stack frame is generally a single addition of a constant to the stack pointer. This is the fastest kind of instruction you can get, it will generally not require a single cycle to execute (it will execute in parallel to some memory access). If the stack frame has a dynamic size, the stack frame must be destroyed by loading the stack pointer from memory, and for that a base pointer must have been retained. That's a memory access with a significant latency, and another register that must be saved to be used. Again, this is overhead that's generally unnecessary.
Your proposal would definitely be implementable, but it would require some workarounds. And these workarounds would generally cost performance. Small bits of performance, but definitely measurable amounts. That's not what compiler/kernel developers want, and for good reason.

Related

When to really use heap memory [duplicate]

What are the stack and heap?
Where are they located physically in a computer's memory?
To what extent are they controlled by the OS or language run-time?
What is their scope?
What determines their sizes?
What makes one faster?

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.
The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.
Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).
To answer your questions directly:
To what extent are they controlled by the OS or language runtime?
The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
What is their scope?
The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.
What determines the size of each of them?
The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).
What makes one faster?
The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.
A clear demonstration:
Image source: vikashazrati.wordpress.com

Stack:
Stored in computer RAM just like the heap.
Variables created on the stack will go out of scope and are automatically deallocated.
Much faster to allocate in comparison to variables on the heap.
Implemented with an actual stack data structure.
Stores local data, return addresses, used for parameter passing.
Can have a stack overflow when too much of the stack is used (mostly from infinite or too deep recursion, very large allocations).
Data created on the stack can be used without pointers.
You would use the stack if you know exactly how much data you need to allocate before compile time and it is not too big.
Usually has a maximum size already determined when your program starts.
Heap:
Stored in computer RAM just like the stack.
In C++, variables on the heap must be destroyed manually and never fall out of scope. The data is freed with delete, delete[], or free.
Slower to allocate in comparison to variables on the stack.
Used on demand to allocate a block of data for use by the program.
Can have fragmentation when there are a lot of allocations and deallocations.
In C++ or C, data created on the heap will be pointed to by pointers and allocated with new or malloc respectively.
Can have allocation failures if too big of a buffer is requested to be allocated.
You would use the heap if you don't know exactly how much data you will need at run time or if you need to allocate a lot of data.
Responsible for memory leaks.
Example:
int foo()
{
char *pBuffer; //<--nothing allocated yet (excluding the pointer itself, which is allocated here on the stack).
bool b = true; // Allocated on the stack.
if(b)
{
//Create 500 bytes on the stack
char buffer[500];
//Create 500 bytes on the heap
pBuffer = new char[500];
}//<-- buffer is deallocated here, pBuffer is not
}//<--- oops there's a memory leak, I should have called delete[] pBuffer;

The most important point is that heap and stack are generic terms for ways in which memory can be allocated. They can be implemented in many different ways, and the terms apply to the basic concepts.
In a stack of items, items sit one on top of the other in the order they were placed there, and you can only remove the top one (without toppling the whole thing over).
The simplicity of a stack is that you do not need to maintain a table containing a record of each section of allocated memory; the only state information you need is a single pointer to the end of the stack. To allocate and de-allocate, you just increment and decrement that single pointer. Note: a stack can sometimes be implemented to start at the top of a section of memory and extend downwards rather than growing upwards.
In a heap, there is no particular order to the way items are placed. You can reach in and remove items in any order because there is no clear 'top' item.
Heap allocation requires maintaining a full record of what memory is allocated and what isn't, as well as some overhead maintenance to reduce fragmentation, find contiguous memory segments big enough to fit the requested size, and so on. Memory can be deallocated at any time leaving free space. Sometimes a memory allocator will perform maintenance tasks such as defragmenting memory by moving allocated memory around, or garbage collecting - identifying at runtime when memory is no longer in scope and deallocating it.
These images should do a fairly good job of describing the two ways of allocating and freeing memory in a stack and a heap. Yum!
To what extent are they controlled by the OS or language runtime?
As mentioned, heap and stack are general terms, and can be implemented in many ways. Computer programs typically have a stack called a call stack which stores information relevant to the current function such as a pointer to whichever function it was called from, and any local variables. Because functions call other functions and then return, the stack grows and shrinks to hold information from the functions further down the call stack. A program doesn't really have runtime control over it; it's determined by the programming language, OS and even the system architecture.
A heap is a general term used for any memory that is allocated dynamically and randomly; i.e. out of order. The memory is typically allocated by the OS, with the application calling API functions to do this allocation. There is a fair bit of overhead required in managing dynamically allocated memory, which is usually handled by the runtime code of the programming language or environment used.
What is their scope?
The call stack is such a low level concept that it doesn't relate to 'scope' in the sense of programming. If you disassemble some code you'll see relative pointer style references to portions of the stack, but as far as a higher level language is concerned, the language imposes its own rules of scope. One important aspect of a stack, however, is that once a function returns, anything local to that function is immediately freed from the stack. That works the way you'd expect it to work given how your programming languages work. In a heap, it's also difficult to define. The scope is whatever is exposed by the OS, but your programming language probably adds its rules about what a "scope" is in your application. The processor architecture and the OS use virtual addressing, which the processor translates to physical addresses and there are page faults, etc. They keep track of what pages belong to which applications. You never really need to worry about this, though, because you just use whatever method your programming language uses to allocate and free memory, and check for errors (if the allocation/freeing fails for any reason).
What determines the size of each of them?
Again, it depends on the language, compiler, operating system and architecture. A stack is usually pre-allocated, because by definition it must be contiguous memory. The language compiler or the OS determine its size. You don't store huge chunks of data on the stack, so it'll be big enough that it should never be fully used, except in cases of unwanted endless recursion (hence, "stack overflow") or other unusual programming decisions.
A heap is a general term for anything that can be dynamically allocated. Depending on which way you look at it, it is constantly changing size. In modern processors and operating systems the exact way it works is very abstracted anyway, so you don't normally need to worry much about how it works deep down, except that (in languages where it lets you) you mustn't use memory that you haven't allocated yet or memory that you have freed.
What makes one faster?
The stack is faster because all free memory is always contiguous. No list needs to be maintained of all the segments of free memory, just a single pointer to the current top of the stack. Compilers usually store this pointer in a special, fast register for this purpose. What's more, subsequent operations on a stack are usually concentrated within very nearby areas of memory, which at a very low level is good for optimization by the processor on-die caches.

(I have moved this answer from another question that was more or less a dupe of this one.)
The answer to your question is implementation specific and may vary across compilers and processor architectures. However, here is a simplified explanation.
Both the stack and the heap are memory areas allocated from the underlying operating system (often virtual memory that is mapped to physical memory on demand).
In a multi-threaded environment each thread will have its own completely independent stack but they will share the heap. Concurrent access has to be controlled on the heap and is not possible on the stack.
The heap
The heap contains a linked list of used and free blocks. New allocations on the heap (by new or malloc) are satisfied by creating a suitable block from one of the free blocks. This requires updating the list of blocks on the heap. This meta information about the blocks on the heap is also stored on the heap often in a small area just in front of every block.
As the heap grows new blocks are often allocated from lower addresses towards higher addresses. Thus you can think of the heap as a heap of memory blocks that grows in size as memory is allocated. If the heap is too small for an allocation the size can often be increased by acquiring more memory from the underlying operating system.
Allocating and deallocating many small blocks may leave the heap in a state where there are a lot of small free blocks interspersed between the used blocks. A request to allocate a large block may fail because none of the free blocks are large enough to satisfy the allocation request even though the combined size of the free blocks may be large enough. This is called heap fragmentation.
When a used block that is adjacent to a free block is deallocated the new free block may be merged with the adjacent free block to create a larger free block effectively reducing the fragmentation of the heap.
The stack
The stack often works in close tandem with a special register on the CPU named the stack pointer. Initially the stack pointer points to the top of the stack (the highest address on the stack).
The CPU has special instructions for pushing values onto the stack and popping them off the stack. Each push stores the value at the current location of the stack pointer and decreases the stack pointer. A pop retrieves the value pointed to by the stack pointer and then increases the stack pointer (don't be confused by the fact that adding a value to the stack decreases the stack pointer and removing a value increases it. Remember that the stack grows to the bottom). The values stored and retrieved are the values of the CPU registers.
If a function has parameters, these are pushed onto the stack before the call to the function. The code in the function is then able to navigate up the stack from the current stack pointer to locate these values.
When a function is called the CPU uses special instructions that push the current instruction pointer onto the stack, i.e. the address of the code executing on the stack. The CPU then jumps to the function by setting the instruction pointer to the address of the function called. Later, when the function returns, the old instruction pointer is popped off the stack and execution resumes at the code just after the call to the function.
When a function is entered, the stack pointer is decreased to allocate more space on the stack for local (automatic) variables. If the function has one local 32 bit variable four bytes are set aside on the stack. When the function returns, the stack pointer is moved back to free the allocated area.
Nesting function calls work like a charm. Each new call will allocate function parameters, the return address and space for local variables and these activation records can be stacked for nested calls and will unwind in the correct way when the functions return.
As the stack is a limited block of memory, you can cause a stack overflow by calling too many nested functions and/or allocating too much space for local variables. Often the memory area used for the stack is set up in such a way that writing below the bottom (the lowest address) of the stack will trigger a trap or exception in the CPU. This exceptional condition can then be caught by the runtime and converted into some kind of stack overflow exception.
Can a function be allocated on the heap instead of a stack?
No, activation records for functions (i.e. local or automatic variables) are allocated on the stack that is used not only to store these variables, but also to keep track of nested function calls.
How the heap is managed is really up to the runtime environment. C uses malloc and C++ uses new, but many other languages have garbage collection.
However, the stack is a more low-level feature closely tied to the processor architecture. Growing the heap when there is not enough space isn't too hard since it can be implemented in the library call that handles the heap. However, growing the stack is often impossible as the stack overflow only is discovered when it is too late; and shutting down the thread of execution is the only viable option.

In the following C# code
public void Method1()
{
int i = 4;
int y = 2;
class1 cls1 = new class1();
}
Here's how the memory is managed
Local Variables that only need to last as long as the function invocation go in the stack. The heap is used for variables whose lifetime we don't really know up front but we expect them to last a while. In most languages it's critical that we know at compile time how large a variable is if we want to store it on the stack.
Objects (which vary in size as we update them) go on the heap because we don't know at creation time how long they are going to last. In many languages the heap is garbage collected to find objects (such as the cls1 object) that no longer have any references.
In Java, most objects go directly into the heap. In languages like C / C++, structs and classes can often remain on the stack when you're not dealing with pointers.
More information can be found here:
The difference between stack and heap memory allocation « timmurphy.org
and here:
Creating Objects on the Stack and Heap
This article is the source of picture above: Six important .NET concepts: Stack, heap, value types, reference types, boxing, and unboxing - CodeProject
but be aware it may contain some inaccuracies.

Other answers just avoid explaining what static allocation means. So I will explain the three main forms of allocation and how they usually relate to the heap, stack, and data segment below. I also will show some examples in both C/C++ and Python to help people understand.
"Static" (AKA statically allocated) variables are not allocated on the stack. Do not assume so - many people do only because "static" sounds a lot like "stack". They actually exist in neither the stack nor the heap. They are part of what's called the data segment.
However, it is generally better to consider "scope" and "lifetime" rather than "stack" and "heap".
Scope refers to what parts of the code can access a variable. Generally we think of local scope (can only be accessed by the current function) versus global scope (can be accessed anywhere) although scope can get much more complex.
Lifetime refers to when a variable is allocated and deallocated during program execution. Usually we think of static allocation (variable will persist through the entire duration of the program, making it useful for storing the same information across several function calls) versus automatic allocation (variable only persists during a single call to a function, making it useful for storing information that is only used during your function and can be discarded once you are done) versus dynamic allocation (variables whose duration is defined at runtime, instead of compile time like static or automatic).
Although most compilers and interpreters implement this behavior similarly in terms of using stacks, heaps, etc, a compiler may sometimes break these conventions if it wants as long as behavior is correct. For instance, due to optimization a local variable may only exist in a register or be removed entirely, even though most local variables exist in the stack. As has been pointed out in a few comments, you are free to implement a compiler that doesn't even use a stack or a heap, but instead some other storage mechanisms (rarely done, since stacks and heaps are great for this).
I will provide some simple annotated C code to illustrate all of this. The best way to learn is to run a program under a debugger and watch the behavior. If you prefer to read python, skip to the end of the answer :)
// Statically allocated in the data segment when the program/DLL is first loaded
// Deallocated when the program/DLL exits
// scope - can be accessed from anywhere in the code
int someGlobalVariable;
// Statically allocated in the data segment when the program is first loaded
// Deallocated when the program/DLL exits
// scope - can be accessed from anywhere in this particular code file
static int someStaticVariable;
// "someArgument" is allocated on the stack each time MyFunction is called
// "someArgument" is deallocated when MyFunction returns
// scope - can be accessed only within MyFunction()
void MyFunction(int someArgument) {
// Statically allocated in the data segment when the program is first loaded
// Deallocated when the program/DLL exits
// scope - can be accessed only within MyFunction()
static int someLocalStaticVariable;
// Allocated on the stack each time MyFunction is called
// Deallocated when MyFunction returns
// scope - can be accessed only within MyFunction()
int someLocalVariable;
// A *pointer* is allocated on the stack each time MyFunction is called
// This pointer is deallocated when MyFunction returns
// scope - the pointer can be accessed only within MyFunction()
int* someDynamicVariable;
// This line causes space for an integer to be allocated in the heap
// when this line is executed. Note this is not at the beginning of
// the call to MyFunction(), like the automatic variables
// scope - only code within MyFunction() can access this space
// *through this particular variable*.
// However, if you pass the address somewhere else, that code
// can access it too
someDynamicVariable = new int;
// This line deallocates the space for the integer in the heap.
// If we did not write it, the memory would be "leaked".
// Note a fundamental difference between the stack and heap
// the heap must be managed. The stack is managed for us.
delete someDynamicVariable;
// In other cases, instead of deallocating this heap space you
// might store the address somewhere more permanent to use later.
// Some languages even take care of deallocation for you... but
// always it needs to be taken care of at runtime by some mechanism.
// When the function returns, someArgument, someLocalVariable
// and the pointer someDynamicVariable are deallocated.
// The space pointed to by someDynamicVariable was already
// deallocated prior to returning.
return;
}
// Note that someGlobalVariable, someStaticVariable and
// someLocalStaticVariable continue to exist, and are not
// deallocated until the program exits.
A particularly poignant example of why it's important to distinguish between lifetime and scope is that a variable can have local scope but static lifetime - for instance, "someLocalStaticVariable" in the code sample above. Such variables can make our common but informal naming habits very confusing. For instance when we say "local" we usually mean "locally scoped automatically allocated variable" and when we say global we usually mean "globally scoped statically allocated variable". Unfortunately when it comes to things like "file scoped statically allocated variables" many people just say... "huh???".
Some of the syntax choices in C/C++ exacerbate this problem - for instance many people think global variables are not "static" because of the syntax shown below.
int var1; // Has global scope and static allocation
static int var2; // Has file scope and static allocation
int main() {return 0;}
Note that putting the keyword "static" in the declaration above prevents var2 from having global scope. Nevertheless, the global var1 has static allocation. This is not intuitive! For this reason, I try to never use the word "static" when describing scope, and instead say something like "file" or "file limited" scope. However many people use the phrase "static" or "static scope" to describe a variable that can only be accessed from one code file. In the context of lifetime, "static" always means the variable is allocated at program start and deallocated when program exits.
Some people think of these concepts as C/C++ specific. They are not. For instance, the Python sample below illustrates all three types of allocation (there are some subtle differences possible in interpreted languages that I won't get into here).
from datetime import datetime
class Animal:
_FavoriteFood = 'Undefined' # _FavoriteFood is statically allocated
def PetAnimal(self):
curTime = datetime.time(datetime.now()) # curTime is automatically allocatedion
print("Thank you for petting me. But it's " + str(curTime) + ", you should feed me. My favorite food is " + self._FavoriteFood)
class Cat(Animal):
_FavoriteFood = 'tuna' # Note since we override, Cat class has its own statically allocated _FavoriteFood variable, different from Animal's
class Dog(Animal):
_FavoriteFood = 'steak' # Likewise, the Dog class gets its own static variable. Important to note - this one static variable is shared among all instances of Dog, hence it is not dynamic!
if __name__ == "__main__":
whiskers = Cat() # Dynamically allocated
fido = Dog() # Dynamically allocated
rinTinTin = Dog() # Dynamically allocated
whiskers.PetAnimal()
fido.PetAnimal()
rinTinTin.PetAnimal()
Dog._FavoriteFood = 'milkbones'
whiskers.PetAnimal()
fido.PetAnimal()
rinTinTin.PetAnimal()
# Output is:
# Thank you for petting me. But it's 13:05:02.255000, you should feed me. My favorite food is tuna
# Thank you for petting me. But it's 13:05:02.255000, you should feed me. My favorite food is steak
# Thank you for petting me. But it's 13:05:02.255000, you should feed me. My favorite food is steak
# Thank you for petting me. But it's 13:05:02.255000, you should feed me. My favorite food is tuna
# Thank you for petting me. But it's 13:05:02.255000, you should feed me. My favorite food is milkbones
# Thank you for petting me. But it's 13:05:02.256000, you should feed me. My favorite food is milkbones

The Stack
When you call a function the arguments to that function plus some other overhead is put on the stack. Some info (such as where to go on return) is also stored there.
When you declare a variable inside your function, that variable is also allocated on the stack.
Deallocating the stack is pretty simple because you always deallocate in the reverse order in which you allocate. Stack stuff is added as you enter functions, the corresponding data is removed as you exit them. This means that you tend to stay within a small region of the stack unless you call lots of functions that call lots of other functions (or create a recursive solution).
The Heap
The heap is a generic name for where you put the data that you create on the fly. If you don't know how many spaceships your program is going to create, you are likely to use the new (or malloc or equivalent) operator to create each spaceship. This allocation is going to stick around for a while, so it is likely we will free things in a different order than we created them.
Thus, the heap is far more complex, because there end up being regions of memory that are unused interleaved with chunks that are - memory gets fragmented. Finding free memory of the size you need is a difficult problem. This is why the heap should be avoided (though it is still often used).
Implementation
Implementation of both the stack and heap is usually down to the runtime / OS. Often games and other applications that are performance critical create their own memory solutions that grab a large chunk of memory from the heap and then dish it out internally to avoid relying on the OS for memory.
This is only practical if your memory usage is quite different from the norm - i.e for games where you load a level in one huge operation and can chuck the whole lot away in another huge operation.
Physical location in memory
This is less relevant than you think because of a technology called Virtual Memory which makes your program think that you have access to a certain address where the physical data is somewhere else (even on the hard disc!). The addresses you get for the stack are in increasing order as your call tree gets deeper. The addresses for the heap are un-predictable (i.e implimentation specific) and frankly not important.

Others have answered the broad strokes pretty well, so I'll throw in a few details.
Stack and heap need not be singular. A common situation in which you have more than one stack is if you have more than one thread in a process. In this case each thread has its own stack. You can also have more than one heap, for example some DLL configurations can result in different DLLs allocating from different heaps, which is why it's generally a bad idea to release memory allocated by a different library.
In C you can get the benefit of variable length allocation through the use of alloca, which allocates on the stack, as opposed to alloc, which allocates on the heap. This memory won't survive your return statement, but it's useful for a scratch buffer.
Making a huge temporary buffer on Windows that you don't use much of is not free. This is because the compiler will generate a stack probe loop that is called every time your function is entered to make sure the stack exists (because Windows uses a single guard page at the end of your stack to detect when it needs to grow the stack. If you access memory more than one page off the end of the stack you will crash). Example:
void myfunction()
{
char big[10000000];
// Do something that only uses for first 1K of big 99% of the time.
}

Others have directly answered your question, but when trying to understand the stack and the heap, I think it is helpful to consider the memory layout of a traditional UNIX process (without threads and mmap()-based allocators). The Memory Management Glossary web page has a diagram of this memory layout.
The stack and heap are traditionally located at opposite ends of the process's virtual address space. The stack grows automatically when accessed, up to a size set by the kernel (which can be adjusted with setrlimit(RLIMIT_STACK, ...)). The heap grows when the memory allocator invokes the brk() or sbrk() system call, mapping more pages of physical memory into the process's virtual address space.
In systems without virtual memory, such as some embedded systems, the same basic layout often applies, except the stack and heap are fixed in size. However, in other embedded systems (such as those based on Microchip PIC microcontrollers), the program stack is a separate block of memory that is not addressable by data movement instructions, and can only be modified or read indirectly through program flow instructions (call, return, etc.). Other architectures, such as Intel Itanium processors, have multiple stacks. In this sense, the stack is an element of the CPU architecture.

What is a stack?
A stack is a pile of objects, typically one that is neatly arranged.
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out manner.
In a multi-threaded application, each thread will have its own stack.
What is a heap?
A heap is an untidy collection of things piled up haphazardly.
In computing architectures the heap is an area of dynamically-allocated memory that is managed automatically by the operating system or the memory manager library.
Memory on the heap is allocated, deallocated, and resized regularly during program execution, and this can lead to a problem called fragmentation.
Fragmentation occurs when memory objects are allocated with small spaces in between that are too small to hold additional memory objects.
The net result is a percentage of the heap space that is not usable for further memory allocations.
Both together
In a multi-threaded application, each thread will have its own stack. But, all the different threads will share the heap.
Because the different threads share the heap in a multi-threaded application, this also means that there has to be some coordination between the threads so that they don’t try to access and manipulate the same piece(s) of memory in the heap at the same time.
Which is faster – the stack or the heap? And why?
The stack is much faster than the heap.
This is because of the way that memory is allocated on the stack.
Allocating memory on the stack is as simple as moving the stack pointer up.
For people new to programming, it’s probably a good idea to use the stack since it’s easier.
Because the stack is small, you would want to use it when you know exactly how much memory you will need for your data, or if you know the size of your data is very small.
It’s better to use the heap when you know that you will need a lot of memory for your data, or you just are not sure how much memory you will need (like with a dynamic array).
Java Memory Model
The stack is the area of memory where local variables (including method parameters) are stored. When it comes to object variables, these are merely references (pointers) to the actual objects on the heap.
Every time an object is instantiated, a chunk of heap memory is set aside to hold the data (state) of that object. Since objects can contain other objects, some of this data can in fact hold references to those nested objects.

The stack is a portion of memory that can be manipulated via several key assembly language instructions, such as 'pop' (remove and return a value from the stack) and 'push' (push a value to the stack), but also call (call a subroutine - this pushes the address to return to the stack) and return (return from a subroutine - this pops the address off of the stack and jumps to it). It's the region of memory below the stack pointer register, which can be set as needed. The stack is also used for passing arguments to subroutines, and also for preserving the values in registers before calling subroutines.
The heap is a portion of memory that is given to an application by the operating system, typically through a syscall like malloc. On modern OSes this memory is a set of pages that only the calling process has access to.
The size of the stack is determined at runtime, and generally does not grow after the program launches. In a C program, the stack needs to be large enough to hold every variable declared within each function. The heap will grow dynamically as needed, but the OS is ultimately making the call (it will often grow the heap by more than the value requested by malloc, so that at least some future mallocs won't need to go back to the kernel to get more memory. This behavior is often customizable)
Because you've allocated the stack before launching the program, you never need to malloc before you can use the stack, so that's a slight advantage there. In practice, it's very hard to predict what will be fast and what will be slow in modern operating systems that have virtual memory subsystems, because how the pages are implemented and where they are stored is an implementation detail.

I think many other people have given you mostly correct answers on this matter.
One detail that has been missed, however, is that the "heap" should in fact probably be called the "free store". The reason for this distinction is that the original free store was implemented with a data structure known as a "binomial heap." For that reason, allocating from early implementations of malloc()/free() was allocation from a heap. However, in this modern day, most free stores are implemented with very elaborate data structures that are not binomial heaps.

You can do some interesting things with the stack. For instance, you have functions like alloca (assuming you can get past the copious warnings concerning its use), which is a form of malloc that specifically uses the stack, not the heap, for memory.
That said, stack-based memory errors are some of the worst I've experienced. If you use heap memory, and you overstep the bounds of your allocated block, you have a decent chance of triggering a segment fault. (Not 100%: your block may be incidentally contiguous with another that you have previously allocated.) But since variables created on the stack are always contiguous with each other, writing out of bounds can change the value of another variable. I have learned that whenever I feel that my program has stopped obeying the laws of logic, it is probably buffer overflow.

Simply, the stack is where local variables get created. Also, every time you call a subroutine the program counter (pointer to the next machine instruction) and any important registers, and sometimes the parameters get pushed on the stack. Then any local variables inside the subroutine are pushed onto the stack (and used from there). When the subroutine finishes, that stuff all gets popped back off the stack. The PC and register data gets and put back where it was as it is popped, so your program can go on its merry way.
The heap is the area of memory dynamic memory allocations are made out of (explicit "new" or "allocate" calls). It is a special data structure that can keep track of blocks of memory of varying sizes and their allocation status.
In "classic" systems RAM was laid out such that the stack pointer started out at the bottom of memory, the heap pointer started out at the top, and they grew towards each other. If they overlap, you are out of RAM. That doesn't work with modern multi-threaded OSes though. Every thread has to have its own stack, and those can get created dynamicly.

From WikiAnwser.
Stack
When a function or a method calls another function which in turns calls another function, etc., the execution of all those functions remains suspended until the very last function returns its value.
This chain of suspended function calls is the stack, because elements in the stack (function calls) depend on each other.
The stack is important to consider in exception handling and thread executions.
Heap
The heap is simply the memory used by programs to store variables.
Element of the heap (variables) have no dependencies with each other and can always be accessed randomly at any time.

Stack
Very fast access
Don't have to explicitly de-allocate variables
Space is managed efficiently by CPU, memory will not become fragmented
Local variables only
Limit on stack size (OS-dependent)
Variables cannot be resized
Heap
Variables can be accessed globally
No limit on memory size
(Relatively) slower access
No guaranteed efficient use of space, memory may become fragmented over time as blocks of memory are allocated, then freed
You must manage memory (you're in charge of allocating and freeing variables)
Variables can be resized using realloc()

In Short
A stack is used for static memory allocation and a heap for dynamic memory allocation, both stored in the computer's RAM.
In Detail
The Stack
The stack is a "LIFO" (last in, first out) data structure, that is managed and optimized by the CPU quite closely. Every time a function declares a new variable, it is "pushed" onto the stack. Then every time a function exits, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables.
The advantage of using the stack to store variables, is that memory is managed for you. You don't have to allocate memory by hand, or free it once you don't need it any more. What's more, because the CPU organizes stack memory so efficiently, reading from and writing to stack variables is very fast.
More can be found here.
The Heap
The heap is a region of your computer's memory that is not managed automatically for you, and is not as tightly managed by the CPU. It is a more free-floating region of memory (and is larger). To allocate memory on the heap, you must use malloc() or calloc(), which are built-in C functions. Once you have allocated memory on the heap, you are responsible for using free() to deallocate that memory once you don't need it any more.
If you fail to do this, your program will have what is known as a memory leak. That is, memory on the heap will still be set aside (and won't be available to other processes). As we will see in the debugging section, there is a tool called Valgrind that can help you detect memory leaks.
Unlike the stack, the heap does not have size restrictions on variable size (apart from the obvious physical limitations of your computer). Heap memory is slightly slower to be read from and written to, because one has to use pointers to access memory on the heap. We will talk about pointers shortly.
Unlike the stack, variables created on the heap are accessible by any function, anywhere in your program. Heap variables are essentially global in scope.
More can be found here.
Variables allocated on the stack are stored directly to the memory and access to this memory is very fast, and its allocation is dealt with when the program is compiled. When a function or a method calls another function which in turns calls another function, etc., the execution of all those functions remains suspended until the very last function returns its value. The stack is always reserved in a LIFO order, the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack, freeing a block from the stack is nothing more than adjusting one pointer.
Variables allocated on the heap have their memory allocated at run time and accessing this memory is a bit slower, but the heap size is only limited by the size of virtual memory. Elements of the heap have no dependencies with each other and can always be accessed randomly at any time. You can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time.
You can use the stack if you know exactly how much data you need to allocate before compile time, and it is not too big. You can use the heap if you don't know exactly how much data you will need at runtime or if you need to allocate a lot of data.
In a multi-threaded situation each thread will have its own completely independent stack, but they will share the heap. The stack is thread specific and the heap is application specific. The stack is important to consider in exception handling and thread executions.
Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).
At run-time, if the application needs more heap, it can allocate memory from free memory and if the stack needs memory, it can allocate memory from free memory allocated memory for the application.
Even, more detail is given here and here.
Now come to your question's answers.
To what extent are they controlled by the OS or language runtime?
The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
More can be found here.
What is their scope?
Already given in top.
"You can use the stack if you know exactly how much data you need to allocate before compile time, and it is not too big. You can use the heap if you don't know exactly how much data you will need at runtime or if you need to allocate a lot of data."
More can be found in here.
What determines the size of each of them?
The size of the stack is set by OS when a thread is created. The size of the heap is set on application startup, but it can grow as space is needed (the allocator requests more memory from the operating system).
What makes one faster?
Stack allocation is much faster since all it really does is move the stack pointer. Using memory pools, you can get comparable performance out of heap allocation, but that comes with a slight added complexity and its own headaches.
Also, stack vs. heap is not only a performance consideration; it also tells you a lot about the expected lifetime of objects.
Details can be found from here.

OK, simply and in short words, they mean ordered and not ordered...!
Stack: In stack items, things get on the top of each-other, means gonna be faster and more efficient to be processed!...
So there is always an index to point the specific item, also processing gonna be faster, there is relationship between the items as well!...
Heap: No order, processing gonna be slower and values are messed up together with no specific order or index... there are random and there is no relationship between them... so execution and usage time could be vary...
I also create the image below to show how they may look like:

stack, heap and data of each process in virtual memory:

In the 1980s, UNIX propagated like bunnies with big companies rolling their own.
Exxon had one as did dozens of brand names lost to history.
How memory was laid out was at the discretion of the many implementors.
A typical C program was laid out flat in memory with
an opportunity to increase by changing the brk() value.
Typically, the HEAP was just below this brk value
and increasing brk increased the amount of available heap.
The single STACK was typically an area below HEAP which was a tract of memory
containing nothing of value until the top of the next fixed block of memory.
This next block was often CODE which could be overwritten by stack data
in one of the famous hacks of its era.
One typical memory block was BSS (a block of zero values)
which was accidentally not zeroed in one manufacturer's offering.
Another was DATA containing initialized values, including strings and numbers.
A third was CODE containing CRT (C runtime), main, functions, and libraries.
The advent of virtual memory in UNIX changes many of the constraints.
There is no objective reason why these blocks need be contiguous,
or fixed in size, or ordered a particular way now.
Of course, before UNIX was Multics which didn't suffer from these constraints.
Here is a schematic showing one of the memory layouts of that era.

A couple of cents: I think, it will be good to draw memory graphical and more simple:
Arrows - show where grow stack and heap, process stack size have limit, defined in OS, thread stack size limits by parameters in thread create API usually. Heap usually limiting by process maximum virtual memory size, for 32 bit 2-4 GB for example.
So simple way: process heap is general for process and all threads inside, using for memory allocation in common case with something like malloc().
Stack is quick memory for store in common case function return pointers and variables, processed as parameters in function call, local function variables.

Since some answers went nitpicking, I'm going to contribute my mite.
Surprisingly, no one has mentioned that multiple (i.e. not related to the number of running OS-level threads) call stacks are to be found not only in exotic languages (PostScript) or platforms (Intel Itanium), but also in fibers, green threads and some implementations of coroutines.
Fibers, green threads and coroutines are in many ways similar, which leads to much confusion. The difference between fibers and green threads is that the former use cooperative multitasking, while the latter may feature either cooperative or preemptive one (or even both). For the distinction between fibers and coroutines, see here.
In any case, the purpose of both fibers, green threads and coroutines is having multiple functions executing concurrently, but not in parallel (see this SO question for the distinction) within a single OS-level thread, transferring control back and forth from one another in an organized fashion.
When using fibers, green threads or coroutines, you usually have a separate stack per function. (Technically, not just a stack but a whole context of execution is per function. Most importantly, CPU registers.) For every thread there're as many stacks as there're concurrently running functions, and the thread is switching between executing each function according to the logic of your program. When a function runs to its end, its stack is destroyed. So, the number and lifetimes of stacks are dynamic and are not determined by the number of OS-level threads!
Note that I said "usually have a separate stack per function". There're both stackful and stackless implementations of couroutines. Most notable stackful C++ implementations are Boost.Coroutine and Microsoft PPL's async/await. (However, C++'s resumable functions (a.k.a. "async and await"), which were proposed to C++17, are likely to use stackless coroutines.)
Fibers proposal to the C++ standard library is forthcoming. Also, there're some third-party libraries. Green threads are extremely popular in languages like Python and Ruby.

I have something to share, although the major points are already covered.
Stack
Very fast access.
Stored in RAM.
Function calls are loaded here along with the local variables and function parameters passed.
Space is freed automatically when program goes out of a scope.
Stored in sequential memory.
Heap
Slow access comparatively to Stack.
Stored in RAM.
Dynamically created variables are stored here, which later requires freeing the allocated memory after use.
Stored wherever memory allocation is done, accessed by pointer always.
Interesting note:
Should the function calls had been stored in heap, it would had resulted in 2 messy points:
Due to sequential storage in stack, execution is faster. Storage in heap would have resulted in huge time consumption thus making the whole program execute slower.
If functions were stored in heap (messy storage pointed by pointer), there would have been no way to return to the caller address back (which stack gives due to sequential storage in memory).

Wow! So many answers and I don't think one of them got it right...
1) Where and what are they (physically in a real computer's memory)?
The stack is memory that begins as the highest memory address allocated to your program image, and it then decrease in value from there. It is reserved for called function parameters and for all temporary variables used in functions.
There are two heaps: public and private.
The private heap begins on a 16-byte boundary (for 64-bit programs) or a 8-byte boundary (for 32-bit programs) after the last byte of code in your program, and then increases in value from there. It is also called the default heap.
If the private heap gets too large it will overlap the stack area, as will the stack overlap the heap if it gets too big. Because the stack starts at a higher address and works its way down to lower address, with proper hacking you can get make the stack so large that it will overrun the private heap area and overlap the code area. The trick then is to overlap enough of the code area that you can hook into the code. It's a little tricky to do and you risk a program crash, but it's easy and very effective.
The public heap resides in it's own memory space outside of your program image space. It is this memory that will be siphoned off onto the hard disk if memory resources get scarce.
2) To what extent are they controlled by the OS or language runtime?
The stack is controlled by the programmer, the private heap is managed by the OS, and the public heap is not controlled by anyone because it is an OS service -- you make requests and either they are granted or denied.
2b) What is their scope?
They are all global to the program, but their contents can be private, public, or global.
2c) What determines the size of each of them?
The size of the stack and the private heap are determined by your compiler runtime options. The public heap is initialized at runtime using a size parameter.
2d) What makes one faster?
They are not designed to be fast, they are designed to be useful. How the programmer utilizes them determines whether they are "fast" or "slow"
REF:
https://norasandler.com/2019/02/18/Write-a-Compiler-10.html
https://learn.microsoft.com/en-us/windows/desktop/api/heapapi/nf-heapapi-getprocessheap
https://learn.microsoft.com/en-us/windows/desktop/api/heapapi/nf-heapapi-heapcreate

A lot of answers are correct as concepts, but we must note that a stack is needed by the hardware (i.e. microprocessor) to allow calling subroutines (CALL in assembly language..). (OOP guys will call it methods)
On the stack you save return addresses and call → push / ret → pop is managed directly in hardware.
You can use the stack to pass parameters.. even if it is slower than using registers (would a microprocessor guru say or a good 1980s BIOS book...)
Without stack no microprocessor can work. (we can't imagine a program, even in assembly language, without subroutines/functions)
Without the heap it can. (An assembly language program can work without, as the heap is a OS concept, as malloc, that is a OS/Lib call.
Stack usage is faster as:
Is hardware, and even push/pop are very efficient.
malloc requires entering kernel mode, use lock/semaphore (or other synchronization primitives) executing some code and manage some structures needed to keep track of allocation.

Where and what are they (physically in a real computer's memory)?
ANSWER: Both are in RAM.
ASIDE:
RAM is like a desk and HDDs/SSDs (permanent storage) are like bookshelves. To read anything, you must have a book open on your desk, and you can only have as many books open as fit on your desk. To get a book, you pull it from your bookshelf and open it on your desk. To return a book, you close the book on your desk and return it to its bookshelf.
Stack and heap are names we give to two ways compilers store different kinds of data in the same place (i.e. in RAM).
What is their scope?
What determines the size of each of them?
What makes one faster?
ANSWER:
The stack is for static (fixed size) data
a. At compile time, the compiler reads the variable types used in your code.
i. It allocates a fixed amount of memory for these variables.
ii. This size of this memory cannot grow.
b. The memory is contiguous (a single block), so access is sometimes faster than the heap
c. An object placed on the stack that grows in memory during runtime beyond the size of the stack causes a stack overflow error
The heap is for dynamic (changing size) data
a. The amount of memory is limited only by the amount of empty space available in RAM
i. The amount used can grow or shrink as needed at runtime
b. Since items are allocated on the heap by finding empty space wherever it exists in RAM, data is not always in a contiguous section, which sometimes makes access slower than the stack
c. Programmers manually put items on the heap with the new keyword and MUST manually deallocate this memory when they are finished using it.
i. Code that repeatedly allocates new memory without deallocating it when it is no longer needed leads to a memory leak.
ASIDE:
The stack and heap were not primarily introduced to improve speed; they were introduced to handle memory overflow. The first concern regarding use of the stack vs. the heap should be whether memory overflow will occur. If an object is intended to grow in size to an unknown amount (like a linked list or an object whose members can hold an arbitrary amount of data), place it on the heap. As far as possible, use the C++ standard library (STL) containers vector, map, and list as they are memory and speed efficient and added to make your life easier (you don't need to worry about memory allocation/deallocation).
After getting your code to run, if you find it is running unacceptably slow, then go back and refactor your code and see if it can be programmed more efficiently. It may turn out the problem has nothing to do with the stack or heap directly at all (e.g. use an iterative algorithm instead of a recursive one, look at I/O vs. CPU-bound tasks, perhaps add multithreading or multiprocessing).
I say sometimes slower/faster above because the speed of the program might not have anything to do with items being allocated on the stack or heap.
To what extent are they controlled by the OS or language run-time?
ANSWER:
The stack size is determined at compile time by the compiler.
The heap size varies during runtime. (The heap works with the OS during runtime to allocate memory.)
ASIDE:
Below is a little more about control and compile-time vs. runtime operations.
Each computer has a unique instruction set architecture (ISA), which are its hardware commands (e.g. "MOVE", "JUMP", "ADD", etc.).
An OS is nothing more than a resource manager (controls how/when/ and where to use memory, processors, devices, and information).
The ISA of the OS is called the bare machine and the remaining commands are called the extended machine. The kernel is the first layer of the extended machine. It controls things like
determining what tasks get to use a processor (the scheduler),
how much memory or how many hardware registers to allocate to a task (the dispatcher), and
the order in which tasks should be performed (the traffic controller).
When we say "compiler", we generally mean the compiler, assembler, and linker together
The compiler turns source code into assembly language and passes it to the assembler,
The assembler turns the assembly language into machine code (ISA commands), and passes it to the linker
The linker takes all machine code (possibly generated from multiple source files) and combines it into one program.
The machine code gets passed to the kernel when executed, which determines when it should run and take control, but the machine code itself contains ISA commands for requesting files, requesting memory, etc. So the code issues ISA commands, but everything has to pass by the kernel.

The stack is essentially an easy-to-access memory that simply manages its items
as a - well - stack. Only items for which the size is known in advance can go onto the stack. This is the case for numbers, strings, booleans.
The heap is a memory for items of which you can’t predetermine the
exact size and structure. Since objects and arrays can be mutated and
change at runtime, they have to go into the heap.
Source: Academind

I feel most answers are very convoluted and technical, while I didn't find one that could explain simply the reasoning behind those two concepts (i.e. why people created them in the first place?) and why you should care. Here is my attempt at one:
Data on the Stack is temporary and auto-cleaning
Data on the Heap is permanent until manually deleted
That's it.
Still, for more explanations :
The stack is meant to be used as the ephemeral or working memory, a memory space that we know will be entirely deleted regularly no matter what mess we put in there during the lifetime of our program. That's like the memo on your desk that you scribble on with anything going through your mind that you barely feel may be important, which you know you will just throw away at the end of the day because you will have filtered and organized the actual important notes in another medium, like a document or a book. We don't care for presentation, crossing-outs or unintelligible text, this is just for our work of the day and will remember what we meant an hour or two ago, it's just our quick and dirty way to store ideas we want to remember later without hurting our current stream of thoughts. That's what people mean by "the stack is the scratchpad".
The heap however is the long-term memory, the actual important document that will we stored, consulted and depended on for a very long time after its creation. It consequently needs to have perfect form and strictly contain the important data. That why it costs a lot to make and can't be used for the use-case of our precedent memo. It wouldn't be worthwhile, or even simply useless, to take all my notes in an academic paper presentation, writing the text as calligraphy. However this presentation is extremely useful for well curated data. That's what the heap is meant to be. Well known data, important for the lifetime application, which is well controlled and needed at many places in your code. The system will thus never delete this precious data without you explicitly asking for it, because it knows "that's where the important data is!".
This is why you need to manage and take care of memory allocation on the heap, but don't need to bother with it for the stack.
Most top answers are merely technical details of the actual implementations of that concept in real computers.
So what to take away from this is that:
Unimportant, working, temporary, data just needed to make our functions and objects work is (generally) more relevant to be stored on the stack.
Important, permanent and foundational application data is (generally) more relevant to be stored on the heap.
This of course needs to be thought of only in the context of the lifetime of your program. Actual humanly important data generated by your program will need to be stored on an external file evidently. (Since whether it is the heap or the stack, they are both cleared entirely when your program terminates.)
PS: Those are just general rules, you can always find edge cases and each language comes with its own implementation and resulting quirks, this is meant to be taken as a guidance to the concept and a rule of thumb.

CPU stack and heap are physically related to how CPU and registers works with memory, how machine-assembly language works, not high-level languages themselves, even if these languages can decide little things.
All modern CPUs work with the "same" microprocessor theory: they are all based on what's called "registers" and some are for "stack" to gain performance. All CPUs have stack registers since the beginning and they had been always here, way of talking, as I know. Assembly languages are the same since the beginning, despite variations... up to Microsoft and its Intermediate Language (IL) that changed the paradigm to have a OO virtual machine assembly language. So we'll be able to have some CLI/CIL CPU in the future (one project of MS).
CPUs have stack registers to speed up memories access, but they are limited compared to the use of others registers to get full access to all the available memory for the processus. It why we talked about stack and heap allocations.
In summary, and in general, the heap is hudge and slow and is for "global" instances and objects content, as the stack is little and fast and for "local" variables and references (hidden pointers to forget to manage them).
So when we use the new keyword in a method, the reference (an int) is created in the stack, but the object and all its content (value-types as well as objects) is created in the heap, if I remember. But local elementary value-types and arrays are created in the stack.
The difference in memory access is at the cells referencing level: addressing the heap, the overall memory of the process, requires more complexity in terms of handling CPU registers, than the stack which is "more" locally in terms of addressing because the CPU stack register is used as base address, if I remember.
It is why when we have very long or infinite recurse calls or loops, we got stack overflow quickly, without freezing the system on modern computers...
C# Heap(ing) Vs Stack(ing) In .NET
Stack vs Heap: Know the Difference
Static class memory allocation where it is stored C#
What and where are the stack and heap?
https://en.wikipedia.org/wiki/Memory_management
https://en.wikipedia.org/wiki/Stack_register
Assembly language resources:
Assembly Programming Tutorial
Intel® 64 and IA-32 Architectures Software Developer Manuals

When a process is created then after loading code and data OS setup heap start just after data ends and stack to top of address space based on architecture
When more heap is required OS will allocate dynamically and heap chunk is always virtually contiguous
Please see brk(), sbrk() and alloca() system call in linux

How to undeclare (delete) variable in C?

Like we do with macros:
#undef SOMEMACRO
Can we also undeclare or delete the variables in C, so that we can save a lot of memory?
I know about malloc() and free(), but I want to delete the variables completely so that if I use printf("%d", a); I should get error
test.c:4:14: error: ‘a’ undeclared (first use in this function)

No, but you can create small minimum scopes to achieve this since all scope local variables are destroyed when the scope is exit. Something like this:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}

It's not a direct answer to the question, but it might bring some order and understanding on why this question has no proper answer and why "deleting" variables is impossible in C.
Point #1 What are variables?
Variables are a way for a programmer to assign a name to a memory space. This is important, because this means that a variable doesn't have to occupy any actual space! As long as the compiler has a way to keep track of the memory in question, a defined variable could be translated in many ways to occupy no space at all.
Consider: const int i = 10; A compiler could easily choose to substitute all instances of i into an immediate value. i would occupy 0 data memory in this case (depending on architecture it could increase code size). Alternatively, the compiler could store the value in a register and again, no stack nor heap space will be used. There's no point in "undefining" a label that exists mostly in the code and not necessarily in runtime.
Point #2 Where are variables stored?
After point #1 you already understand that this is not an easy question to answer as the compiler could do anything it wants without breaking your logic, but generally speaking, variables are stored on the stack. How the stack works is quite important for your question.
When a function is being called the machine takes the current location of the CPU's instruction pointer and the current stack pointer and pushes them into the stack, replacing the stack pointer to the next location on stack. It then jumps into the code of the function being called.
That function knows how many variables it has and how much space they need, so it moves the frame pointer to capture a frame that could occupy all the function's variables and then just uses stack. To simplify things, the function captures enough space for all it's variables right from the start and each variable has a well defined offset from the beginning of the function's stack frame*. The variables are also stored one after the other.
While you could manipulate the frame pointer after this action, it'll be too costly and mostly pointless - The running code only uses the last stack frame and could occupy all remaining stack if needed (stack is allocated at thread start) so "releasing" variables gives little benefit. Releasing a variable from the middle of the stack frame would require a defrag operation which would be very CPU costly and pointless to recover few bytes of memory.
Point #3: Let the compiler do its job
The last issue here is the simple fact that a compiler could do a much better job at optimizing your program than you probably could. Given the need, the compiler could detect variable scopes and overlap memory which can't be accessed simultaneously to reduce the programs memory consumption (-O3 compile flag).
There's no need for you to "release" variables since the compiler could do that without your knowledge anyway.
This is to complement all said before me about the variables being too small to matter and the fact that there's no mechanism to achieve what you asked.
* Languages that support dynamic-sized arrays could alter the stack frame to allocate space for that array only after the size of the array was calculated.

There is no way to do that in C nor in the vast majority of programming languages, certainly in all programming languages that I know.
And you would not save "a lot of memory". The amount of memory you would save if you did such a thing would be minuscule. Tiny. Not worth talking about.
The mechanism that would facilitate the purging of variables in such a way would probably occupy more memory than the variables you would purge.
The invocation of the code that would reclaim the code of individual variables would also occupy more space than the variables themselves.
So if there was a magic method purge() that purges variables, not only the implementation of purge() would be larger than any amount of memory you would ever hope to reclaim by purging variables in your program, but also, in int a; purge(a); the call to purge() would occupy more space than a itself.
That's because the variables that you are talking about are very small. The printf("%d", a); example that you provided shows that you are thinking of somehow reclaiming the memory occupied by individual int variables. Even if there was a way to do that, you would be saving something of the order of 4 bytes. The total amount of memory occupied by such variables is extremely small, because it is a direct function of how many variables you, as a programmer, declare by hand-typing their declarations. It would take years of typing on a keyboard doing nothing but mindlessly declaring variables before you would declare a number of int variables occupying an amount of memory worth speaking of.

Well, you can use blocks ({ }) and defining a variable as late as possible to limit the scope where it exists.
But unless the variable's address is taken, doing so has no influence on the generated code at all, as the compiler's determination of the scope where it has to keep the variable's value is not significantly impacted.
If the variable's address is taken, failure of escape-analysis, mostly due to inlining-barriers like separate compilation or allowing semantic interpositioning, can make the compiler assume it has to keep it alive till later in the block than strictly neccessary. That's rarely significant (don't worry about a handful of ints, and most often a few lines of code longer keeping it alive are insignificant), but best to keep it in mind for the rare case where it might matter.

If you are that concerned about the tiny amount of memory that is on the stack, then you're probably going to be interested in understanding the specifics of your compiler as well. You'll need to find out what it does when it compiles. The actual shape of the stack-frame is not specified by the C language. It is left to the compiler to figure out. To take an example from the currently accepted answer:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
This may or may not affect the memory usage of the function. If you were to do this in a mainstream compiler like gcc or Visual Studio, you would find that they optimize for speed rather than stack size, so they pre-allocate all of the stack space they need at the start of the function. They will do analysis to figure out the minimum pre-allocation needed, using your scoping and variable-usage analysis, but those algorithms literally wont' be affected by extra scoping. They're already smarter than that.
Other compilers, especially those for embedded platforms, may allocate the stack frame differently. On these platforms, such scoping may be the trick you needed. How do you tell the difference? The only options are:
Read the documentation
Try it, and see what works
Also, make sure you understand the exact nature of your problem. I worked on a particular embedded project which eschewed the stack for everything except return values and a few ints. When I pressed the senior developers about this silliness, they explained that on this particular application, stack space was at more of a premium than space for globally allocated variables. They had a process they had to go through to prove that the system would operate as intended, and this process was much easier for them if they allocated everything up front and avoided recursion. I guarantee you would never arrive at such a convoluted solution unless you first knew the exact nature of what you were solving.
As another solution you could look at, you could always build your own stack frames. Make a union of structs, where each struct contains the variables for one stack frame. Then keep track of them yourself. You could also look at functions like alloca, which can allow for growing the stack frame during the function call, if your compiler supports it.
Would a union of structs work? Try it. The answer is compiler dependent. If all variables are stored in memory on your particular device, then this approach will likely minimize stack usage. However, it could also substantially confuse register coloring algorithms, and result in an increase in stack usage! Try and see how it goes for you!

Are activation records created on stack or heap in C?

I am reading about memory allocation and activation records. I am having some doubts. Can anyone make the following crystal clear ?
A). My first doubt is that "Are activation records created on stack or heap in C" ?
B). These are few lines from an abstract which i am referring :-->
Even though memory on stack area is created during run time- the
amount of memory (activation record size) is determined at compile
time. Static and global memory area is compile time determined and
this is part of the binary. At run time, we cannot change this. Only
memory area freely available for the process to change during runtime
is heap.At compile time compiler only reserves the stack space for
activation record. This gets used (allocated on actual memory) only
during program run. Only DATA segment part of the program like static
variables, string literals etc. are allocated during compile time. For
heap area, how much memory to be allocated is also determined at run
time.
Can anyone please elaborate these lines as i am unable to understand anything ?
I am sure the explaination would be of great need to me.

As a quick answer, I don't even really know what an activation record is. The rest of the quote has very poor English and is quite misleading.
Honestly, the abstract is talking about absolutes when in reality, there really are not at all absolute. You do define a main stack at compile time, yes (though you can create many stacks at runtime as well).
Yes, when you want to allocate memory, one usually creates a pointer to store that information, but where you place that is completely up to you. It can be stack, it can be global memory, it can be in the heap from another allocation, or you can just leak memory and not store it anywhere it all if you wish. Perhaps this is what is meant by an activation record?
Or perhaps, it means that when dynamic memory is created, somewhere in memory, there has to be some sort of information that keeps track of used and unused memory. For many allocators, this is a list of pointers stored somewhere in the allocated memory, though others store it in a different piece of memory and some could even place that on the stack. It all depends on the needs of the memory system.
Finally, where dynamic memory is allocated from can vary as well. It can come from a call to the OS, though in some cases, it can also just be overlayed onto existing global (or even stack) memory - which is not uncommon in embedded programming.
As you can see, this abstract is not even close to what dynamic memory represents.
Additional info:
Many are jumping all over me stating that 'C' has no stack in the standard. Correct. That said, how many people have truly coded in C without one? I'll leave that alone for now.
Defined memory, as you call it, is anything declared with the 'static' keyword within a function or any variable declared outside of a function without the 'extern' keyword in front of it. This is memory that the compiler knows about and can reserve space for without any additional help.
Allocated memory - is not a good term as defined memory can also be considered allocated. Instead, use the term dynamic memory. This is memory that you allocate from a heap at run-time. An example:
char *foo;
int my_value;
int main(void)
{
foo = malloc(10 * sizeof(char));
// Do stuff with foo
free(foo);
return 0;
}
foo is "defined" as you say as a pointer. If nothing else were done, it would only reserve that much memory, but when the malloc is reached in main(), it now points to at least 10 bytes of dynamic memory as well. Once the free is reached, that memory is now made available to the program for other uses. It's allocated size is 'dynamic'. Compare that to my_value which will always be the size of an int and nothing else.

In C (given how it is almost universally implemented*) An activation record is exactly the same thing as a stack frame which is the same thing as a call frame. They are always created on the stack.
The stack segment is a memory area the process gets "for free" from the OS when it created. It does not need to malloc or free it. On x86, a machine register (e.g RSP) points to the end of the segment and stack frames/activation records/call frames are "allocated" by decrementing the pointer in that register by how many byte to allocate. E.g:
int my_func() {
int x = 123;
int y = 234;
int z = 345;
...
return 1;
}
An unoptimizing C compiler could generate assembly code for keeping those three variables in the stack frame like this:
my_func:
; "allocate" 24 bytes of stack space
sub rsp, 24
; Initialize the allocated stack memory
mov [rsp], 345 ; z = 345
mov [rsp+8], 234 ; y = 234
mov [rsp+16], 134 ; x = 123
...
; "free" the allocated stack space
add rsp, 24
; return 1
mov rax, 1
ret
In other contexts and languages activation records can be implemented differently. For example using linked lists. But as the language is C and the context is low-level programming I don't think it is useful to discuss that.

In theory, a C99 (or C11) compatible implementation (e.g. a C compiler & C standard library implementation) do not even need (in all cases) a call stack. For example, one could imagine a whole program compiler (notably for freestanding C implementation) which would analyze the entire program and decide that stack frames are unneeded (e.g. each local variable could be allocated statically, or fit in a register). Or one could imagine an implementation allocating the call frames as continuation frames (perhaps after CPS transformation by the compiler) elsewhere (e.g. in some "heap"), using techniques similar to those described in Appel old book Compiling with Continuations (describing an SML/NJ compiler).
(remember that a programming language is a specification -not some software-, often written in English, perhaps with additional formalization, in some technical report or standard document. AFAIK, the C99 or C11 standards do not even mention any stack or activation record. But in practice, most C implementations are made of a compiler and a standard library implementation.)
In practice, allocation records are call frames (for C, they are synonyms; things are more complex with nested functions) and are allocated on a hardware assisted call stack on all reasonable C implementations I know. on Z/Architecture there is no hardware stack pointer register, so it is a convention (dedicating some register to play the role of the stack pointer).
So look first at call stack wikipage. It has a nice picture worth many words.
Are activation records created on stack or heap
In practice, they (activation records) are call frames on the call stack (allocated following calling conventions and ABIs). Of course the layout, slot usage, and size of a call frame is computed at compile-time by the compiler.
In practice, a local variable may correspond to some slot inside the call frame. But sometimes, the compiler would keep it only in a register, or reuse the same slot (which has a fixed offset in the call frame) for various usages, e.g. for several local variables in different blocks, etc.
But most C compilers are optimizing compilers. They are able to inline a function, or sometimes make a tail call to it (then the caller's call frame is reused as or overwritten by the callee call frame), so details are more complex.
See also this How was C ported to architectures that had no hardware stack? question on retro.

Finding roots for garbage collection in C

I'm trying to implement a simple mark and sweep garbage collector in C. The first step of the algorithm is finding the roots. So my question is how can I find the roots in a C program?
In the programs using malloc, I'll be using the custom allocator. This custom allocator is all that will be called from the C program, and may be a custom init().
How does garbage collector knows what all the pointers(roots) are in the program? Also, given a pointer of a custom type how does it get all pointers inside that?
For example, if there's a pointer p pointing to a class list, which has another pointer inside it.. say q. How does garbage collector knows about it, so that it can mark it?
Update: How about if I send all the pointer names and types to GC when I init it? Similarly, the structure of different types can also be sent so that GC can traverse the tree. Is this even a sane idea or am I just going crazy?

First off, garbage collectors in C, without extensive compiler and OS support, have to be conservative, because you cannot distinguish between a legitimate pointer and an integer that happens to have a value that looks like a pointer. And even conservative garbage collectors are hard to implement. Like, really hard. And often, you will need to constrain the language in order to get something acceptable: for instance, it might be impossible to correctly collect memory if pointers are hidden or obfuscated. If you allocate 100 bytes and only keep a pointer to the tenth byte of the allocation, your GC is unlikely to figure out that you still need the block since it will see no reference to the beginning. Another very important constraint to control is the memory alignment: if pointers can be on unaligned memory, your collector can be slowed down by a factor of 10x or worse.
To find roots, you need to know where your stacks start, and where your stacks end. Notice the plural form: each thread has its own stack, and you might need to account for that, depending on your objectives. To know where a stack starts, without entering into platform-specific details (that I probably wouldn't be able to provide anyways), you can use assembly code inside the main function of the current thread (just main in a non-threaded executable) to query the stack register (esp on x86, rsp on x86_64 to name those two only). Gcc and clang support a language extension that lets you assign a variable permanently to a register, which should make it easy for you:
register void* stack asm("esp"); // replace esp with the name of your stack reg
(register is a standard language keyword that is most of the time ignored by today's compilers, but coupled with asm("register_name"), it lets you do some nasty stuff.)
To ensure you don't forget important roots, you should defer the actual work of the main function to another one. (On x86 platforms, you can also query ebp/rbp, the stack frame base pointers, instead, and still do your actual work in the main function.)
int main(int argc, const char** argv, const char** envp)
{
register void* stack asm("esp");
// put stack somewhere
return do_main(argc, argv, envp);
}
Once you enter your GC to do collection, you need to query the current stack pointer for the thread you've interrupted. You will need design-specific and/or platform-specific calls for that (though if you get something to execute on the same thread, the technique above will still work).
The actual hunt for roots starts now. Good news: most ABIs will require stack frames to be aligned on a boundary greater than the size of a pointer, which means that if you trust every pointer to be on aligned memory, you can treat your whole stack as a intptr_t* and check if any pattern inside looks like any of your managed pointers.
Obviously, there are other roots. Global variables can (theoretically) be roots, and fields inside structures can be roots too. Registers can also have pointers to objects. You need to separately account for global variables that can be roots (or forbid that altogether, which isn't a bad idea in my opinion) because automatic discovery of those would be hard (at least, I wouldn't know how to do it on any platform).
These roots can lead to references on the heap, where things can go awry if you don't take care.
Since not all platforms provide malloc introspection (as far as I know), you need to implement the concept of scanned memory--that is, memory that your GC knows about. It needs to know at least the address and the size of each of such allocation. When you get a reference to one of these, you simply scan them for pointers, just like you did for the stack. (This means that you should take care that your pointers are aligned. This is normally the case if you let your compiler do its job, but you still need to be careful when you use third-party APIs).
This also means that you cannot put references to collectable memory to places where the GC can't reach it. And this is where it hurts the most and where you need to be extra-careful. Otherwise, if your platform supports malloc introspection, you can easily tell the size of each allocation you get a pointer to and make sure you don't overrun them.
This just scratches the surface of the topic. Garbage collectors are extremely complex, even when single-threaded. When you add threads to the mix, you enter a whole new world of hurt.
Apple has implemented such a conservative GC for the Objective-C language and dubbed it libauto. They have open-sourced it, along with a good part of the low-level technologies of Mac OS X, and you can find the source here.
I can only quote Hot Licks here: good luck!
Okay, before I go even further, I forgot something very important: compiler optimizations can break the GC. If your compiler is not aware of your GC, it can very well never put certain roots on the stack (only dealing with them in registers), and you're going to miss them. This is not too problematic for single-threaded programs if you can inspect registers, but again, a huge mess for multithreaded programs.
Also be very careful about the interruptibility of allocations: you must make sure that your GC cannot kick in while you're returning a new pointer because it could collect it right before it is assigned to a root, and when your program resumes it would assign that new dangling pointer to your program.
And here's an update to address the edit:
Update: How about if I send all the pointer names and types to GC when
I init it? Similarly, the structure of different types can also be
sent so that GC can traverse the tree. Is this even a sane idea or am
I just going crazy?
I guess you could allocate our memory then register it with the GC to tell it that it should be a managed resource. That would solve the interruptability problem. But then, be careful about what you send to third-party libraries, because if they keep a reference to it, your GC might not be able to detect it since they won't register their data structures with your GC.
And you likely won't be able to do that with roots on the stack.

The roots are basically all static and automatic object pointers. Static pointers would be linked inside the load modules. Automatic pointers must be found by scanning stack frames. Of course, you have no idea where in the stack frames the automatic pointers are.
Once you have the roots you need to scan objects and find all the pointers inside them. (This would include pointer arrays.) For that you need to identify the class object and somehow extract from it information about pointer locations. Of course, in C many objects are not virtual and do not have a class pointer within them.
Good luck!!
Added: One technique that could vaguely make your quest possible is "conservative" garbage collection. Since you intend to have your own allocator, you can (somehow) keep track of allocation sizes and locations, so you can pick any pointer-sized chunk out of storage and ask "Might this possibly be a pointer to one of my objects?" You can, of course, never know for sure, since random data might "look like" a pointer to one of your objects, but still you can, through this mechanism, scan a chunk of storage (like a frame in the call stack, or an individual object) and identify all the possible objects it might address.
With a conservative collector you cannot safely do object relocation/compaction (where you modify pointers to objects as you move them) since you might accidentally modify "random" data that looks like an object pointer but is in fact meaningful data to some application. But you can identify unused objects and free up the space they occupy for reuse. With proper design it's possible to have a very effective non-compacting GC.
(However, if your version of C allows unaligned pointers scanning could be very slow, since you'd have to try every variation on byte alignment.)

Is making smaller functions generally more efficient memory-wise since variables get deallocated more frequently?

Is dividing the work into 5 functions as opposed to one big function more memory efficient in C since at a given time there are fewer variables in memory, as the stack-frame gets deallocated more often? Does it depend on the compiler, and optimization? if so in what compilers is it faster?
Answer given there are a lot of local variables and the stack frames comes from a centralized main and not created on the top of each other.
I know other advantages of breaking out the function into smaller functions. Please answer this question, only in respect to memory usage.

It might reduce "high water mark" of stack usage for your program, and if so that might reduce the overall memory requirement of the program.
Yes, it depends on optimization. If the optimizer inlines the function calls, you might well find that all the variables of all the functions inlined are wrapped into one big stack frame. Any compiler worth using is capable of inlining[*], so the fact that it can happen doesn't depend on compiler. Exactly when it happens, will differ.
If your local variables are small, though, then it's fairly rare for your program to use more stack than has been automatically allocated to you at startup. Unless you go past what you're given initially, how much you use makes no difference to overall memory requirements.
If you're putting great big structures on the stack (multiple kilobytes), or if you're on a machine where a kilobyte is a lot of memory, then it might make a difference to overall memory usage. So, if by "a lot of local variables" you mean few dozen ints and pointers then no, nothing you do makes any significant difference. If by "a lot of local variables" you mean a few dozen 10k buffers, or if your function recurses very deep so that you have hundreds of levels of your few dozen ints, then it's a least possible it could make a difference, depending on the OS and configuration.
The model that stack and heap grow towards each other through general RAM, and the free memory in the middle can be used equally by either one of them, is obsolete. With the exception of a very few, very restricted systems, memory models are not designed that way any more. In modern OSes, we have so-called "virtual memory", and stack space is allocated to your program one page at a time. Most of them automatically allocate more pages of stack as it is used, up to a configured limit that's usually very large. A few don't automatically extend stack (Symbian last I used it, which was some years ago, didn't, although arguably Symbian is not a "modern" OS). If you're using an embedded OS, check what the manual says about stack.
Either way, the only thing that affects total memory use is how many pages of stack you need at any one time. If your system automatically extends stack, you won't even notice how much you're using. If it doesn't, you'll need to ensure that the program is given sufficient stack for its high-water mark, and that's when you might notice excessive stack use.
In short, this is one of those things that in theory makes a difference, but in practice that difference is almost always insignificant. It only matters if your program uses massive amounts of stack relative to the resources of the environment it runs in.
[*] People programming in C for PICs or something, using a C compiler that is basically a non-optimizing assembler, are allowed to be offended that I've called their compiler "not worth using". The stack on such devices is so different from "typical" systems that the answer is different anyway.

I think in most cases the area of memory allocated for the stack (for the entire program) remains constant. The amount in use will change based on the depth of call stack and that amount would be less when fewer variables are used (but note that function calls push the return address and stack pointer also).
Also it depends on how the functions are called. If two functions are called in series, for example, and the stack of the first is popped before the call to the second, then you'll be using less of the stack..but if the first function calls the second then you're back to where you were with one big function (plus the function call overhead).

There's no memory allocation on stack - just moving the stack pointer towards next value. While stack size itself is predefined. So there's no difference in memory usage (apart of situations when you get stack overflow).

Yes, in the same vein that using a finer coat of paint on a jet plane increases its aerodynamic properties. Ok, that's a bad analogy, but the point is that if there is ever a question of making things clear and telegraphic or trying to use more functions, go with telegraphic. In most cases these are not mutually exclusive anyway as the beginners tend to give subroutines or functions too much to do.
In terms of memory I think that if you are truly splitting up up work (f, then g, then h) then you will see some minute available memory increases but if these are interdependent then you will not.
As #Joel Burget says, memory management is not really a consideration in code structuring.
Just my take.

Splitting a huge function into smaller ones does have its benefits, among them is potentially more optimized memory usage.
Say, you have this function.
void huge_func(int input) {
char a[1024];
char b[1024];
// do something with input and a
// do something with input and b
}
And you split it to two.
void func_a(int input) {
char a[1024];
// do something with input and a
}
void func_b(int input) {
char b[1024];
// do something with input and b
}
Calling huge_func will take at least 2048 bytes of memory, and calling func_a then func_b achieves the same outcome with about half less memory. However, if inside func_a you call func_b, the amount of memory used is about the same as huge_func. Essentially, as what #sje397 wrote.
I might be wrong to say this but I do not think there is any compiler optimization that could help you reduce the usage of stack memory. I believe the layout of stack memory must ensure that sufficient memory is reserved for all declared variables, whether used or not.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight