Beginner's confusion about x86 stack

Beginner's confusion about x86 stack - c

First of all, I'd like to know if this model is an accurate representation of the stack "framing" process.
I've been told that conceptually, the stack is like a Coke bottle. The sugar is at the bottom and you fill it up to the top. With this in mind, how does the Call tell the EIP register to "target" the called function if the EIP is in another bottle (it's in the code segment, not the stack segment)? I watched a video on YouTube saying that the "Code Segment of RAM" (the place where functions are kept) is the place where the EIP register is.

Typically, a computer program uses four kinds of memory areas (also called sections or segments):
The text section: This contains the program code. It is reserved when the program is loaded by the operating system. This area is fixed and does not change while the program is running. This would better be called "code" section, but the name has historical reasons.
The data section: This contains variables of the program. It is reserved when the program is loaded and initialized to values defined by the programmer. These values can be altered by the program while it executes.
The stack: This is a dynamic area of memory. It is used to store data for function calls. It basically works by "pushing" values onto the stack and popping from the stack. This is also called "LIFO": last in first out. This is where local variables of a function reside. If a function complets, the data is removed from the stack and is lost (basically).
The heap: This is also a dynamic memory region. There are special function in the programming language which "allocate" (reserve) a piece of this area on request of the program. Another function is available to return this area to the heap if it is not required anymore. As the data is released explicitly, it can be used to store data which lives longer than just a function call (different from the stack).
The data for text and data section are stored in the program file (they can be found in Linux for example using objdump (add a . to the names). stack and heap are not stored anywhere in the file as they are allocated dynamically (on-demand) by the program itself.
Normally, after the program has been loaded, the memory area reamining is treated as a single large block where both, stack and heap are located. They start from opposite end of that area and grow towards each other. For most architectures the heap grows from low to high memory addresses (ascending) and the stack downwards (decending). If they ever intersect, the program has run out of memory. As this may happen undetected, the stack might corrupt (change foreign data) the heap or vice versa. This may result in any kind of errors, depending how/what data has changed. If the stack gets corrupted, this may result in the program going wild (this is actually one way a trojan might work). Modern operating systems, however should take measures to detect this situation before it becomes critical.
This is not only for x86, but also for most other CPU families and operating system, notably: ARM, x86, MIPS, MSP430 (microcontroller), AVR (microcontroller), Linux, Windows, OS-X, iOS, Android (which uses Linux OS), DOS. For microcontrollers, there is often no heap (all memory is allocated at run-time) and the stack may be organized a bit differently; this is also true for the ARM-based Cortex-M microcontrollers. But anyway, this is quite a special subject.
Disclaimer: This is very simplified, so please no comments like "how about bss, const, myspecialarea";-) . There also is not requirement from the C standard for these areas, specifically to use a heap or a stack. Indeed there are implementations which don't use either. Those are most times embedded systems with small (8 or 16 bit) MCUs or DSPs. Also modern architectures use CPU registers instead of the stack to pass parameters and keep local variables. Those are defined in the Application Binary Interface of the target platform.
For the stack, you might read the wikipedia article. Note the difference in implementation between the datatstructure "stack" and the "hardware stack" as implemented in a typical (micro)processor.

Related

How does a JIT compiler like v8 structure its memory (i.e. the stack, heap, code, and data)?

I am trying to figure out how Linux stores the stack, or C even, since each process can have its own stack. Every diagram shows the assembly instructions and registers used in managing an extremely primitive stack (only 1 stack as well), so I'm in doubt that this is how higher-level systems do it (i.e. systems with multiple threads, processes, fibers, workers, coroutines, generators, etc.). Which is why this question is about v8....
How does v8 (at a high level, maybe linking to some related code if it's useful) structure the stack? How does it structure the heap? And code and data. You always see diagrams like this:
But (a) is that how a production JIT system like v8 does it? And (b) what is the actual implementation of the key structs? What data structures are used? Is the stack actually a linked list of some sort? Or is it an array? Is the heap broken down into a doubly-linked list of blocks of a certain size? Are they allocated in physically contiguous blocks of memory (in v8), or are they using OS-specific syscalls like mmap on Linux to allocate memory, which it then structures in its own way using some sort of block architecture (which is what I would like to know about)? Or is it just relying on malloc and such?
Basically, I am interested in simulating how a production JIT compiler works and am stuck at the beginning part of how the memory is structured (with structs and such). I am not sure if the stack is actually a linked-list in reality, or something else. Am very much interested to know.
The reason I think these diagrams are a lie is because we are usually dealing with virtual memory here, the physical memory is abstracted away. And even then, the virtual memory is divided into small pages, which might be scattered around or something, I don't know. So I feel these visuals are totally incorrect and misleading, and would like to know what it actually looks like.

How does v8 structure the stack?
Just like any other program: rbp points at the base of the current function's stack frame, and rsp points at the "top" (growing downwards, though); the push and pop instructions are used to write/read from where rsp points to while decrementing/incrementing rsp accordingly. There's no data structure, just raw memory.
Is the heap [...] using OS-specific syscalls like mmap on Linux to allocate memory
Yes. V8's managed heap is allocated one page at a time using mmap, or equivalents on other operating systems.
Note that what such diagrams call "heap" is heap-allocated memory in the C++ sense, not the managed heap in a VM sense. One could argue that the latter is just a special part/case of the former though, and whether you use mmap or malloc is an implementation detail that doesn't really matter (beyond possibly certain performance differences, depending on usage patterns).
I think these diagrams are a lie is because we are usually dealing with virtual memory here
I'd put it the opposite way: virtual memory is why these diagrams are (slightly simplified but) mostly accurate. Every process thinks that it is alone in the world, it doesn't see any other processes in its memory address space (just a bit of the kernel, for interacting with it). The memory that a process sees does look like raw memory; it's just that the addresses aren't the physical addresses, there is a translation layer in between (which is managed by kernel + CPU).

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?

Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Way to detect that stack area is not overlapping RAM area during runtime

Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.

You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.

In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!

In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.

Stack and heap confusion for embedded 8051

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?

The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).

I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.

Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);

Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS

In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.

Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.

Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.

While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight