Where threads are located in memory? - c

hay
I have a question about location of threads in memory,
Where is threads stack located? And is there a way to display it (using gdb, readelf or something similar)

is there a way to display it...using gdb...?
Sure, GDB can show you the stack of any thread. I don't remember the commands, but they're right there in the manual. ISTR, there's one command that will list all of the threads, there's another that you use to tell it which thread you want to look at. Everything else works just like how it works for a single-threaded program.
I think there's also a way you can tell GDB to iterate over the threads (i.e., perform a single command, such as dump the stack, once for each thread in the program.)
Where is threads stack located?
Um, It's located in memory.
Seriously. Why do you want to know? In most programming environments that I have ever heard of, the entire stack for a thread gets allocated all at once, and it can not grow. There's usually some way for the program to say how big it needs the stack of a new thread to be if the default size is not big enough.
In Linux, the program typically would obtain space for a new thread's stack by calling 'mmap(...)` with arguments that allow the OS to choose the virtual address. But, there's no reason why it has to work that way. The program could allocate the stack from the heap, if that made any sense.
In other operating systems, there's probably some mechanism similar to mmap that lets the OS choose the address.
if you want an exact answer it is in the memory between heap and stack
If you go back thirty or more years, A process in a Unix-like OS would be given one contiguous block of virtual memory, typically starting at page one (page zero would be unallocated because that's how you get a segfault if the program follows a NULL pointer.) The lowest addresses would contain the program's "text" segment (e.g., its code and immutable strings), then the "data" segment (initialized static variables), then the "bss" segment (uninitialized static variables.)
Everything from the top of the BSS to the top of the given VM region was "wilderness" (i.e., untouched). The program's heap would grow up into the wilderness from the bottom, and its one and only call stack would grow down into the wilderness from the top. If the heap and the stack ever met, then you'd get a "stack overflow" or a malloc() error.
Things are more complicated these days, when a program can have dozens or even hundreds of call stacks. Instead of that "wilderness" Linux programs today can use 'mmap(...)` to create additional VM regions--either for a new thread's stack, or to add to the heap, or to map a file into memory for random access.

Related

How is memory layout shared with other processes/threads?

I'm currently learning memory layout in C. For now I know there exist several sections in C program memory: text, data, bss, heap and stack. They also say heap is shared with other things beyond the program.
My questions are these.
What exactly is the heap shared with? One source states that Heap must always be freed in order to make it available for other processes whereas another says The heap area is shared by all threads, shared libraries, and dynamically loaded modules in a process. If it is not shared with other processes, do I really have to free it while my program is running (not at the end of it)?
Some sources also single out high addresses (the sixth section) for command line arguments and environment variables. Shall this be considered as another layer and a part of a program memory?
Are the other sections shared with anything else beyond a program?
The heap is a per-process memory: each process has its own heap, which is shared only within the same process space (like between the process threads, as you said). Why should you free it? Not properly to give space to other processes (at least in modern OS where the process memory is reclaimed by the OS when the process dies), but to prevent heap exhaustion within your process memory: in C, if you don't deallocate the heap memory regions you used, they will be always considered as busy even when they are not used anymore. Thus, to prevent undesired errors, it's a good practice to free the memory in the heap as soon as you don't need it anymore.
In a C program the command line variables are stored in the stack as function variables of the main. What happens is that usually the stack is allocated in the highest portion of a process memory, which is mapped to the high addresses (this is probably the reason why some sources point out what you wrote). But, generally speaking, there isn't any sixth memory area.
As said by the others, the text area can be shared by processes. This area usually contains the binary code, which would be the same for different processes which share the same binary. For performance reasons, the OS can allow to share such memory area, (think for example when you fork a child process).
Heap is shared with other processes in a sense that all processes use RAM. The more of it you use, the less is available to other programs. Heap sharing with other threads in your own program means that all your threads actually see and access the same heap (same virtual address space, same actual RAM, with some luck also same cache).
No.
text can be shared with other processes. These days it is marked as read-only, so having several processes share text makes sense. In practice this means that if you are already running top and you run another instance it makes no sense to load text part again. This would waste time and physical RAM. If the OS is smart enough it can map those RAM pages into virtual address space of both top instances, saving time and space.
On the official aspect:
The terms thread, process, text section, data section, bss, heap and stack are not even defined by the C language standard, and every platform is free to implement these components however "it may like".
Threads and processes are typically implemented at the operating-system layer, while all the different memory sections are typically implemented at the compiler layer.
On the practical aspect:
For every given process, all these memory sections (text section, data section, bss, heap and stack) are shared by all the threads of that process.
Hence, it is under the responsibility of the programmer to ensure mutual-exclusion when accessing these memory sections from different threads.
Typically, this is achieved via synchronization utilities such as semaphores, mutexes and message queues.
In between processes, it is under the responsibility of the operating system to ensure mutual-exclusion.
Typically, this is achieved via virtual-memory abstraction, where each process runs inside its own logical address space, and each logical address space is mapped to a different physical address space.
Disclaimer: some would claim that each thread has its own stack, but technically speaking, those stacks are usually allocated consecutively on the stack of the process, and there's usually no one to prevent a thread from accessing the stacks of other threads, whether intentionally or by mistake (aka stack overflow).

heap overflow affecting other programs

I was trying to create the condition for malloc to return a NULL pointer. In the below program, though I can see malloc returning NULL, once the program is forcebly terminated, I see that all other programs are becoming slow and finally I had to reboot the system. So my question is whether the memory for heap is shared with other programs? If not, other programs should not have affected. Is OS is not allocating certain amount of memory at the time of execution? I am using windows 10, Mingw.
#include <stdio.h>
#include <malloc.h>
void mallocInFunction(void)
{
int *ptr=malloc(500);
if(ptr==NULL)
{
printf("Memory Could not be allocated\n");
}
else
{
printf("Allocated memory successfully\n");
}
}
int main (void)
{
while(1)
{
mallocInFunction();
}
return(0);
}
So my question is whether the memory for heap is shared with other programs?
Physical memory (RAM) is a resource that is shared by all processes. The operating system makes decisions about how much RAM to allocate to each process and adjusts that over time.
If not, other programs should not have affected. Is OS is not allocating certain amount of memory at the time of execution?
At the time the program starts executing, the operating system has no idea how much memory the program will want or need. Instead, it deals with allocations as they happen. Unless configured otherwise, it will typically do everything it possibly can to allow the program's allocation to succeed because presumably there's a reason the program is doing what it's doing and the operating system won't try to second guess it.
... whether the memory for heap is shared with other programs?
Well, the C standard doesn't exactly require a heap, but in the context of a task-switching, multi-user and multi-threaded OS, of course memory is shared between processes! The C standard doesn't require any of this, but this is all pretty common stuff:
CPU cache memory tends to be preferred for code that's executed often, though this might get swapped around quite a bit; that may or may not be swapped to a heap.
Task switching causes registers to be swapped to other forms of memory; that may or may not be swapped to a heap.
Entire pages are swapped to and from disk, so that other programs can make use of them when your OS switches execution away from your program and to the other programs, and when it's your programs turn to execute again among other reasons. This may or may not involve manipulating the heap.
FWIW, you're referring to memory that has allocated storage duration. It's best to avoid using terms like heap and stack, as they're virtually meaningless. The memory you're referring to is on a silicon chip, regardless of whether it uses a heap or a stack.
... Is OS is not allocating certain amount of memory at the time of execution?
Speaking of silicon chips and execution, your OS likely only has control of one processor (a silicon chip which contains some logic circuits and memory, among other things I'm sure) with which to execute many programs! To summarise this post, yes, your program is most likely sharing those silicon chips with other programs!
On a tangential note, I don't think heap overflow means what you think it means.
Your question cannot be answered in the context of C, the language. For C, there's no such thing as a heap, a process, ...
But it can be answered in the context of operating systems. Even a bit generically because many modern multitasking OSes do similar things.
Given a modern multitasking OS, it will use virtual address spaces for each process. The OS manages a fixed size of physical RAM and divides this into pages, when a process needs memory, such pages are mapped into the process' virtual address space (typically using a different virtual address than the physical one). So when all memory pages are claimed by the OS itself and by the processes running, the OS will typically save some of these pages that are not in active use to disk, in a swap area, in order to serve this page as a fresh page to the next process requesting one. But when the original page is touched (and this is typically the case with free(), see below), it must first be loaded from disk again, but to have a free page for this, another page must be saved to swap space.
This is, like all disk I/O, slow, and it's probably what you see happening here.
Now to fully understand this: what does malloc() do? It typically requests from the operating system to have the memory of the own process increased (and if necessary, the OS does this by mapping another page), and it uses this new memory by writing some information there about the block of memory requested (so free() can work correctly later) and ultimately returns a pointer to a block that's free to use for the program. free() uses the information written by malloc(), modifies it to indicate this block is free again, and it typically can't give any memory back to the OS because there are other malloc()d blocks in the same page. It will give memory back when possible, but that's the exception in a typical scenario where dynamic allocations are heavily used.
So, the answer to your question is: Yes, the RAM is shared because there is only one set of physical RAM. The OS does the best it can to hide that fact and virtualize RAM, but if a process consumes all that is there, this will have visible effects.
malloc() is not system call but libc library function. So when a program ask for allocating memory via malloc(), system call brk()/sbrk() OR mmap() to allocated page(s), more details here.
Please keep in mind that the memory you get is all virtual in nature, that means if you have 3GB of physical RAM you can actually allocate almost infinite memory. So how does this happens? This happens via concept called 'paging', where system stores and retrieves data from secondary memory storage(HDD/SDD) to main memory(RAM), more details here.
So with this theory, out of memory usually quite rare but program like above which is checking system limits, this can happen. This is nicely explained here.
Now, why other programs are sort of hanged OR slow? Because they all share the same operating system and system is starving for resource. In fact at a point the system will crash and reboot again.
Hope this helps?

Programming languages without garbage collector: where does the garbage go?

I'm currently studying algorithms and advanced data structures: since I'm familiar with C and it does provide a great level of control above implementation and pointer usage I'm using it to test the understanding of the subject so far.
When testing structures that need dinamic things like lists and trees I asked myself: since C doesn't have a garbage collector, if I don't call the free() function in order to deallocate all the variables I dinamically allocate, where does that memory go?
Other related questions incude (sorry for misusing some terms, I don't have much experience in low level abstraction):
Does the compiler use the actual hard drive resources (like a variable is in the x record of my drive) or it "istantiates" a portion of virtual memory to compile and run my programs?
Do I have lots of lists, trees and graphs in my hard drive, all of them involving counts from 0 to 100 or the strings "abcd" and "qwerty"?
Does the OS recognize said data as garbage or I'm stuck with this junk forever until I format the drive?
I'm really curious about it, I never went below the C level of abstraction.
since C doesn't have a garbage collector, if I don't call the free() function in order to deallocate all the variables I dynamically allocate, where does that memory go?
This is not (and cannot really be) defined by the C11 standard (read n1570).
However, let's pretend you run an executable produced by some C compiler on some familiar operating system (like Linux or Windows or MacOSX). Actually you are running some process which has some virtual address space.
The virtual memory and paging subsystem of the operating system kernel would put most useful pages - the resident set size - of that virtual address space in RAM and configure the MMU; read about demand paging & thrashing & page cache & page faults.
When that process terminates (either by nicely exiting or by some abnormal situation, like a segmentation fault) the operating system is releasing every resources used by your process (including the virtual address space of that process).
Read Operating Systems: Three Easy Pieces for much more.
However, if you don't have any operating system, or if your OS or processor don't support virtual memory (think of some Arduino board) things can be widely different. Read about undefined behavior and runtime systems.
On Linux, you can query the address space of a process of pid 1234 by using proc(5). Run in a terminal cat /proc/1234/maps or use pmap(1). From inside the process, read /proc/self/maps as a sequential file. See this.
You could also study the source code of open source standard libraries like GNU libc (above syscalls(2) on Linux) or of musl-libc, or use strace(1) to understand what system calls are done.
C (and other non-garbage-collecting languages) has no concept of garbage at all, and thus no need to collect it somehow - Either you hold a valid pointer to some allocated memory, then it's considered "valuable memory", or you don't, then your program is just wrong - It's as simple as that.
The latter case is something C doesn't even evaluate any further - There's no point in researching what happens in a program "that's wrong" other than fixing it.
Languages like C and C++ use dynamic heap allocation through dedicated functions/operators like malloc and new. This allocates memory on the heap, in RAM. If such a program fails to free the memory once done using it, then the programmer has managed to create a certain kind of bug called memory leak. Meaning that the program now consumes heap memory that cannot be used, since there is nothing in the program pointing at it any longer.
However, all memory allocated by a process is freed by the OS when the process is done executing. If the process failed to clean up its own heap allocations, the OS will do it. It is still good practice to manually clean up the memory though, but for other reasons (exposes latent bugs).
Therefore the only concern with memory leaks is that they cause programs to consume too much RAM while they execute. Once the process is done executing, all memory - including leaked memory - is freed.
There is no relation between the heap and your hard drive, just as there is no relation between the stack and your hard drive. The hard drive is used for storing the executable part of your program, nothing else. The heap, stack and other such memory areas are for storing data when your program is executing. Since they are allocated in RAM, all info in such areas is lost when the program is done executing.
The reason why some languages introduced garbage collection, was to remove the problem with memory leaks. A garbage collector is a background process of sorts, that goes through a program's heap memory and looks for segments of data which no part of the program is pointing at, then free those segments. Since the garbage collector does this, there is no need for free()/ delete.
This comes at the expense of execution speed, since the garbage collector needs to be executed now and then. This is one of many reasons why languages like Java and C# are slower than C and C++ by design. And it is also the reason why C and C++ don't have and never will have a garbage collector, since those languages prioritize execution speed.
If you don't free a resource, it stays allocated. The C compiler knows nothing about hard drives. You can read and write files with C and the appropriate IO libraries. So yes, your hard disk might be littered with stuff from running your software, but the C language or compiler isn't responsible to clean it up. You are. You can clean up your files manually, or code your C programs to clean up after themselves. Get a good book on C.
In languages like C which has no native garbage collection, any instantiated variables are held in volatile memory (RAM not HDD) until such times as either the application releases it or when the application closes. This can cause major issues on machines with limited memory as the memory allocation for the application continues to grow when objects are not 'disposed' throughout its lifecycle until there is no memory left and the application crashes and burns.
In answer to point 2 and 3 (objects on the HDD eg Trees, Graphs), no, they will not be littering your HDD as the objects are only created in memory (RAM) and only live while the application is running, closing the app will release the memory back (mostly) for use by other applications.
See this link for reference to hopefully help understand C Variables a little more.
That memory you are talking about doesn't go anywhere.
It just remains there allocated and unable to be used by any other program until the program that allocated it completes its execution. Then, roughly, the operating system comes and "cleans" all the remains of that application from memory.
It is advised to free the memory yourself since the OS does that way slower than any application would do it (it has to cross-check with every other application running to make sure it doesn't free something it shouldn't be).

About sbrk() and malloc()

I've read the linux manual about sbrk() thoroughly:
sbrk() changes the location of the program break, which defines the end
of the process's data segment (i.e., the program break is the first
location after the end of the uninitialized data segment).
And I do know that user space memory's organization is like the following:
The problem is:
When I call sbrk(1), why does it say I am increasing the size of heap? As the manual says, I am changing the end position of "data segment & bss". So, what increases should be the size of data segment & bss, right?
The data and bss segments are a fixed size. The space allocated to the process after the end of those segments is therefore not a part of those segments; it is merely contiguous with them. And that space is called the heap space and is used for dynamic memory allocation.
If you want to regard it as 'extending the data/bss segment', that's fine too. It won't make any difference to the behaviour of the program, or the space that's allocated, or anything.
The manual page on Mac OS X indicates you really shouldn't be using them very much:
The brk and sbrk functions are historical curiosities left over from earlier days before the advent of virtual memory management. The brk() function sets the break or lowest address of a process's data segment (uninitialized data) to addr (immediately above bss). Data addressing is restricted between addr and the lowest stack pointer to the stack segment. Memory is allocated by brk in page size pieces; if addr is not evenly divisible by the system page size, it is increased to the next page boundary.
The current value of the program break is reliably returned by sbrk(0) (see also end(3)). The getrlimit(2) system call may be used to determine the maximum permissible size of the data segment; it will not be possible to set the break beyond the rlim_max value returned from a call to getrlimit, e.g. etext + rlp->rlim_max (see end(3) for the definition of etext).
It is mildly exasperating that I can't find a manual page for end(3), despite the pointers to look at it. Even this (slightly old) manual page for sbrk() does not have a link for it.
Notice that today sbrk(2) is rarely used. Most malloc implementations are using mmap(2) -at least for large allocations- to acquire a memory segment (and munmap to release it). Quite often, free simply marks a memory zone to be reusable by some future malloc (and does not release any memory to the Linux kernel).
(so practically, the heap of a modern linux process is made of several segments, so is more subtle than your picture; and multi-threaded processes have one stack per thread)
Use proc(5), notably /proc/self/maps and /proc/$pid/maps, to understand the virtual address space of some process. Try first to understand the output of cat /proc/self/maps (showing the address space of that cat command) and of cat /proc/$$/maps (showing the address space of your shell). Try also to look at the maps pseudo-file for your web browser (e.g. cat /proc/$(pidof firefox)/maps or cat /proc/$(pidof iceweasel)/maps etc...); I have more than a thousand lines (so process segments) in it.
Use strace(1) to understand the system calls done by a given command or process.
Take advantage that on Linux most (and probably all) C standard library implementations are free software, so you can study their source code. The source code of musl-libc is quite easy to read.
Read also about ELF, ASLR, dynamic linking & ld-linux(8), and the Advanced Linux Programming book then syscalls(2)

How programmatically get Linux process's stack start and end address?

For a mono threaded program, I want to check whether or not a given virtual address is in the process's stack. I want to do that inside the process which is written in C.
I am thinking of reading /proc/self/maps to find the line labelled [stack] to get start and end address for my process's stack. Thinking about this solution led me to the following questions:
/proc/self/maps shows a stack of 132k for my particular process and the maximum size for the stack (ulimit -s) is 8 mega on my system. How does Linux know that a given page fault occurring because we are above the stack limit belongs to the stack (and that the stack must be made larger) rather than that we are reaching another memory area of the process ?
Does Linux shrink back the stack ? In other words, when returning from deep function calls for example, does the OS reduce the virtual memory area corresponding to the stack ?
How much virtual space is initially allocated for the stack by the OS ?
Is my solution correct and is there any other cleaner way to do that ?
Lots of the stack setup details depend on which architecture you're running on, executable format, and various kernel configuration options (stack pointer randomization, 4GB address space for i386, etc).
At the time the process is exec'd, the kernel picks a default stack top (for example, on the traditional i386 arch it's 0xc0000000, i.e. the end of the user-mode area of the virtual address space).
The type of executable format (ELF vs a.out, etc) can in theory change the initial stack top. Any additional stack randomization and any other fixups are then done (for example, the vdso [system call springboard] area generally is put here, when used). Now you have an actual initial top of stack.
The kernel now allocates whatever space is needed to construct argument and environment vectors and so forth for the process, initializes the stack pointer, creates initial register values, and initiates the process. I believe this provides the answer for (3): i.e. the kernel allocates only enough space to contain the argument and environment vectors, other pages are allocated on demand.
Other answers, as best as I can tell:
(1) When a process attempts to store data in the area below the current bottom of the stack region, a page fault is generated. The kernel fault handler determines where the next populated virtual memory region within the process' virtual address space begins. It then looks at what type of area that is. If it's a "grows down" area (at least on x86, all stack regions should be marked grows-down), and if the process' stack pointer (ESP/RSP) value at the time of the fault is less than the bottom of that region and if the process hasn't exceeded the ulimit -s setting, and the new size of the region wouldn't collide with another region, then it's assumed to be a valid attempt to grow the stack and additional pages are allocated to satisfy the process.
(2) Not 100% sure, but I don't think there's any attempt to shrink stack areas. Presumably normal LRU page sweeping would be performed making now-unused areas candidates for paging out to the swap area if they're really not being re-used.
(4) Your plan seems reasonable to me: the /proc/NN/maps should get start and end addresses for the stack region as a whole. This would be the largest your stack has ever been, I think. The current actual working stack area OTOH should reside between your current stack pointer and the end of the region (ordinarily nothing should be using the area of the stack below the stack pointer).
My answer is for linux on x64 with kernel 3.12.23 only. It might or might not apply to aother versions or architectures.
(1)+(2) I'm not sure here, but I believe it is as Gil Hamilton said before.
(3) You can see the amount in /proc/pid/maps (or /proc/self/maps if you target the calling process). However not all of that it actually useable as stack for your application. Argument- (argv[]) and environment vectors (__environ[]) usually consume quite a bit of space at the bottom (highest address) of that area.
To actually find the area the kernel designated as "stack" for your application, you can have a look at /proc/self/stat. Its values are documented here. As you can see, there is a field for "startstack". Together with the size of the mapped area, you can compute the current amount of stack reserved. Along with "kstkesp", you could determine the amount of free stack space or actually used stack space (keep in mind that any operation done by your thread most likely will change those values).
Also note, that this works only for the processes main thread! Other threads won't get a labled "[stack]" mapping, but either use anonymous mappings or might even end up on the heap. (Use pthreads API to find those values, or remember the stack-start in the threads main function).
(4) As explained in (3), you solution is mostly OK, but not entirely accurate.

Resources