Ever since I was introduced to C, I was told that in C dynamic memory allocation is done using the functions in the malloc family. I also learned that memory dynamically allocated using malloc is allocated on the heap section of the process.
Various OS textbooks say that malloc involves system call (though not always but at times) to allocate structures on heap to the process. Now supposing that malloc returns pointer to chunk of bytes allocated on the heap, why should it need a system call. The activation records of a function are placed in the stack section of the process and since the "stack section" is already a part of the virtual address space of the process, pushing and popping of activation records, manipulation of stack pointers, just start from the highest possible address of the virtual address space. It does not even require a system call.
Now on the same grounds since the "heap section" is also a part of the virtual address space of the process, why should a system call be necessary for allocating a chunk of bytes in this section. The routine like malloc could self handle the "free" list and "allocated" list on its own. All it needs to know is the end of the "data section". Certain texts say that system calls are necessary to "attach memory to the process for dynamic memory allocation", but if malloc allocates memory on "heap section" why is it at all required to attach memory to the process during malloc? Could be simply taken from portion already part of the process.
While going through the text "The C Programming Language" [2e] by Kernighan and Ritchie, I came across their implementation of the malloc function [section 8.7 pages 185-189]. The authors say :
malloc calls upon the operating system to obtain more memory as necessary.
Which is what the OS texts say, but counter intuitive to my thought above (if malloc allocates space on heap).
Since asking the system for memory is a comparatively expensive operation, the authors do not do that on every call to malloc, so they create a function morecore which requests at least NALLOC units; this larger block is chopped up as needed. And the basic free list management is done by free.
But the thing is that the authors use sbrk() to ask the operating system for memory in morecore. Now Wikipedia says:
brk and sbrk are basic memory management system calls used in Unix and Unix-like operating systems to control the amount of memory allocated to the data segment of the process.
Where
a data segment (often denoted .data) is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables.
Which I guess is not the "heap section". [Data section is the second section from bottom in the picture above, while heap is the third section from bottom.]
I am totally confused. I want to know what really happens and how both the concepts are correct? Please help me understand the concept by joining the scattered pieces together...
In your diagram, the section labeled "data" is more precisely called "static data"; the compiler pre-allocates this memory for all the global variables when the process starts.
The heap that malloc() uses is the rest of the process's data segment. This initially has very little memory assigned to it in the process. If malloc() needs more memory, it can use sbrk() to extend the size of the data segment, or it can use mmap() to create additional memory segments elsewhere in the address space.
Why does malloc() need to do this? Why not simply make the entire address space available for it to use? There are historical and practical reasons for this.
The historical reason is that early computers didn't have virtual memory. All the memory assigned to a process was swapped in bulk to disk when switching between processes. So it was important to only assign memory pages that were actually needed.
The practical reason is that this is useful for detecting various kinds of errors. If you've ever gotten a segmentation violation error because you dereferenced an uninitialized pointer, you've benefited from this. Much of the process's virtual address space is not allocated to the process, which makes it likely that unitialized pointers point to unavailable memory, and you get an error trying to use it.
There's also an unallocated gap between the heap (growing upwards) and the stack (growing downward). This is used to detect stack overflow -- when the stack tries to use memory in that gap, it gets a fault that's translated to the stack overflow signal.
This is the Standard C Library specification for malloc(), in its entirety:
7.22.3.4 The malloc function
Synopsis
#include <stdlib.h>
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is
specified by size and whose value is indeterminate. Note that this need
not be the same as the representation of floating-point zero or a null
pointer constant.
Returns
The malloc function returns either a null pointer or a pointer to the
allocated space.
That's it. There's no mention of the Heap, the Stack or any other memory location, which means that the underlying mechanisms for obtaining the requested memory are implementation details.
In other words, you don't care where the memory comes from, from a C perspective. A conforming implementation is free to implement malloc() in any way it sees fit, so long as it conforms to the above specification.
I was told that in C dynamic memory allocation is done using the functions in the malloc family. I also learned that memory dynamically allocated using malloc is allocated on the heap section of the process.
Correct on both points.
Now supposing that malloc returns pointer to chunk of bytes allocated on the heap, why should it need a system call.
It needs to request an adjustment to the size of the heap, to make it bigger.
...the "stack section" is already a part of the virtual address space of the process, pushing and popping of activation records, manipulation of stack pointers, [...] does not even require a system call.
The stack segment is grown implicitly, yes, but that's a special feature of the stack segment. There's typically no such implicit growing of the data segment. (Note, too, that the implicit growing of the stack segment isn't perfect, as witness the number of people who post questions to SO asking why their programs crash when they allocate huge arrays as local variables.)
Now on the same grounds since the "heap section" is also a part of the virtual address space of the process, why should a system call be necessary for allocating a chunk of bytes in this section.
Answer 1: because it's always been that way.
Answer 2: because you want accidental stray pointer references to crash, not to implicitly allocate memory.
malloc calls upon the operating system to obtain more memory as necessary.
Which is what the OS texts say, but counter intuitive to my thought above (if malloc allocates space on heap).
Again, malloc does request space on the heap, but it must use an explicit system call to do so.
But the thing is that the authors use sbrk() to ask the operating system for memory in morecore. Now Wikipedia says:
brk and sbrk are basic memory management system calls used in Unix and Unix-like operating systems to control the amount of memory allocated to the data segment of the process.
Different people use different nomenclatures for the different segments. There's not much of a distinction between the "data" and "heap" segments. You can think of the heap as a separate segment, or you can think of those system calls -- the ones that "allocate space on the heap" -- as simply making the data segment bigger. That's the nomenclature the Wikipedia article is using.
Some updates:
I said that "There's not much of a distinction between the 'data' and 'heap' segments." I suggested that you could think of them as subparts of a single, more generic data segment. And actually there are three subparts: initialized data, uninitialized data or "bss", and the heap. Initialized data has initial values that are explicitly copied out of the program file. Uninitialized data starts out as all bits zero, and so does not need to be stored in the program file; all the program file says is how many bytes of uninitialized data it needs. And then there's the heap, which can be thought of as a dynamic extension of the data segment, which starts out with a size of 0 but may be dynamically adjusted at runtime via calls to brk and sbrk.
I said, "you want accidental stray pointer references to crash, not to implicitly allocate memory", and you asked about this. This was in response to your supposition that explicit calls to brk or sbrk ought not to be required to adjust the size of the heap, and your suggestion that the heap could grow automatically, implicitly, just like the stack does. But how would that work, really?
The way automatic stack allocation works is that as the stack pointer grows (typically "downward"), it eventually reaches a point that it points to unallocated memory -- that blue section in the middle of the picture you posted. At that point, your program literally gets the equivalent of a "segmentation violation". But the operating system notices that the violation involves an address just below the existing stack, so instead of killing your program on an actual segmentation violation, it quick-quick makes the stack segment a little bigger, and lets your program proceed as if nothing had happened.
So I think your question was, why not have the upward-growing heap segment work the same way? And I suppose an operating system could be written that worked that way, but most people would say it was a bad idea.
I said that in the stack-growing case, the operating system notices that the violation involves an address "just below" the existing stack, and decides to grow the stack at that point. There's a definition of "just below", and I'm not sure what it is, but these days I think it's typically a few tens or hundreds of kilobytes. You can find out by writing a program that allocates a local variable
char big_stack_array[100000];
and seeing if your program crashes.
Now, sometimes a stray pointer reference -- that would otherwise cause a segmentation violation style crash -- is just the result of the stack normally growing. But sometimes it's a result of a program doing something stupid, like the common error of writing
char *retbuf;
printf("type something:\n");
fgets(retbuf, 100, stdin);
And the conventional wisdom is that you do not want to (that is, the operating system does not want to) coddle a broken program like this by automatically allocating memory for it (at whatever random spot in the address space the uninitialized retbuf pointer seems to point) to make it seem to work.
If the heap were set up to grow automatically, the OS would presumably define an analogous threshold of "close enough" to the existing heap segment. Apparently stray pointer references within that region would cause the heap to automatically grow, while references beyond that (farther into the blue region) would crash as before. That threshold would probably have to be bigger than the threshold governing automatic stack growth. malloc would have to be written to make sure not to try to grow the heap by more than that amount. And true, stray pointer references -- that is, program bugs -- that happened to reference unallocated memory in that zone would not be caught. (Which is, it's true, what can happen for buggy, stray pointer references just off the end of the stack today.)
But, really, it's not hard for malloc to keep track of things, and explicitly call sbrk when it needs to. The cost of requiring explicit allocation is small, and the cost of allowing automatic allocation -- that is, the cost of the stray pointer bugs not caught -- would be larger. This is a different set of tradeoffs than for the stack growth case, where an explicit test to see if the stack needed growing -- a test which would have to occur on every function call -- would be significantly expensive.
Finally, one more complication. The picture of the virtual memory layout that you posted -- with its nice little stack, heap, data, and text segments -- is a simple and perhaps outdated one. These days I believe things can be a lot more complicated. As #chux wrote in a comment, "your malloc() understanding is only one of many ways allocation is handled. A clear understanding of one model may hinder (or help) understanding of the many possibilities." Among those complicating possibilities are:
A program may have multiple stack segments maintaining multiple stacks, if it supports coroutines or multithreading.
The mmap and shm_open system calls may cause additional memory segments to be allocated, scattered anywhere within that blue region between the heap and the stack.
For large allocations, malloc may use mmap rather than sbrk to get memory from the OS, since it turns out this can be advantageous.
See also Why does malloc() call mmap() and brk() interchangeably?
As the bard said, "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy." :-)
Not all virtual addresses are available at the beginning of a process.
OS does maintain a virtual-to-physics map, but (at any given time) only some of the virtual addresses are in the map. Reading or Writing to an virtual address that isn't in the map cause a instruction level exception. sbrk puts more addresses in the map.
Stack just like data section but has a fixed size, and there is no sbrk-like system call to extend it. We can say there is no heap section, but only a fixed-size stack section and a data section which can be grown upward by sbrk.
The heap section you say is actually a managed (by malloc and free) part of the data section. It's clear that the code relating to heap management is not in OS kernel but in C library executing in CPU user mode.
I want to make sure the return address of sbrk is within a certain specific range. I read somewhere that sbrk allocates from an area allocated at program initialization. So I'm wondering if there's anyway I can enforce the program initialization to allocate from a specific address? For example, with mmap, I'll be able to do so with MAP_FIXED_NOREPLACE . Is it possible to have something similar?
No, this is not possible. brk and sbrk refer to the data segment of the program, and that can be loaded at any valid address that meets the needs of the dynamic linker. Different architectures can and do use different addresses, and even machines of the same architecture can use different ranges depending on the configuration of the kernel. Using a fixed address or address range is extremely nonportable and will make your program very brittle to future changes. I fully expect that doing this will cause your program to break in the future simply by upgrading libc.
In addition, modern programs are typically compiled as position-independent executables so that ASLR can be used to improve security. Therefore, even if you knew the address range that was used for one invocation of your program, the very next invocation of your program might use a totally different address range.
In addition, you almost never want to invoke brk or sbrk by hand. In almost all cases, you will want to use the system memory allocator (or a replacement like jemalloc), which will handle this case for you. For example, glibc's malloc implementation, like most others, will allocate large chunks of memory using mmap, which can significantly reduce memory usage in long-running programs, since these large chunks can be freed independently. The memory allocator also may not appreciate you changing the size of the data segment without consulting it.
Finally, in case you care about portability to other Unix systems, not all systems even have brk and sbrk. OpenBSD allocates all memory using mmap which improves security by expanding the use of ASLR (at the cost of performance).
If you absolutely must use a fixed address or address range and there is no alternative, you'll need to use mmap to allocate that range of memory.
When a C program is started how does it ask the operating system enough memory space for static variables?
And while running how does it ask the operating system memory space for automatic variables?
I would also like to know how it releases these memory spaces after execution.
Please try to be the most accurate possible. If operating systems varies in their explanations please give preference to UNIX-like ones.
Global variables and those with static life time are usually stored in a data segment which is setup by the operating system's executable loader.
This loader probably does, what #John Zwinck said on Unix. On Windows there is VirtualAlloc for example, which can also be used to allocate memory in the address space of another program.
Local variables are usually stored on the so called stack. Allocations on the stack are pretty fast as they usually just consist of a modification of the stack pointer register (sp, esp, rsp on x86 processor family). So when you have an int (size: 4 bytes) that register would simply be decremented by 4 as the stack grows downwards. At the end of the scope the old state of the stack register is restored.
Also this makes stack overflows dangerous wher you can overwrite other variables on the stack that should not be modified, like return addresses of function calls.
Dynamic variables are variables allocated using malloc (C) or new (C++) or any of the operating system specific allocation functions. These are placed on the so called heap. These live until they are cleaned up using free/delete/os-specific-deallocator or the program exits (in that case a sane operating system takes care of the cleanup).
Also dynamic allocation is the slowest of the three as it requires a call to the operating system.
Memory allocation on Unix-like systems is done via calls to the operating system using the sbrk() and mmap() APIs.
sbrk() is used to enlarge the "data segment" which is a contiguous range of (virtual) addresses. mmap() is used in many modern systems as a sort of supplement to this, because it can allocate chunks which can later be deallocated independently (meaning no "holes" will remain as can happen with sbrk()).
In C you have malloc() as the user-facing API for memory allocation. You can read more about how that maps to the low-level functions I mentioned earlier, here: How are malloc and free implemented?
Static variables are found in the BSS segment behind the code. Auto variables are located on the stack at the end of the virtual memory of the process. Both are defined at compile time. The memory layout is then created at program startup.
brk(), sbrk() and mmap() can manipulate the virtual memory (the heap in particular) at run time (e.g. with malloc()/free()) but these functions are not related to static and auto variables!
When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?
malloc implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0) at initialization time. On other operating systems the process is different.
Note that you can implement malloc without knowing the location of the heap. You can initialize the free list to NULL, and call sbrk or a similar function with the allocation size each time a free element of the appropriate size is not found.
This only about Linux implementations of malloc
Many malloc implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.
(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)
Both these syscalls may be quite expansive, so some implementations ask memory (using mmap) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.
The mmap syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.
Try on your system to run a simple program printing the result of a single malloc (of e.g. 128 int-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps (or print the lines of /proc/self/maps inside your program). See proc(5)
So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.
BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc implementation. I find that source code of musl libc is very readable.
On Windows, you use the Heap functions to get the process heap memory. The C runtime will allocate memory blocks on the heap using HeapAlloc and then use that to fulfil malloc requests.
I am trying to understand how memory space is allocated for a C program. For that , I want to determine stack and data segment boundaries. Is there any library call or system call which does this job ? I found that stack bottom can be determined by reading /proc/self/stat. However, I could not find how to do it. Please help. :)
Processes don't have a single "data segment" anymore. They have a bunch of mappings of memory into their address space. Common cases are:
Shared library or executable code or rodata, mapped shared, without write access.
Glibc heap segments, anonymous segments mapped with rw permissions.
Thread stack areas. They look a lot like heap segments, but are usually separated from each other with some unmapped guard pages.
As Nikolai points out, you can look at the list of these with the pmap tool.
Look into /proc/<pid>/maps and /proc/<pid>/smaps (assuming Linux). Also pmap <pid>.
There is no general method for doing this. In fact, some of the secure computing environments randomize the exact address space allocations and order so that code injection attacks are more challenging to engineer.
However, every C runtime library has to arrange the contributions of data and stack segments so the program works correctly. Reading the runtime startup code is the most direct way of finding the answer.
Which C compiler are you interested in?