Why is malloc() considered a library call and not a system call? - c

Why is malloc() considered a standard C library function and not a system call? It seems like the OS is responsible for handling all memory allocation requests.

It would certainly be possible to implement malloc and free as system calls, but it's rarely if ever done that way.
System calls are calls into the OS kernel. For example, on POSIX systems (Linux, UNIX, ...), read and write are system calls. When a C program calls read, it's probably calling a wrapper that does whatever is needed to make a request to the kernel and then return the result to the caller.
It turns out that the most efficient way to do memory management is to use lower-level system calls (see brk and sbrk) to expand the current process's data segment, and then use library calls (malloc, free, etc.) to manage memory within that segment. That management doesn't require any interaction with the kernel; it's all just pointer manipulation performed within the current process. The malloc function will invoke a system call such as brk or sbrk if it needs more memory than is currently available, but many malloc calls won't require any interaction with the kernel at all.
The above is fairly specific to Linux/POSIX/UNIX systems. The details will be a bit different for Windows for example, but the overall design is likely to be similar.
Note that some C standard library functions are typically implemented directly as system calls. time is one example (but as Nick ODell points out in a comment, a time call can often be performed without interacting with the kernel).

It seems like the OS is responsible for handling all memory allocation requests.
Well, both yes and no
It actually depends more on your specific system than it depend on C.
Most OS allocates memory in trunks of some size. Typically called a page. The page size may differ. And on a specific system there may be several supported page-sizes. 4K is a typical page-size on many systems but huge page much bigger may be supported.
But yes... at the end of the day there is only one entity that can allocate memory. The OS. Unless you are on bare-metal where other code can handle it - if even supported.
Why is malloc() considered a standard C library function and not a system call?
The short answer is: Because malloc isn't a OS/systemcall. Period.
To elaborate a bit more. One malloc call may lead to a systemcall but the next malloc may not.
For instance: You request 100 bytes using malloc. malloc may decide to call the OS. The OS gives you 4K. In your next malloc you request 500 byte. Then the "layer in between" can just give the 500 bytes from the trunk already provided by the previous syscall.
So no... memory allocation via malloc may not lead to any syscall for alocation of more memory.
It's all very dependent on your specific system. And the C standard doesn't care.
But malloc is not a syscall. malloc uses other syscalls when needed.

It seems like the OS is responsible for handling all memory allocation requests.
For performance reasons, it's not a good idea to ask the OS for memory every time the program needs memory. There are a few reasons for this:
The OS manages memory in units called pages. Pages are typically 4096 bytes long. (But some architectures or operating systems use larger pages.) The OS can't allocate memory to a process in a chunk smaller than a page.
Imagine you need 10 bytes to store a string. It would be very wasteful to allocate 4096 bytes and only use the first 10. A memory allocator can ask the OS for a page, and slice that page into smaller allocations.
A system call requires a context switch. A context switch is expensive, (~100 ns on x86 systems) relative to calling a function in the same program. Again, it is better to ask for a larger chunk of memory, and re-use it for many allocations.
Why is malloc() considered a library call and not a system call?
For some library calls, like read() the implementation in the library is very simple: it calls the system call of the same name. One call to the library function read() produces one system call to read(). It's reasonable to describe read() as a system call, because all the work is being done in the kernel.
The story with malloc() is more complicated. There's no system call called malloc(), and the library call malloc() will actually use the system calls sbrk(), brk(), or mmap(), depending on the size of your allocation and the implementation you're using. Much of the time, it makes no system call at all!
There are many different choices in how to implement malloc(). For that reason, you'll see many different competing implementations, such as jemalloc, or tcmalloc.

Why is malloc() considered a standard C library function and not a system call?
Because it's part of the C standard library.
It seems like the OS is responsible for handling all memory allocation
requests.
It's not. An operating system typically allocates some memory space for a given process, but how the memory is used after that is up to the process. Using the standard library for things like memory allocation insulates your code from the details of any given operating system, which makes your code a lot more portable. A given implementation of malloc might ultimately make a system call to obtain memory, but whether it does or doesn't or does some of the time is an implementation detail.

Related

How can I get a guarantee that when a memory is freed, the OS will reclaim that memory for it's use?

I noticed that this program:
#include <stdio.h>
int main() {
const size_t alloc_size = 1*1024*1024;
for (size_t i = 0; i < 3; i++) {
printf("1\n");
usleep(1000*1000);
void *p[3];
for (size_t j = 3; j--; )
memset(p[j] = malloc(alloc_size),0,alloc_size); // memset for de-virtualize the memory
usleep(1000*1000);
printf("2\n");
free(p[i]);
p[i] = NULL;
usleep(1000*1000*4);
printf("3\n");
for (size_t j = 3; j--; )
free(p[j]);
}
}
which allocates 3 memories, 3 times and each time frees different memory, frees the memory according to watch free -m, which means that the OS reclaimed the memory for every free regardless of the memory's position inside the program's address space. Can I somehow get a guarantee for this effect? Or is there already anything like that (like a rule of >64KB allocations)?
The short answer is: In general, you cannot guarantee that the OS will reclaim the freed memory, but there may be an OS specific way to do it or a better way to ensure such behavior.
The long answer:
Your code has undefined behavior: there is an extra free(p[i]); after the printf("2\n"); which accesses beyond the end of the p array.
You allocate large blocks (1 MB) for which your library makes individual system calls (for example mmap in linux systems), and free releases these blocks to the OS, hence the observed behavior.
Various OSes are likely to implement such behavior for a system specific threshold (typically 128KB), but the C standard gives guarantee about this, so relying on such behavior is system specific.
Read the manual page for malloc() on your system to see if this behavior can be controlled. For example, the C library on Linux uses an environment variable MMAP_THRESHOLD to override the default setting for this threshold.
If you program to a Posix target, you might want to use mmap() directly instead of malloc to guarantee that the memory is returned to the system once deallocated with munmap(). Note that the block returned by mmap() will have been initialized to all bits zero before the first access, so you may avoid such explicit initialization to take advantage of on demand paging, on perform explicit initialization to ensure the memory is mapped to try and minimize latency in later operations.
On the OSes I know, and especially on linux:
no, you cannot guarantee reuse. Why would you want that? Reuse only happens when someone needs more memory pages, and Linux will then have to pick pages that aren't currently mapped to a process; if these run out, you'll get into swapping. And: you can't make your OS do something that is none of your processes' business. How it internally manages memory allocations is none of the freeing process' business. In fact, that's security-wise a good thing.
What you can do is not only freeing the memory (which might leave it allocated to your process, handled by your libc, for later mallocs), but actually giving it back (man sbrk, man munmap, have fun). That's not something you'd usually do.
Also: this is yet another instantiation of "help, linux ate my RAM"... you misinterpret what free tells you.
For glibc malloc(), read the man 3 malloc man page.
In short, smaller allocations use memory provided by sbrk() to extend the data segment; this is not returned to the OS. Larger allocations (typically 132 KiB or more; you can use MMAP_THRESHOLD on glibc to change the limit) use mmap() to allocate anonymous memory pages (but also include memory allocation bookkeeping on those pages), and when freed, these are usually immediately returned to the OS.
The only case when you should worry about the process returning memory to the OS in a timely manner, is if you have a long-running process, that temporarily does a very large allocation, running on an embedded or otherwise memory-constrained device. Why? Because this stuff has been done in C successfully for decades, and the C library and the OS kernel do handle these cases just fine. It just isn't a practical problem in normal circumstances. You only need to worry about it, if you know it is a practical problem; and it won't be a practical problem except on very specific circumstances.
I personally do routinely use mmap(2) in Linux to map pages for huge data sets. Here, "huge" means "too large to fit in RAM and swap".
Most common case is when I have a truly huge binary data set. Then, I create a (sparse) backing file of suitable size, and memory-map that file. Years ago, in another forum, I showed an example of how to do this with a terabyte data set -- yes, 1,099,511,627,776 bytes -- of which only 250 megabytes or so was actually manipulated in that example, to keep the data file small. The key here in this approach is to use MAP_SHARED | MAP_NORESERVE to ensure the kernel does not use swap memory for this dataset (because it would be insufficient, and fail), but use the file backing directly. We can use madvise() to inform the kernel of our probable access patterns as an optimization, but in most cases it does not have that big of an effect (as the kernel heuristics do a pretty good job of it anyway). We can also use msync() to ensure certain parts are written to storage. (There are certain effects that has wrt. other processes that read the file backing the mapping, especially depending on whether they read it normally, or use options like O_DIRECT; and if shared over NFS or similar, wrt. processes reading the file remotely. It all goes quite complicated very quickly.)
If you do decide to use mmap() to acquire anonymous memory pages, do note that you need to keep track of both the pointer and the length (length being a multiple of page size, sysconf(_SC_PAGESIZE)), so that you can release the mapping later using munmap(). Obviously, this is then completely separate from normal memory allocation (malloc(), calloc(), free()); but unless you try to use specific addresses, the two will not interfere with each other.
If you want memory to be reclaimed by the operating system you need to use operating system services to allocate the memory (which be allocated in pages). Deallocate the memory, you need to call the operating system services that remove pages from your process.
Unless you write your own malloc/free that does this, you are never going to be able to accomplish your goal with off-the-shelf library functions.

How to know the full size of memory allocated for a single program in C?

To check the program total memory allocated at end of a program, Since i used free() function for deallocating an array .
There is no standard way to know that, and the notion of "full size of memory" is not well defined (and its "allocation" could happen outside and independently of malloc, e.g. on Linux by direct calls to mmap(2) etc...)
In practice (assuming your code is running in a process on some common operating system on a desktop or laptop), think instead in terms of virtual address space.
Read Operating Systems: Three Easy Pieces (freely downloadable).
On Linux (but this is Linux specific) you could use /proc/ (see proc(5) for details) to query the kernel about the virtual address space and the status of some process. For a process of pid 1234, see /proc/1234/maps and /proc/1234/status etc.
You could (and probably should) use valgrind to hunt memory leaks.
With GNU glibc, you also have mallinfo(3) & malloc_stats(3) (but they are non-standard) etc...
Be aware that malloc and free uses lower-level system calls such as mmap(2) & munmap (or the older sbrk(2), etc...) to change the virtual address space, but that free usually don't release memory to the kernel with munmap but prefers to keep and mark the freed memory zone for future usage by malloc.
You could use other implementations of malloc if you really wanted to (or even provide your own one). But you generally should not.

What is program break? Where does it start from,0x00?

int brk(void *end_data_segment);
void *sbrk(intptr_t increment);
Calling sbrk() with an increment of
0
can be used to find the current location of the program break.
What is program break? Where does it start from,0x00?
Oversimplifying:
A process has several segments of memory:
Code (text) segment, which contains the code to be executed.
Data segment, which contains data the compiler knows about (globals and statics).
Stack segment, which contains (drumroll...) the stack.
(Of course, nowadays it's much more complex. There is a rodata segment, a uninitialized data segment, mappings allocated via mmap, a vdso, ...)
One traditional way a program can request more memory in a Unix-like OS is to increment the size of the data segment, and use a memory allocator (i.e. malloc() implementation) to manage the resulting space. This is done via the brk() system call, which changes the point where the data segment "breaks"/ends.
A program break is end of the process's data segment. AKA...
the program break is the first
location after the end of the
uninitialized data segment
As to where it starts from, it's system dependent but probably not 0x00.
These days, sbrk(2) (and brk) are nearly obsolete system calls (and you can nearly forget about them and ignore the old notion of break; focus on understanding mmap(2)). Notice that the sbrk(2) man page says in its NOTES :
Avoid using brk() and sbrk(): the malloc(3) memory allocation package
is the portable and comfortable way of allocating memory.
(emphasis mine)
Most implementations of malloc(3) (notably the one in musl-libc) are rather using mmap(2) to require memory - and increase their virtual address space - from the kernel (look at that virtual address space wikipage, it has a nice picture). Some malloc-s use sbrk for small allocations, mmap for large ones.
Use strace(1) to find out the system calls (listed in syscalls(2)) done by some given process or command. BTW you'll then find that bash and ls (and probably many other programs) don't make a single call to sbrk.
Explore the virtual address space of some process by using proc(5). Try cat /proc/$$/maps and cat /proc/self/maps and even cat /proc/$$/smaps and read a bit to understand the output.
Be aware of ASLR & vdso(7).
And sbrk is not very thread friendly.
(my answer focuses on Linux)
You are saying that sbrk() is an obsolute system call and that we should use malloc(), but malloc(), according to her documentation, when allocating less memory than 128 KiB (32 pages) uses it. So we shouldn´t use sbrk() directly, but malloc() use it, if allocation is bigger than 128 KiB then malloc() uses mmap() that allocates private pages to the userspace.
Finally its a good idea to understand sbrk(), at least for understanding the "Program Break" concept.
Based on the following widely used diagram:
program break, which is also known as brk in many articles, points to the address of heap segment's end.
When you call malloc, it changes the address of program break.

malloc()/free() behavior differs between Debian and Redhat

I have a Linux app (written in C) that allocates large amount of memory (~60M) in small chunks through malloc() and then frees it (the app continues to run then). This memory is not returned to the OS but stays allocated to the process.
Now, the interesting thing here is that this behavior happens only on RedHat Linux and clones (Fedora, Centos, etc.) while on Debian systems the memory is returned back to the OS after all freeing is done.
Any ideas why there could be the difference between the two or which setting may control it, etc.?
I'm not certain why the two systems would behave differently (probably different implementations of malloc from different glibc's). However, you should be able to exert some control over the global policy for your process with a call like:
mallopt(M_TRIM_THRESHOLD, bytes)
(See this linuxjournal article for details).
You may also be able to request an immediate release with a call like
malloc_trim(bytes)
(See malloc.h). I believe that both of these calls can fail, so I don't think you can rely on them working 100% of the time. But my guess is that if you try them out you will find that they make a difference.
Some mem handler dont present the memory as free before it is needed. It instead leaves the CPU to do other things then finalize the cleanup. If you wish to confirm that this is true, then just do a simple test and allocate and free more memory in a loop more times than you have memeory available.

Too many calls to mprotect

I am working on a parallel app (C, pthread). I traced the system calls because at some point I have bad parallel performances. My traces shown that my program calls mprotect() many many times ... enough to significantly slow down my program.
I do allocate a lot of memory (with malloc()) but there is only a reasonable number of calls to brk() to increase the heap size. So why so many calls to mprotect() ?!
Are you creating and destroying lots of threads?
Most pthread implementations will add a "guard page" when allocating a threads stack. It's an access protected memory page used to detect stack overflows. I'd expect at least one call to mprotect each time a thread is created or terminated to (un)protect the guard page. If this is the case, there are several obvious strategies:
Set the guard page size to zero using pthread_attr_setguardsize() before creating threads.
Use a thread-pool (of as many threads as processors say). Once a thread is done with a task, return it to the pool to get a new task rather than terminate and create a new thread.
Another explanation might be that you're on a platform where a thread's stack will be grown if overflow is detected. I don't think this is implemented on Linux with GCC/Glibc as yet, but there have been some proposals along these lines recently. If you use a lot of stack space whilst processing, you might explicitely increase the initial/minimum stack size using pthread_attr_setstacksize.
Or it might be something else entirely!
If you can, run your program under a debug libc and break on mprotect(). Look at the call stack, see what your code is doing that's leading to the mprotect() calls.
glibc library that has ptmalloc2 for its malloc uses mprotect() internally for micromanagement of heap for threads other than main thread (for main thread, sbrk() is used instead.) malloc() firstly allocates large chunk of memory with mmap() for the thread if a heap area seems to have contention, and then it changes the protection bits of unnecessary portion to make it accessible with mprotect(). Later, when it needs to grow the heap, it changes the protection to read/writable with mprotect() again. Those mprotect() calls are for heap growth and shrink in multithreaded applications.
http://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf
explains this in a bit more detailed way.
The 'valgrind' suite has a tool called 'callgrind' that will tell you what is calling what. If you run the application under 'callgrind', you can then view the resulting profile data with 'kcachegrind' (it can analyze profiles made by 'cachegrind' or 'callgrind'). Then just double-click on 'mprotect' in the left pane and it will show you what code is calling it and how many times.

Resources