C: Memory usage not reflecting after calloc

C: Memory usage not reflecting after calloc - c

I run
calloc(1024 * 1024 * 1024, sizeof(int));
I check my program's usage and it's zero but I never made a call to free
Edit:
Running Debian Jessie
Edit 2:
I am using the top as the system monitor

Linux does lazy memory allocation. Only when a page fault occurs on a page marked as allocated does Linux actually consider it as being used. Try writing to a byte inside the allocated data and checking the memory usage again. For more information on memory allocation in Linux, check http://www.tldp.org/LDP/tlk/mm/memory.html.
Additionally, even though calloc zeroes the allocated memory, it can still be done in a lazy way that leads to the behavior you described. See How to lazy allocate zeroed memory?.

Your example only allocates a few K, which may be too small to see. But even if you were to ask for a much larger amount of memory, you won't see usage until you actually use the memory.
Linux by default does not actually allocate memory pages until you touch them somehow. Although calloc is supposed to initialize the memory to zero, this does not count as touching it as internally keeps track of uninitialized pages and returns zeros when read.

Related

Large memory usage of aio [duplicate]

Here's my question: Does calling free or delete ever release memory back to the "system". By system I mean, does it ever reduce the data segment of the process?
Let's consider the memory allocator on Linux, i.e ptmalloc.
From what I know (please correct me if I am wrong), ptmalloc maintains a free list of memory blocks and when a request for memory allocation comes, it tries to allocate a memory block from this free list (I know, the allocator is much more complex than that but I am just putting it in simple words). If, however, it fails, it gets the memory from the system using say sbrk or brk system calls. When a memory is free'd, that block is placed in the free list.
Now consider this scenario, on peak load, a lot of objects have been allocated on heap. Now when the load decreases, the objects are free'd. So my question is: Once the object is free'd will the allocator do some calculations to find whether it should just keep this object in the free list or depending upon the current size of the free list it may decide to give that memory back to the system i.e decrease the data segment of the process using sbrk or brk?
Documentation of glibc tells me that if the allocation request is much larger than page size, it will be allocated using mmap and will be directly released back to the system once free'd. Cool. But let's say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what?
From what I know (correct me please), a memory allocated with malloc will never be released back to the system ever until the process ends i.e. the allocator will simply keep it in the free list if I free it. But the question that is troubling me is then, if I use a tool to see the memory usage of my process (I am using pmap on Linux, what do you guys use?), it should always show the memory used at peak load (as the memory is never given back to the system, except when allocated using mmap)? That is memory used by the process should never ever decrease(except the stack memory)? Is it?
I know I am missing something, so please shed some light on all this.
Experts, please clear my concepts regarding this. I will be grateful. I hope I was able to explain my question.

There isn't much overhead for malloc, so you are unlikely to achieve any run-time savings. There is, however, a good reason to implement an allocator on top of malloc, and that is to be able to trace memory leaks. For example, you can free all memory allocated by the program when it exits, and then check to see if your memory allocator calls balance (i.e. same number of calls to allocate/deallocate).
For your specific implementation, there is no reason to free() since the malloc won't release to system memory and so it will only release memory back to your own allocator.
Another reason for using a custom allocator is that you may be allocating many objects of the same size (i.e you have some data structure that you are allocating a lot). You may want to maintain a separate free list for this type of object, and free/allocate only from this special list. The advantage of this is that it will avoid memory fragmentation.

No.
It's actually a bad strategy for a number of reasons, so it doesn't happen --except-- as you note, there can be an exception for large allocations that can be directly made in pages.
It increases internal fragmentation and therefore can actually waste memory. (You can only return aligned pages to the OS, so pulling aligned pages out of a block will usually create two guaranteed-to-be-small blocks --smaller than a page, anyway-- to either side of the block. If this happens a lot you end up with the same total amount of usefully-allocated memory plus lots of useless small blocks.)
A kernel call is required, and kernel calls are slow, so it would slow down the program. It's much faster to just throw the block back into the heap.
Almost every program will either converge on a steady-state memory footprint or it will have an increasing footprint until exit. (Or, until near-exit.) Therefore, all the extra processing needed by a page-return mechanism would be completely wasted.

It is entirely implementation dependent. On Windows VC++ programs can return memory back to the system if the corresponding memory pages contain only free'd blocks.

I think that you have all the information you need to answer your own question. pmap shows the memory that is currenly being used by the process. So, if you call pmap before the process achieves peak memory, then no it will not show peak memory. if you call pmap just before the process exits, then it will show peak memory for a process that does not use mmap. If the process uses mmap, then if you call pmap at the point where maximum memory is being used, it will show peak memory usage, but this point may not be at the end of the process (it could occur anywhere).
This applies only to your current system (i.e. based on the documentation you have provided for free and mmap and malloc) but as the previous poster has stated, behavior of these is implmentation dependent.

This varies a bit from implementation to implementation.
Think of your memory as a massive long block, when you allocate to it you take a bit out of your memory (labeled '1' below):
111
If I allocate more more memory with malloc it gets some from the system:
1112222
If I now free '1':
___2222
It won't be returned to the system, because two is in front of it (and memory is given as a continous block). However if the end of the memory is freed, then that memory is returned to the system. If I freed '2' instead of '1'. I would get:
111
the bit where '2' was would be returned to the system.
The main benefit of freeing memory is that that bit can then be reallocated, as opposed to getting more memory from the system. e.g:
33_2222

I believe that the memory allocator in glibc can return memory back to the system, but whether it will or not depends on your memory allocation patterns.
Let's say you do something like this:
void *pointers[10000];
for(i = 0; i < 10000; i++)
pointers[i] = malloc(1024);
for(i = 0; i < 9999; i++)
free(pointers[i]);
The only part of the heap that can be safely returned to the system is the "wilderness chunk", which is at the end of the heap. This can be returned to the system using another sbrk system call, and the glibc memory allocator will do that when the size of this last chunk exceeds some threshold.
The above program would make 10000 small allocations, but only free the first 9999 of them. The last one should (assuming nothing else has called malloc, which is unlikely) be sitting right at the end of the heap. This would prevent the allocator from returning any memory to the system at all.
If you were to free the remaining allocation, glibc's malloc implementation should be able to return most of the pages allocated back to the system.
If you're allocating and freeing small chunks of memory, a few of which are long-lived, you could end up in a situation where you have a large chunk of memory allocated from the system, but you're only using a tiny fraction of it.

Here are some "advantages" to never releasing memory back to the system:
Having already used a lot of memory makes it very likely you will do so again, and
when you release memory the OS has to do quite a bit of paperwork
when you need it again, your memory allocator has to re-initialise all its data structures in the region it just received
Freed memory that isn't needed gets paged out to disk where it doesn't actually make that much difference
Often, even if you free 90% of your memory, fragmentation means that very few pages can actually be released, so the effort required to look for empty pages isn't terribly well spent

Many memory managers can perform TRIM operations where they return entirely unused blocks of memory to the OS. However, as several posts here have mentioned, it's entirely implementation dependent.
But lets say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what ?
This depends on your allocation pattern. Do you free ALL of the small allocations? If so and if the memory manager has handling for a small block allocations, then this may be possible. However, if you allocate many small items and then only free all but a few scattered items, you may fragment memory and make it impossible to TRIM blocks since each block will have only a few straggling allocations. In this case, you may want to use a different allocation scheme for the temporary allocations and the persistant ones so you can return the temporary allocations back to the OS.

Malloc allocating same block of memory

I used C library malloc to allocate 8MB memory, after using that memory I used free to free the 8MB memory.
But when i used malloc again to allocate 8MB of memory, it is allocating the same location as allocated previously.
How to avoid this problem and why does this occur?
EDIT: I'm implementing a tool to test main memory, if malloc allocates the same block of memory it is not possible to check the whole memory

This is not a problem per se, and is by design. Typical implementations of malloc will recycle blocks of memory for reasons of performance. In any case, since malloc returns addresses from a limited pool of values, there's no way it could guarantee not to recycle blocks.
The only sure fire way to stop malloc returning blocks that is has returned before is to stop freeing them. Of course, that's not really very practical.
I'm implementing a tool to test main memory. If malloc allocates the same block of memory it is not possible to check the whole memory.
Your tool to test main memory cannot be implemented with malloc, or indeed by any user mode program. Modern operating systems don't give you access to physical memory. Rather they present a virtualized view of memory. The addresses in your program are not physical addresses, they are virtual address. Testing physical memory requires you to go in at a much lower level than is possible from a user mode program.

This should help you
How malloc works?
To prevent this, allocate a few bytes using malloc/calloc and then free the bigger chunk of memory.
BTW this is not a wrong behavior to get the same memory address.
You might want to call system() to run a few linux commands (that provide detailed memory mgmt options) from your code.Main memory management cannot be done/tested using malloc/free, they are limited to operating on the memory allocated to your program when it is running.

Is calloc exactly the same as malloc + memset?

In linux, is calloc exactly the same as malloc + memset or does this depend on the exact linux/kernel version?
I am particularly interested in the question of whether you can calloc more RAM than you physically have (as you can certainly malloc more RAM than you physically have, you just can't write to it). In other words, does calloc always actually write to the memory you have been allocated as the specs suggest it should.

Of course, that depends on the implementation, but on a modern day Linux, you probably can. Easiest way is to try it, but I'm saying this based on the following logic.
You can malloc more than the memory you have (physical + virtual) because the kernel delays allocation of your memory until you actually use it. I believe that's to increase the chances of your program not failing due to memory limits, but that's not the question.
calloc is the same as malloc but zero initializes the memory. When you ask Linux for a page of memory, Linux already zero-initializes it. So if calloc can tell that the memory it asked for was just requested from the kernel, it doesn't actually have to zero initialize it! Since it doesn't, there is no access to that memory and therefore it should be able to request more memory than there actually is.
As mentioned in the comments this answer provides a very good explanation.

Whether calloc needs to write to the memory depends on whether it got the allocation from heap pages that are already assigned to the process, or it had to request more memory be assigned to the process by the kernel (using a system call such as sbrk() or mmap()). When the kernel assigns new memory to a process, it always zeroes it first (typically using a VM optimization, so it doesn't actually have to write to the page). But if it's reusing memory that was assigned previously, it has to use memset() to zero it.

It is not mentioned in the cited duplicate or here. Linux uses virtual memory and can allocate more memory that physically available in the system. A naive implementation of calloc() that simply does a malloc() plus memset() in user space will touch every page.
As Linux typically allocates in 4k chunks, all of the calloc() blocks are the same and initially read as zero. That is the same 4k chunk of memory can be mapped read only and the entire calloc() space in only taking up approximately size/4k * pointer_size + 4k. As the program writes to the calloc() space, a page fault happens and Linux will allocate a new page (4k) and resume the program.
This is called copy-on-write or COW for short. malloc() will generally behave the same way. For small sizes, the 'C' library will use binning and share 4k pages with other small sized allocation.
So, there are typically two layers involved.
Linux kernel's process memory management.
glibc heap management.
If the memory size requested is large and requires new memory allocated to the process, then most of the above applies (via Linux's process memory management). However, if the memory requested is small, then it will be like a malloc() plus memset(). In the large allocation size, the memset() is damaging as it touches the memory and the kernel thinks it needs a new page to allocate.

You can't malloc(3) more ram than the kernel gives the process doing the malloc(3)-ing. malloc(3) returns NULL if you can't allocate the amount of memory you want to allocate. In addition, malloc(3) and memset(3) are defined by your c library (libc.so) and not your kernel. The Linux kernel defines mmap(2) and other low-level memory allocation functions, not the *alloc(3) family (excluding kalloc()).

Program heap size?

Is the maximum heap size of a program in C fixed or if I keep malloc-ing it will at some point start to overflow?
Code:
while(connectionOK) //connectionOK is the connection with server which might be forever
{
if(userlookup_IDNotFound(userID))
user_struct* newuser = malloc(getsize(user_struct));
setupUserAccount(newuser);
}
I am using gcc in ubuntu/ linux if that matters.
I know something like getrlimit but not sure if it gives heap size. Although it does give the default stack size for one of the options in the input argument.
Also valgrind is probably a good tool as suggested here how to get Heap size of a program but I want to dynamically print an error message if there is a heap overflow.
My understanding was the process address space being allocated by the OS (which is literally allowed to use the whole memory if it wants to) at the beginning of the process creation but I am not sure if it is dynamically given more physical memory once it requests for additional memory.

The heap never overflows it just runs out of memory at a certain point (usually when malloc() returns NULL) So to detect out of memory just check the return value of the malloc() call.
if (newuser == NULL)
{
printf("OOM\n");
exit(1); /* exit if you want or can't handle being OOM */
}
malloc() internally will request more memory from the OS so it expands dynamically so it's not really fixed size as it will give back pages to the OS that it no longer needs as well as requesting more at any given time that it requires them.

Technically what malloc allocates on most systems is not memory, but address space. On a modern system you can easily allocate several petabytes of address space with malloc and malloc will probably always return a non null pointer. The reason behind this is, that most OS actually perform memory allocation only when a piece of address space is actively modified. As long as it sits there untouched, the OS will just make a note that a certain area of a process address space has been validly reserved for future use.
This kind of bahavior is called "memory overcommitment" and is of importance when maintaining Linux systems. If can happen, that theres more memory allocated than available for some time, and then some program will actually write to some of the overcommited memory. What then happens is, that the so called "Out Of Memory Killer" (OOM killer) will go on a rampage and kills those processes it sees most apropriate for; unfortunately it usually are those processes you don't want to loose under any circumstances. Databases are known to be among the prime targets of the OOM killer.
Because of this, it's strongly recommended to switch of memory overcommitment on high availability Linux boxes. With disabled memory overcommitment disabled, each request for address space must be backed by memory. In that case malloc will actually return 0 if the request can not be fullfilled.

at some point, malloc() will return NULL, when system will run out of memory. then when you try to dereference that, your program will abort executing.
See what happens when you do malloc(SIZE_MAX) a few times :-)

Why malloc+memset is slower than calloc?

It's known that calloc is different than malloc in that it initializes the memory allocated. With calloc, the memory is set to zero. With malloc, the memory is not cleared.
So in everyday work, I regard calloc as malloc+memset.
Incidentally, for fun, I wrote the following code for a benchmark.
The result is confusing.
Code 1:
#include<stdio.h>
#include<stdlib.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
int i=0;
char *buf[10];
while(i<10)
{
buf[i] = (char*)calloc(1,BLOCK_SIZE);
i++;
}
}
Output of Code 1:
time ./a.out
**real 0m0.287s**
user 0m0.095s
sys 0m0.192s
Code 2:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
int i=0;
char *buf[10];
while(i<10)
{
buf[i] = (char*)malloc(BLOCK_SIZE);
memset(buf[i],'\0',BLOCK_SIZE);
i++;
}
}
Output of Code 2:
time ./a.out
**real 0m2.693s**
user 0m0.973s
sys 0m1.721s
Replacing memset with bzero(buf[i],BLOCK_SIZE) in Code 2 produces the same result.
My question is: Why is malloc+memset so much slower than calloc? How can calloc do that?

The short version: Always use calloc() instead of malloc()+memset(). In most cases, they will be the same. In some cases, calloc() will do less work because it can skip memset() entirely. In other cases, calloc() can even cheat and not allocate any memory! However, malloc()+memset() will always do the full amount of work.
Understanding this requires a short tour of the memory system.
Quick tour of memory
There are four main parts here: your program, the standard library, the kernel, and the page tables. You already know your program, so...
Memory allocators like malloc() and calloc() are mostly there to take small allocations (anything from 1 byte to 100s of KB) and group them into larger pools of memory. For example, if you allocate 16 bytes, malloc() will first try to get 16 bytes out of one of its pools, and then ask for more memory from the kernel when the pool runs dry. However, since the program you're asking about is allocating for a large amount of memory at once, malloc() and calloc() will just ask for that memory directly from the kernel. The threshold for this behavior depends on your system, but I've seen 1 MiB used as the threshold.
The kernel is responsible for allocating actual RAM to each process and making sure that processes don't interfere with the memory of other processes. This is called memory protection, it has been dirt common since the 1990s, and it's the reason why one program can crash without bringing down the whole system. So when a program needs more memory, it can't just take the memory, but instead it asks for the memory from the kernel using a system call like mmap() or sbrk(). The kernel will give RAM to each process by modifying the page table.
The page table maps memory addresses to actual physical RAM. Your process's addresses, 0x00000000 to 0xFFFFFFFF on a 32-bit system, aren't real memory but instead are addresses in virtual memory. The processor divides these addresses into 4 KiB pages, and each page can be assigned to a different piece of physical RAM by modifying the page table. Only the kernel is permitted to modify the page table.
How it doesn't work
Here's how allocating 256 MiB does not work:
Your process calls calloc() and asks for 256 MiB.
The standard library calls mmap() and asks for 256 MiB.
The kernel finds 256 MiB of unused RAM and gives it to your process by modifying the page table.
The standard library zeroes the RAM with memset() and returns from calloc().
Your process eventually exits, and the kernel reclaims the RAM so it can be used by another process.
How it actually works
The above process would work, but it just doesn't happen this way. There are three major differences.
When your process gets new memory from the kernel, that memory was probably used by some other process previously. This is a security risk. What if that memory has passwords, encryption keys, or secret salsa recipes? To keep sensitive data from leaking, the kernel always scrubs memory before giving it to a process. We might as well scrub the memory by zeroing it, and if new memory is zeroed we might as well make it a guarantee, so mmap() guarantees that the new memory it returns is always zeroed.
There are a lot of programs out there that allocate memory but don't use the memory right away. Sometimes memory is allocated but never used. The kernel knows this and is lazy. When you allocate new memory, the kernel doesn't touch the page table at all and doesn't give any RAM to your process. Instead, it finds some address space in your process, makes a note of what is supposed to go there, and makes a promise that it will put RAM there if your program ever actually uses it. When your program tries to read or write from those addresses, the processor triggers a page fault and the kernel steps in to assign RAM to those addresses and resumes your program. If you never use the memory, the page fault never happens and your program never actually gets the RAM.
Some processes allocate memory and then read from it without modifying it. This means that a lot of pages in memory across different processes may be filled with pristine zeroes returned from mmap(). Since these pages are all the same, the kernel makes all these virtual addresses point to a single shared 4 KiB page of memory filled with zeroes. If you try to write to that memory, the processor triggers another page fault and the kernel steps in to give you a fresh page of zeroes that isn't shared with any other programs.
The final process looks more like this:
Your process calls calloc() and asks for 256 MiB.
The standard library calls mmap() and asks for 256 MiB.
The kernel finds 256 MiB of unused address space, makes a note about what that address space is now used for, and returns.
The standard library knows that the result of mmap() is always filled with zeroes (or will be once it actually gets some RAM), so it doesn't touch the memory, so there is no page fault, and the RAM is never given to your process.
Your process eventually exits, and the kernel doesn't need to reclaim the RAM because it was never allocated in the first place.
If you use memset() to zero the page, memset() will trigger the page fault, cause the RAM to get allocated, and then zero it even though it is already filled with zeroes. This is an enormous amount of extra work, and explains why calloc() is faster than malloc() and memset(). If you end up using the memory anyway, calloc() is still faster than malloc() and memset() but the difference is not quite so ridiculous.
This doesn't always work
Not all systems have paged virtual memory, so not all systems can use these optimizations. This applies to very old processors like the 80286 as well as embedded processors which are just too small for a sophisticated memory management unit.
This also won't always work with smaller allocations. With smaller allocations, calloc() gets memory from a shared pool instead of going directly to the kernel. In general, the shared pool might have junk data stored in it from old memory that was used and freed with free(), so calloc() could take that memory and call memset() to clear it out. Common implementations will track which parts of the shared pool are pristine and still filled with zeroes, but not all implementations do this.
Dispelling some wrong answers
Depending on the operating system, the kernel may or may not zero memory in its free time, in case you need to get some zeroed memory later. Linux does not zero memory ahead of time, and Dragonfly BSD recently also removed this feature from their kernel. Some other kernels do zero memory ahead of time, however. Zeroing pages during idle isn't enough to explain the large performance differences anyway.
The calloc() function is not using some special memory-aligned version of memset(), and that wouldn't make it much faster anyway. Most memset() implementations for modern processors look kind of like this:
function memset(dest, c, len)
// one byte at a time, until the dest is aligned...
while (len > 0 && ((unsigned int)dest & 15))
*dest++ = c
len -= 1
// now write big chunks at a time (processor-specific)...
// block size might not be 16, it's just pseudocode
while (len >= 16)
// some optimized vector code goes here
// glibc uses SSE2 when available
dest += 16
len -= 16
// the end is not aligned, so one byte at a time
while (len > 0)
*dest++ = c
len -= 1
So you can see, memset() is very fast and you're not really going to get anything better for large blocks of memory.
The fact that memset() is zeroing memory that is already zeroed does mean that the memory gets zeroed twice, but that only explains a 2x performance difference. The performance difference here is much larger (I measured more than three orders of magnitude on my system between malloc()+memset() and calloc()).
Party trick
Instead of looping 10 times, write a program that allocates memory until malloc() or calloc() returns NULL.
What happens if you add memset()?

Because on many systems, in spare processing time, the OS goes around setting free memory to zero on its own and marking it safe for calloc(), so when you call calloc(), it may already have free, zeroed memory to give you.

On some platforms in some modes malloc initialises the memory to some typically non-zero value before returning it, so the second version could well initialize the memory twice