So I've got an interesting OS based problem for you. I've spent the last few hours conversing with anyone I know who's experienced with C programming, and nobody seems to be able to come up with a definitive answer as to why this behaviour is occurring.
I have a program that is intentionally designed to cause an extreme memory leak, (as an example of what happens when you don't free memory after allocating it). On 64 bit operating systems, (Windows, Linux, etc), it does what it should do. It fills physical ram, then fills the swap space of the OS. In Linux, the process is then terminated by the OS. In Windows however, it is not, and it continues running. The eventual result is a system crash.
Here's the code:
#include <stdlib.h>
#include <stdio.h>
void main()
{
while(1)
{
int *a;
a = (int*)calloc(65536, 4);
}
}
However, if you compile and run this code on a 32 bit Linux distribution, it has no effect on physical memory usage at all. It uses approximately 1% of my 4 GB of allocated RAM, and it never rises after that. I don't have a legitimate copy of 32 Bit Windows to test on, so I can't be certain this occurs on 32 bit Windows as well.
Can somebody please explain why the use of calloc will fill the physical ram of a 64 bit Linux OS, but not a 32 bit Linux OS?
The malloc and calloc functions do not technically allocate memory, despite their name. They actually allocate portions of your program's address space with OS-level read/write permissions. This is a subtle difference and is not relevant most of the time.
This program, as written, only consumes address space. Eventually, calloc will start returning NULL but the program will continue running.
#include <stdlib.h>
// Note main should be int.
int main() {
while (1) {
// Note calloc should not be cast.
int *a = calloc(65536, sizeof(int));
}
}
If you write to the addresses returned from calloc, it will force the kernel to allocate memory to back those addresses.
#include <stdlib.h>
#include <string.h>
int main() {
size_t size = 65536 * 4;
while (1) {
// Allocates address space.
void *p = calloc(size, 1);
// Forces the address space to have allocated memory behind it.
memset(p, 0, size);
}
}
It's not enough to write to a single location in the block returned from calloc because the granularity for allocating actual memory is 4 KiB (the page size... 4 KiB is the most common). So you can get by with just writing to each page.
What about the 64-bit case?
There is some bookkeeping overhead for allocating address space. On a 64-bit system, you get something like 40 or 48 bits of address space, of which about half can be allocated to the program, which comes to at least 8 TiB. On a 32-bit system this comes to 2 GiB or so (depending on kernel configuration).
So on a 64-bit system, you can allocate ~8 TiB, and a 32-bit system you can allocate ~2 GiB, and the overhead is what causes the problems. There is typically a small amount of overhead for each call to malloc or calloc.
See also Why malloc+memset is slower than calloc?
Related
I'm not sure if I'm asking a noob question here, but here I go. I also searched a lot for a similar question, but I got nothing.
So, I know how mmap and brk work and that, regardless of the length you enter, it will round it up to the nearest page boundary. I also know malloc uses brk/sbrk or mmap (At least on Linux/Unix systems) but this raises the question: does malloc also round up to the nearest page size? For me, a page size is 4096 bytes, so if I want to allocate 16 bytes with malloc, 4096 bytes is... a lot more than I asked for.
The basic job of malloc and friends is to manage the fact that the OS can generally only (efficiently) deal with large allocations (whole pages and extents of pages), while programs often need smaller chunks and finer-grained management.
So what malloc (generally) does, is that the first time it is called, it allocates a larger amount of memory from the system (via mmap or sbrk -- maybe one page or maybe many pages), and uses a small amount of that for some data structures to track the heap use (where the heap is, what parts are in use and what parts are free) and then marks the rest of that space as free. It then allocates the memory you requested from that free space and keeps the rest available for subsequent malloc calls.
So the first time you call malloc for eg 16 bytes, it will uses mmap or sbrk to allocate a large chunk (maybe 4K or maybe 64K or maybe 16MB or even more) and initialize that as mostly free and return you a pointer to 16 bytes somewhere. A second call to malloc for another 16 bytes will just return you another 16 bytes from that pool -- no need to go back to the OS for more.
As your program goes ahead mallocing more memory it will just come from this pool, and free calls will return memory to the free pool. If it generally allocates more than it frees, eventually that free pool will run out, and at that point, malloc will call the system (mmap or sbrk) to get more memory to add to the free pool.
This is why if you monitor a process that is allocating and freeing memory with malloc/free with some sort of process monitor, you will generally just see the memory use go up (as the free pool runs out and more memory is requested from the system), and generally will not see it go down -- even though memory is being freed, it generally just goes back to the free pool and is not unmapped or returned to the system. There are some exceptions -- particularly if very large blocks are involved -- but generally you can't rely on any memory being returned to the system until the process exits.
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(void) {
void *a = malloc(1);
void *b = malloc(1);
uintptr_t ua = (uintptr_t)a;
uintptr_t ub = (uintptr_t)b;
size_t page_size = getpagesize();
printf("page size: %zu\n", page_size);
printf("difference: %zd\n", (ssize_t)(ub - ua));
printf("offsets from start of page: %zu, %zu\n",
(size_t)ua % page_size, (size_t)ub % page_size);
}
prints
page_size: 4096
difference: 32
offsets from start of page: 672, 704
So clearly it is not rounded to page size in this case, which proves that it is not always rounded to page size.
It will hit mmap if you change allocation to some arbitrary large size. For example:
void *a = malloc(10000001);
void *b = malloc(10000003);
and I get:
page size: 4096
difference: -10002432
offsets from start of page: 16, 16
And clearly the starting address is still not page aligned; the bookkeeping must be stored below the pointer and the pointer needs to be sufficiently aligned for the largest alignment generally needed - you can reason this with free - if free is just given a pointer but it needs to figure out the size of the allocation, where could it look for it, and only two choices are feasible: in a separate data structure that lists all base pointers and their allocation sizes, or at some offset below the current pointer. And only one of them is sane.
So, I have this piece of code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *p;
long n = 1;
while(1) {
p = malloc(n * sizeof(char));
//p = calloc(n, sizeof(char));
if(p) {
printf("[%ld] Memory allocation successful! Address: %p\n", n , p);
n++;
} else {
printf("No more memory! Sorry...");
break;
}
}
free(p);
getch();
return 0;
}
And I run in on Windows. Interesting thing:
if we use malloc, the program allocates about 430 MB of memory and then stops (photo here => http://i.imgur.com/woswThG.png)
if we use calloc, the program allocates about 2 GB of memory and then stops (photo here => http://i.imgur.com/3JKy5pA.png)
(strange test): if we use both of them in the same time, it uses maximum of (~400MB + ~2GB) / 2 => ~1.2GB
However, if I run the same code on Linux, the allocation goes on and on (after 600k allocations and many GB used it still continues until eventually it is killed) and approximately the same amount of memory is used.
So my question is: shouldn't they allocate the same amount of memory? I thought the only difference was that calloc initialize the memory with zero (malloc returns uninitialized memory). And why it only happens on Windows? It's strange and interesting in the same time.
Hope you can help me with an explanation for this. Thanks!
Edit:
Code::Blocks 13.12 with GNU GCC Compiler
Windows 10 (x64)
Linux Mint 17.2 "Rafaela" - Cinnamon (64-bit) (for Linux testing)
Looking at the program output, you actually allocate the same number of blocks, 65188 for malloc, 65189 for calloc. Ignoring overhead, that's slightly less than 2GB of memory.
My guess is that you compile in 32 bit mode (pointers are dumped as 32 bits), which limits the amount of memory available to a single user process to less than 2GB. The difference on the process map display comes from how your program uses the memory it allocates.
The malloc version does not touch the allocated pages: more than 3 quarters of them are not actually mapped, hence only 430MB.
The calloc version shows 2GB of mapped memory: chances are your C library function calloc clears the allocated memory, even for pages obtained from the OS. This is not optimal, but only visible if you do not touch the allocated memory, a special case anyway. Yet it would be faster to not clear pages obtained from the OS as they are specified to be zero filled anyway.
In Linux, you may be compiling to 64 bits, getting access to much more than 2GB of virtual process space. Since you do not touch the memory, it is not mapped, and the same seems to happen in the calloc case as well. The C runtime is different (64 bit glibc on Linux, 32 bit Microsoft Library on Windows). You should use top or ps in a different terminal in Linux to check how much memory is actually mapped to your process in both cases.
I was reading in a book:
The virtual address space of a process on a 32 bit machine is 2^32 i.e. 4Gb of space. And every address seen in the program is a virtual address. The 4GB of space is further goes through user/kernel split 3-1GB.
To better understand this, I did malloc() of 5Gb space and tried to print the all addresses. If I print the addresses, How is the application going to print whole 5Gb address when It has only 3GB of virtual address space? Am I missing something here?
malloc() takes size_t as an argument. On 32 bit system it's an alias to some unsigned 32 bit integer type. This means that you just cannot pass any value bigger than 2^32-1 as an argument for malloc() making it impossible request allocation of more than 4GB of memory using this function.
The same is true for all other functions that can be used to allocate memory. Ultimately they all end up as either brk() or mmap syscall. The length argument of mmap() is also of type ssize_t an in case of brk() you have to provide a pointer for the new end of your allocated space. The pointer is again 32 bit.
So there is absolutely no way to tell kernel you would like to get more than 4GB of memory allocated with one call) And it's not an accident - this just wouldn't make any sense anyway.
Now it's true that you could do several calls to malloc or other function that allocates memory, requesting more than 4GB in total. If you try this, the subsequent call (that would cause extending allocated memory to more than 3GB) will fail as there is just no address space available.
So I guess that you either didn't check the malloc return value or you did try to run code like this (or something similar):
int main() {
assert(malloc(5*1<<30));
}
and assumed that you succeeded in allocating 5GB without verifying that your argument overflowed and instead of requesting 5368709120 bytes, you requested 1073741824. One example to verify this on Linux is to use:
$ ltrace ./a.out
__libc_start_main(0x804844c, 1, 0xbfbcea74, 0x80484a0, 0x8048490 <unfinished ...>
malloc(1073741824) = 0x77746008
$
There's already a good answer. Just in case, the size of your virtual address space is easily verifiable like this:
#include <stdlib.h>
#include <stdio.h>
int main()
{
size_t size = (size_t)-1L;
void *foo;
printf("trying to allocate %zu bytes\n", size);
if (!(foo = malloc(size)))
{
perror("malloc()");
}
else
{
free(foo);
}
}
> gcc -m32 -omalloc malloc.c && ./malloc
trying to allocate 4294967295 bytes
malloc(): Cannot allocate memory
This must fail because parts of your address space are already occupied: by the mapped part of the kernel, by mapped shared libraries and by your program, of course.
You cannot do this because there is no function for you to alloc 5GB memory.
Is there a limit to the amount of memory that can be allocated from a program? By that I mean, is there any protection from a program, for example, that allocates memory in an infinite loop?
When would the call to malloc() return a NULL pointer?
Yes, there is a limit. What that limit is depends on many factors, including (but not limited to):
The instruction set of the program (32-bit binaries have a smaller address space than 64-bit binaries, for example).
How much memory the system has free. ("Memory" here includes virtual memory.)
Any artificial restrictions set by the system administrator or a privileged process (see, for example, setrlimit() and the (obsolete) ulimit() function).
When memory cannot be allocated, malloc() will return NULL. If the system is completely out of memory, your process may be terminated forcefully.
From Wikipedia,
The largest possible memory block malloc can allocate depends on the
host system, particularly the size of physical memory and the
operating system implementation. Theoretically, the largest number
should be the maximum value that can be held in a size_t type, which
is an implementation-dependent unsigned integer representing the size
of an area of memory. The maximum value is 2CHAR_BIT × sizeof(size_t)
− 1, or the constant SIZE_MAX in the C99 standard.
It depends on the operating system and the standard library.
On Linux,
When you run out of address space, malloc() will return NULL.
When you run out of both physical memory and swap space, the OOM killer will run and kill a process to free memory.
I am tackling this issue in Reverse.
See pointer stores addresses of memory blocks. If we able to find maximum address it can store then we can find memory allocated to our program.
Code
#include <stdio.h>
int main()
{
void *p;
printf("%zu",sizeof(p));
return 0;
}
Output
8
Understanding:
pointer size is 8 bytes.
8 bytes -> 64 bits
Max address it can store/ last memory block address: 2^64-1
Memory block addresses: 0, 1, 2, 3, ... 2^64-1
Memory allocated to program: 2^64 byte
I have a doubt regarding heap in program execution layout diagram of a C program.
I know that all the dynamically allocated memory is allotted in heap which grows dynamically. But I would like to know what is the max heap size for a C program ??
I am just attaching a sample C program ... here I am trying to allocate 1GB memory to string and even doing the memset ...
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char *temp;
mybuffer=malloc(1024*1024*1024*1);
temp = memset(mybuffer,0,(1024*1024*1024*1));
if( (mybuffer == temp) && (mybuffer != NULL))
printf("%x - %x\n", mybuffer, &mybuffer[((1024*1024*1024*1)-1)]]);
else
printf("Wrong\n");
sleep(20);
free(mybuffer);
return 0;
}
If I run above program in 3 instances at once then malloc should fail atleast in one instance [I feel so] ... but still malloc is successfull.
If it is successful can I know how the OS takes care of 3GB of dynamically allocated memory.
Your machine is very probably overcomitting on RAM, and not using the memory until you actually write it. Try writing to each block after allocating it, thus forcing the operating system to ensure there's real RAM mapped to the address malloc() returned.
From the linux malloc page,
BUGS
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available. This is a really bad bug. In
case it turns out that the system is out of memory, one or more pro‐
cesses will be killed by the infamous OOM killer. In case Linux is
employed under circumstances where it would be less desirable to sud‐
denly lose some randomly picked processes, and moreover the kernel ver‐
sion is sufficiently recent, one can switch off this overcommitting
behavior using a command like:
# echo 2 > /proc/sys/vm/overcommit_memory
See also the kernel Documentation directory, files vm/overcommit-
accounting and sysctl/vm.txt.
You're mixing up physical memory and virtual memory.
http://apollo.lsc.vsc.edu/metadmin/references/sag/x1752.html
http://en.wikipedia.org/wiki/Virtual_memory
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
Malloc will allocate the memory but it does not write to any of it. So if the virtual memory is available then it will succeed. It is only when you write something to it will the real memory need to be paged to the page file.
Calloc if memory serves be correctly(!) write zeros to each byte of the allocated memory before returning so will need to allocate the pages there and then.