Shortage of Heap memroy on FreeRTOS - heap-memory

I am running my application on Marvell MW300 board, using FreeRTOS V9.0.0.
In my application when I try to connect HTTPS server, mbedtls shows error.
[wm_mbedtls] ssl_tls.c:5431: |1| 0x00121188: alloc(4429 in bytes) (4429 out bytes) failed.
While debugging it is observer that this is due to the shortage of heap memory. Here is the heap memory stat
Heap size ---------------------- : 305536
Free size ---------------------- : 17888
Peak Heap Usage since bootup --- : 291048
Total allocations -------------- : 136
Failed allocations ------------- : 0
Min overhead per allocation ---- : 16
Biggest free block available now : 8040
I print this heap memory info before try to connect my HTTPS server.
It is observed that when a device trying to connect HTTPS server, mbedtls want to allocate two 4429 byte buffer (in and out), but it gets failed because of Biggest free block available is 8040
Here is the code of mbedtls.
/*
* Prepare base structures
*/
if( ( ssl-> in_buf = mbedtls_calloc( 1, MBEDTLS_SSL_IN_BUFFER_LEN( ssl->conf ) ) ) == NULL ||
( ssl->out_buf = mbedtls_calloc( 1, MBEDTLS_SSL_OUT_BUFFER_LEN( ssl->conf ) ) ) == NULL )
{
MBEDTLS_SSL_DEBUG_MSG( 1, ( "alloc(%d in bytes) (%d out bytes) failed", MBEDTLS_SSL_IN_BUFFER_LEN( ssl->conf ),
MBEDTLS_SSL_OUT_BUFFER_LEN( ssl->conf ) ) );
mbedtls_free( ssl->in_buf );
ssl->in_buf = NULL;
return( MBEDTLS_ERR_SSL_ALLOC_FAILED );
}
The Free memory available onboard is 17888.
Is it possible to add some “Free memory” into “free blocks available”?
Or any suggestion, how to handle this issue?
I am using the heap 4 scheme.
Thanks in advance.

According to the log you've posted, you have a total of 305536 bytes available for heap, out which you use 287648 bytes, leaving you with only 17888 bytes of free memory. In other words, you use over 94% of available "dynamic" memory.
For applications that connect over TLS (you've mentioned HTTPS which is HTTP over TLS), having ~18KB of free memory is likely not going to be enough. TLS connections in most TCP/IP stack implementations I've encountered tend to be very "memory-heavy", as they tend to allocate internal buffers used for encryption and decryption of packets. Certificate validation step during TLS handshake also tends not to be very "lightweight", as the server may decide to send you a whole chain of certificates for validation, which may in total be kilobytes in size just by itself. All that needs to be at least temporarily in memory.
To answer your question of Is it possible to add some “Free memory” into “free blocks available?, you have two options:
Get hardware with more RAM: either MCU with more internal RAM or use external RAM chip,
Optimize your application.
First option is self-explanatory. I'd first recommend going for second option. I assume that the numbers you've given (17888 bytes free) are for application that's idle / "does nothing". If so - that's a LOT and you should look into what causes so much RAM to be used in this case. This is going to involve you debugging all parts of your application which may at some point dynamically allocate memory.

Related

About Dynamic Memory Allocation in C [duplicate]

I was trying to figure out how much memory I can malloc to maximum extent on my machine
(1 Gb RAM 160 Gb HD Windows platform).
I read that the maximum memory malloc can allocate is limited to physical memory (on heap).
Also when a program exceeds consumption of memory to a certain level, the computer stops working because other applications do not get enough memory that they require.
So to confirm, I wrote a small program in C:
int main(){
int *p;
while(1){
p=(int *)malloc(4);
if(!p)break;
}
}
I was hoping that there would be a time when memory allocation would fail and the loop would break, but my computer hung as it was an infinite loop.
I waited for about an hour and finally I had to force shut down my computer.
Some questions:
Does malloc allocate memory from HD also?
What was the reason for above behaviour?
Why didn't loop break at any point of time?
Why wasn't there any allocation failure?
I read that the maximum memory malloc can allocate is limited to physical memory (on heap).
Wrong: most computers/OSs support virtual memory, backed by disk space.
Some questions: does malloc allocate memory from HDD also?
malloc asks the OS, which in turn may well use some disk space.
What was the reason for above behavior? Why didn't the loop break at any time?
Why wasn't there any allocation failure?
You just asked for too little at a time: the loop would have broken eventually (well after your machine slowed to a crawl due to the large excess of virtual vs physical memory and the consequent super-frequent disk access, an issue known as "thrashing") but it exhausted your patience well before then. Try getting e.g. a megabyte at a time instead.
When a program exceeds consumption of memory to a certain level, the
computer stops working because other applications do not get enough
memory that they require.
A total stop is unlikely, but when an operation that normally would take a few microseconds ends up taking (e.g.) tens of milliseconds, those four orders of magnitude may certainly make it feel as if the computer had basically stopped, and what would normally take a minute could take a week.
I know this thread is old, but for anyone willing to give it a try oneself, use this code snipped
#include <stdlib.h>
int main() {
int *p;
while(1) {
int inc=1024*1024*sizeof(char);
p=(int*) calloc(1,inc);
if(!p) break;
}
}
run
$ gcc memtest.c
$ ./a.out
upon running, this code fills up ones RAM until killed by the kernel. Using calloc instead of malloc to prevent "lazy evaluation". Ideas taken from this thread:
Malloc Memory Questions
This code quickly filled my RAM (4Gb) and then in about 2 minutes my 20Gb swap partition before it died. 64bit Linux of course.
/proc/sys/vm/overcommit_memory controls the maximum on Linux
On Ubuntu 19.04 for example, we can easily see that malloc is implemented with mmap(MAP_ANONYMOUS by using strace.
Then man proc then describes how /proc/sys/vm/overcommit_memory controls the maximum allocation:
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed".
In mode 1, the kernel pretends there is always enough memory, until memory actually runs out. One use case for this mode is scientific computing applications that em‐ ploy large sparse arrays. In Linux kernel versions before 2.6.0, any nonzero value implies mode 1.
In mode 2 (available since Linux 2.6), the total virtual address space that can be allocated (CommitLimit in /proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) * overcommit_ratio / 100 + total_swap
where:
total_RAM is the total amount of RAM on the system;
total_huge_TLB is the amount of memory set aside for huge pages;
overcommit_ratio is the value in /proc/sys/vm/overcommit_ratio; and
total_swap is the amount of swap space.
For example, on a system with 16GB of physical RAM, 16GB of swap, no space dedicated to huge pages, and an overcommit_ratio of 50, this formula yields a Com‐ mitLimit of 24GB.
Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit is instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
See also the description of /proc/sys/vm/admiin_reserve_kbytes and /proc/sys/vm/user_reserve_kbytes.
Documentation/vm/overcommit-accounting.rst in the 5.2.1 kernel tree also gives some information, although lol a bit less:
The Linux kernel supports the following overcommit handling modes
0 Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.
1 Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays and
just relying on the virtual memory consisting almost entirely
of zero pages.
2 Don't overcommit. The total address space commit for the
system is not permitted to exceed swap + a configurable amount
(default is 50%) of physical RAM. Depending on the amount you
use, in most situations this means a process will not be
killed while accessing pages but will receive errors on memory
allocation as appropriate.
Useful for applications that want to guarantee their memory
allocations will be available in the future without having to
initialize every page.
Minimal experiment
We can easily see the maximum allowed value with:
main.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv) {
char *chars;
size_t nbytes;
/* Decide how many ints to allocate. */
if (argc < 2) {
nbytes = 2;
} else {
nbytes = strtoull(argv[1], NULL, 0);
}
/* Allocate the bytes. */
chars = mmap(
NULL,
nbytes,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
/* This can happen for example if we ask for too much memory. */
if (chars == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* Free the allocated memory. */
munmap(chars, nbytes);
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run to allocate 1GiB and 1TiB:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out 0x40000000
./main.out 0x10000000000
We can then play around with the allocation value to see what the system allows.
I can't find a precise documentation for 0 (the default), but on my 32GiB RAM machine it does not allow the 1TiB allocation:
mmap: Cannot allocate memory
If I enable unlimited overcommit however:
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
then the 1TiB allocation works fine.
Mode 2 is well documented, but I'm lazy to carry out precise calculations to verify it. But I will just point out that in practice we are allowed to allocate about:
overcommit_ratio / 100
of total RAM, and overcommit_ratio is 50 by default, so we can allocate about half of total RAM.
VSZ vs RSS and the out-of-memory killer
So far, we have just allocated virtual memory.
However, at some point of course, if you use enough of those pages, Linux will have to start killing some processes.
I have illustrated that in detail at: What is RSS and VSZ in Linux memory management
Try this
#include <stdlib.h>
#include <stdio.h>
main() {
int Mb = 0;
while (malloc(1<<20)) ++Mb;
printf("Allocated %d Mb total\n", Mb);
}
Include stdlib and stdio for it.
This extract is taken from deep c secrets.
malloc does its own memory management, managing small memory blocks itself, but ultimately it uses the Win32 Heap functions to allocate memory. You can think of malloc as a "memory reseller".
The windows memory subsystem comprises physical memory (RAM) and virtual memory (HD). When physical memory becomes scarce, some of the pages can be copied from physical memory to virtual memory on the hard drive. Windows does this transparently.
By default, Virtual Memory is enabled and will consume the available space on the HD. So, your test will continue running until it has either allocated the full amount of virtual memory for the process (2GB on 32-bit windows) or filled the hard disk.
As per C90 standard guarantees that you can get at least one object 32 kBytes in size, and this may be static, dynamic, or automatic memory. C99 guarantees at least 64 kBytes. For any higher limit, refer your compiler's documentation.
Also, malloc's argument is a size_t and the range of that type is [0,SIZE_MAX], so the maximum you can request is SIZE_MAX, which value varies upon implementation and is defined in <limits.h>.
I don't actually know why that failed, but one thing to note is that `malloc(4)" may not actually give you 4 bytes, so this technique is not really an accurate way to find your maximum heap size.
I found this out from my question here.
For instance, when you declare 4 bytes of memory, the space directly before your memory could contain the integer 4, as an indication to the kernel of how much memory you asked for.
Does malloc allocate memory from HD also?
Implementation of malloc() depends on libc implementation and operating system (OS). Typically malloc() doesn't always request RAM from the OS but returns a pointer to previously allocated memory block "owned" by libc.
In case of POSIX compatible systems, this libc controlled memory area is usually increased using syscall brk(). That doesn't allow releasing any memory between two still existing allocations which causes the process to look still using all the RAM after allocating areas A, B, C in sequence and releasing B. This is because areas A and C around the area B are still in use so the memory allocated from the OS cannot be returned.
Many modern malloc() implementations have some kind of heuristic where small allocations use the memory area reserved via brk() and "big" allocations use anonymous virtual memory blocks reserved via mmap() using MAP_ANONYMOUS flag. This allows immediately returning these big allocations when free() is later called. Typically the runtime performance of mmap() is slightly slower than using previously reserved memory which is the reason malloc() implements this heuristic.
Both brk() and mmap() allocate virtual memory from the OS. And virtual memory can be always backed up by swap which may be stored in any storage that the OS supports, including HDD.
In case you run Windows, the syscalls have different names but the underlying behavior is probably about the same.
What was the reason for above behaviour?
Since your example code never touched the memory, I'd guess you're seeing behavior where OS implements copy-on-write for virtual RAM and the memory is mapped to shared page with whole page filled with zeroes by default. Modern operating systems do this because many programs allocate more RAM than they actually need and using shared zero page by default for all memory allocations avoids needing to use real RAM for these allocations.
If you want to test how OS handles your loop and actually reserve true storage, you need to write something to the memory you allocated. For x86 compatible hardware you only need to write one byte per each 4096 byte segment because page size is 4096 and the hardware cannot implement copy-on-write behavior for smaller segments; once one byte is modified, the whole 4096 byte segment called page must be reserved for your process. I'm not aware of any modern CPU that would support smaller than 4096 byte pages. Modern Intel CPUs support 2 MB and 1 GB pages in addition to 4096 byte pages but the 1 GB pages are rarely used because the overhead of using 2 MB pages is small enough for any sensible RAM amounts. 1 GB pages might make sense if your system has hundreds of terabytes of RAM.
So basically your program only tested reserving virtual memory without ever using said virtual memory. Your OS probably has special optimization for this which avoids needing more than 4 KB of RAM to support this.
Unless your objective is to try to measure the overhead caused by your malloc() implementation, you should avoid trying to allocate memory block smaller than 16-32 bytes. For mmap() allocations the minimum possible overhead is 8 bytes per allocation on x86-64 hardware due the data needed to return the memory to the operating system so it really doesn't make sense for malloc() to use mmap() syscall for a single 4 byte allocation.
The overhead is needed to keep track of memory allocations because the memory is freed using void free(void*) so memory allocation routines must keep track of the allocated memory segment size somewhere. Many malloc() implementations also need additional metadata and if they need to keep track of any memory addresses, those need 8 bytes per address.
If you truly want to search for the limits of your system, you should probably do binary search for the limit where malloc() fails. In practice, you try to allocate ..., 1KB, 2KB, 4KB, 8KB, ..., 32 GB which then fails and you know that the real world limit is between 16 GB and 32 GB. You can then split this size in half and figure out the exact limit with additional testing. If you do this kind of search, it may be easier to always release any successful allocation and reserve the test block with a single malloc() call. That should also avoid accidentally accounting for malloc() overhead so much because you need only one allocation at any time at max.
Update: As pointed out by Peter Cordes in the comments, your malloc() implementation may be writing bookkeeping data about your allocations in the reserved RAM which causes real memory to be used and that can cause system to start swapping so heavily that you cannot recover it in any sensible timescale without shutting down the computer. In case you're running Linux and have enabled "Magic SysRq" keys, you could just press Alt+SysRq+f to kill the offending process taking all the RAM and system would run just fine again. It is possible to write malloc() implementation that doesn't usually touch the RAM allocated via brk() and I assumed you would be using one. (This kind of implementation would allocate memory in 2^n sized segments and all similarly sized segments are reserved in the same range of addresses. When free() is later called, the malloc() implementation knows the size of the allocation from the address and bookkeeping about free memory segments are kept in separate bitmap in single location.) In case of Linux, malloc() implementation touching the reserved pages for internal bookkeeping is called dirtying the memory, which prevents sharing memory pages because of copy-on-write handling.
Why didn't loop break at any point of time?
If your OS implements the special behavior described above and you're running 64-bit system, you're not going to run out of virtual memory in any sensible timescale so your loop seems infinite.
Why wasn't there any allocation failure?
You didn't actually use the memory so you're allocating virtual memory only. You're basically increasing the maximum pointer value allowed for your process but since you never access the memory, the OS never bothers the reserve any physical memory for your process.
In case you're running Linux and want the system to enforce virtual memory usage to match actually available memory, you have to write 2 to kernel setting /proc/sys/vm/overcommit_memory and maybe adjust overcommit_ratio, too. See https://unix.stackexchange.com/q/441364/20336 for details about memory overcommit on Linux. As far as I know, Windows implements overcommit, too, but I don't know how to adjust its behavior.
when first time you allocate any size to *p, every next time you leave that memory to be unreferenced. That means
at a time your program is allocating memory of 4 bytes only
. then how can you thing you have used entire RAM, that's why SWAP device( temporary space on HDD) is out of discussion. I know an memory management algorithm in which when no one program is referencing to memory block, that block is eligible to allocate for programs memory request. That's why you are just keeping busy to RAM Driver and that's why it can't give chance to service other programs. Also this a dangling reference problem.
Ans : You can at most allocate the memory of your RAM size. Because no program has access to swap device.
I hope your all questions has got satisfactory answers.

Malloc is using 10x the amount of memory necessary

I have a network application which allocates predicable 65k chunks as part of the IO subsystem. The memory usage is tracked atomically within the system so I know how much memory I'm actually using. This number can also be checked against malloc_stats()
Result of malloc_stats()
Arena 0:
system bytes = 1617920
in use bytes = 1007840
Arena 1:
system bytes = 2391826432
in use bytes = 247265696
Arena 2:
system bytes = 2696175616
in use bytes = 279997648
Arena 3:
system bytes = 6180864
in use bytes = 6113920
Arena 4:
system bytes = 16199680
in use bytes = 699552
Arena 5:
system bytes = 22151168
in use bytes = 899440
Arena 6:
system bytes = 8765440
in use bytes = 910736
Arena 7:
system bytes = 16445440
in use bytes = 11785872
Total (incl. mmap):
system bytes = 935473152
in use bytes = 619758592
max mmap regions = 32
max mmap bytes = 72957952
Items to note:
The total in use bytes is completely correct number according to my internal counter. However, the application has a RES (from top/htop) of 5.2GB. The allocations are almost always 65k; I don't understand the huge amount of fragmentation/waste I am seeing even more so when mmap comes into play.
total system bytes does not equal to the sum of system bytes in each Arena.
I'm on Ubuntu 16.04 using glibc 2.23-0ubuntu3
Arena 1 and 2 account for the large RES value the kernel is reporting.
Arena 1 and 2 are holding on to 10x the amount of memory that is used.
The mass majority of allocations are ALWAYS 65k (explicit multiple of the page size)
How do I keep malloc for allocating an absurd amount of memory?
I think this version of malloc has a huge bug. Eventually (after an hour) a little more than half of the memory will be released. This isn't a fatal bug but it is definitely a problem.
UPDATE - I added mallinfo and re-ran the test - the app is no longer processing anything at the time this was captured. No network connections are attached. It is idle.
Arena 2:
system bytes = 2548473856
in use bytes = 3088112
Arena 3:
system bytes = 3288600576
in use bytes = 6706544
Arena 4:
system bytes = 16183296
in use bytes = 914672
Arena 5:
system bytes = 24027136
in use bytes = 911760
Arena 6:
system bytes = 15110144
in use bytes = 643168
Arena 7:
system bytes = 16621568
in use bytes = 11968016
Total (incl. mmap):
system bytes = 1688858624
in use bytes = 98154448
max mmap regions = 32
max mmap bytes = 73338880
arena (total amount of memory allocated other than mmap) = 1617780736
ordblks (number of ordinary non-fastbin free blocks) = 1854
smblks (number of fastbin free blocks) = 21
hblks (number of blocks currently allocated using mmap) = 31
hblkhd (number of bytes in blocks currently allocated using mmap) = 71077888
usmblks (highwater mark for allocated space) = 0
fsmblks (total number of bytes in fastbin free blocks) = 1280
uordblks (total number of bytes used by in-use allocations) = 27076560
fordblks (total number of bytes in free blocks) = 1590704176
keepcost (total amount of releaseable free space at the top of the heap) = 439216
My hypothesis is as follows: The difference between the total system bytes reported by malloc is much less than the amount reported in each arena. (1.6Gb vs 6.1GB) This could mean that (A) malloc is actually releasing blocks but the arena doesn't or (B) that malloc is not compacting memory allocations at all and it is creating huge amount of fragmentation.
UPDATE Ubuntu released a kernel update which basically fixed everything as described in this post. That said, there is a lot of good information in here on how malloc works with the kernel.
The full details can be a bit complex, so I'll try to simplify things as much as I can. Also, this is a rough outline and may be slightly inaccurate in places.
Requesting memory from the kernel
malloc uses either sbrk or anonymous mmap to request a contiguous memory area from the kernel. Each area will be a multiple of the machine's page size, typically 4096 bytes. Such a memory area is called an arena in malloc terminology. More on that below.
Any pages so mapped become part of the process's virtual address space. However, even though they have been mapped in, they may not be backed up by a physical RAM page [yet]. They are mapped [many-to-one] to the single "zero" page in R/O mode.
When the process tries to write to such a page, it incurs a protection fault, the kernel breaks the mapping to the zero page, allocates a real physical page, remaps to it, and the process is restarted at the fault point. This time the write succeeds. This is similar to demand paging to/from the paging disk.
In other words, page mapping in a process's virtual address space is different than page residency in a physical RAM page/slot. More on this later.
RSS (resident set size)
RSS is not really a measure of how much memory a process allocates or frees, but how many pages in its virtual address space have a physical page in RAM at the present time.
If the system has a paging disk of 128GB, but only had (e.g.) 4GB of RAM, a process RSS could never exceed 4GB. The process's RSS goes up/down based upon paging in or paging out pages in its virtual address space.
So, because of the zero page mapping at start, a process RSS may be much lower than the amount of virtual memory it has requested from the system. Also, if another process B "steals" a page slot from a given process A, the RSS for A goes down and goes up for B.
The process "working set" is the minimum number of pages the kernel must keep resident for the process to prevent the process from excessively page faulting to get a physical memory page, based on some measure of "excessively". Each OS has its own ideas about this and it's usually a tunable parameter on a system-wide or per-process basis.
If a process allocates a 3GB array, but only accesses the first 10MB of it, it will have a lower working set than if it randomly/scattershot accessed all parts of the array.
That is, if the RSS is higher [or can be higher] than the working set, the process will run well. If the RSS is below the working set, the process will page fault excessively. This can be either because it has poor "locality of reference" or because other events in the system conspire to "steal" the process's page slots.
malloc and arenas
To cut down on fragmentation, malloc uses multiple arenas. Each arena has a "preferred" allocation size (aka "chunk" size). That is, smaller requests like malloc(32) come from (e.g.) arena A, but larger requests like malloc(1024 * 1024) come from a different arena (e.g.) arena B.
This prevents a small allocation from "burning" the first 32 bytes of the last available chunk in arena B, making it too short to satisfy the next malloc(1M)
Of course, we can't have a separate arena for each requested size, so the "preferred" chunk sizes are typically some power of 2.
When creating a new arena for a given chunk size, malloc doesn't just request an area of the chunk size, but some multiple of it. It does this so it can quickly satisfy subsequent requests of the same size without having to do an mmap for each one. Since the minimum size is 4096, arena A will have 4096/32 chunks or 128 chunks available.
free and munmap
When an application does a free(ptr) [ptr represents a chunk], the chunk is marked as available. free could choose to combine contiguous chunks that are free/available at that time or not.
If the chunk is small enough, it does nothing more (i.e.) the chunk is available for reallocation, but, free does not try to release the chunk back to the kernel. For larger allocations, free will [try to] do munmap immediately.
munmap can unmap a single page [or even a small number of bytes], even if comes in the middle of an area that was multiple pages long. If so, the application now has a "hole" in the mapping.
malloc_trim and madvise
If free is called, it probably calls munmap. If an entire page has been unmapped, the RSS of the process (e.g. A) goes down.
But, consider chunks that are still allocated, or chunks that were marked as free/available but were not unmapped.
They are still part of the process A's RSS. If another process (e.g. B) starts doing lots of allocations, the system may have to page out some of process A's slots to the paging disk [reducing A's RSS] to make room for B [whose RSS goes up].
But, if there is no process B to steal A's page slots, process A's RSS can remain high. Say process A allocated 100MB, used it a while back, but is only actively using 1MB now, the RSS will remain at 100MB.
That's because without the "interference" from process B, the kernel had no reason to steal any page slots from A, so they "remain on the books" in the RSS.
To tell the kernel that a memory area is not likely to be used soon, we need the madvise syscall with MADV_WONTNEED. This tells the kernel that the memory area is low priority and it should [more] aggressively page it out to the paging disk, thereby reducing the process's RSS.
The pages remain mapped in the process's virtual address space, but get farmed out to the paging disk. Remember, page mapping is different than page residency.
If the process accesses the page again, it incurs a page fault and the kernel will pull in the data from paging disk to a physical RAM slot and remap. The RSS goes back up. Classical demand paging.
madvise is what malloc_trim uses to reduce the RSS of the process.
free does not promise to return the freed memory to the OS.
What you observe is the freed memory is kept in the process for possible reuse. More than that, free releasing memory to the OS can pose a performance problem when allocation and deallocation of large chunks happen frequently. This is why there is an option to return the memory to the OS explicitly with malloc_trim.
Try malloc_trim(0) and see if that reduces the RSS. This function is non-standard, so its behaviour is implementation specific, it might not do anything at all. You mentioned in the comments that calling it did reduce RSS.
You may like to make sure that there are no memory leaks and memory corruption before you start digging deeper.
With regards to keepcost member, see man mallinfo:
BUGS
Information is returned for only the main memory allocation area.
Allocations in other arenas are excluded. See malloc_stats(3) and malloc_info(3) for alternatives that include information about other arenas.

Exhaust memory usage with malloc and sleep in C [duplicate]

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).
from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.
No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.
This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n
Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);
1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

Memory Leak Using malloc fails

I am writing a program to leak memory( main memory ) to test how the system behaves with low system memory and swap memory. We are using the following loop which runs periodically and leaks memory
main(int argc, char* argv[] )
{
int arg_mem = argv[1];
while(1)
{
u_int_ptr =(unsigned int*) malloc(arg_mem * 1024 * 1024);
if( u_int_ptr == NULL )
printf("\n leakyapp Daemon FAILED due to insufficient available memory....");
sleep( arg_time );
}
}
Above loop runs for sometime and prints the message "leakyapp Daemon FAILED due to insufficient available memory...." . But when I run the command "free" I can see that running this program has no effect either on Main memory or Swap.
Am I doing something wrong ?
Physical memory is not committed to your allocations until you actually write into it.
If you have a kernel version after 2.6.23, use mmap() with the MAP_POPULATE flag instead of malloc():
u_int_ptr = mmap(NULL, arg_mem * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
if (u_int_ptr == MAP_FAILED)
/* ... */
If you have an older kernel, you'll have to touch each page in the allocation.
There might be some sort of copy-on-write optimization. I would suggest actually writing something to the memory you are allocating.
What is happening is that malloc requests argmem * 256 pages from the heap (assuming a 4 Kbyte page size). The heap in turn requests the memory from the operating system. However, all that does is create entries in the page table for the newly allocated memory block. No actual physical RAM is allocated to the process, except that required by the heap to track the malloc request.
As soon as the process tries to access one of those pages by reading or writing, a page fault is generated because the entry in the page table is effectively a dangling pointer. The operating system will then allocate a physical page to the process. It's only then that you'll see the available physical memory go down.
Since all new pages start completely zeroed out, Linux might employ a "copy on write" strategy to optimise page allocation. i.e. it might keep a single page totally zeroed and always allocate that one when a process tries to read from a previously unused page. Only when the process tries to write to that new page would it actually allocate a completely fresh page from physical RAM. I don't know if Linux actually does this, but if it does, merely reading from a new page is not going to be enough to increase physical memory usage.
So, your best strategy is to allocate your large block of RAM and then write something at 4096 byte intervals throughout it.
What does ulimit -m -v print?
Explanation: On any server OS, you can limit the amount of resources a process can allocate to make sure that a single runaway process can't bring down the whole machine.
I'm guessing (based on the command line argument) that you're using a desktop/server OS and not an embedded system.
Allocating memory like this is probably not consuming much RAM. Your memory allocation might not have even succeeded - on some OSs (e.g. Linux), malloc() can return non-NULL even when you ask for more memory than is available.
Without knowing what your OS is and exactly what you're trying to test, it's difficult to suggest anything specific, but you might want to look at more low level ways of allocating memory than malloc(), or ways of controlling the virtual memory system. On Linux you might want to look at mlock().
I think caf already explained it. Linux is usually configured to allow overcommitting memory. You allocate huge chunks of memory, but internally there happens nothing but just making a note that you process wants this huge chunk of memory. It's not before you try to write that chunk, that the kernel tries to find free virtual memory to satisfy the read/write access. This is a bit like flight booking: Airlines usually overbook the flights, because there's always a percentage of passengers who do not show up.
You can force the memory to be committed by writing to the chunk with memset() after allocation. calloc should work too.

Maximum memory which malloc can allocate

I was trying to figure out how much memory I can malloc to maximum extent on my machine
(1 Gb RAM 160 Gb HD Windows platform).
I read that the maximum memory malloc can allocate is limited to physical memory (on heap).
Also when a program exceeds consumption of memory to a certain level, the computer stops working because other applications do not get enough memory that they require.
So to confirm, I wrote a small program in C:
int main(){
int *p;
while(1){
p=(int *)malloc(4);
if(!p)break;
}
}
I was hoping that there would be a time when memory allocation would fail and the loop would break, but my computer hung as it was an infinite loop.
I waited for about an hour and finally I had to force shut down my computer.
Some questions:
Does malloc allocate memory from HD also?
What was the reason for above behaviour?
Why didn't loop break at any point of time?
Why wasn't there any allocation failure?
I read that the maximum memory malloc can allocate is limited to physical memory (on heap).
Wrong: most computers/OSs support virtual memory, backed by disk space.
Some questions: does malloc allocate memory from HDD also?
malloc asks the OS, which in turn may well use some disk space.
What was the reason for above behavior? Why didn't the loop break at any time?
Why wasn't there any allocation failure?
You just asked for too little at a time: the loop would have broken eventually (well after your machine slowed to a crawl due to the large excess of virtual vs physical memory and the consequent super-frequent disk access, an issue known as "thrashing") but it exhausted your patience well before then. Try getting e.g. a megabyte at a time instead.
When a program exceeds consumption of memory to a certain level, the
computer stops working because other applications do not get enough
memory that they require.
A total stop is unlikely, but when an operation that normally would take a few microseconds ends up taking (e.g.) tens of milliseconds, those four orders of magnitude may certainly make it feel as if the computer had basically stopped, and what would normally take a minute could take a week.
I know this thread is old, but for anyone willing to give it a try oneself, use this code snipped
#include <stdlib.h>
int main() {
int *p;
while(1) {
int inc=1024*1024*sizeof(char);
p=(int*) calloc(1,inc);
if(!p) break;
}
}
run
$ gcc memtest.c
$ ./a.out
upon running, this code fills up ones RAM until killed by the kernel. Using calloc instead of malloc to prevent "lazy evaluation". Ideas taken from this thread:
Malloc Memory Questions
This code quickly filled my RAM (4Gb) and then in about 2 minutes my 20Gb swap partition before it died. 64bit Linux of course.
/proc/sys/vm/overcommit_memory controls the maximum on Linux
On Ubuntu 19.04 for example, we can easily see that malloc is implemented with mmap(MAP_ANONYMOUS by using strace.
Then man proc then describes how /proc/sys/vm/overcommit_memory controls the maximum allocation:
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed".
In mode 1, the kernel pretends there is always enough memory, until memory actually runs out. One use case for this mode is scientific computing applications that em‐ ploy large sparse arrays. In Linux kernel versions before 2.6.0, any nonzero value implies mode 1.
In mode 2 (available since Linux 2.6), the total virtual address space that can be allocated (CommitLimit in /proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) * overcommit_ratio / 100 + total_swap
where:
total_RAM is the total amount of RAM on the system;
total_huge_TLB is the amount of memory set aside for huge pages;
overcommit_ratio is the value in /proc/sys/vm/overcommit_ratio; and
total_swap is the amount of swap space.
For example, on a system with 16GB of physical RAM, 16GB of swap, no space dedicated to huge pages, and an overcommit_ratio of 50, this formula yields a Com‐ mitLimit of 24GB.
Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit is instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
See also the description of /proc/sys/vm/admiin_reserve_kbytes and /proc/sys/vm/user_reserve_kbytes.
Documentation/vm/overcommit-accounting.rst in the 5.2.1 kernel tree also gives some information, although lol a bit less:
The Linux kernel supports the following overcommit handling modes
0 Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.
1 Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays and
just relying on the virtual memory consisting almost entirely
of zero pages.
2 Don't overcommit. The total address space commit for the
system is not permitted to exceed swap + a configurable amount
(default is 50%) of physical RAM. Depending on the amount you
use, in most situations this means a process will not be
killed while accessing pages but will receive errors on memory
allocation as appropriate.
Useful for applications that want to guarantee their memory
allocations will be available in the future without having to
initialize every page.
Minimal experiment
We can easily see the maximum allowed value with:
main.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv) {
char *chars;
size_t nbytes;
/* Decide how many ints to allocate. */
if (argc < 2) {
nbytes = 2;
} else {
nbytes = strtoull(argv[1], NULL, 0);
}
/* Allocate the bytes. */
chars = mmap(
NULL,
nbytes,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
/* This can happen for example if we ask for too much memory. */
if (chars == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* Free the allocated memory. */
munmap(chars, nbytes);
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run to allocate 1GiB and 1TiB:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out 0x40000000
./main.out 0x10000000000
We can then play around with the allocation value to see what the system allows.
I can't find a precise documentation for 0 (the default), but on my 32GiB RAM machine it does not allow the 1TiB allocation:
mmap: Cannot allocate memory
If I enable unlimited overcommit however:
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
then the 1TiB allocation works fine.
Mode 2 is well documented, but I'm lazy to carry out precise calculations to verify it. But I will just point out that in practice we are allowed to allocate about:
overcommit_ratio / 100
of total RAM, and overcommit_ratio is 50 by default, so we can allocate about half of total RAM.
VSZ vs RSS and the out-of-memory killer
So far, we have just allocated virtual memory.
However, at some point of course, if you use enough of those pages, Linux will have to start killing some processes.
I have illustrated that in detail at: What is RSS and VSZ in Linux memory management
Try this
#include <stdlib.h>
#include <stdio.h>
main() {
int Mb = 0;
while (malloc(1<<20)) ++Mb;
printf("Allocated %d Mb total\n", Mb);
}
Include stdlib and stdio for it.
This extract is taken from deep c secrets.
malloc does its own memory management, managing small memory blocks itself, but ultimately it uses the Win32 Heap functions to allocate memory. You can think of malloc as a "memory reseller".
The windows memory subsystem comprises physical memory (RAM) and virtual memory (HD). When physical memory becomes scarce, some of the pages can be copied from physical memory to virtual memory on the hard drive. Windows does this transparently.
By default, Virtual Memory is enabled and will consume the available space on the HD. So, your test will continue running until it has either allocated the full amount of virtual memory for the process (2GB on 32-bit windows) or filled the hard disk.
As per C90 standard guarantees that you can get at least one object 32 kBytes in size, and this may be static, dynamic, or automatic memory. C99 guarantees at least 64 kBytes. For any higher limit, refer your compiler's documentation.
Also, malloc's argument is a size_t and the range of that type is [0,SIZE_MAX], so the maximum you can request is SIZE_MAX, which value varies upon implementation and is defined in <limits.h>.
I don't actually know why that failed, but one thing to note is that `malloc(4)" may not actually give you 4 bytes, so this technique is not really an accurate way to find your maximum heap size.
I found this out from my question here.
For instance, when you declare 4 bytes of memory, the space directly before your memory could contain the integer 4, as an indication to the kernel of how much memory you asked for.
Does malloc allocate memory from HD also?
Implementation of malloc() depends on libc implementation and operating system (OS). Typically malloc() doesn't always request RAM from the OS but returns a pointer to previously allocated memory block "owned" by libc.
In case of POSIX compatible systems, this libc controlled memory area is usually increased using syscall brk(). That doesn't allow releasing any memory between two still existing allocations which causes the process to look still using all the RAM after allocating areas A, B, C in sequence and releasing B. This is because areas A and C around the area B are still in use so the memory allocated from the OS cannot be returned.
Many modern malloc() implementations have some kind of heuristic where small allocations use the memory area reserved via brk() and "big" allocations use anonymous virtual memory blocks reserved via mmap() using MAP_ANONYMOUS flag. This allows immediately returning these big allocations when free() is later called. Typically the runtime performance of mmap() is slightly slower than using previously reserved memory which is the reason malloc() implements this heuristic.
Both brk() and mmap() allocate virtual memory from the OS. And virtual memory can be always backed up by swap which may be stored in any storage that the OS supports, including HDD.
In case you run Windows, the syscalls have different names but the underlying behavior is probably about the same.
What was the reason for above behaviour?
Since your example code never touched the memory, I'd guess you're seeing behavior where OS implements copy-on-write for virtual RAM and the memory is mapped to shared page with whole page filled with zeroes by default. Modern operating systems do this because many programs allocate more RAM than they actually need and using shared zero page by default for all memory allocations avoids needing to use real RAM for these allocations.
If you want to test how OS handles your loop and actually reserve true storage, you need to write something to the memory you allocated. For x86 compatible hardware you only need to write one byte per each 4096 byte segment because page size is 4096 and the hardware cannot implement copy-on-write behavior for smaller segments; once one byte is modified, the whole 4096 byte segment called page must be reserved for your process. I'm not aware of any modern CPU that would support smaller than 4096 byte pages. Modern Intel CPUs support 2 MB and 1 GB pages in addition to 4096 byte pages but the 1 GB pages are rarely used because the overhead of using 2 MB pages is small enough for any sensible RAM amounts. 1 GB pages might make sense if your system has hundreds of terabytes of RAM.
So basically your program only tested reserving virtual memory without ever using said virtual memory. Your OS probably has special optimization for this which avoids needing more than 4 KB of RAM to support this.
Unless your objective is to try to measure the overhead caused by your malloc() implementation, you should avoid trying to allocate memory block smaller than 16-32 bytes. For mmap() allocations the minimum possible overhead is 8 bytes per allocation on x86-64 hardware due the data needed to return the memory to the operating system so it really doesn't make sense for malloc() to use mmap() syscall for a single 4 byte allocation.
The overhead is needed to keep track of memory allocations because the memory is freed using void free(void*) so memory allocation routines must keep track of the allocated memory segment size somewhere. Many malloc() implementations also need additional metadata and if they need to keep track of any memory addresses, those need 8 bytes per address.
If you truly want to search for the limits of your system, you should probably do binary search for the limit where malloc() fails. In practice, you try to allocate ..., 1KB, 2KB, 4KB, 8KB, ..., 32 GB which then fails and you know that the real world limit is between 16 GB and 32 GB. You can then split this size in half and figure out the exact limit with additional testing. If you do this kind of search, it may be easier to always release any successful allocation and reserve the test block with a single malloc() call. That should also avoid accidentally accounting for malloc() overhead so much because you need only one allocation at any time at max.
Update: As pointed out by Peter Cordes in the comments, your malloc() implementation may be writing bookkeeping data about your allocations in the reserved RAM which causes real memory to be used and that can cause system to start swapping so heavily that you cannot recover it in any sensible timescale without shutting down the computer. In case you're running Linux and have enabled "Magic SysRq" keys, you could just press Alt+SysRq+f to kill the offending process taking all the RAM and system would run just fine again. It is possible to write malloc() implementation that doesn't usually touch the RAM allocated via brk() and I assumed you would be using one. (This kind of implementation would allocate memory in 2^n sized segments and all similarly sized segments are reserved in the same range of addresses. When free() is later called, the malloc() implementation knows the size of the allocation from the address and bookkeeping about free memory segments are kept in separate bitmap in single location.) In case of Linux, malloc() implementation touching the reserved pages for internal bookkeeping is called dirtying the memory, which prevents sharing memory pages because of copy-on-write handling.
Why didn't loop break at any point of time?
If your OS implements the special behavior described above and you're running 64-bit system, you're not going to run out of virtual memory in any sensible timescale so your loop seems infinite.
Why wasn't there any allocation failure?
You didn't actually use the memory so you're allocating virtual memory only. You're basically increasing the maximum pointer value allowed for your process but since you never access the memory, the OS never bothers the reserve any physical memory for your process.
In case you're running Linux and want the system to enforce virtual memory usage to match actually available memory, you have to write 2 to kernel setting /proc/sys/vm/overcommit_memory and maybe adjust overcommit_ratio, too. See https://unix.stackexchange.com/q/441364/20336 for details about memory overcommit on Linux. As far as I know, Windows implements overcommit, too, but I don't know how to adjust its behavior.
when first time you allocate any size to *p, every next time you leave that memory to be unreferenced. That means
at a time your program is allocating memory of 4 bytes only
. then how can you thing you have used entire RAM, that's why SWAP device( temporary space on HDD) is out of discussion. I know an memory management algorithm in which when no one program is referencing to memory block, that block is eligible to allocate for programs memory request. That's why you are just keeping busy to RAM Driver and that's why it can't give chance to service other programs. Also this a dangling reference problem.
Ans : You can at most allocate the memory of your RAM size. Because no program has access to swap device.
I hope your all questions has got satisfactory answers.

Resources