Get maximum available heap memory - c

I'm currently trying to figure out the maximum memory that is able to be allocated through the malloc() command in C.
Until now I´ve tried a simple algorithm that increments a counter that will subsequently be allocated. If the malloc command returns "NULL" I know, that there is not enough memory available.
ULONG ulMaxSize = 0;
for (ULONG ulSize = /*0x40036FF0*/ 0x40A00000; ulSize <= 0xffffffff; ulSize++)
{
void* pBuffer = malloc(ulSize);
if (pBuffer == NULL)
{
ulMaxSize = ulSize - 1;
break;
}
free(pBuffer);
}
void* pMaxBuffer = malloc(ulMaxSize);
However, this algorithm gets executed very long since the malloc() command has turned out to be a time consuming task.
My question is now, if there is a more efficient algorithm to find the maximum memory able to be allocated?

The maximum memory that can be allocated depends mostly on few factors:
Address space limits on the process (max memory, virtual memory and friends).
Virtual space available
Physical space available
Fragmentation, which will limit the size of continuous memory blocks.
... Other limits ...
From your description (extreme slowness) looks like the process start using swap, which is VERY slow vs. real memory.
Consider the following alternative
For address space limit, look at ulimit -a (or use getrlimit to access the same data from C program) - look for 'max memory size', and 'virtual memory'
For swap space, physical memory - top
ulimit -a (filtered)
data seg size (kbytes, -d) unlimited
max memory size (kbytes, -m) 2048
stack size (kbytes, -s) 8192
virtual memory (kbytes, -v) unlimited
From a practical point, given that a program does not have control over system resources, you should be focused on 'max memory size'.

Other than using OS specific API to get such number :
sysinfo on linux or reading it from /proc/meminfo )
GlobalMemoryStatusEx for win32
You can also do a binary search, not recommended, as the state of the system might be in flux and the result could vary over time:
ULONG getMax() {
ULONG min = 0x0;
ULONG max = 0xffffffff;
void* t = malloc(max);
if(t!=NULL) {
free(t);
return max;
}
while(max-min > 1) {
ULONG mid = min + (max - min) / 2;
t = malloc(mid);
if(t == NULL) {
max = mid;
continue;
}
free(t);
min = mid;
}
return min;
}

Related

AF_XDP: Relationship between `FRAME_SIZE` and actual size of packet

My AF-XDP userspace program is based on this tutorial: https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
I am currently trying to parse ~360.000 RTP-packets per second (checking for continuous sequence numbers) but I loose around 25 per second (this means that for 25 packets the statement previous_rtp_sqnz_nmbr + 1 == current_rtp_sqnz_nmbr doesn't hold true).
So I tried to increase the number of allocated packets NUM_FRAMES from 228.000 to 328.000. With default FRAME_SIZE of XSK_UMEM__DEFAULT_FRAME_SIZE = 4096 this results in 1281Mbyte being allocated (no problem because I have 32GB of RAM) but for whatever reason, this function call:
static struct xsk_umem_info *configure_xsk_umem(void *buffer, uint64_t size)
{
printf("Try to allocate %lu\n", size);
struct xsk_umem_info *umem;
int ret;
umem = calloc(1, sizeof(*umem));
if (!umem)
return NULL;
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
NULL);
if (ret) {
errno = -ret;
return NULL;
}
umem->buffer = buffer;
return umem;
}
fails with
Try to allocate 1343488000
ERROR: Can't create umem "Cannot allocate memory"
I don't know why? But because I know that my RTP-packets are not larger than 1500bytes, I set FRAME_SIZE 3072 so I am now at around 960Mbyte (which works without an error).
However, I am now loosing half of the received packets (this means that for 180.000 packets the previous sequence number doesn't line up with the current sequence number).
Because of this I ask the question: What is the relationship between FRAME_SIZE and the actual size of a packet? Because obviously it can not be the same.
Edit: I am using 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux and just copied the libbpf-repository from here into my code-base: https://github.com/libbpf/libbpf
So I don't know whether the error mentioned here: https://github.com/xdp-project/xdp-tutorial/issues/76 is still valid?

Why my program occupies 32KB and not 11200B (Valgrind's massif)

The n-body program does this at the begining:
real4 *pin = (real4*)malloc(n * sizeof(real4));
real4 *pout = (real4*)malloc(n * sizeof(real4));
real3 *v = (real3*)malloc(n * sizeof(real3));
real3 *f = (real3*)malloc(n * sizeof(real3));
the total size of this should be (if n = 100): 100*32 + 100*32 + 100*24 + 100*24 = 11200B but with Valgrind's massif I have this:
I am not fammilliar with massif, but when talking about heap memory, there are two numbers that are interesting, how much memory has the allocator requested from the OS, and how much has the allocator given to your program through malloc(). If your program has requested ~10K of bytes, it is reasonable to think that the allocator may have requested a round number like 32K from the OS. The allocator typically request memory in large blocks from the OS since kernel calls are slow. (and a few other reasons)
So I would guess that the 32K that you are seeing, is what the allocator has aquired from the OS, ready to be given to your program through any additional malloc() that may happen.

Memory allocation threshold (mmap vs malloc)

Id like to point out I'm new to this so I'm trying to understand / explain it best i can.
I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
Here is how memory is allocated currently using third party libsodium:
alloc_region(escrypt_region_t *region, size_t size)
{
uint8_t *base, *aligned;
#if defined(MAP_ANON) && defined(HAVE_MMAP)
if ((base = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE,
#ifdef MAP_NOCORE
MAP_ANON | MAP_PRIVATE | MAP_NOCORE,
#else
MAP_ANON | MAP_PRIVATE,
#endif
-1, 0)) == MAP_FAILED)
base = NULL; /* LCOV_EXCL_LINE */
aligned = base;
#elif defined(HAVE_POSIX_MEMALIGN)
if ((errno = posix_memalign((void **) &base, 64, size)) != 0) {
base = NULL;
}
aligned = base;
#else
base = aligned = NULL;
if (size + 63 < size)
errno = ENOMEM;
else if ((base = (uint8_t *) malloc(size + 63)) != NULL) {
aligned = base + 63;
aligned -= (uintptr_t) aligned & 63;
}
#endif
region->base = base;
region->aligned = aligned;
region->size = base ? size : 0;
return aligned;
}
So for example, this currently calls posix_memalign to allocate (e.g.) 32mb of memory.
32mb exceeds my 'memory cap' given to me (but does not throw memory warnings as the memory capacity is far greater, its just what I'm 'allowed' to use)
From some googling, I'm under the impression i can either use mmap and virtual memory.
I can see that the function above already has some mmap implemented but is never called.
Is it possible to convert the above code so that i never exceed my 30mb memory limit?
From my understanding, if this allocation would exceed my free memory, it would automatically allocate in virtual memory? So can i force this to happen and pretend that my free space is lower than available?
Any help is appreciated
UPDATE
/* Allocate memory. */
B_size = (size_t) 128 * r * p;
V_size = (size_t) 128 * r * N;
need = B_size + V_size;
if (need < V_size) {
errno = ENOMEM;
return -1;
}
XY_size = (size_t) 256 * r + 64;
need += XY_size;
if (need < XY_size) {
errno = ENOMEM;
return -1;
}
if (local->size < need) {
if (free_region(local)) {
return -1;
}
if (!alloc_region(local, need)) {
return -1;
}
}
B = (uint8_t *) local->aligned;
V = (uint32_t *) ((uint8_t *) B + B_size);
XY = (uint32_t *) ((uint8_t *) V + V_size);
I am basically trying to figure out if its possible to keep memory allocation under a threshold due to memory limitation of my project.
On Linux or POSIX systems, you might consider using setrlimit(2) with RLIMIT_AS:
This is the maximum size of the process's virtual memory
(address space) in bytes. This limit affects calls to brk(2),
mmap(2), and mremap(2), which fail with the error ENOMEM upon
exceeding this limit.
Above this limit, the mmap would fail, and so would fail for instance the call to malloc(3) that triggered that particular use of mmap.
I'm under the impression i can either use mmap
Notice that malloc(3) will call mmap(2) (or sometimes sbrk(2)...) to retrieve (virtual) memory from the kernel, thus growing your virtual address space. However, malloc often prefer to reused previously free-d memory (when available). And free usually won't call munmap(2) to release memory chunks but prefer to keep it for future malloc-s. Actually most C standard libraries segregate between "small" and "large" allocation (in practice a malloc for a gigabyte will use mmap, and the corresponding free would mmap immediately).
See also mallopt(3) and madvise(2). In case you need to lock some pages (obtained by mmap) into physical RAM, consider mlock(2).
Look also into this answer (explaining that the notion of RAM used by a particular process is not that easy).
For malloc related bugs (including memory leaks) use valgrind.

Freertos + STM32 - thread memory overflow with malloc

I'm working with stm32+rtos to implement a file system based on spi flash. For freertos, I adopted heap_1 implementation. This is how i create my task.
osThreadDef(Task_Embedded, Task_VATEmbedded, osPriorityNormal, 0, 2500);
VATEmbeddedTaskHandle = osThreadCreate(osThread(Task_Embedded), NULL);
I allocated 10000 bytes of memory to this thread.
and in this thread. I tried to write data into flash. In the first few called it worked successfully. but somehow it crash when i tried more time of write.
VATAPI_RESULT STM32SPIWriteSector(void *writebuf, uint8_t* SectorAddr, uint32_t buff_size){
if(STM32SPIEraseSector(SectorAddr) == VAT_SUCCESS){
DBGSTR("ERASE SECTOR - 0x%2x %2x %2x", SectorAddr[0], SectorAddr[1], SectorAddr[2]);
}else return VAT_UNKNOWN;
if(STM32SPIProgram_multiPage(writebuf, SectorAddr, buff_size) == VAT_SUCCESS){
DBGSTR("WRTIE SECTOR SUCCESSFUL");
return VAT_SUCCESS;
}else return VAT_UNKNOWN;
return VAT_UNKNOWN;
}
.
VATAPI_RESULT STM32SPIProgram_multiPage(uint8_t *writebuf, uint8_t *writeAddr, uint32_t buff_size){
VATAPI_RESULT nres;
uint8_t tmpaddr[3] = {writeAddr[0], writeAddr[1], writeAddr[2]};
uint8_t* sectorBuf = malloc(4096 * sizeof(uint8_t));
uint8_t* pagebuf = malloc(255* sizeof(uint8_t));
memset(&sectorBuf[0],0,4096);
memset(&pagebuf[0],0,255);
uint32_t i = 0, tmp_convert1, times = 0;
if(buff_size < Page_bufferSize)
times = 1;
else{
times = buff_size / (Page_bufferSize-1);
if((times%(Page_bufferSize-1))!=0)
times++;
}
/* Note : According to winbond flash feature, the last bytes of every 256 bytes should be 0, so we need to plus one byte on every 256 bytes*/
i = 0;
while(i < times){
memset(&pagebuf[0], 0, Page_bufferSize - 1);
memcpy(&pagebuf[0], &writebuf[i*255], Page_bufferSize - 1);
memcpy(&sectorBuf[i*Page_bufferSize], &pagebuf[0], Page_bufferSize - 1);
sectorBuf[((i+1)*Page_bufferSize)-1] = 0;
i++;
}
i = 0;
while(i < times){
if((nres=STM32SPIPageProgram(&sectorBuf[Page_bufferSize*i], &tmpaddr[0], Page_bufferSize)) != VAT_SUCCESS){
DBGSTR("STM32SPIProgram_allData write data fail on %d times!",i);
free(sectorBuf);
free(pagebuf);
return nres;
}
tmp_convert1 = (tmpaddr[0]<<16 | tmpaddr[1]<<8 | tmpaddr[2]) + Page_bufferSize;
tmpaddr[0] = (tmp_convert1&0xFF0000) >> 16;
tmpaddr[1] = (tmp_convert1&0xFF00) >>8;
tmpaddr[2] = 0x00;
i++;
}
free(sectorBuf);
free(pagebuf);
return nres;
}
I open the debugger and it seems like it crash when i malloced "sectorbuf" in function "STM32SPIProgram_multiPage", what Im confused is that i did free the memory after "malloc". anyone has idea about it?
arm-none-eabi-size "RTOS.elf"
text data bss dec hex filename
77564 988 100756 179308 2bc6c RTOS.elf
Reading the man
Memory Management
[...]
If RTOS objects are created dynamically then the standard C library malloc() and free() functions can sometimes be used for the purpose, but ...
they are not always available on embedded systems,
they take up valuable code space,
they are not thread safe, and
they are not deterministic (the amount of time taken to execute the function will differ from call to call)
... so more often than not an alternative memory allocation implementation is required.
One embedded / real time system can have very different RAM and timing requirements to another - so a single RAM allocation algorithm will only ever be appropriate for a subset of applications.
To get around this problem, FreeRTOS keeps the memory allocation API in its portable layer. The portable layer is outside of the source files that implement the core RTOS functionality, allowing an application specific implementation appropriate for the real time system being developed to be provided. When the RTOS kernel requires RAM, instead of calling malloc(), it instead calls pvPortMalloc(). When RAM is being freed, instead of calling free(), the RTOS kernel calls vPortFree().
[...]
(Emphasis mine.)
So the meaning is that if you use directly malloc, FreeRTOS is not able to handle the heap consumed by the system function. Same if you choose heap_3 management that is a simple malloc wrapper.
Take also note that the memory management you choose has no free capability.
heap_1.c
This is the simplest implementation of all. It does not permit memory to be freed once it has been allocated. Despite this, heap_1.c is appropriate for a large number of embedded applications. This is because many small and deeply embedded applications create all the tasks, queues, semaphores, etc. required when the system boots, and then use all of these objects for the lifetime of program (until the application is switched off again, or is rebooted). Nothing ever gets deleted.
The implementation simply subdivides a single array into smaller blocks as RAM is requested. The total size of the array (the total size of the heap) is set by configTOTAL_HEAP_SIZE - which is defined in FreeRTOSConfig.h. The configAPPLICATION_ALLOCATED_HEAP FreeRTOSConfig.h configuration constant is provided to allow the heap to be placed at a specific address in memory.
The xPortGetFreeHeapSize() API function returns the total amount of heap space that remains unallocated, allowing the configTOTAL_HEAP_SIZE setting to be optimised.
The heap_1 implementation:
Can be used if your application never deletes a task, queue, semaphore, mutex, etc. (which actually covers the majority of applications in which FreeRTOS gets used).
Is always deterministic (always takes the same amount of time to execute) and cannot result in memory fragmentation.
Is very simple and allocated memory from a statically allocated array, meaning it is often suitable for use in applications that do not permit true dynamic memory allocation.
(Emphasis mine.)
Side note: You have always to check malloc return value != NULL.

Why do very large stack allocations fail despite unlimited ulimit?

The following static allocation gives segmentation fault
double U[100][2048][2048];
But the following dynamic allocation goes fine
double ***U = (double ***)malloc(100 * sizeof(double **));
for(i=0;i<100;i++)
{
U[i] = (double **)malloc(2048 * sizeof(double *));
for(j=0;j<2048;j++)
{
U[i][j] = (double *)malloc(2048*sizeof(double));
}
}
The ulimit is set to unlimited in linux.
Can anyone give me some hint on whats happening?
When you say the ulimit is set to unlimited, are you using the -s option? As otherwise this doesn't change the stack limit, only the file size limit.
There appear to be stack limits regardless, though. I can allocate:
double *u = malloc(200*2048*2048*(sizeof(double))); // 6gb contiguous memory
And running the binary I get:
VmData: 6553660 kB
However, if I allocate on the stack, it's:
double u[200][2048][2048];
VmStk: 2359308 kB
Which is clearly not correct (suggesting overflow). With the original allocations, the two give the same results:
Array: VmStk: 3276820 kB
malloc: VmData: 3276860 kB
However, running the stack version, I cannot generate a segfault no matter what the size of the array -- even if it's more than the total memory actually on the system, if -s unlimited is set.
EDIT:
I did a test with malloc in a loop until it failed:
VmData: 137435723384 kB // my system doesn't quite have 131068gb RAM
Stack usage never gets above 4gb, however.
Assuming your machine actually has enough free memory to allocate 3.125 GiB of data, the difference most likely lies in the fact that the static allocation needs all of this memory to be contiguous (it's actually a 3-dimensional array), while the dynamic allocation only needs contiguous blocks of about 2048*8 = 16 KiB (it's an array of pointers to arrays of pointers to quite small actual arrays).
It is also possible that your operating system uses swap files for heap memory when it runs out, but not for stack memory.
There is a very good discussion of Linux memory management - and specifically the stack - here: 9.7 Stack overflow, it is worth the read.
You can use this command to find out what your current stack soft limit is
ulimit -s
On Mac OS X the hard limit is 64MB, see How to change the stack size using ulimit or per process on Mac OS X for a C or Ruby program?
You can modify the stack limit at run-time from your program, see Change stack size for a C++ application in Linux during compilation with GNU compiler
I combined your code with the sample there, here's a working program
#include <stdio.h>
#include <sys/resource.h>
unsigned myrand() {
static unsigned x = 1;
return (x = x * 1664525 + 1013904223);
}
void increase_stack( rlim_t stack_size )
{
rlim_t MIN_STACK = 1024 * 1024;
stack_size += MIN_STACK;
struct rlimit rl;
int result;
result = getrlimit(RLIMIT_STACK, &rl);
if (result == 0)
{
if (rl.rlim_cur < stack_size)
{
rl.rlim_cur = stack_size;
result = setrlimit(RLIMIT_STACK, &rl);
if (result != 0)
{
fprintf(stderr, "setrlimit returned result = %d\n", result);
}
}
}
}
void my_func() {
double U[100][2048][2048];
int i,j,k;
for(i=0;i<100;++i)
for(j=0;j<2048;++j)
for(k=0;k<2048;++k)
U[i][j][k] = myrand();
double sum = 0;
int n;
for(n=0;n<1000;++n)
sum += U[myrand()%100][myrand()%2048][myrand()%2048];
printf("sum=%g\n",sum);
}
int main() {
increase_stack( sizeof(double) * 100 * 2048 * 2048 );
my_func();
return 0;
}
You are hitting a limit of the stack. By default on Windows, the stack is 1M but can grow more if there is enough memory.
On many *nix systems default stack size is 512K.
You are trying to allocate 2048 * 2048 * 100 * 8 bytes, which is over 2^25 (over 2G for stack). If you have a lot of virtual memory available and still want to allocate this on stack, use a different stack limit while linking the application.
Linux:
How to increase the gcc executable stack size?
Change stack size for a C++ application in Linux during compilation with GNU compiler
Windows:
http://msdn.microsoft.com/en-us/library/tdkhxaks%28v=vs.110%29.aspx

Resources