Is there a limit on memory allocated using huge pages? - c

I am allocating memory using "huge pages(1MB size)" and using mmap. After allocating 4 GB of memory ,mmap returns fail.
mmap(NULL, memsize, PROT_READ | PROT_WRITE,MAP_PRIVATE | MAP_ANONYMOUS |MAP_POPULATE | MAP_HUGETLB, -1, 0);
here memsize = 1GB
I am calling above statement in a loop. Upto 4 iterations it is fine. In 5th iteration mmap is failed.
mmap(NULL, memsize, PROT_READ | PROT_WRITE,MAP_PRIVATE | MAP_ANONYMOUS |MAP_POPULATE , -1, 0);
Above statement (without hugepages) works perfectly for any number of iterations. Am I missing any information related to hugepages?
I tried "MAP_NORESERVE" flag also as mentioned in mmap fail after 4GB.
Any sort of information will be greatly appreciated. Thank you.

Change the allocated "number of huge pages" in file
/proc/sys/vm/nr_hugepages
according to the amount of memory you want to allocate.
Earlier it says:
>cat /proc/meminfo | grep HugePages
HugePages_Total = 2500
4GB => it has 2048*2Mb= 4Gb
2048 huge pages already consumed.
one more GB of memory need (1GB/2MB= 512) 512 more huge pages. But 2500 - 2048 =452 only left. But you need 512. Thats the problem why mmap failed. If you edit the above mentioned file(/proc/sys/vm/nr_hugepages) contents to 2560, it allows 5GB. Change it according to the amount of memory you need. Thanks to # Klas Lindbäck. I referred back the link, small research exposed the working

Related

mallinfo doesn't show mmap allocation's information

In mallinfo structure there are two fields hblks and hblkhd. The man documentation says that they are responsible for the number of blocks allocated by mmap and the total number of bytes. But when I run next code
void * ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
*(int *) ptr = 10;
Fields hblks and hblkhd are also zero. While the total number of free bytes in the blocks decreases. Could you please explain why this behavior is observed?
I also tried to allocate all free space and use mmap after it. But in this situation fields also equal to zero
Compiler: gcc 9.4.0
OS: Ubuntu 20.04.1
I did some experiments and they led me to the conclusion that this field is filled only when mmap occurred when calling malloc. A normal mmap call doesn't show up in this statistic, which is logical, because this is a system call, and the statistic is collected in user-space

mmap file with larger fixed length with zero padding?

I want to use mmap() to read a file with fixed length (eg. 64MB), but there also some files < 64MB.
I mmap this files (<64MB, eg.30MB) with length = 64MB, when read file data beyond file size(30MB - 64MB), the program got a bus-error.
I want mmap these files with fixed length, and read 0x00 when pointer beyond file size. How to do that?
One method I can think is ftruncate file first, and ftruncate back to ori size, but I don't think this method is perfect.
This is one of the few reasonable use cases for MAP_FIXED, to remap part of an existing mapping to use a new backing file.
A simple solution here is to unconditionally mmap 64 MB of anonymous memory (or explicitly mmap /dev/zero), without MAP_FIXED and store the resulting pointer.
Next, mmap 64 MB or your actual file size (whichever is less) of your actual file, passing in the result of the anonymous/zero mmap and passing the MAP_FIXED flag. The pages corresponding to your file will no longer be anonymous/zero mapped, and instead will be backed by your file's data; the remaining pages will be backed by the anonymous/zero pages.
When you're done, a single munmap call will unmap all 64 MB at once (you don't need to separately unmap the real file pages and the zero backed pages).
Extremely simple example (no error checking, please add it yourself):
// Reserve 64 MB of contiguous addresses; anonymous mappings are always zero backed
void *mapping = mmap(NULL, 64 * 1024 * 1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Open file and check size
struct stat sb;
int fd = open(myfilename, O_RDONLY);
fstat(fd, &sb);
// Use smaller of file size or 64 MB
size_t filemapsize = sb.st_size > 64 * 1024 * 1024 ? 64 * 1024 * 1024 : sb.st_size;
// Remap up to 64 MB of pages, replacing some or all of original anonymous pages
mapping = mmap(mapping, filemapsize, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0);
close(fd);
// ... do stuff with mapping ...
munmap(mapping, 64 * 1024 * 1024);

Predict malloc block sizes grid in C

I'm trying to optimize my dynamic memory usage. The thing is that I initially allocate some amount of memory for the data I get from a socket. Then, on the new data arrival I'm reallocating memory so the newly arrived part will fit into the local buffer. After some poking around I've found that malloc actually allocates a greater block than requested. In some cases significantly greater; here comes some debug info from malloc_usable_size(ptr):
requested 284 bytes, allocated 320 bytes
requested 644 bytes, reallocated 1024 bytes
It's well known that malloc/realloc are expensive operations. In most cases newly arrived data will fit into a previously allocated block (at least when I requested 644 byes and get 1024 instead), but I have no idea how I can figure that out.
The trouble is that malloc_usable_size should not be relied upon (as described in manual) and if the program requested 644 bytes and malloc allocated 1024, the excess 644 bytes may be overwritten and can not be used safely. So, using malloc for a given amount of data and then use malloc_usable_size to figure out how many bytes were really allocated isn't the way to go.
What I want is to know the block grid before calling malloc, so I will request exactly the maximum amount of bytes greater then I need, store allocated size and on the realloc check if I really need to realloc, or if the previously allocated block is fine just because it's greater.
In other words, if I were to request 644 bytes, and malloc actually gave me 1024, I want to have predicted that and requested 1024 instead.
Depending on your particular implementation of libc you will have different behaviour. I have found in most cases two approaches to do the trick:
Use the stack, this is not always feasible, but C allows VLAs on the stack and is the most effective if you don't intend to pass your buffer to an external thread
while (1) {
char buffer[known_buffer_size];
read(fd, buffer, known_buffer_size);
// use buffer
// released at the end of scope
}
In Linux you can make excellent use of mremap which can enlarge/shrink memory with zero-copy guaranteed. It may move your VM mapping though. Only problem here is that it only works in chunks of system page size sysconf(_SC_PAGESIZE) which is usually 0x1000.
void * buffer = mmap(NULL, init_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
while(1) {
// if needs remapping
{
// zero copy, but involves a system call
buffer = mremap(buffer, new_size, MREMAP_MAYMOVE);
}
// use buffer
}
munmap(buffer, current_size);
OS X has similar semantics to Linux's mremap through the Mach vm_remap, it's a little more compilcated though.
void * buffer = mmap(NULL, init_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mach_port_t this_task = mach_task_self();
while(1) {
// if needs remapping
{
// zero copy, but involves a system call
void * new_address = mmap(NULL, new_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
vm_prot_t cur_prot, max_prot;
munmap(new_address, current_size); // vm needs to be empty for remap
// there is a race condition between these two calls
vm_remap(this_task,
&new_address, // new address
current_size, // has to be page-aligned
0, // auto alignment
0, // remap fixed
this_task, // same task
buffer, // source address
0, // MAP READ-WRITE, NOT COPY
&cur_prot, // unused protection struct
&max_prot, // unused protection struct
VM_INHERIT_DEFAULT);
munmap(buffer, current_size); // remove old mapping
buffer = new_address;
}
// use buffer
}
The short answer is that the standard malloc interface does not provide the information you are looking for. To use the information breaks the abstraction provided.
Some alternatives are:
Rethink your usage model. Perhaps pre-allocate a pool of buffers at start, filling them as you go. Unfortunately this could complicate your program more than you would like.
Use a different memory allocation library that does provide the needed interface. Different libraries provide different tradeoffs in terms of fragmentation, max run time, average run time, etc.
Use your OS memory allocation API. These are often written to be efficient, but will generally require a system call (unlike a user-space library).
In my professional code, I often take advantage of the actual size allocated by malloc()[etc], rather than the requested size. This is my function for determining the actual allocation size0:
int MM_MEM_Stat(
void *I__ptr_A,
size_t *_O_allocationSize
)
{
int rCode = GAPI_SUCCESS;
size_t size = 0;
/*-----------------------------------------------------------------
** Validate caller arg(s).
*/
#ifdef __linux__ // Not required for __APPLE__, as alloc_size() will
// return 0 for non-malloc'ed refs.
if(NULL == I__ptr_A)
{
rCode=EINVAL;
goto CLEANUP;
}
#endif
/*-----------------------------------------------------------------
** Calculate the size.
*/
#if defined(__APPLE__)
size=malloc_size(I__ptr_A);
#elif defined(__linux__)
size=malloc_usable_size(I__ptr_A);
#else
!##$%
#endif
if(0 == size)
{
rCode=EFAULT;
goto CLEANUP;
}
/*-----------------------------------------------------------------
** Return requested values to caller.
*/
if(_O_allocationSize)
*_O_allocationSize = size;
CLEANUP:
return(rCode);
}
I did some sore research and found two interesting things about malloc realization in Linux and FreeBSD:
1) in Linux malloc increment blocks linearly in 16 byte steps, at least up to 8K, so no optimization needed at all, it's just not reasonable;
2) in FreeBSD situation is different, steps are bigger and tend to grow up with requested block size.
So, any kind of optimization is needed only for FreeBSD as Linux allocates blocks with a very tiny steps and it's very unlikely to receive less then 16 bytes of data from socket.

mmap() fail with ENOMEM on a 1TB ANONYMOUS file?

I'm trying to mmap an 1TB anonymous file under Fedora Linux x86_64 (4G RAM plus 16G swap). But I got ENOMEM "cannot allocate memory" and even for 32G as the following code. Am I missing anything? Appreciate any clue.
#define HEAP_SIZE (1UL << 35)
int main()
{
void *addr = mmap(0, HEAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
{
perror(NULL);
return 1;
}
printf("mmap %d gbytes succeed\n", HEAP_SIZE/(1UL << 30));
return 0;
}
The default Linux overcommit policy prevents you from allocating this much memory. You don't have anywhere near 1TB of RAM, and the kernel will give you ENOMEM now rather than running the OOM killer later... but you can change this policy.
$ /sbin/sysctl vm.overcommit_memory
vm.overcommit_memory = 0
$ sudo /sbin/sysctl vm.overcommit_memory=1
vm.overcommit_memory = 1
Policy 1 is "always overcommit", which is useful for certain applications. Policy 2 is "never overcommit". The default policy, 0, allows some overcommit but uses heuristics to reject large allocations, like the one which failed on your computer.
Alternative
You could also use the MAP_NORESERVE flag. Note that the kernel will ignore this flag if its policy is to "never overcommit".

Mmap() an entire large file

I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
int main(int argc, char *argv[])
{
const char *memblock;
int fd;
struct stat sb;
fd = open(argv[1], O_RDONLY);
fstat(fd, &sb);
printf("Size: %lu\n", (uint64_t)sb.st_size);
memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0);
if (memblock == MAP_FAILED) handle_error("mmap");
for(uint64_t i = 0; i < 10; i++)
{
printf("[%lu]=%X ", i, memblock[i]);
}
printf("\n");
return 0;
}
test.c is compiled using gcc -std=c99 test.c -o test and file of test returns: test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns:
Size: 8274324021
mmap: Cannot allocate memory
I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?
MAP_PRIVATE mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.
I also note that you're mapping with PROT_WRITE, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY - this itself may be another problem for you; you must specify O_RDWR if you want to use PROT_WRITE with MAP_SHARED.
As for PROT_WRITE only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE - or, if you only need to read, PROT_READ.
On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED results in an EPERM error (since I'm not allowed to write to the backing file opened with O_RDONLY). Changing to PROT_READ and MAP_SHARED allows the mapping to succeed.
If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap and remap with MAP_PRIVATE the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.
Alternately, you can write 1 to /proc/sys/vm/overcommit_memory. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.
Linux (and apparently a few other UNIX systems) have the MAP_NORESERVE flag for mmap(2), which can be used to explicitly enable swap space overcommitting. This can be useful when you wish to map a file larger than the amount of free memory available on your system.
This is particularly handy when used with MAP_PRIVATE and only intend to write to a small portion of the memory mapped range, since this would otherwise trigger swap space reservation of the entire file (or cause the system to return ENOMEM, if system wide overcommitting hasn't been enabled and you exceed the free memory of the system).
The issue to watch out for is that if you do write to a large portion of this memory, the lazy swap space reservation may cause your application to consume all the free RAM and swap on the system, eventually triggering the OOM killer (Linux) or causing your app to receive a SIGSEGV.
You don't have enough virtual memory to handle that mapping.
As an example, I have a machine here with 8G RAM, and ~8G swap (so 16G total virtual memory available).
If I run your code on a VirtualBox snapshot that is ~8G, it works fine:
$ ls -lh /media/vms/.../snap.vdi
-rw------- 1 me users 9.2G Aug 6 16:02 /media/vms/.../snap.vdi
$ ./a.out /media/vms/.../snap.vdi
Size: 9820000256
[0]=3C [1]=3C [2]=3C [3]=20 [4]=4F [5]=72 [6]=61 [7]=63 [8]=6C [9]=65
Now, if I drop the swap, I'm left with 8G total memory. (Don't run this on an active server.) And the result is:
$ sudo swapoff -a
$ ./a.out /media/vms/.../snap.vdi
Size: 9820000256
mmap: Cannot allocate memory
So make sure you have enough virtual memory to hold that mapping (even if you only touch a few pages in that file).

Resources