How to find holes in the address space? - c

I have a set of files whose lengths are all multiples of the page-size of my operating system (FreeBSD 10). I would like to mmap() these files to consecutive pages of RAM, giving me the ability to treat a collection of files as one large array of data.
Preferably using portable functions, how can I find a sufficiently large region of unmapped address space so I can be sure that a series of mmap() calls to this region is going to be successful?

Follow these steps:
First compute the total size needed by enumerating your files and summing their sizes.
Map a single area of anonymous memory of this size with mmap. If this fails, you lose.
Save the pointer and unmap the area (actually, unmap may not be necessary if your system's mmap with a fixed address implicitly unmaps any previous overlapping region).
Map the first file at this address with the appropriate MAP_FIXED flag.
Increment the address by the file size.
loop to step 4 until all files have been mmapped.
This should be fully portable to any POSIX system, but some OSes might have quirks that prevent this method. Try it.

You could mmap a large region where the size is the sum of the sizes of all files, using MAP_PRIVATE | MAP_ANON, and protection PROT_NONE which would prevent the OS from unnecessarily committing the memory charges.
This will reserve but not commit memory.
You could then open file filename1 at [baseAddr, size1) and open filename2 at [baseAddr + size1, baseAddr + size1 + size2), and so on.
I believe the flags for this are MAP_FIXED | MAP_PRIVATE.

Related

Fail to allocate a large amount of virtual memory

I read that when you try to allocate more bytes than are available in RAM using malloc(), it allocates virtual memory. At least on Linux.
I want to allocate a huge amount of virtual memory, lets say 100 GB. So, I wrote something like this:
void* virtual_memory = malloc(100 gb int);
But the returned pointer is NULL.
I execute this code on a 64-bit Ubuntu virtual machine.
What am I doing wrong?
EDIT
What I'm trying to achieve is to make htop tool displaying 100GB in VIRT column for my process.
UPDATE
I CAN call malloc to allocate 2 GB at once 50 times
I read that when you try to allocate more bytes than are available in RAM using malloc(), it allocates virtual memory
To start with, this is not correct. You always allocate virtual memory. This virtual memory is mapped to some area on the Physical memory(RAM) or the swap space. If the swap space + physical memory is less than 100 GBs, your allocation will fail. Also, the libc implementation might fail to allocate such a large amount, if it has some (programmable) limit set.
but I have a strange task to show up 100gb of virtual memory for the process in htop tool. And it's claimed to be achievable via single line of code.
Yes if you just need this much virtual memory, you can reserve memory but not commit it. You can read upon how mmap(*NIX) or VirtualAlloc(Windows) can be used for the same.
When you reserve a particular Virtual Address range, you tell the operating system that you intend to use this range, so other code can't use that. But it doesn't mean you can actually use it. This also means that it doesn't need a RAM/Swap backing. So you will be able to reserve arbitrarily large amount (less than 2^48 bytes on your 64 bit system of course).
Although I am not sure if htop will include that in the value it shows, you will have to try that out.
If this doesn't indeed add to your virtual memory count, you can map it to a file, instead of mapping it anonymously. This might create a 100 GB file on your system (assuming you have that much space), but you should even be able to read/write to it.
Following code can be used on linux -
int fd = open("temp.txt", O_RDWR | O_CREAT);
void* addr = mmap(NULL, 100 * GBS, PROT_WRITE | PROT_READ, MAP_PRIVATE, fd, 0);
The following code done the thing for me:
for (int i = 0; i < 50; ++i) {
malloc(int_pow(2, 31));
}
Where int_pow is just a custom pow implementation, which operates integers. After running this app htop tool shows that it uses exactly 100GB of virtual memory.

File Holes and Shared Memory in Linux?

Since I read the Linux Programming Interface book (very good read) I discovered the file holes in Linux. Therefore using Unix file formats like Ext4 one can open a new file write 1000 bytes seek to the position of 1000.000.000 write another 1000 bytes and depending the block size of the file format end up with a file consuming just 2048 bytes for a block size of 1024 or 512.
So basically the file is created as a 1GB + 1000 bytes long file where only two blocks of real drive space are used.
Can I erase the middle of a file forcing the system to deallocate those blocks on the drive?
Is there an equivalent where I allocate (shared memory) with or without a file backing it where it has also holes that are just filled as the memory pages are written?
Would be nice to allocate 1GB shared memory but never utilize it fully until necessary just to avoid remapping if the shared memory block should grow.
Can I erase the middle of a file forcing the system to deallocate those blocks on the drive?
You probably want the Linux specific fallocate(2); beware, it might not work on some filesystems (e.g. NFS, VFAT, ...), because some filesystems don't have holes. See also lseek(2) with SEEK_HOLE, posix_fadvise(2), madvise(2), memfd_create(2), etc...
Raw block devices (like a disk partition, or an USB key, or an SSD) don't have any holes (but you could mmap into them). Holes are a file system software artifact.
Would be nice to allocate 1GB shared memory but never utilize it
this is contradictory. If the memory is shared it is used (by the thing -generally another process- with which you share that memory). Read shm_overview(7) if you really want shared memory (and read carefully mmap(2)). Read more about virtual memory, address space, paging, MMUs, page faults, operating systems, kernels, mmap, demand paging, copy-on-write, memory map, ELF, sparse files ... Try also the cat /proc/$$/maps command in a terminal, and understand the output (see proc(5)...).
Perhaps you want to pre-allocate some address space range, and later really allocate the virtual memory. This is possible using Linux version of mmap(2).
To pre-allocate a gigabyte memory range, you'll first call mmap with MAP_NORESERVE
size_t onegiga = 1L<<30;
void* startad = mmap(NULL, onegiga, PROT_NONE,
MAP_ANONYMOUS|MAP_NORESERVE|MAP_SHARED,
-1, 0);
if (startad==MAP_FAILED) { perror("mmap MAP_NORESERVE"); exit(EXIT_FAILURE); }
void* endad = (char*)startad + onegiga;
The MAP_NORESERVE does not consume a lot of resources (i.e. does not eat swap space, which is not reserved, hence the name of the flag). It is pre-allocating address space, in the sense that further mmap calls (without MAP_FIXED) won't give an address inside the returned range (unless you munmap some of it).
Later on, you can allocate some subsegment of that, in multiples of the page size (generally 4Kbytes), using MAP_FIXED inside the previous segment, e.g.
size_t segoff = 1024*1024; // or something else such that ....
assert (segoff >=0 && segoff < onegiga && segoff % sysconf(_SC_PAGESIZE)==0);
size_t segsize = 65536; // or something else such that ....
assert (segsize > 0 && segsize % sysconf(_SC_PAGESIZE)==0
&& startad + segoffset + segsize < endad);
void* segmentad = mmap(startad + segoffset, segsize,
PROT_READ|PROT_WRITE,
MAP_FIXED | MAP_PRIVATE,
-1, 0);
if (segmentad == MAP_FAILED) { perror("mmap MAP_FIXED"); exit(EXIT_FAILURE); }
This re-allocation with MAP_FIXED will use some resources (e.g. consume some swap space).
IIRC, the SBCL runtime and garbage collection uses such tricks.
Read also Advanced Linux Programming and carefully syscalls(2) and the particular man pages of relevant system calls.
Read also about memory overcommitment. This is a Linux feature that I dislike and generally disable (e.g. thru proc(5)).
BTW, the Linux kernel is free software. You can download its source code from kernel.org and study the source code. And you can write some experimental code also. Then ask another more focused question showing your code and the results of your experiments.

How do I implement dynamic shared memory resizing?

Currently I use shm_open to get a file descriptor and then use ftruncate and mmap whenever I want to add a new buffer to the shared memory. Each buffer is used individually for its own purposes.
Now what I need to do is arbitrarily resize buffers.
And also munmap buffers and reuse the free space again later.
The only solution I can come up with for the first problem is: ftuncate(file_size + old_buffer_size + extra_size), mmap, copy data accross into the new buffer and then munmap the original data. This looks very expensive to me and there is probably a better way. It also entails removing the original buffer every time.
For the second problem I don't even have a bad solution, I clearly can't move memory around everytime a buffer is removed. And if I keep track of free memory and use it whenever possible it will slow down allocation process as well as leave me with bits and pieces in between that are unused.
I hope this is not too confusing.
Thanks
As best as I understand you need to grow (or shrink) the existing memory mapping.
Under linux shared memory implemented as a file, located in /dev/shm memory filesystem. All operations in this file is the same as on the regular files (and file descriptors).
if you want to grow the existing mapping first expand the file size with ftruncate (as you wrote) then use mremap to expand the mapping to the requested size.
If you store pointers points to this region you maybe have to update these, but first try to call with 0 flag. In this case the system tries to grow the existing mapping to the requested size (if there is no collision with other preserved memory region) and pointers remains valid.
If previous option not available use MREMAP_MAYMOVE flag. In this case the system remaps to another locations, but mostly it's done effectively (no copy applied by the system.) then update the pointers.
Shrinking is the same but the reverse order.
I wrote an open source library for just this purpose:
rszshm - resizable pointer-safe shared memory
To quote from the description page:
To accommodate resizing, rszshm first maps a large, private, noreserve
map. This serves to claim a span of addresses. The shared file mapping
then overlays the beginning of the span. Later calls to extend the
mapping overlay more of the span. Attempts to extend beyond the end of
the span return an error.
I extend a mapping by calling mmap with MAP_FIXED at the original address, and with the new size.

Does mmap or malloc allocate RAM?

I know this is probably a stupid question but i've been looking for awhile and can't find a definitive answer. If I use mmap or malloc (in C, on a linux machine) does either one allocate space in RAM? For example, if I have 2GB of RAM and wanted to use all available RAM could I just use a malloc/memset combo, mmap, or is there another option I don't know of?
I want to write a series of simple programs that can run simultaneously and keep all RAM used in the process to force swap to be used, and pages swapped in/out frequently. I tried this already with the program below, but it's not exactly what I want. It does allocate memory (RAM?), and force swap to be used (if enough instances are running), but when I call sleep doesn't that just lock the memory from being used (so nothing is actually being swapped in or out from other processes?), or am I misunderstanding something.
For example, if I ran this 3 times would I be using 2GB (all) of RAM from the first two instances, and the third instance would then swap one of the previous two instances out (of RAM) and the current instance into RAM? Or would instance #3 just run using disk or virtual memory?
This brings up another point, would I need to allocate enough memory to use all available virtual memory as well for the swap partition to be used?
Lastly, would mmap (or any other C function. Hell, even another language if applicable) be better for doing this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MB(size) ( (size) * 1024 * 1024)
#define GB(size) ( (size) * 1024 * 1024 * 1024)
int main(){
char *p;
p = (char *)malloc(MB(512));
memset(p, 'T', MB(512));
printf(".5 GB allocated...\n");
char *q;
q = (char *)malloc(MB(512));
memset(q, 'T', MB(512));
printf("1 GB allocated...\n");
printf("Sleeping...\n");
sleep(300);
}
** Edit: I am using CentOS 6.4 (with 3.6.0 kernel) for my OS, if that helps any.
This is very OS/machine dependent.
In most OSes neither allocates RAM. They both allocate VM space. They make a certain range of your processes virtual memory valid for use. RAM is normally allocated later by the OS on first write. Until then those allocations do not use RAM (aside from the page table that lists them as valid VM space).
If you want to allocate physical RAM then you have to make each page (sysconf(_SC_PAGESIZE) gives you the system pagesize) dirty.
In Linux you can see your VM mappings with all details in /proc/self/smaps. Rss is your resident set of that mapping (how much is resident in RAM), everything else that is dirty will have been swapped out. All non-dirty memory will be available for use, but won't exist until then.
You can make all pages dirty with something like
size_t mem_length;
char (*my_memory)[sysconf(_SC_PAGESIZE)] = mmap(
NULL
, mem_length
, PROT_READ | PROT_WRITE
, MAP_PRIVATE | MAP_ANONYMOUS
, -1
, 0
);
int i;
for (i = 0; i * sizeof(*my_memory) < mem_length; i++) {
my_memory[i][0] = 1;
}
On some Implementations this can also be achieved by passing the MAP_POPULATE flag to mmap, but (depending on your system) it may just fail mmap with ENOMEM if you try to map more then you have RAM available.
Theory and practice differ greatly here. In theory, neither mmap nor malloc allocate actual RAM, but in practice they do.
mmap will allocate RAM to store a virtual memory area data structure (VMA). If mmap is used with an actual file to be mapped, it will (unless explicitly told differently) further allocate several pages of RAM to prefetch the mapped file's contents.
Other than that, it only reserves address space, and RAM will be allocated as it is accessed for the first time.
malloc, similarly, only logically reserves amounts of address space within the virtual address space of your process by telling the operating system either via sbrk or mmap that it wants to manage some (usually much larger than you request) area of address space. It then subdivides this huge area via some more or less complicated algorithm and finally reserves a portion of this address space (properly aligned and rounded) for your use and returns a pointer to it.
But: malloc also needs to store some additional information somewhere, or it would be impossible for free to do its job at a later time. At the very least free needs to know the size of an allocated block in addition to the start address. Usually, malloc therefore secretly allocates a few extra bytes which are immediately preceding the address that you get -- you don't know about that, it doesn't tell you.
Now the crux of the matter is that while in theory malloc does not touch the memory that it manages and does not allocate physical RAM, in practice it does. And this does indeed cause page faults and memory pages to be created (i.e. RAM being used).
You can verify this under Linux by keeping to call malloc and watch the OOP killer blast your process out of existence because the system runs out of physical RAM when in fact there should be plenty left.

When should I use mmap for file access?

POSIX environments provide at least two ways of accessing files. There's the standard system calls open(), read(), write(), and friends, but there's also the option of using mmap() to map the file into virtual memory.
When is it preferable to use one over the other? What're their individual advantages that merit including two interfaces?
mmap is great if you have multiple processes accessing data in a read only fashion from the same file, which is common in the kind of server systems I write. mmap allows all those processes to share the same physical memory pages, saving a lot of memory.
mmap also allows the operating system to optimize paging operations. For example, consider two programs; program A which reads in a 1MB file into a buffer creating with malloc, and program B which mmaps the 1MB file into memory. If the operating system has to swap part of A's memory out, it must write the contents of the buffer to swap before it can reuse the memory. In B's case any unmodified mmap'd pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap'd from. (The OS can detect which pages are unmodified by initially marking writable mmap'd pages as read only and catching seg faults, similar to Copy on Write strategy).
mmap is also useful for inter process communication. You can mmap a file as read / write in the processes that need to communicate and then use synchronization primitives in the mmap'd region (this is what the MAP_HASSEMAPHORE flag is for).
One place mmap can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. In this case you may have to map the file in smaller chunks than you would like to make it fit.
Another potential awkwardness with mmap as a replacement for read / write is that you have to start your mapping on offsets of the page size. If you just want to get some data at offset X you will need to fixup that offset so it's compatible with mmap.
And finally, read / write are the only way you can work with some types of files. mmap can't be used on things like pipes and ttys.
One area where I found mmap() to not be an advantage was when reading small files (under 16K). The overhead of page faulting to read the whole file was very high compared with just doing a single read() system call. This is because the kernel can sometimes satisify a read entirely in your time slice, meaning your code doesn't switch away. With a page fault, it seemed more likely that another program would be scheduled, making the file operation have a higher latency.
mmap has the advantage when you have random access on big files. Another advantage is that you access it with memory operations (memcpy, pointer arithmetic), without bothering with the buffering. Normal I/O can sometimes be quite difficult when using buffers when you have structures bigger than your buffer. The code to handle that is often difficult to get right, mmap is generally easier. This said, there are certain traps when working with mmap.
As people have already mentioned, mmap is quite costly to set up, so it is worth using only for a given size (varying from machine to machine).
For pure sequential accesses to the file, it is also not always the better solution, though an appropriate call to madvise can mitigate the problem.
You have to be careful with alignment restrictions of your architecture(SPARC, itanium), with read/write IO the buffers are often properly aligned and do not trap when dereferencing a casted pointer.
You also have to be careful that you do not access outside of the map. It can easily happen if you use string functions on your map, and your file does not contain a \0 at the end. It will work most of the time when your file size is not a multiple of the page size as the last page is filled with 0 (the mapped area is always in the size of a multiple of your page size).
In addition to other nice answers, a quote from Linux system programming written by Google's expert Robert Love:
Advantages of mmap( )
Manipulating files via mmap( ) has a handful of advantages over the
standard read( ) and write( ) system calls. Among them are:
Reading from and writing to a memory-mapped file avoids the
extraneous copy that occurs when using the read( ) or write( ) system
calls, where the data must be copied to and from a user-space buffer.
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch
overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable
mappings are shared in their entirety; private writable mappings have
their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for the lseek( ) system call.
For these reasons, mmap( ) is a smart choice for many applications.
Disadvantages of mmap( )
There are a few points to keep in mind when using mmap( ):
Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an
integer number of pages is "wasted" as slack space. For small files, a
significant percentage of the mapping may be wasted. For example, with
4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process' address space. With a 32-bit address space, a very large number of various-sized mappings
can result in fragmentation of the address space, making it hard to
find large free contiguous regions. This problem, of course, is much
less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is
generally obviated by the elimination of the double copy mentioned in
the previous section, particularly for larger and frequently accessed
files.
For these reasons, the benefits of mmap( ) are most greatly realized
when the mapped file is large (and thus any wasted space is a small
percentage of the total mapping), or when the total size of the mapped
file is evenly divisible by the page size (and thus there is no wasted
space).
Memory mapping has a potential for a huge speed advantage compared to traditional IO. It lets the operating system read the data from the source file as the pages in the memory mapped file are touched. This works by creating faulting pages, which the OS detects and then the OS loads the corresponding data from the file automatically.
This works the same way as the paging mechanism and is usually optimized for high speed I/O by reading data on system page boundaries and sizes (usually 4K) - a size for which most file system caches are optimized to.
An advantage that isn't listed yet is the ability of mmap() to keep a read-only mapping as clean pages. If one allocates a buffer in the process's address space, then uses read() to fill the buffer from a file, the memory pages corresponding to that buffer are now dirty since they have been written to.
Dirty pages can not be dropped from RAM by the kernel. If there is swap space, then they can be paged out to swap. But this is costly and on some systems, such as small embedded devices with only flash memory, there is no swap at all. In that case, the buffer will be stuck in RAM until the process exits, or perhaps gives it back withmadvise().
Non written to mmap() pages are clean. If the kernel needs RAM, it can simply drop them and use the RAM the pages were in. If the process that had the mapping accesses it again, it cause a page fault the kernel re-loads the pages from the file they came from originally. The same way they were populated in the first place.
This doesn't require more than one process using the mapped file to be an advantage.

Resources