File Holes and Shared Memory in Linux? - c

Since I read the Linux Programming Interface book (very good read) I discovered the file holes in Linux. Therefore using Unix file formats like Ext4 one can open a new file write 1000 bytes seek to the position of 1000.000.000 write another 1000 bytes and depending the block size of the file format end up with a file consuming just 2048 bytes for a block size of 1024 or 512.
So basically the file is created as a 1GB + 1000 bytes long file where only two blocks of real drive space are used.
Can I erase the middle of a file forcing the system to deallocate those blocks on the drive?
Is there an equivalent where I allocate (shared memory) with or without a file backing it where it has also holes that are just filled as the memory pages are written?
Would be nice to allocate 1GB shared memory but never utilize it fully until necessary just to avoid remapping if the shared memory block should grow.

Can I erase the middle of a file forcing the system to deallocate those blocks on the drive?
You probably want the Linux specific fallocate(2); beware, it might not work on some filesystems (e.g. NFS, VFAT, ...), because some filesystems don't have holes. See also lseek(2) with SEEK_HOLE, posix_fadvise(2), madvise(2), memfd_create(2), etc...
Raw block devices (like a disk partition, or an USB key, or an SSD) don't have any holes (but you could mmap into them). Holes are a file system software artifact.
Would be nice to allocate 1GB shared memory but never utilize it
this is contradictory. If the memory is shared it is used (by the thing -generally another process- with which you share that memory). Read shm_overview(7) if you really want shared memory (and read carefully mmap(2)). Read more about virtual memory, address space, paging, MMUs, page faults, operating systems, kernels, mmap, demand paging, copy-on-write, memory map, ELF, sparse files ... Try also the cat /proc/$$/maps command in a terminal, and understand the output (see proc(5)...).
Perhaps you want to pre-allocate some address space range, and later really allocate the virtual memory. This is possible using Linux version of mmap(2).
To pre-allocate a gigabyte memory range, you'll first call mmap with MAP_NORESERVE
size_t onegiga = 1L<<30;
void* startad = mmap(NULL, onegiga, PROT_NONE,
MAP_ANONYMOUS|MAP_NORESERVE|MAP_SHARED,
-1, 0);
if (startad==MAP_FAILED) { perror("mmap MAP_NORESERVE"); exit(EXIT_FAILURE); }
void* endad = (char*)startad + onegiga;
The MAP_NORESERVE does not consume a lot of resources (i.e. does not eat swap space, which is not reserved, hence the name of the flag). It is pre-allocating address space, in the sense that further mmap calls (without MAP_FIXED) won't give an address inside the returned range (unless you munmap some of it).
Later on, you can allocate some subsegment of that, in multiples of the page size (generally 4Kbytes), using MAP_FIXED inside the previous segment, e.g.
size_t segoff = 1024*1024; // or something else such that ....
assert (segoff >=0 && segoff < onegiga && segoff % sysconf(_SC_PAGESIZE)==0);
size_t segsize = 65536; // or something else such that ....
assert (segsize > 0 && segsize % sysconf(_SC_PAGESIZE)==0
&& startad + segoffset + segsize < endad);
void* segmentad = mmap(startad + segoffset, segsize,
PROT_READ|PROT_WRITE,
MAP_FIXED | MAP_PRIVATE,
-1, 0);
if (segmentad == MAP_FAILED) { perror("mmap MAP_FIXED"); exit(EXIT_FAILURE); }
This re-allocation with MAP_FIXED will use some resources (e.g. consume some swap space).
IIRC, the SBCL runtime and garbage collection uses such tricks.
Read also Advanced Linux Programming and carefully syscalls(2) and the particular man pages of relevant system calls.
Read also about memory overcommitment. This is a Linux feature that I dislike and generally disable (e.g. thru proc(5)).
BTW, the Linux kernel is free software. You can download its source code from kernel.org and study the source code. And you can write some experimental code also. Then ask another more focused question showing your code and the results of your experiments.

Related

Reserved Memory Equals Shared Memory but Memory is Never Reserved

I am currently editing a program I inherited to be able to work with 23 GB files. As such, to maintain low memory, I am using mmap to load arrays which I had created in a previous program. However, I load these arrays, and then enter into a function and the shared and reserved memory spikes, even though I do not believe I ever allocate anything. When running, the memory starts at 0 and then quickly increases to 90% (~36GB as I have 40GB of ram) and stays there. Eventually, I start needing memory (less than 30GB) and the program then gets killed.
Usually, I would suspect that this issue would be due to allocation, or that I was somehow allocating memory. However, I am not allocating any memory (although I am reading in mmaped files).
The curious thing is that the memory reserved is equal to the amount of memory shared (see attached screenshot).
The functions that I wrote to access mmaped arrays:
double* loadArrayDouble(ssize_t size, char* backupFile, int *filedestination) {
*filedestination = open(backupFile, O_RDWR | O_CREAT, 0644);
if (*filedestination < 0) {
perror("open failed");
exit(1);
}
// make sure file is big enough
if (lseek(*filedestination,size*sizeof(double), SEEK_SET) == -1) {
perror("seek to len failed");
exit(1);
}
if (lseek(*filedestination, 0, SEEK_SET) == -1) {
perror("seek to 0 failed");
exit(1);
}
double *array1 = mmap(NULL, size*sizeof(double), PROT_READ | PROT_WRITE, MAP_SHARED, *filedestination, 0);
if (array1 == MAP_FAILED) {
perror("mmap failed");
exit(1);
}
return array1;
}
Please let me know if there is any other code to include.. It appears that the memory increases significantly even though double* file1 = loadArrayDouble(SeqSize, "/home/HonoredTarget/file1", &fileIT); is called multiple times (for each of the 6 arrays)
"Res" is short for "resident", not "reserved". Resident memory refers to the process memory which the kernel happens to have resident at the moment; the virtual memory system might drop a resident page at any moment, so it's not in any way a limitation. However, the kernel attempts not to swap out pages which seem to be active. The OOM killer will act if your process is churning too many pages in and out of memory. If you use data sequentially, then it usually doesn't matter how much you have mmap'ed, because only the recent pages will be resident. But if you skip around in the memory, reading a bit here and writing a bit there, then you'll create more churn. That seems like what is happening.
"shr" (shared) memory does in fact refer to memory which could be shared with another process (whether or not it actually is shared with another process). The fact that you use MAP_SHARED means that it is not surprising that all of your mmap'ed pages are shared. You need MAP_SHARED if your program modifies the data in the file, which I guess it does.
The "virt" (virtual) column measures how much of your address spaced you've actually mapped (including memory mapped to anonymous backing storage by whatever dynamic allocation library you're using.) 170G seems a bit high to me. If you have six 23GB files mapped simultaneously, that would be 138GB. But perhaps those numbers were just estimates. Anyway, it doesn't matter that much, as long as you're within the virtual memory limits you've set. (Although page tables do occupy real memory, so there is some effect.)
Memory mapping does not save you memory, really. When you mmap a file, the contents of the file still need to be read into memory in order for your program to use the data. The big advantage to mmap is that you don't have to futz around with allocating buffers and issuing read calls. Also, there is no need to copy data from the kernel buffer into which the file is read. So it can be a lot easier and more efficient, but not always; it depends a lot on the precise access pattern.
One thing to note: the following snippet does not do what the comment says it does:
// make sure file is big enough
if (lseek(*filedestination,size*sizeof(double), SEEK_SET) == -1) {
perror("seek to len failed");
exit(1);
}
lseek only sets the file position for the next read or write operation. If the file does not extend to that point, you'll get an EOF indication when you read, or the file will be extended (sparsely) if you write. So there's really not much point. If you want to check the file size, use stat. Or make sure you read at least one byte after doing the seek.
There's also not a lot of point using O_CREAT in the open call, since if the file doesn't exist and thus gets created, it will have size 0, which is presumably an error. Leaving O_CREAT off means the open call will fail if the file doesn't exist, which is likely what you want.
Finally, if you are not actually modifying the file contents, don't mmap with PROT_WRITE. PROT_READ pages are a lot easier for the kernel to deal with, because they can just be dropped and read back in later. (For writable pages, the kernel keeps track of the fact that the page has been modified, but if you aren't planning on writing and you don't allow modification, that makes the kernel's task a bit easier.)
Since you are (apparently) getting your process killed by the OOM killer, even though the memory you are using is MAP_SHARED (so never requires backing store -- it is auotmatically backed by the file), it would appear you are running your linux with no swap space, which is a bad idea if you have large mapped files like this, as it will cause processes to get killed whenever your resident memory approaches your physical memory. So the obvious solution would be to add a swap file -- even a small amount (1-2GB) will avoid the OOM killer problem. There are lots of tutorials online about how to add a swapfile in linux. You can look here or here or search for yourself.
If for some reason you don't want to add a swapfile, you may be able to reduce the frequency of getting killed by increasing the "swappiness" of your system -- this will cause the kernel to drop pages of your mmapped files more readily, thus reducing the likelyhood of getting into an OOM situation. You do this by increasing the vm.swappiness parameter either in your sysctl.conf file (for bootup) or by writing a new value to your /proc/sys/vm/swappiness file.

Segmentation fault when trying to access an element in a big array in C [duplicate]

I'm making a game where the world is divided into chunks of data describing the world. I keep the chunks in a dynamically allocated array so I have to use malloc() when initializing the world's data structures.
Reading the malloc() man page, there is a Note as follows:
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available. In case it turns out that the
system is out of memory, one or more processes will be killed by the
OOM killer. For more information, see the description of
/proc/sys/vm/overcommit_memory and /proc/sys/vm/oom_adj in proc(5), and the Linux kernel source file
Documentation/vm/overcommit-accounting.
If Linux is set to use optimistic memory allocation then does this mean it doesn't always return the full amount of memory I requested in the call to malloc()?
I read that optimistic memory allocation an be disabled by modifying the kernel, but I don't want to do that.
So is there a way to check whether the program has allocated the requested amount?
This is not something you need to deal with from an application perspective. Users who don't want random processes killed by the "OOM killer" will disable overcommit themselves via
echo "2" > /proc/sys/vm/overcommit_memory
This is their choice, not yours.
But from another standpoint, it doesn't matter. Typical "recommended" amounts of swap are so ridiculous that no reasonable amount of malloc is going to fail to have physical storage to back it. However, you could easily allocate so much (even with forced MAP_POPULATE or manually touching it all) to keep the system thrashing swap for hours/days/weeks. There is no canonical way to ask the system to notify you and give an error if the amount of memory you want is going to bog down the system swapping.
The whole situation is a mess, but as an application developer, your role in the fix is just to use malloc correctly and check for a null return value. The rest of the responsibility is on distributions and the kernel maintainers.
Instead of malloc you can allocate the necessary memory directly by mmap, with MAP_POPULATE that advises the kernel to map the pages immediately.
#include <sys/mman.h>
// allocate length bytes and prefault the memory so
// that it surely is mapped
void *block = mmap(NULL, length, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE,
-1, 0);
// free the block allocated previously
// note, you need to know the size
munmap(block, length);
But the better alternative is that usually the world is saved to a file, so you would mmap the contents directly from a file:
int fd = open('world.bin', 'r+');
void *block = mmap(NULL, <filesize>, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, 0);
The file world.bin is mapped into the memory starting from address block; all changes to the memory would also written transparently to the file - no need to worry if there is enough RAM as linux will take care of mapping the pages in and out automatically.
Do note that some of these flags are not defined unless you have a certain feature test macro defined:
Certain flags constants are defined only if either _BSD_SOURCE or
_SVID_SOURCE is defined. (Requiring _GNU_SOURCE also suffices, and requiring that macro specifically would have been more logical, since
these flags are all Linux-specific.) The relevant flags are:
MAP_32BIT, MAP_ANONYMOUS (and the synonym MAP_ANON),
MAP_DENYWRITE, MAP_EXECUTABLE, MAP_FILE, MAP_GROWSDOWN, MAP_HUGETLB,
MAP_LOCKED, MAP_NONBLOCK, MAP_NORESERVE, MAP_POPULATE, and MAP_STACK.

Fail to allocate a large amount of virtual memory

I read that when you try to allocate more bytes than are available in RAM using malloc(), it allocates virtual memory. At least on Linux.
I want to allocate a huge amount of virtual memory, lets say 100 GB. So, I wrote something like this:
void* virtual_memory = malloc(100 gb int);
But the returned pointer is NULL.
I execute this code on a 64-bit Ubuntu virtual machine.
What am I doing wrong?
EDIT
What I'm trying to achieve is to make htop tool displaying 100GB in VIRT column for my process.
UPDATE
I CAN call malloc to allocate 2 GB at once 50 times
I read that when you try to allocate more bytes than are available in RAM using malloc(), it allocates virtual memory
To start with, this is not correct. You always allocate virtual memory. This virtual memory is mapped to some area on the Physical memory(RAM) or the swap space. If the swap space + physical memory is less than 100 GBs, your allocation will fail. Also, the libc implementation might fail to allocate such a large amount, if it has some (programmable) limit set.
but I have a strange task to show up 100gb of virtual memory for the process in htop tool. And it's claimed to be achievable via single line of code.
Yes if you just need this much virtual memory, you can reserve memory but not commit it. You can read upon how mmap(*NIX) or VirtualAlloc(Windows) can be used for the same.
When you reserve a particular Virtual Address range, you tell the operating system that you intend to use this range, so other code can't use that. But it doesn't mean you can actually use it. This also means that it doesn't need a RAM/Swap backing. So you will be able to reserve arbitrarily large amount (less than 2^48 bytes on your 64 bit system of course).
Although I am not sure if htop will include that in the value it shows, you will have to try that out.
If this doesn't indeed add to your virtual memory count, you can map it to a file, instead of mapping it anonymously. This might create a 100 GB file on your system (assuming you have that much space), but you should even be able to read/write to it.
Following code can be used on linux -
int fd = open("temp.txt", O_RDWR | O_CREAT);
void* addr = mmap(NULL, 100 * GBS, PROT_WRITE | PROT_READ, MAP_PRIVATE, fd, 0);
The following code done the thing for me:
for (int i = 0; i < 50; ++i) {
malloc(int_pow(2, 31));
}
Where int_pow is just a custom pow implementation, which operates integers. After running this app htop tool shows that it uses exactly 100GB of virtual memory.

How to find holes in the address space?

I have a set of files whose lengths are all multiples of the page-size of my operating system (FreeBSD 10). I would like to mmap() these files to consecutive pages of RAM, giving me the ability to treat a collection of files as one large array of data.
Preferably using portable functions, how can I find a sufficiently large region of unmapped address space so I can be sure that a series of mmap() calls to this region is going to be successful?
Follow these steps:
First compute the total size needed by enumerating your files and summing their sizes.
Map a single area of anonymous memory of this size with mmap. If this fails, you lose.
Save the pointer and unmap the area (actually, unmap may not be necessary if your system's mmap with a fixed address implicitly unmaps any previous overlapping region).
Map the first file at this address with the appropriate MAP_FIXED flag.
Increment the address by the file size.
loop to step 4 until all files have been mmapped.
This should be fully portable to any POSIX system, but some OSes might have quirks that prevent this method. Try it.
You could mmap a large region where the size is the sum of the sizes of all files, using MAP_PRIVATE | MAP_ANON, and protection PROT_NONE which would prevent the OS from unnecessarily committing the memory charges.
This will reserve but not commit memory.
You could then open file filename1 at [baseAddr, size1) and open filename2 at [baseAddr + size1, baseAddr + size1 + size2), and so on.
I believe the flags for this are MAP_FIXED | MAP_PRIVATE.

Does mmap or malloc allocate RAM?

I know this is probably a stupid question but i've been looking for awhile and can't find a definitive answer. If I use mmap or malloc (in C, on a linux machine) does either one allocate space in RAM? For example, if I have 2GB of RAM and wanted to use all available RAM could I just use a malloc/memset combo, mmap, or is there another option I don't know of?
I want to write a series of simple programs that can run simultaneously and keep all RAM used in the process to force swap to be used, and pages swapped in/out frequently. I tried this already with the program below, but it's not exactly what I want. It does allocate memory (RAM?), and force swap to be used (if enough instances are running), but when I call sleep doesn't that just lock the memory from being used (so nothing is actually being swapped in or out from other processes?), or am I misunderstanding something.
For example, if I ran this 3 times would I be using 2GB (all) of RAM from the first two instances, and the third instance would then swap one of the previous two instances out (of RAM) and the current instance into RAM? Or would instance #3 just run using disk or virtual memory?
This brings up another point, would I need to allocate enough memory to use all available virtual memory as well for the swap partition to be used?
Lastly, would mmap (or any other C function. Hell, even another language if applicable) be better for doing this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MB(size) ( (size) * 1024 * 1024)
#define GB(size) ( (size) * 1024 * 1024 * 1024)
int main(){
char *p;
p = (char *)malloc(MB(512));
memset(p, 'T', MB(512));
printf(".5 GB allocated...\n");
char *q;
q = (char *)malloc(MB(512));
memset(q, 'T', MB(512));
printf("1 GB allocated...\n");
printf("Sleeping...\n");
sleep(300);
}
** Edit: I am using CentOS 6.4 (with 3.6.0 kernel) for my OS, if that helps any.
This is very OS/machine dependent.
In most OSes neither allocates RAM. They both allocate VM space. They make a certain range of your processes virtual memory valid for use. RAM is normally allocated later by the OS on first write. Until then those allocations do not use RAM (aside from the page table that lists them as valid VM space).
If you want to allocate physical RAM then you have to make each page (sysconf(_SC_PAGESIZE) gives you the system pagesize) dirty.
In Linux you can see your VM mappings with all details in /proc/self/smaps. Rss is your resident set of that mapping (how much is resident in RAM), everything else that is dirty will have been swapped out. All non-dirty memory will be available for use, but won't exist until then.
You can make all pages dirty with something like
size_t mem_length;
char (*my_memory)[sysconf(_SC_PAGESIZE)] = mmap(
NULL
, mem_length
, PROT_READ | PROT_WRITE
, MAP_PRIVATE | MAP_ANONYMOUS
, -1
, 0
);
int i;
for (i = 0; i * sizeof(*my_memory) < mem_length; i++) {
my_memory[i][0] = 1;
}
On some Implementations this can also be achieved by passing the MAP_POPULATE flag to mmap, but (depending on your system) it may just fail mmap with ENOMEM if you try to map more then you have RAM available.
Theory and practice differ greatly here. In theory, neither mmap nor malloc allocate actual RAM, but in practice they do.
mmap will allocate RAM to store a virtual memory area data structure (VMA). If mmap is used with an actual file to be mapped, it will (unless explicitly told differently) further allocate several pages of RAM to prefetch the mapped file's contents.
Other than that, it only reserves address space, and RAM will be allocated as it is accessed for the first time.
malloc, similarly, only logically reserves amounts of address space within the virtual address space of your process by telling the operating system either via sbrk or mmap that it wants to manage some (usually much larger than you request) area of address space. It then subdivides this huge area via some more or less complicated algorithm and finally reserves a portion of this address space (properly aligned and rounded) for your use and returns a pointer to it.
But: malloc also needs to store some additional information somewhere, or it would be impossible for free to do its job at a later time. At the very least free needs to know the size of an allocated block in addition to the start address. Usually, malloc therefore secretly allocates a few extra bytes which are immediately preceding the address that you get -- you don't know about that, it doesn't tell you.
Now the crux of the matter is that while in theory malloc does not touch the memory that it manages and does not allocate physical RAM, in practice it does. And this does indeed cause page faults and memory pages to be created (i.e. RAM being used).
You can verify this under Linux by keeping to call malloc and watch the OOP killer blast your process out of existence because the system runs out of physical RAM when in fact there should be plenty left.

Resources