On Mac my use of munmap results in seeing higher page reclaims.
The return value of my munmap is 0, which indicates that the requested pages where successfully unmapped.
Why do I see higher page reclaims when I test programs using memory I have mapped and unmapped in this way?
Is there a way to debug munmap and see if my calls to that function aren't doing anything to the mapped memory that is passed to it.
I used "/usr/bin/time -l" to see the amount of page reclaims I get from running my program. Whenever I use munmap my page reclaims get higher then when I don't.
int main(void)
{
int i = 0; char *addr;
while (i < 1024)
{
addr = mmap(0, getpagesize(), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
addr[0] = 23;
if (!munmap(addr, getpagesize()))
print("Success\n");
i++;
}
return (NULL);
}
on allocation
when I call munmap:
I pass it the same pointer it gave me.
I check the return value and check if it is 0 <-- this is what I get most of the time.
I made a test program where I call mmap 1024 times and munmap that number of times too.
When I don't call munmap the reclaimed pages are within the region of 1478 and the value is the same when I call munmap.
How can I check if my use of that memory is correct?
The important thing to remember about mmap is that the MAP_ANONYMOUS memory must be zeroed. So what happens usually is that a kernel will map a page frame with only zeroes in there - and only when a write hits the page, a read-write mapped zero page is mapped in place.
However, this is the reason why the kernel cannot reuse the originally mapped page right away - it does not know that only the first byte of the page is dirty - instead, it must zero all 4 kiB bytes on that page before it can be given back to the process in a new anonymous mapping. Hence in both examples there are at least 1024 page faults occurring.
If the memory would not need to be zeroed, Linux for example has an extra flag called MAP_UNINITIALIZED that tells kernel that the pages need not be zeroed, but it is only available in embedded devices:
MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to improve
performance on embedded devices. This flag is honored only if
the kernel was configured with the
CONFIG_MMAP_ALLOW_UNINITIALIZED
option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices
where one has complete control of the contents of user memory).
I guess the reason for its non-availability in generic Linux kernels is because the kernel does not keep track of the process that previously had mapped the page frame, hence the page could leak information from a sensitive process.
bzeroing the page yourself would not affect performance - the kernel would not know that it was zeroed because there is no architecture that would support it in hardware - and then it is cheaper to write zeroes over the page than to check if the page is full of all zeroes and then in 99.9999999 % cases to write zeroes over it anyway.
Related
I'm making a game where the world is divided into chunks of data describing the world. I keep the chunks in a dynamically allocated array so I have to use malloc() when initializing the world's data structures.
Reading the malloc() man page, there is a Note as follows:
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available. In case it turns out that the
system is out of memory, one or more processes will be killed by the
OOM killer. For more information, see the description of
/proc/sys/vm/overcommit_memory and /proc/sys/vm/oom_adj in proc(5), and the Linux kernel source file
Documentation/vm/overcommit-accounting.
If Linux is set to use optimistic memory allocation then does this mean it doesn't always return the full amount of memory I requested in the call to malloc()?
I read that optimistic memory allocation an be disabled by modifying the kernel, but I don't want to do that.
So is there a way to check whether the program has allocated the requested amount?
This is not something you need to deal with from an application perspective. Users who don't want random processes killed by the "OOM killer" will disable overcommit themselves via
echo "2" > /proc/sys/vm/overcommit_memory
This is their choice, not yours.
But from another standpoint, it doesn't matter. Typical "recommended" amounts of swap are so ridiculous that no reasonable amount of malloc is going to fail to have physical storage to back it. However, you could easily allocate so much (even with forced MAP_POPULATE or manually touching it all) to keep the system thrashing swap for hours/days/weeks. There is no canonical way to ask the system to notify you and give an error if the amount of memory you want is going to bog down the system swapping.
The whole situation is a mess, but as an application developer, your role in the fix is just to use malloc correctly and check for a null return value. The rest of the responsibility is on distributions and the kernel maintainers.
Instead of malloc you can allocate the necessary memory directly by mmap, with MAP_POPULATE that advises the kernel to map the pages immediately.
#include <sys/mman.h>
// allocate length bytes and prefault the memory so
// that it surely is mapped
void *block = mmap(NULL, length, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE,
-1, 0);
// free the block allocated previously
// note, you need to know the size
munmap(block, length);
But the better alternative is that usually the world is saved to a file, so you would mmap the contents directly from a file:
int fd = open('world.bin', 'r+');
void *block = mmap(NULL, <filesize>, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, 0);
The file world.bin is mapped into the memory starting from address block; all changes to the memory would also written transparently to the file - no need to worry if there is enough RAM as linux will take care of mapping the pages in and out automatically.
Do note that some of these flags are not defined unless you have a certain feature test macro defined:
Certain flags constants are defined only if either _BSD_SOURCE or
_SVID_SOURCE is defined. (Requiring _GNU_SOURCE also suffices, and requiring that macro specifically would have been more logical, since
these flags are all Linux-specific.) The relevant flags are:
MAP_32BIT, MAP_ANONYMOUS (and the synonym MAP_ANON),
MAP_DENYWRITE, MAP_EXECUTABLE, MAP_FILE, MAP_GROWSDOWN, MAP_HUGETLB,
MAP_LOCKED, MAP_NONBLOCK, MAP_NORESERVE, MAP_POPULATE, and MAP_STACK.
I have a call to mmap() which I try to map 64MB using MAP_ANONYMOUS as follows:
void *block = mmap(0, 67108864, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (block == MAP_FAILED)
exit(1);
I understand that to actually own the memory, I need to hit that block of memory. I want to add some sort of 0's or empty strings to actually own the memory. How would I do that? I tried the following, but that obviously segfaults (I know why it does):
char *temp = block;
for (int i = 0; i < 67108864; i++) {
*temp = '0';
temp++;
}
How would I actually gain ownership of that block by assigning something in that block?
Thanks!
Your process already owns the memory, but what I think you want is to make it resident. That is, you want the kernel to allocate physical memory for the mmaped region.
The kernel allocates a virtual memory area (VMA) for the process, but this just specifies a valid region and doesn't actually allocate physical pages (or frames as they are also sometimes called). To make the kernel allocate entries in the page table, all you need to do is force a page fault.
The easiest way to force a page fault is to touch the memory just like you're doing. Though, because your page size is almost certainly 4096 bytes, you really only need to read one byte every 4096 bytes thereby reducing the amount of work you actually need to do.
Finally, because you are setting the pages PROT_READ, you will actually want to read from each page rather than try to write.
Your question is not very well formulated. I don't understand why you think the process is not owning its memory obtained thru mmap?
Your newly mmap-ed memory zone has only PROT_READ (so you can just read the zeros inside) and you need that to be PROT_READ|PROT_WRITE to be able to write inside.
But your process already "owns" the memory once the mmap returned.
If the process has pid 1234, you could sequentially read (perhaps with cat /proc/1234/maps in a different terminal) its memory map thru /proc/1234/maps; from inside your process, use /proc/self/maps.
Maybe you are interested in memory overcommit; there is a way to disable that.
Perhaps the mincore(2), msync(2), mlock(2) syscalls are interesting you.
Maybe you want the MAP_POPULATE or MAP_LOCKED flag of mmap(2)
I actually don't understand why you say "own the memory" in your question, which I don't understand very well. If you just want to disable memory overcommit, please tell.
And you might also mmap some file segment. I believe there is no possible overcommit in that case. But I would just suggest to disable memory overcommit in your entire system, thru /proc/sys/vm/overcommit_memory.
The man page for mlockall on my kernel 3.0 says
mlockall() locks all pages mapped into the address space of the
calling process. This includes the pages of the code, data and stack
segment, as well as shared libraries, user space kernel data, shared
memory, and memory-mapped files. All mapped pages are guaranteed
to be resident in RAM when the call returns successfully; the pages
are guaranteed to stay in RAM until later unlocked.
and later says
Real-time processes that are using mlockall() to prevent delays on
page faults should reserve enough locked stack pages before entering
the time-critical section, so that no page fault can be caused by
function
calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and
writes to the memory occupied by this array in order to touch these
stack pages. This way,
enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page
faults can occur in the critical section.
I understand that this system call can't guess the maximum stack size that will be reached and thus is unable to lock pages for the stack. But why the first part of the man displayed above says that it's also done for the stack ? Is there an error in this man page, or does it just mean that the locking is done for the initial stack size ?
Yes, locking is done for the current stack pages, but not for all possible future stack pages.
It's explained by that first sentence:
mlockall() locks all pages mapped into the address space of the calling process.
So if a page is mapped, it will be locked. If not, it won't.
It just mentions the stack in the original sentence because the stack memory is mapped separately from the heap memory. There's no special treatment for the stack, if it's mapped it'll be locked, otherwise it won't. So as the second section you quote says, it's important to grow the stack to the maximum size it will reach whilst your code is running before you call mlockall.
Actually, from a quick reading of the mm/mlock.c source code, I'd say it simply locks everything: all currently mapped pages.
static int do_mlockall(int flags)
{
struct vm_area_struct * vma, * prev = NULL;
unsigned int def_flags = 0;
if (flags & MCL_FUTURE)
def_flags = VM_LOCKED;
current->mm->def_flags = def_flags;
if (flags == MCL_FUTURE)
goto out;
for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;
newflags = vma->vm_flags | VM_LOCKED;
if (!(flags & MCL_CURRENT))
newflags &= ~VM_LOCKED;
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
}
out:
return 0;
}
Despite what larsmans said, I do think it also applies to all future pages if MCL_FUTURE is also specified.
In that case 'current->mm->def_flags is updated to include VM_LOCKED.
I am writing a program to leak memory( main memory ) to test how the system behaves with low system memory and swap memory. We are using the following loop which runs periodically and leaks memory
main(int argc, char* argv[] )
{
int arg_mem = argv[1];
while(1)
{
u_int_ptr =(unsigned int*) malloc(arg_mem * 1024 * 1024);
if( u_int_ptr == NULL )
printf("\n leakyapp Daemon FAILED due to insufficient available memory....");
sleep( arg_time );
}
}
Above loop runs for sometime and prints the message "leakyapp Daemon FAILED due to insufficient available memory...." . But when I run the command "free" I can see that running this program has no effect either on Main memory or Swap.
Am I doing something wrong ?
Physical memory is not committed to your allocations until you actually write into it.
If you have a kernel version after 2.6.23, use mmap() with the MAP_POPULATE flag instead of malloc():
u_int_ptr = mmap(NULL, arg_mem * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
if (u_int_ptr == MAP_FAILED)
/* ... */
If you have an older kernel, you'll have to touch each page in the allocation.
There might be some sort of copy-on-write optimization. I would suggest actually writing something to the memory you are allocating.
What is happening is that malloc requests argmem * 256 pages from the heap (assuming a 4 Kbyte page size). The heap in turn requests the memory from the operating system. However, all that does is create entries in the page table for the newly allocated memory block. No actual physical RAM is allocated to the process, except that required by the heap to track the malloc request.
As soon as the process tries to access one of those pages by reading or writing, a page fault is generated because the entry in the page table is effectively a dangling pointer. The operating system will then allocate a physical page to the process. It's only then that you'll see the available physical memory go down.
Since all new pages start completely zeroed out, Linux might employ a "copy on write" strategy to optimise page allocation. i.e. it might keep a single page totally zeroed and always allocate that one when a process tries to read from a previously unused page. Only when the process tries to write to that new page would it actually allocate a completely fresh page from physical RAM. I don't know if Linux actually does this, but if it does, merely reading from a new page is not going to be enough to increase physical memory usage.
So, your best strategy is to allocate your large block of RAM and then write something at 4096 byte intervals throughout it.
What does ulimit -m -v print?
Explanation: On any server OS, you can limit the amount of resources a process can allocate to make sure that a single runaway process can't bring down the whole machine.
I'm guessing (based on the command line argument) that you're using a desktop/server OS and not an embedded system.
Allocating memory like this is probably not consuming much RAM. Your memory allocation might not have even succeeded - on some OSs (e.g. Linux), malloc() can return non-NULL even when you ask for more memory than is available.
Without knowing what your OS is and exactly what you're trying to test, it's difficult to suggest anything specific, but you might want to look at more low level ways of allocating memory than malloc(), or ways of controlling the virtual memory system. On Linux you might want to look at mlock().
I think caf already explained it. Linux is usually configured to allow overcommitting memory. You allocate huge chunks of memory, but internally there happens nothing but just making a note that you process wants this huge chunk of memory. It's not before you try to write that chunk, that the kernel tries to find free virtual memory to satisfy the read/write access. This is a bit like flight booking: Airlines usually overbook the flights, because there's always a percentage of passengers who do not show up.
You can force the memory to be committed by writing to the chunk with memset() after allocation. calloc should work too.
Does protection flag affect the sharing between processes? If I have PROT_READ|PROT_WRITE -protected mmapped memory region, is it still fully shared as long as I haven't written into it?
int prot = PROT_READ|PROT_EXEC;
image = mmap(NULL, filesize, prot, MAP_PRIVATE, fildes, 0);
vs:
int prot = PROT_READ|PROT_WRITE|PROT_EXEC;
image = mmap(...)
I'd want to make small modification to small portion of the memory region after I've mapped it, then re-mprotect it all, because it's simpler than mprotecting small portions when I need to do so.
The question is whether it ends up forcing the whole file copied per process or just the portions I modified per process?
According to the mmap(2) man page on a recent Linux system, MAP_PRIVATE allocates the memory using copy-on-write (COW). This means, your memory will not be duplicated unless you make changes to it. As COW is an efficient method to implement this, I assume it is also done this way in other *NIX systems.
The memory for mmap is organized in equal-sized chunks, so called pages. Memory will always be mapped in multiples of the page size, i.e. whole pages. Each page can be swapped independently. So if you write something to this mmap'ed memory range, only at least one page has to be copied.
The page size depends on your system, on x86 it is usually 4096 bytes. If you are interested in the page size of your system, you can use sysconf(3).
#include <unistd.h>
long pagesize = sysconf(_SC_PAGESIZE);
The pointer you get from mmap() will already point to a multiple of the page size and you should pass mprotect() an address being aligned to a page boundary.