Is it possible to unpage all memory in Windows? - c

I have plenty of RAM, however, after starting and finishing a large number of processes, it seems that most of the applications' virtual memory has been paged to disk, and switching to any of the older processes requires a very long time to load the memory back into RAM.
Is there a way, either via Windows API or via kernel call, to get Windows to unpage all (or as much as possible) memory? Maybe by stepping through the list of running processes and get the memory manager to unpage each process's memory?

Update 3: I've uploaded my complete program to github.
OK, based on the replies so far, here's a naive suggestion for a tool that tries to get all applications back into physical memory:
Allocate a small chunk of memory X, maybe 4MB. (Should it be non-pageable?)
Iterate over all processes:
For each process, copy chunks of its memory to X.
(Possibly suspending the process first?)
Suppose you have 2GB of RAM, and only 1GB is actually required by processes. If everything is in physical memory, you'd only copy 256 chunks, not the end of the world. At the end of the day, there's a good chance that all processes are now entirely in the physical memory.
Possible convenience and optimisation options:
Check first that the total required space is no more than, say, 50% of the total physical space.
Optionally only run on processes owned by the current user, or on a user-specified list.
Check first whether each chunk of memory is actually paged to disk or not.
I can iterate over all processes using EnumProcesses(); I'd be grateful for any suggestions how to copy an entire process's memory chunk-wise.
Update: Here is my sample function. It takes the process ID as its argument and copies one byte from each good page of the process. (The second argument is the maximal process memory size, obtainable via GetSystemInfo().)
void UnpageProcessByID(DWORD processID, LPVOID MaximumApplicationAddress, DWORD PageSize)
{
MEMORY_BASIC_INFORMATION meminfo;
LPVOID lpMem = NULL;
// Get a handle to the process.
HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, processID);
// Do the work
if (NULL == hProcess )
{
fprintf(stderr, "Could not get process handle, skipping requested process ID %u.\n", processID);
}
else
{
SIZE_T nbytes;
unsigned char buf;
while (lpMem < MaximumApplicationAddress)
{
unsigned int stepsize = PageSize;
if (!VirtualQueryEx(hProcess, lpMem, &meminfo, sizeof(meminfo)))
{
fprintf(stderr, "Error during VirtualQueryEx(), skipping process ID (error code %u, PID %u).\n", GetLastError(), processID);
break;
}
if (meminfo.RegionSize < stepsize) stepsize = meminfo.RegionSize;
switch(meminfo.State)
{
case MEM_COMMIT:
// This next line should be disabled in the final code
fprintf(stderr, "Page at 0x%08X: Good, unpaging.\n", lpMem);
if (0 == ReadProcessMemory(hProcess, lpMem, (LPVOID)&buf, 1, &nbytes))
fprintf(stderr, "Failed to read one byte from 0x%X, error %u (%u bytes read).\n", lpMem, GetLastError(), nbytes);
else
// This next line should be disabled in the final code
fprintf(stderr, "Read %u byte(s) successfully from 0x%X (byte was: 0x%X).\n", nbytes, lpMem, buf);
break;
case MEM_FREE:
fprintf(stderr, "Page at 0x%08X: Free (unused), skipping.\n", lpMem);
stepsize = meminfo.RegionSize;
break;
case MEM_RESERVE:
fprintf(stderr, "Page at 0x%08X: Reserved, skipping.\n", lpMem);
stepsize = meminfo.RegionSize;
break;
default:
fprintf(stderr, "Page at 0x%08X: Unknown state, panic!\n", lpMem);
}
//lpMem = (LPVOID)((DWORD)meminfo.BaseAddress + (DWORD)meminfo.RegionSize);
lpMem += stepsize;
}
}
CloseHandle(hProcess);
}
Question: Does the region by whose size I increment consist of at most one page, or am I missing pages? Should I try to find out the page size as well and only increment by the minimum of region size and page size? Update 2: Page size is only 4kiB! I changed the above code to increment only in 4kiB steps. In the final code we'd get rid of the fprintf's inside the loop.

Well, it isn't hard to implement yourself. Use VirtualQueryEx() to discover the virtual addresses used by a process, ReadProcessMemory() to force the pages to get reloaded.
It isn't likely to going to make any difference at all, it will just be your program that takes forever to do its job. The common diagnostic for slow reloading of pages is a fragmented paging file. Common on Windows XP for example when the disk hasn't been defragged in a long time and it was allowed to fill close to capacity frequently. The SysInternals' PageDefrag utility can help fix the problem.

No, windows provides no such feature natively. Programs such as Cacheman and RAM IDLE accomplish this by simply allocating a large chunk of RAM, forcing other things to page to disk, which effectively accomplishes what you want.

Related

VirtualAlloc() working on certain memory addresses but not others

I'm running a program called VMdriver5.exe (left side of image I attached), that creates a process of another program called VMmapper.exe. Inside the VMdriver5.c code, it gets its PID (using GetCurrentProcessId()) and passes it's PID to VMmapper.exe upon creation, such that VMmapper shows the memory layout of VMdriver5.exe so I can see the virtual memory options.
I was wondering if I could get some help, because, when I call VirtualAlloc() on certain memory addresses, it works fine. However, as you can see in the image below, there is a region of memory with 503,808 bytes free, and I attempt to reserve 65,536 bytes of that space using VirtualAlloc(), but it fails for some reason. I pass the memory address 00185000 as one of the parameters.
The code I'm using is this:
lpvAddr = VirtualAlloc(vmAddress, units << 16, MEM_RESERVE, flProtect); // only works with PAGE_READONLY
if(lpvAddr == NULL) {
printf("Case 1: Reserve a region, VirtualAlloc() failed. Error: %ld\n", GetLastError());
}
else {
printf("Committed %lu bytes at address 0x%lp\n",dwPageSize, lpvAddr); // %lu = long unsigned decimal integer
}
break;
I would appreciate any help and can provide any more code or info if needed. Thanks guys and happy easter.

Is it possible to "punch holes" through mmap'ed anonymous memory?

Consider a program which uses a large number of roughly page-sized memory regions (say 64 kB or so), each of which is rather short-lived. (In my particular case, these are alternate stacks for green threads.)
How would one best do to allocate these regions, such that their pages can be returned to the kernel once the region isn't in use anymore? The naïve solution would clearly be to simply mmap each of the regions individually, and munmap them again as soon as I'm done with them. I feel this is a bad idea, though, since there are so many of them. I suspect that the VMM may start scaling badly after a while; but even if it doesn't, I'm still interested in the theoretical case.
If I instead just mmap myself a huge anonymous mapping from which I allocate the regions on demand, is there a way to "punch holes" through that mapping for a region that I'm done with? Kind of like madvise(MADV_DONTNEED), but with the difference that the pages should be considered deleted, so that the kernel doesn't actually need to keep their contents anywhere but can just reuse zeroed pages whenever they are faulted again.
I'm using Linux, and in this case I'm not bothered by using Linux-specific calls.
I did a lot of research into this topic (for a different use) at some point. In my case I needed a large hashmap that was very sparsely populated + the ability to zero it every now and then.
mmap solution:
The easiest solution (which is portable, madvise(MADV_DONTNEED) is linux specific) to zero a mapping like this is to mmap a new mapping above it.
void * mapping = mmap(MAP_ANONYMOUS);
// use the mapping
// zero certain pages
mmap(mapping + page_aligned_offset, length, MAP_FIXED | MAP_ANONYMOUS);
The last call is performance wise equivalent to subsequent munmap/mmap/MAP_FIXED, but is thread safe.
Performance wise the problem with this solution is that the pages have to be faulted in again on a subsequence write access which issues an interrupt and a context change. This is only efficient if very few pages were faulted in in the first place.
memset solution:
After having such crap performance if most of the mapping has to be unmapped I decided to zero the memory manually with memset. If roughly over 70% of the pages are already faulted in (and if not they are after the first round of memset) then this is faster then remapping those pages.
mincore solution:
My next idea was to actually only memset on those pages that have been faulted in before. This solution is NOT thread-safe. Calling mincore to determine if a page is faulted in and then selectively memset them to zero was a significant performance improvement until over 50% of the mapping was faulted in, at which point memsetting the entire mapping became simpler (mincore is a system call and requires one context change).
incore table solution:
My final approach which I then took was having my own in-core table (one bit per page) that says if it has been used since the last wipe. This is by far the most efficient way since you will only be actually zeroing the pages in each round that you actually used. It obviously also is not thread safe and requires you to track which pages have been written to in user space, but if you need this performance then this is by far the most efficient approach.
I don't see why doing lots of calls to mmap/munmap should be that bad. The lookup performance in the kernel for mappings should be O(log n).
Your only options as it seems to be implemented in Linux right now is to punch holes in the mappings to do what you want is mprotect(PROT_NONE) and that is still fragmenting the mappings in the kernel so it's mostly equivalent to mmap/munmap except that something else won't be able to steal that VM range from you. You'd probably want madvise(MADV_REMOVE) work or as it's called in BSD - madvise(MADV_FREE). That is explicitly designed to do exactly what you want - the cheapest way to reclaim pages without fragmenting the mappings. But at least according to the man page on my two flavors of Linux it's not fully implemented for all kinds of mappings.
Disclaimer: I'm mostly familiar with the internals of BSD VM systems, but this should be quite similar on Linux.
As in the discussion in comments below, surprisingly enough MADV_DONTNEED seems to do the trick:
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>
int
main(int argc, char **argv)
{
int ps = getpagesize();
struct rusage ru = {0};
char *map;
int n = 15;
int i;
if ((map = mmap(NULL, ps * n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)) == MAP_FAILED)
err(1, "mmap");
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
printf("unnecessary printf to fault stuff in: %d %ld\n", map[0], ru.ru_minflt);
/* Unnecessary call to madvise to fault in that part of libc. */
if (madvise(&map[ps], ps, MADV_NORMAL) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
if (madvise(map, ps * n, MADV_DONTNEED) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
return 0;
}
I'm measuring ru_minflt as a proxy to see how many pages we needed to allocate (this isn't exactly true, but the next sentence makes it more likely). We can see that we get new pages in the third printf because the contents of map[0] are 0.

Strategy for recovering from NULL == malloc() due to memory exhaustion

Reading Martin Sustrick's blog on challenges attendant with preventing "undefined behavior" in C++, vs C, in particular the problem attendant with malloc() failing due to memory exhaustion, I was reminded of the many, many times I have been frustrated to know what to do in such cases.
With virtual systems such conditions are rare, but on embedded platforms, or where performance degradation attendant with hitting the virtual system equates to failure, as is Martin's case with ZeroMQ, I resolved to find a workable solution, and did.
I wanted to ask the readers of StackOverflow if they've tried this approach, and what their experience with it was.
The solution is to allocate a chunk of spare memory off the heap with a call to malloc() at the start of the program, and then use that pool of spare memory to stave off memory exhaustion when and if it occurs. The idea is to prevent capitulation in favor of an orderly retreat (I was reading the accounts of Kesselring's defense of Italy last night) where error messages and IP sockets and such will work long enough to (hopefully) at least tell the user what happened.
#define SPARE_MEM_SIZE (1<<20) // reserve a megabyte
static void *gSpareMem;
// ------------------------------------------------------------------------------------------------
void *tenacious_malloc(int requested_allocation_size) {
static int remaining_spare_size = 0; // SPARE_MEM_SIZE;
char err_msg[512];
void *rtn = NULL;
// attempt to re-establish the full size of spare memory, if it needs it
if (SPARE_MEM_SIZE != remaining_spare_size) {
if(NULL != (gSpareMem = realloc(gSpareMem, SPARE_MEM_SIZE))) {
remaining_spare_size = SPARE_MEM_SIZE;
// "touch" the memory so O/S will allocate physical memory
meset(gSpareMem, 0, SPARE_MEM_SIZE);
printf("\nSize of spare memory pool restored successfully in %s:%s at line %i :)\n",
__FILE__, __FUNCTION__, __LINE__);
} else {
printf("\nUnable to restore size of spare memory buffer.\n");
}
}
// attempt a plain, old vanilla malloc() and test for failure
if(NULL != (rtn = malloc(requested_allocation_size))) {
return rtn;
} else {
sprintf(err_msg, "\nInitial call to malloc() failed in %s:%s at line %i",
__FILE__, __FUNCTION__, __LINE__);
if(remaining_spare_size < requested_allocation_size) {
// not enough spare storage to satisfy the request, so no point in trying
printf("%s\nRequested allocaton larger than remaining pool. :(\n\t --- ABORTING --- \n", err_msg);
return NULL;
} else {
// take the needed storage from spare memory
printf("%s\nRetrying memory allocation....\n", err_msg);
remaining_spare_size -= requested_allocation_size;
if(NULL != (gSpareMem = realloc(gSpareMem, remaining_spare_size))) {
// return malloc(requested_allocation_size);
if(NULL != (rtn = malloc(requested_allocation_size))) {
printf("Allocation from spare pool succeeded in %s:%s at line %i :)\n",
__FILE__, __FUNCTION__, __LINE__);
return rtn;
} else {
remaining_spare_size += requested_allocation_size;
sprintf(err_msg, "\nRetry of malloc() after realloc() of spare memory pool "
"failed in %s:%s at line %i :(\n", __FILE__, __FUNCTION__, __LINE__);
return NULL;
}
} else {
printf("\nRetry failed.\nUnable to allocate requested memory from spare pool. :(\n");
return NULL;
}
}
}
}
// ------------------------------------------------------------------------------------------------
int _tmain(int argc, _TCHAR* argv[]) {
int *IntVec = NULL;
double *DblVec = NULL;
char *pString = NULL;
char String[] = "Every good boy does fine!";
IntVec = (int *) tenacious_malloc(100 * sizeof(int));
DblVec = (double *) tenacious_malloc(100 * sizeof(double));
pString = (char *)tenacious_malloc(100 * sizeof(String));
strcpy(pString, String);
printf("\n%s", pString);
printf("\nHit Enter to end program.");
getchar();
return 0;
}
The best strategy is to aim for code that works without allocations. In particular, for a correct, robust program, all failure paths must be failure-case-free, which means you can't use allocation in failure paths.
My preference, whenever possible, is to avoid any allocations once an operation has started, instead determining the storage needed and allocating it all prior to the start of the operation. This can greatly simplify program logic and makes testing much easier (since there's a single point of possible failure you have to test). Of course it can also be more expensive in other ways; for example, you might have to make two passes over input data to determine how much storage you will need and then process it using the storage.
In regards to your solution of pre-allocating some emergency storage to use once malloc fails, there are basically two versions of this:
Simply calling free on the emergency storage then hoping malloc works again afterwards.
Going through your own wrapper layer for everything where the wrapper layer can directly use the emergency storage without ever freeing it.
The first approach has the advantage that even standard library and third-party library code can utilize the emergency space, but it has the disadvantage that the freed storage could be stolen by other processes, or threads in your own process, racing for it. If you're sure the memory exhaustion will come from exhausting virtual address space (or process resource limits) rather than system resources, and your process is single-threaded, you don't have to worry about the race, and you can fairly safely assume this approach will work. However, in general, the second approach is much safer, because you have an absolute guarantee that you can obtain the desired amount of emergency storage.
I don't really like either of these approaches, but they may be the best you can do.
On a modern 64 bit computer, you can malloc significantly more memory than you have RAM. In practice, malloc doesn't fail. What happens in practice is that your application starts thrashing, and once you have say 4GB of RAM and your allocations exceed that, your performance will drop to zero because you are swapping like mad. Your performance goes down so much that you never get to the point where malloc can't return memory.

Linux is not allowing me to access a fixed region of memory

I have some data stored in a FLASH memory that I need to access with C pointers to be able to make a non-Linux graphics driver work (I think this requirement is DMA related, not sure). Calling read works, but I don't want to have intermediate RAM buffers between the FLASH and the non-Linux driver.
However, just creating a pointer and storing the address that I want on it is making Linux emit an exception about invalid access on me.
void *ptr = 0xdeadbeef;
int a = *ptr; // invalid access!
What am I missing here? And could someone point me to a material to make this concepts clear for me?
I'm reading about mmap but I'm not sure that this is what I need.
The problem you have is that linux runs your program in a virtual address space. So every address you use directly in the code (like 0xdeadbeef) is a virtual address that gets translated by the memory management unit into a physical address which is not necessarily the same as your virtual address. This allows easy separation of multiple independent processes and other stuff like paging, etc.
The problem is now, that in your case no physical address is mapped to the virtual address 0xdeadbeef causing the kernel to abort execution.
The call mmap you already found asks the kernel to assign a specific file (from a specific offset) to a virtual address of your process. Note that the returning address of mmap could be a completely different address. So don't make any assumptions about the virtual address you get.
Therefore there are examples with mmap and /dev/mem out there where the offset for the memory device is the physical address. After the kernel was able to assign the file from the offset you gave to a virtual address of your process you can access the memory area asif it were a direct access.
After you don't need the area anymore don't forget to munmap the area. Otherwise you'll cause something similar to a memory leak.
One problem with the /dev/mem method is that the user running the process needs access to this device. This could introduce a security issue (e.g. Samsung recently introduced such a security hole in their hand held devices)
A more secure way is the way described in a article i found (The Userspace I/O HOWTO) as you still have control about the memory areas accessable by the user's process.
You need to access the memory differently. Basically you need to open /dev/mem and use mmap(). (as you suggested). Simple example:
int openMem(unsigned int address, unsigned int size)
{
int mmapFD;
int page_size;
unsigned int page_start_address;
/* Minimum page size for the mmapped region. */
mask = size - 1;
/* Get the page size. */
page_size = (int) sysconf(_SC_PAGE_SIZE);
/* We have to map shared memory to beginning of memory page so adjust
* memory address accordingly. */
page_start_address = address - (address % page_size);
/* Open the file that will be mapped. */
if((mmapFD = open("/dev/mem", (O_RDWR | O_SYNC))) == -1)
{
printf("Opening shared memory device failed\n");
return -1;
}
mmap_base_address = mmap(0, size, (PROT_READ|PROT_WRITE), MAP_SHARED, mmapFD, (off_t)page_start_address & ~mask);
if(mmap_base_address == MAP_FAILED)
{
printf("Mapping memory failed\n");
return -1;
}
return 0;
}
unsigned int *getAddress(unsigned int address)
{
unsigned int log_address;
log_address = (int)((off_t)mmap_base_address + ((off_t)address & mask));
return (unsigned int*)log_address;
}
...
result = openMem(address, 0x10000);
if (result < 0)
return result;
target_address = getValue(address);
*(unsigned int*)target_address = value;
This would set "value" to "address".
You need to call ioremap - something like:
void *myaddr = ioremap(0xdeadbeef, size);
where size is the size of your memory region. You probably want to use a page-aligned address for the first argument, e.g. 0xdeadb000 - but I expect your actual device isn't at "0xdeadbeef" anyways.
Edit: The call to ioremap must be done from a driver!

How to find if the underlying Linux Kernel supports Copy on Write?

For example, I am working on an ancient kernel and want to know whether it really implements Copy on Write. Is there a way ( preferably programattically in C ) to find out?
No, there isn't a reliable programmatic way to find that out from within a userland process.
The idea behind COW is that it should be fully transparent to the user code. Your code touches the individual pages, a page fault is invoked, the kernel copies the corresponding page and your process is resumed as if nothing had happened.
I casually stumbled upon this rather old question, and I see that other people already pointed out that it does not make much sense to "detect CoW" since Linux already implies CoW.
However I find this question pretty interesting, and while technically one should not be able to detect this kind of kernel mechanism which should be completely transparent to userspace processes, there actually are architecture specific ways (i.e. side-channels) that can be exploited to determine whether Copy on Write happens or not.
On x86 processors that support Restricted Transactional Memory, you can leverage the fact that memory transactions are aborted when an exception such as a page fault occurs. Given a valid address, this information can be used to detect if a page is resident in memory or not (similarly to the use of minicore(2)), or even to detect Copy on Write.
Here's a working example. Note: check that your processor supports RTM by looking at /proc/cpuinfo for the rtm flag, and compile using GCC without optimizations and with the -mrtm flag.
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <immintrin.h>
/* Use x86 transactional memory to detect a page fault when trying to write
* at the specified address, assuming it's a valid address.
*/
static int page_dirty(void *page) {
unsigned char *p = page;
if (_xbegin() == _XBEGIN_STARTED) {
*p = 0;
_xend();
/* Transaction successfully ended => no context switch happened to
* copy page into virtual memory of the process => page was dirty.
*/
return 1;
} else {
/* Transaction aborted => page fault happened and context was switched
* to copy page into virtual memory of the process => page wasn't dirty.
*/
return 0;
}
/* Should not happen! */
return -1;
}
int main(void) {
unsigned char *addr;
addr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap failed");
return 1;
}
// Write to trigger initial page fault and actually reserve memory
*addr = 123;
fprintf(stderr, "Initial state : %d\n", page_dirty(addr));
fputs("----- fork -----\n", stderr);
if (fork()) {
fprintf(stderr, "Parent before : %d\n", page_dirty(addr));
// Read (should NOT trigger Copy on Write)
*addr;
fprintf(stderr, "Parent after R: %d\n", page_dirty(addr));
// Write (should trigger Copy on Write)
*addr = 123;
fprintf(stderr, "Parent after W: %d\n", page_dirty(addr));
} else {
fprintf(stderr, "Child before : %d\n", page_dirty(addr));
// Read (should NOT trigger Copy on Write)
*addr;
fprintf(stderr, "Child after R : %d\n", page_dirty(addr));
// Write (should trigger Copy on Write)
*addr = 123;
fprintf(stderr, "Child after W : %d\n", page_dirty(addr));
}
return 0;
}
Output on my machine:
Initial state : 1
----- fork -----
Parent before : 0
Parent after R: 0
Parent after W: 1
Child before : 0
Child after R : 0
Child after W : 1
As you can see, writing to pages marked as CoW (in this case after fork), causes the transaction to fail because a page fault exception is triggered and causes a transaction abort. The changes are reverted by hardware before the transaction is aborted. After writing to the page, trying to do the same thing again results in the transaction correctly terminating and the function returning 1.
Of course, this should not really be used seriously, but merely be taken as a fun and interesting exercise. Since RTM transactions are aborted for any kind of exception and also for context switch, false negatives are possible (for example if the process is preempted by the kernel right in the middle of the transaction). Keeping the transaction code really short (in the above case just a branch and an assignment *p = 0) is essential. Multiple tests could also be made to avoid false negatives.

Resources