Strategy for recovering from NULL == malloc() due to memory exhaustion - c

Reading Martin Sustrick's blog on challenges attendant with preventing "undefined behavior" in C++, vs C, in particular the problem attendant with malloc() failing due to memory exhaustion, I was reminded of the many, many times I have been frustrated to know what to do in such cases.
With virtual systems such conditions are rare, but on embedded platforms, or where performance degradation attendant with hitting the virtual system equates to failure, as is Martin's case with ZeroMQ, I resolved to find a workable solution, and did.
I wanted to ask the readers of StackOverflow if they've tried this approach, and what their experience with it was.
The solution is to allocate a chunk of spare memory off the heap with a call to malloc() at the start of the program, and then use that pool of spare memory to stave off memory exhaustion when and if it occurs. The idea is to prevent capitulation in favor of an orderly retreat (I was reading the accounts of Kesselring's defense of Italy last night) where error messages and IP sockets and such will work long enough to (hopefully) at least tell the user what happened.
#define SPARE_MEM_SIZE (1<<20) // reserve a megabyte
static void *gSpareMem;
// ------------------------------------------------------------------------------------------------
void *tenacious_malloc(int requested_allocation_size) {
static int remaining_spare_size = 0; // SPARE_MEM_SIZE;
char err_msg[512];
void *rtn = NULL;
// attempt to re-establish the full size of spare memory, if it needs it
if (SPARE_MEM_SIZE != remaining_spare_size) {
if(NULL != (gSpareMem = realloc(gSpareMem, SPARE_MEM_SIZE))) {
remaining_spare_size = SPARE_MEM_SIZE;
// "touch" the memory so O/S will allocate physical memory
meset(gSpareMem, 0, SPARE_MEM_SIZE);
printf("\nSize of spare memory pool restored successfully in %s:%s at line %i :)\n",
__FILE__, __FUNCTION__, __LINE__);
} else {
printf("\nUnable to restore size of spare memory buffer.\n");
}
}
// attempt a plain, old vanilla malloc() and test for failure
if(NULL != (rtn = malloc(requested_allocation_size))) {
return rtn;
} else {
sprintf(err_msg, "\nInitial call to malloc() failed in %s:%s at line %i",
__FILE__, __FUNCTION__, __LINE__);
if(remaining_spare_size < requested_allocation_size) {
// not enough spare storage to satisfy the request, so no point in trying
printf("%s\nRequested allocaton larger than remaining pool. :(\n\t --- ABORTING --- \n", err_msg);
return NULL;
} else {
// take the needed storage from spare memory
printf("%s\nRetrying memory allocation....\n", err_msg);
remaining_spare_size -= requested_allocation_size;
if(NULL != (gSpareMem = realloc(gSpareMem, remaining_spare_size))) {
// return malloc(requested_allocation_size);
if(NULL != (rtn = malloc(requested_allocation_size))) {
printf("Allocation from spare pool succeeded in %s:%s at line %i :)\n",
__FILE__, __FUNCTION__, __LINE__);
return rtn;
} else {
remaining_spare_size += requested_allocation_size;
sprintf(err_msg, "\nRetry of malloc() after realloc() of spare memory pool "
"failed in %s:%s at line %i :(\n", __FILE__, __FUNCTION__, __LINE__);
return NULL;
}
} else {
printf("\nRetry failed.\nUnable to allocate requested memory from spare pool. :(\n");
return NULL;
}
}
}
}
// ------------------------------------------------------------------------------------------------
int _tmain(int argc, _TCHAR* argv[]) {
int *IntVec = NULL;
double *DblVec = NULL;
char *pString = NULL;
char String[] = "Every good boy does fine!";
IntVec = (int *) tenacious_malloc(100 * sizeof(int));
DblVec = (double *) tenacious_malloc(100 * sizeof(double));
pString = (char *)tenacious_malloc(100 * sizeof(String));
strcpy(pString, String);
printf("\n%s", pString);
printf("\nHit Enter to end program.");
getchar();
return 0;
}

The best strategy is to aim for code that works without allocations. In particular, for a correct, robust program, all failure paths must be failure-case-free, which means you can't use allocation in failure paths.
My preference, whenever possible, is to avoid any allocations once an operation has started, instead determining the storage needed and allocating it all prior to the start of the operation. This can greatly simplify program logic and makes testing much easier (since there's a single point of possible failure you have to test). Of course it can also be more expensive in other ways; for example, you might have to make two passes over input data to determine how much storage you will need and then process it using the storage.
In regards to your solution of pre-allocating some emergency storage to use once malloc fails, there are basically two versions of this:
Simply calling free on the emergency storage then hoping malloc works again afterwards.
Going through your own wrapper layer for everything where the wrapper layer can directly use the emergency storage without ever freeing it.
The first approach has the advantage that even standard library and third-party library code can utilize the emergency space, but it has the disadvantage that the freed storage could be stolen by other processes, or threads in your own process, racing for it. If you're sure the memory exhaustion will come from exhausting virtual address space (or process resource limits) rather than system resources, and your process is single-threaded, you don't have to worry about the race, and you can fairly safely assume this approach will work. However, in general, the second approach is much safer, because you have an absolute guarantee that you can obtain the desired amount of emergency storage.
I don't really like either of these approaches, but they may be the best you can do.

On a modern 64 bit computer, you can malloc significantly more memory than you have RAM. In practice, malloc doesn't fail. What happens in practice is that your application starts thrashing, and once you have say 4GB of RAM and your allocations exceed that, your performance will drop to zero because you are swapping like mad. Your performance goes down so much that you never get to the point where malloc can't return memory.

Related

Physical memory management in Userspace?

I am working on an embedded device with only 512MB of RAM and the device is running Linux kernel. I want to do the memory management of all the processes running in the userspace by my own library. is it possible to do so. from my understanding, the memory management is done by kernel, Is it possible to have that functionality in User space.
If your embedded device runs Linux, it has an MMU. Controling the MMU is normally a privileged operation, so only an operating system kernel has access to it. Therefore the answer is: No, you can't.
Of course you can write software running directly on the device, without operating system, but I guess that's not what you wanted. You should probably take one step back, ask yourself what gave you the idea about the memory management and what could be a better way to solve this original problem.
You can consider using setrlimit. Refer another Q&A.
I wrote the test code and run it on my PC. I can see that memory usage is limited. The exact relationship of units requires further analysis.
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
int main(int argc, char* argv)
{
long limitSize = 1;
long testSize = 140000;
// 1. BEFORE: getrlimit
{
struct rlimit asLimit;
getrlimit(RLIMIT_AS, &asLimit);
printf("BEFORE: rlimit(RLIMIT_AS) = %ld,%ld\n", asLimit.rlim_cur, asLimit.rlim_max);
}
// 2. BEFORE: test malloc
{
char *xx = malloc(testSize);
if (xx == NULL)
perror("malloc FAIL");
else
printf("malloc(%ld) OK\n", testSize);
free(xx);
}
// 3. setrlimit
{
struct rlimit new;
new.rlim_cur = limitSize;
new.rlim_max = limitSize;
setrlimit(RLIMIT_AS, &new);
}
// 4. AFTER: getrlimit
{
struct rlimit asLimit;
getrlimit(RLIMIT_AS, &asLimit);
printf("AFTER: rlimit(RLIMIT_AS) = %ld,%ld\n", asLimit.rlim_cur, asLimit.rlim_max);
}
// 5. AFTER: test malloc
{
char *xx = malloc(testSize);
if (xx == NULL)
perror("malloc FAIL");
else
printf("malloc(%ld) OK\n", testSize);
free(xx);
}
return 0;
}
Result:
BEFORE: rlimit(RLIMIT_AS) = -1,-1
malloc(140000) OK
AFTER: rlimit(RLIMIT_AS) = 1,1
malloc FAIL: Cannot allocate memory
From what I understand of your question, you want to somehow use your own library for handling memory of kernel processes. I presume you are doing this to make sure that rogue processes don't use too much memory, which allows your process to use as much memory as is available. I believe this idea is flawed.
For example, imagine this scenario:
Total memory 512MB
Process 1 limit of 128MB - Uses 64MB
Process 2 imit of 128MB - Uses 64MB
Process 3 limit of 256MB - Uses 256MB then runs out of memory, when in fact 128MB is still available.
I know you THINK this is the answer to your problem, and on 'normal' embedded systems, this would probably work, but you are using a complex kernel, running processes you don't have total control over. You should write YOUR software to be robust when memory gets tight because that is all you can control.

How do userspace programs pass memory back to the kernel after free()?

I've been reading a lot about memory allocation on the heap and how certain heap management allocators do it.
Say I have the following program:
#include<stdlib.h>
#include<stdio.h>
#include<unistd.h>
int main(int argc, char *argv[]) {
// allocate 4 gigabytes of RAM
void *much_mems = malloc(4294967296);
// sleep for 10 minutes
sleep(600);
// free teh ram
free(*much_mems);
// sleep some moar
sleep(600);
return 0;
}
Let's say for sake of argument that my compiler doesn't optimize out anything above, that I can actually allocate 4GiB of RAM, that the malloc() call returns an actual pointer and not NULL, that size_t can hold an integer as big as 4294967296 on my given platform, that the allocater implemented by the malloc call actually does allocate that amount of RAM in the heap. Pretend that the above code does exactly what it looks like it will do.
After the call to free executes, how does the kernel know that those 4 GiB of RAM are now eligible for use for other processes and for the kernel itself? I'm not assuming the kernel is Linux, but that would be a good example. Before the call to free, this process has a heap size of at least 4GiB, and afterward, does it still have that heap size?
How do modern operating systems allow userspace programs to return memory back to kernel space? Do free implementations execute a syscall to the kernel (or many syscalls) to tell it which areas of memory are now available? And is it possible that my 4 GiB allocation will be non-contiguous?
Do free implementations execute a syscall to the kernel (or many syscalls) to tell it which areas of memory are now available?
Yes.
A modern implementation of malloc on Linux will call mmap to allocate a large amount of memory. The kernel will find an unused virtual address, mark it as allocated, and return it. (The kernel may also return an error if there isn't enough free memory)
free would then call munmap to deallocate the memory, passing the address and size of the allocation.
On Windows, malloc will call VirtualAlloc and free will call VirtualFree.
On GNU/Linux with Glibc, large memory allocations, of more than a few hundred kilobytes, are handled by calling mmap. When the free function is invoked on this, the library knows that the memory was allocated this way (thanks to meta-data stored in a header). It simply calls unmap on it to release it. That's how the kernel knows; its mmap and unmap API is being used.
You can see these calls if you run strace on the program.
The kernel keeps track of all mmap-ed regions using a red-black tree. Given an arbitrary virtual address, it can quickly determine whether it lands in the mmap area, and which mapping, by performing a tree walk.
Before the call to free, this process has a heap size of at least 4GiB...
The C language does not define either "heap" or "stack". Before the call to free, this process has a chunk of 4 GB dynamically allocated memory...
and afterward, does it still have that heap size?
...and after the free(), access to that memory would be undefined behaviour, so for practical purposes, that dynamically allocated memory is no longer "there".
What the library does "under the hood" (e.g. caching, see below) is up to the library, and is subject to change without further notice. This could change with the amount of available physical memory, system load, runtime parameters, ...
How do modern operating systems allow userspace programs to return memory back to kernel space?
It's up to the standard library's implementation to decide (which, of course, has to talk to the operating system to actually, physically allocate / free memory).
Others have pointed out how certain, existing implementations do it. Other libraries, operating systems, and environments exist.
Do free implementations execute a syscall to the kernel (or many syscalls) to tell it which areas of memory are now available?
Possibly. A common optimization done by library implementations is to "cache" free()d memory, so subsequent malloc() calls can be served without talking to the kernel (which is a costly operation). When, how much, and how long memory is cached this way is, you guessed it, implementation-defined.
And is it possible that my 4 GiB allocation will be non-contiguous?
The process will always "see" contiguous memory. In a system supporting virtual memory (i.e. "modern" desktop OS's like Linux or Windows), the physical memory might be non-contiguous, but the virtual addresses your process gets to see will be contiguous (or the malloc() would have failed if this requirement could not be serviced).
Again, other systems exist. You might be looking at a system that doesn't virtualize addresses (i.e. gives physical addresses to the process). You might be looking at a system that assigns a given amount of memory to a process on startup, serves any malloc() requests from that, and doesn't support the allocation of additional memory. And so on.
If we're using Linux as an example it uses mmap to allocate large chunks of memory. This means when you free it it gets umapped ie the kernel gets told that it can now unmap this memory. Read up on the brk and sbrk system calls. A good place to start would be here...
What does brk( ) system call do?
and here. The following post discusses how malloc is implemented which will give you a good idea what's happening under the covers...
How is malloc() implemented internally?
Doug Lea's malloc can be found here. It's well commented and public domain...
ftp://g.oswego.edu/pub/misc/malloc.c
malloc() and free() are kernel functions (system calls) . it is being called by the application to allocate and free memory on the heap.
application itself is not allocating/freeing memory .
the whole mechanism is executed at kernel level .
see the below heap implementation code
void *heap_alloc(uint32_t nbytes) {
heap_header *p, *prev_p; // used to keep track of the current unit
unsigned int nunits; // this is the number of "allocation units" needed by nbytes of memory
nunits = (nbytes + sizeof(heap_header) - 1) / sizeof(heap_header) + 1; // see how much we will need to allocate for this call
// check to see if the list has been created yet; start it if not
if ((prev_p = _heap_free) == NULL) {
_heap_base.s.next = _heap_free = prev_p = &_heap_base; // point at the base of the memory
_heap_base.s.alloc_sz = 0; // and set it's allocation size to zero
}
// now enter a for loop to find a block fo memory
for (p = prev_p->s.next;; prev_p = p, p = p->s.next) {
// did we find a big enough block?
if (p->s.alloc_sz >= nunits) {
// the block is exact length
if (p->s.alloc_sz == nunits)
prev_p->s.next = p->s.next;
// the block needs to be cut
else {
p->s.alloc_sz -= nunits;
p += p->s.alloc_sz;
p->s.alloc_sz = nunits;
}
_heap_free = prev_p;
return (void *)(p + 1);
}
// not enough space!! Try to get more from the kernel
if (p == _heap_free) {
// if the kernel has no more memory, return error!
if ((p = morecore()) == NULL)
return NULL;
}
}
}
this heap_alloc function uses morecore function which is implemented as below :
heap_header *morecore() {
char *cp;
heap_header *up;
cp = (char *)pmmngr_alloc_block(); // allocate more memory for the heap
// if cp is null we have no memory left
if (cp == NULL)
return NULL;
//vmmngr_mapPhysicalAddress(cp, (void *)_virt_addr); // and map it's virtual address to it's physical address
vmmngr_mapPhysicalAddress(vmmngr_get_directory(), _virt_addr, (uint32_t)cp, I86_PTE_PRESENT | I86_PTE_WRITABLE);
_virt_addr += BLOCK_SIZE; // tack on nu bytes to the virtual address; this will be our next allocation address
up = (heap_header *)cp;
up->s.alloc_sz = BLOCK_SIZE;
heap_free((void *)(up + 1));
return _heap_free;
}
as you can see this function is asking the physical memory manager to allocate a block :
cp = (char *)pmmngr_alloc_block();
and then map the allocated block into virtual memory :
vmmngr_mapPhysicalAddress(vmmngr_get_directory(), _virt_addr, (uint32_t)cp, I86_PTE_PRESENT | I86_PTE_WRITABLE);
as you can see , the whole story is being controlled by the heap manager in kernel level.

c program to calculate the amount of memory usage by the system?

How do i know the amount of memory used . i.e RAM usage ?
int main()
{
int i=0;
for(i=0;i<100;i++)
{
printf("%d\n",i);
}
return 0;
}
I want to write a code which calculate the amount of memory used by this program . May be like-
int main()
{
int i=0;
for(i=0;i<100;i++)
{
printf("%d\n",i);
}
printf("Amount of memory consumed=%f",SOME_FUNCTION());
return 0;
}
The getrusage system call will return a handful of information for the current process, among which is the "resident set size":
struct rusage usage;
if (!getrusage(RUSAGE_SELF, &usage)) {
printf("Maximum resident set size (KB): %ld\n", usage.ru_maxrss);
} else {
perror("getrusage");
}
This size equates to the amount of memory that is physically wired to the process and not the entire size of the virtual address space, parts of which might be paged-out or never loaded.
It's not easy to check how much memory your program uses on Linux system. but most likely what you want to check is the value of VmRSS in /proc/[pid]/status (or second column of /proc/[pid]/statm). VmRSS ("resident set size") is the amount of memory your process currently uses.
Other than that you may be interested in VmSize from /proc/[pid]/status (or first column of /proc/[pid]/statm). This is total memory your process uses, including memory swapped out, memory used by shared libraries, memory-mapped resources (which, in general, don't consume real RAM).
To get PID of your process, use getpid(). From within your process you could also check /proc/self/status.
A simple approach will to create a wrapper function for memory allocation and freeing and call the wrapper and put memory usage information in that. This can only be used in case of dynamic memory allocaion. e.g
#define ALLOC 1
#define FREE 2
mem_op(void * pointer,int size,int operation)
{
switch(operation)
{
static int mem_used;
case ALLOC:
// call malloc or alloc
mem_used = mem_used+size;
break;
case FREE:
// call free
mem_used = mem_used-size;
break;
}

Is it possible to "punch holes" through mmap'ed anonymous memory?

Consider a program which uses a large number of roughly page-sized memory regions (say 64 kB or so), each of which is rather short-lived. (In my particular case, these are alternate stacks for green threads.)
How would one best do to allocate these regions, such that their pages can be returned to the kernel once the region isn't in use anymore? The naïve solution would clearly be to simply mmap each of the regions individually, and munmap them again as soon as I'm done with them. I feel this is a bad idea, though, since there are so many of them. I suspect that the VMM may start scaling badly after a while; but even if it doesn't, I'm still interested in the theoretical case.
If I instead just mmap myself a huge anonymous mapping from which I allocate the regions on demand, is there a way to "punch holes" through that mapping for a region that I'm done with? Kind of like madvise(MADV_DONTNEED), but with the difference that the pages should be considered deleted, so that the kernel doesn't actually need to keep their contents anywhere but can just reuse zeroed pages whenever they are faulted again.
I'm using Linux, and in this case I'm not bothered by using Linux-specific calls.
I did a lot of research into this topic (for a different use) at some point. In my case I needed a large hashmap that was very sparsely populated + the ability to zero it every now and then.
mmap solution:
The easiest solution (which is portable, madvise(MADV_DONTNEED) is linux specific) to zero a mapping like this is to mmap a new mapping above it.
void * mapping = mmap(MAP_ANONYMOUS);
// use the mapping
// zero certain pages
mmap(mapping + page_aligned_offset, length, MAP_FIXED | MAP_ANONYMOUS);
The last call is performance wise equivalent to subsequent munmap/mmap/MAP_FIXED, but is thread safe.
Performance wise the problem with this solution is that the pages have to be faulted in again on a subsequence write access which issues an interrupt and a context change. This is only efficient if very few pages were faulted in in the first place.
memset solution:
After having such crap performance if most of the mapping has to be unmapped I decided to zero the memory manually with memset. If roughly over 70% of the pages are already faulted in (and if not they are after the first round of memset) then this is faster then remapping those pages.
mincore solution:
My next idea was to actually only memset on those pages that have been faulted in before. This solution is NOT thread-safe. Calling mincore to determine if a page is faulted in and then selectively memset them to zero was a significant performance improvement until over 50% of the mapping was faulted in, at which point memsetting the entire mapping became simpler (mincore is a system call and requires one context change).
incore table solution:
My final approach which I then took was having my own in-core table (one bit per page) that says if it has been used since the last wipe. This is by far the most efficient way since you will only be actually zeroing the pages in each round that you actually used. It obviously also is not thread safe and requires you to track which pages have been written to in user space, but if you need this performance then this is by far the most efficient approach.
I don't see why doing lots of calls to mmap/munmap should be that bad. The lookup performance in the kernel for mappings should be O(log n).
Your only options as it seems to be implemented in Linux right now is to punch holes in the mappings to do what you want is mprotect(PROT_NONE) and that is still fragmenting the mappings in the kernel so it's mostly equivalent to mmap/munmap except that something else won't be able to steal that VM range from you. You'd probably want madvise(MADV_REMOVE) work or as it's called in BSD - madvise(MADV_FREE). That is explicitly designed to do exactly what you want - the cheapest way to reclaim pages without fragmenting the mappings. But at least according to the man page on my two flavors of Linux it's not fully implemented for all kinds of mappings.
Disclaimer: I'm mostly familiar with the internals of BSD VM systems, but this should be quite similar on Linux.
As in the discussion in comments below, surprisingly enough MADV_DONTNEED seems to do the trick:
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>
int
main(int argc, char **argv)
{
int ps = getpagesize();
struct rusage ru = {0};
char *map;
int n = 15;
int i;
if ((map = mmap(NULL, ps * n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)) == MAP_FAILED)
err(1, "mmap");
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
printf("unnecessary printf to fault stuff in: %d %ld\n", map[0], ru.ru_minflt);
/* Unnecessary call to madvise to fault in that part of libc. */
if (madvise(&map[ps], ps, MADV_NORMAL) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
if (madvise(map, ps * n, MADV_DONTNEED) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
return 0;
}
I'm measuring ru_minflt as a proxy to see how many pages we needed to allocate (this isn't exactly true, but the next sentence makes it more likely). We can see that we get new pages in the third printf because the contents of map[0] are 0.

Is it possible to unpage all memory in Windows?

I have plenty of RAM, however, after starting and finishing a large number of processes, it seems that most of the applications' virtual memory has been paged to disk, and switching to any of the older processes requires a very long time to load the memory back into RAM.
Is there a way, either via Windows API or via kernel call, to get Windows to unpage all (or as much as possible) memory? Maybe by stepping through the list of running processes and get the memory manager to unpage each process's memory?
Update 3: I've uploaded my complete program to github.
OK, based on the replies so far, here's a naive suggestion for a tool that tries to get all applications back into physical memory:
Allocate a small chunk of memory X, maybe 4MB. (Should it be non-pageable?)
Iterate over all processes:
For each process, copy chunks of its memory to X.
(Possibly suspending the process first?)
Suppose you have 2GB of RAM, and only 1GB is actually required by processes. If everything is in physical memory, you'd only copy 256 chunks, not the end of the world. At the end of the day, there's a good chance that all processes are now entirely in the physical memory.
Possible convenience and optimisation options:
Check first that the total required space is no more than, say, 50% of the total physical space.
Optionally only run on processes owned by the current user, or on a user-specified list.
Check first whether each chunk of memory is actually paged to disk or not.
I can iterate over all processes using EnumProcesses(); I'd be grateful for any suggestions how to copy an entire process's memory chunk-wise.
Update: Here is my sample function. It takes the process ID as its argument and copies one byte from each good page of the process. (The second argument is the maximal process memory size, obtainable via GetSystemInfo().)
void UnpageProcessByID(DWORD processID, LPVOID MaximumApplicationAddress, DWORD PageSize)
{
MEMORY_BASIC_INFORMATION meminfo;
LPVOID lpMem = NULL;
// Get a handle to the process.
HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, processID);
// Do the work
if (NULL == hProcess )
{
fprintf(stderr, "Could not get process handle, skipping requested process ID %u.\n", processID);
}
else
{
SIZE_T nbytes;
unsigned char buf;
while (lpMem < MaximumApplicationAddress)
{
unsigned int stepsize = PageSize;
if (!VirtualQueryEx(hProcess, lpMem, &meminfo, sizeof(meminfo)))
{
fprintf(stderr, "Error during VirtualQueryEx(), skipping process ID (error code %u, PID %u).\n", GetLastError(), processID);
break;
}
if (meminfo.RegionSize < stepsize) stepsize = meminfo.RegionSize;
switch(meminfo.State)
{
case MEM_COMMIT:
// This next line should be disabled in the final code
fprintf(stderr, "Page at 0x%08X: Good, unpaging.\n", lpMem);
if (0 == ReadProcessMemory(hProcess, lpMem, (LPVOID)&buf, 1, &nbytes))
fprintf(stderr, "Failed to read one byte from 0x%X, error %u (%u bytes read).\n", lpMem, GetLastError(), nbytes);
else
// This next line should be disabled in the final code
fprintf(stderr, "Read %u byte(s) successfully from 0x%X (byte was: 0x%X).\n", nbytes, lpMem, buf);
break;
case MEM_FREE:
fprintf(stderr, "Page at 0x%08X: Free (unused), skipping.\n", lpMem);
stepsize = meminfo.RegionSize;
break;
case MEM_RESERVE:
fprintf(stderr, "Page at 0x%08X: Reserved, skipping.\n", lpMem);
stepsize = meminfo.RegionSize;
break;
default:
fprintf(stderr, "Page at 0x%08X: Unknown state, panic!\n", lpMem);
}
//lpMem = (LPVOID)((DWORD)meminfo.BaseAddress + (DWORD)meminfo.RegionSize);
lpMem += stepsize;
}
}
CloseHandle(hProcess);
}
Question: Does the region by whose size I increment consist of at most one page, or am I missing pages? Should I try to find out the page size as well and only increment by the minimum of region size and page size? Update 2: Page size is only 4kiB! I changed the above code to increment only in 4kiB steps. In the final code we'd get rid of the fprintf's inside the loop.
Well, it isn't hard to implement yourself. Use VirtualQueryEx() to discover the virtual addresses used by a process, ReadProcessMemory() to force the pages to get reloaded.
It isn't likely to going to make any difference at all, it will just be your program that takes forever to do its job. The common diagnostic for slow reloading of pages is a fragmented paging file. Common on Windows XP for example when the disk hasn't been defragged in a long time and it was allowed to fill close to capacity frequently. The SysInternals' PageDefrag utility can help fix the problem.
No, windows provides no such feature natively. Programs such as Cacheman and RAM IDLE accomplish this by simply allocating a large chunk of RAM, forcing other things to page to disk, which effectively accomplishes what you want.

Resources