How to deal with large RegionSize in VirtualQueryEx? - c

I'm writing a simple memoryscanner in c, where I'm using VirtualQueryEx to scan an arbitrary process's memory.
VirtualQueryEx (hProc, addr, &meminfo, sizeof(meminfo)
I loop through all of the memory blocks in process like this:
addr = (unsigned char*)meminfo.BaseAddress + meminfo.RegionSize
But the problem is that one block of memory is much larger than the size of SIZE_T and can't fit into meminfo.RegionSize.
This is what it looks like in process hacker:
process hacker 2
As you can see it jumps from 0x7ffe2000 to 0x19a1e00000 creates a RegionSize of 0x1921e1e000 which is much larger than an 2^32.
I tested with other processes than notepad.exe and they had that same huge jump after about 3 blocks of memory that are always 4k in size. I tried starting at an address after this huge jump and it worked fine, but the problem is that the jumps are allocated differently for each process so it's not a portable solution to the problem.

I found the answer.
I was compiling the c program with mingw which is 32-bit, but I'm using a 64-bit system. That's why the RegionSize couldn't fit in SIZE_T.

Related

Why does a malloc of 2GB or larger of memory in the Heap in C fail?

I want to allocate a specific size of memory on the C heap. As a test to do it I used the following piece of code (to check the maximum allowable value):
int main(void) {
int *val;
val=(int*)malloc(4096);
if(!val)
break;
return 0;
}
The problem is that when trying a range of values with malloc(), at about a size of 1900MB it fails. Yet I have 16 GB of installed RAM with about 12GB free.
So I can't even think to allocate a higher value of memory.
Is there something that I'm doing wrong?
Is there something that I should know about malloc()?
I see many programs(like Virtual Machines)that use big amount of memory so i already ruled out the idea that is a security features of the OS.
The pages allocated to your process is doesn't constitute the whole RAM. What do you think? The whole user program will not be allowed to access full RAM. The OS decided how much virtual memory your process will be allocated and then the program starts. That is why you didn't show using the full RAM of your machine.
Long story short - your program is not given the whole RAM for use. If it did - then it would trigger a whole bigger problem than you think it is posing now. Also your idea of malloc use is not clear. When it is NULL which is returned by malloc it mostly calls for an error condition. Your code somehow abstracts those accept and introduce a redundant dummy if block which does nothing.
As a user program you will request memory - and in case it fails then it will return NULL and handle the condition accordingly.
malloc() is a C and C++ run time function that is part of the Standard Library. The memory addressing capabilities depends on what compiler is being used as well as the build settings for that compiler.
I did the following test with Windows 7 x64 using Microsoft Visual 2013 with C++, when I am using the 32 bit build target settings (x86), then when I try to use malloc() to allocate a 4GB block of memory, malloc() returns a NULL pointer indicating the allocation failed.
This is expected because with Windows 32 bit OS, the maximum amount of RAM that can be addressed is a 32 bit pointer but in reality less than 4GB, somewhere around 3.5 GB of usable RAM I think due to way that Windows 32 manages physical memory.
Testing with two other sizes, 2GB (failed with a NULL pointer returned) and 1GB (succeeded with a valid pointer returned) indicates that the maximum that the 32 bit C++ run time allows is somewhere between 1GB and 2GB.
Then I changed the build settings from x86 to x64 to generate a 64 bit executable with 64 bit pointers.
With the change in build settings a call to malloc() with 4GB succeeded with a valid pointer returned.
Task Manager shows the following:
See also StackOverFlow How can I allocate all the availble memory in visual studio for my application?
with the accepted answer that mentions the following:
Getting a larger virtual memory address space requires a pretty
fundamental overhaul. Albeit that it is easy today, just target x64 as
the platform target. A 64-bit process has massive amounts of address
space available, limited only by the maximum size of the paging file.
You could limp along in 32-bit mode, as long as you can count on
actually running on a 64-bit operating system, by using the
/LARGEADDRESSAWARE linker option. Which increases the VM size from 2
GB to 4 GB on a 64-bit operating system.

why does bigger malloc cause exception? [duplicate]

First of all I noticed when I malloc memory vs. calloc the memory footprint is different. I am working with datasets of several GB. It is ok for this data to be random.
I expected that I could just malloc a large amount of memory and read whatever random data was in it cast to a float. However, looking at the memory footprint in the process viewer the memory is obviously not being claimed (vs. calloc where I see a large foot print). I ran a loop to write data into the memory and then I saw the memory footprint climb. Am I correct in saying that the memory isn't actually claimed until I initialize it?
Finally after I passed 1024*1024*128 bytes (1024 MB in the process viewer) I started getting segfaults. Calloc however seems to initialize the full amount up to 1 GB. Why do I get segfaults when initializing memory in a for loop with malloc at this number 128MB and why does the memory footprint show 1024MB?
If malloc a large amount from memory and then read from it what am I getting (since the process viewer shows almost no footprint until I initialize it)?
Finally is there any way for me to alloc more than 4GB? I am testing memory hierarchy performance.
Code for #2:
long long int i;
long long int *test=(long long int*)malloc(1024*1024*1024);
for (i=0;i<1024*1024*128;i++)
test[i]=i;
sleep(15);
Some notes:
As the comments note, Linux doesn't actually allocate your memory until you use it.
When you use calloc instead of malloc, it zeroes out all the memory you requested. This is equivalent to using it.
1- If you are working on a 32-bit machine you can't have a variable with more than 2GBs allocated to it.
2- If you are working on a 64-bit machine you can allocate as much as RAM+Swap memory in total, however, allocating all for one variable requires a big consequent chunk of memory which might not be available. Try it with a linked list, where each element has only 1 MB assigned and you can achieve a higher memory allocated in total.
3- As noted by you and Sharth, unless you use your memory, linux won't allocate it.
Your #2 is failing with a segfault either because sizeof(long long int) > 8 or because your malloc returned NULL. That is very possible if you are requesting 1 GB of RAM.
More info on #2. From your 128 MB comment I get the idea that you may not realize what's happening. Because you declare the array pointer as long long int the size of each array element is 8 bytes. 1024/8 == 128 so that is why your loop works. It did when I tried it, anyway.
Your for loop in your example code is actually touching 1GB of memory, since it is indexing 128*1024*1024 long longs, and each long long is 8 bytes.

Why does a C program crash if a large variable is declared?

I have the following C program compiled in Microsoft Visual Studio Express 2012:
int main() {
int a[300000];
return 0;
}
This crashes with a stack overflow in msvcr110d.dll!__crtFlsGetValue().
If I change the array size from 300,000 to 200,000 it works fine (in so much as this simple program can be said to 'work' since it doesn't do anything).
I'm running on Windows 7 and have also tried this with gcc under Cygwin and it produces the same behaviour (in this case a seg fault).
What the heck?
There are platform-specific limits on the size of the space used by automatic objects in C (the "stack size"). Objects that are larger than that size (which may be a few kilobytes on an embedded platform and a few megabytes on a desktop machine) cannot be declared as automatic objects. Make them static or dynamic instead.
In a similar vein, there are limits on the depth of function calls, and in particular on recursion.
Check your compiler and/or platform documentation for details on what the actual size is, and on how you might be able to change it. (E.g. on Linux check out ulimit.)
Because it's being allocated on the stack and the stack has a limited size, obviously not large enough to hold 300000 ints.
Use heap allocation a la malloc:
int* a = malloc(sizeof(int) * 300000);
// ...
free(a);
The heap can hold a lot more than the stack.
The size of thread stacks is traditionally limited by operating systems because of a finite limit to the amount of virtual address space available to each process.
As the virtual address space allocated to a thread stack can't be changed once it is allocated, there is no strategy other than to allocate a fairly large, but limited, chunk to each thread - even when most threads will use very little of it.
Similarly, there is a finite limit also on the number of threads a process is allowed to spawn.
At a guess, the limit here is 1MB, and Windows then limits the number of threads to - say - 256, this means that 256MB of the 3GB virtual address space available to a 32-bit process is allocated to thread stacks - or put another way, 1/12th.
On 64-bit systems, there is obviously a lot more virtual space to play with, but having a limit is still sensible in order to quickly detect - and terminate - infinite recursion.
Local variables claim space from the stack. So, if you allocate something large enough, the stack will inevitably overflow.

Limit on memory allocation in windows + am I calculating this properly?

I'm writing a program that requires a lot of memory (large graph analysis).
Currently there are two main data structures in my program (taking up most of the memory). These are:
a n*n matrix of type int **
and array of length n, type Node *
Node, in this case, is a struct containing two ints (sizeof(Node) = 8)
The biggest value for n that I can run my code on is 22900, doing a bit of calculation I get:
22900*22900 * sizeof(int) * 8 + 22900 * sizeof(Node) = 16782591360 bits
This is 1.95375077 Gigabytes.
So question 1: am I calculating the memory usage for these two data structures properly?
and 2: Is there a 2GB memory allocation limit on windows. If so, how can I get around it?
For further information, I am on a 64bit Windows 7 machine compiling with GCC, 4GB RAM with ~3GB of free RAM at time of running.
Thanks.
You aren't calculating it correctly. First, there is no reason to multiply anything by 8. The quantum of allocation in C is byte, not bit. Second, you neglect the pointer array which implements the first dimension of your matrix. So:
22900 * sizeof(int*) + 22900*22900*sizeof(int) + 22900*sizeof(Node) = 2097914800 bytes
As for useful advice, I'll leave that to the (already posted) other answer.
You are most likely compiling for 32-bits; on windows, 32-bit processes are limited to 2G of addressable space (with a 64-bit OS and the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set, 4GB). Compile for 64-bit and you should see your memory limit rise substantially. However, you will likely want more physical RAM before doing so; you're using half of it already and hitting swap will kill your performance.
32bit processes are limited to 2G of user-adressable memory (on most releases of windows with default settings). 64bit processes have much larger address spaces. See this note Performance and Memory Consumption Under WOW64 for a way to give your 32bit app a 4G address space (not sure if/how GCC can build executable images with that flag set though).
Compile your code as a 64bit application and that limit should vanish (try MinGW-w64).
To get around the memory limitation, you have to compile the program in 64-bit mode; note that pointers are then 8 byte in size. The total memory usage of the matrix would be then doubled.

Why should I use malloc() when "char bigchar[ 1u << 31 - 1 ];" works just fine?

What's the advantage of using malloc (besides the NULL return on failure) over static arrays? The following program will eat up all my ram and start filling swap only if the loops are uncommented. It does not crash.
...
#include <stdio.h>
unsigned int bigint[ 1u << 29 - 1 ];
unsigned char bigchar[ 1u << 31 - 1 ];
int main (int argc, char **argv) {
int i;
/* for (i = 0; i < 1u << 29 - 1; i++) bigint[i] = i; */
/* for (i = 0; i < 1u << 31 - 1; i++) bigchar[i] = i & 0xFF; */
getchar();
return 0;
}
...
After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit? Apparently I can have as many of of them as I want. It will segfault, but only if I ask for (and try to use) more than malloc would give me anyway.
Is there a way to determine if a static array was actually allocated and safe to use?
EDIT: I'm interested in why malloc is used to manage the heap instead of letting the virtual memory system handle it. Apparently I can size an array to many times the size I think I'll need and the virtual memory system will only keep in ram what is needed. If I never write to e.g. the end (or beginning) of these huge arrays then the program doesn't use the physical memory. Furthermore, if I can write to every location then what does malloc do besides increment a pointer in the heap or search around previous allocations in the same process?
Editor's note: 1 << 31 causes undefined behaviour if int is 32-bit, so I have modified the question to read 1u. The intent of the question is to ask about allocating large static buffers.
Well, for two reasons really:
Because of portability, since some systems won't do the virtual memory management for you.
You'll inevitably need to divide this array into smaller chunks for it to be useful, then to keep track of all the chunks, then eventually as you start "freeing" some of the chunks of the array you no longer require you'll hit the problem of memory fragmentation.
All in all you'll end up implementing a lot of memory management functionality (actually pretty much reimplementing the malloc) without the benefit of portability.
Hence the reasons:
Code portability via memory management encapsulation and standardisation.
Personal productivity enhancement by the way of code re-use.
Please see:
malloc() and the C/C++ heap
Should a list of objects be stored on the heap or stack?
C++ Which is faster: Stack allocation or Heap allocation
Proper stack and heap usage in C++?
About C/C++ stack allocation
Stack,Static and Heap in C++
Of Memory Management, Heap Corruption, and C++
new on stack instead of heap (like alloca vs malloc)
with malloc you can grow and shrink your array: it becomes dynamic, so you can allocate exactly for what you need.
This is called custom memory management, I guess.
You can do that, but you'll have to manage that chunk of memory yourself.
You'd end up writing your own malloc() woring over this chunk.
Regarding:
After some trial and error I found the
above is the largest static array
allowed on my 32-bit Intel machine
with GCC 4.3. Is this a standard
limit, a compiler limit, or a machine
limit?
One upper bound will depend on how the 4GB (32-bit) virtual address space is partitioned between user-space and kernel-space. For Linux, I believe the most common partitioning scheme has a 3 GB range of addresses for user-space and a 1 GB range of addresses for kernel-space. The partitioning is configurable at kernel build-time, 2GB/2GB and 1GB/3GB splits are also in use. When the executable is loaded, virtual address space must be allocated for every object regardless of whether real memory is allocated to back it up.
You may be able to allocate that gigantic array in one context, but not others. For example, if your array is a member of a struct and you wish to pass the struct around. Some environments have a 32K limit on struct size.
As previously mentioned, you can also resize your memory to use exactly what you need. It's important in performance-critical contexts to not be paging out to virtual memory if it can be avoided.
There is no way to free stack allocation other than going out of scope. So when you actually use global allocation and VM has to alloc you real hard memory, it is allocated and will stay there until your program runs out. This means that any process will only grow in it's virtual memory use (functions have local stack allocations and those will be "freed").
You cannot "keep" the stack memory once it goes out of scope of function, it is always freed. So you must know how much memory you will use at compile time.
Which then boils down to how many int foo[1<<29]'s you can have. Since first one takes up whole memory (on 32bit) and will be (lets lie: 0x000000) the second will resolve to 0xffffffff or thereaobout. Then the third one would resolve to what? Something that 32bit pointers cannot express. (remember that stack reservations are resolved partially at compiletime, partially runtime, via offsets, how far the stack offset is pushed when you alloc this or that variable).
So the answer is pretty much that once you have int foo [1<<29] you cant have any reasonable depth of functions with other local stack variables anymore.
You really should avoid doing this unless you know what you're doing. Try to only request as much memory as you need. Even if it's not being used or getting in the way of other programs it can mess up the process its self. There are two reasons for this. First, on certain systems, particularly 32bit ones it can cause address space to be exhausted prematurely in rare circumstances. Additionally many kernels have some kind of per process limit on reserved/virtual/not in use memory. If your program asks for memory at points in run time the kernel can kill the process if it asks for memory to be reserved that exceeds this limit. I've seen programs that have either crashed or exited due to a failed malloc because they are reserving GBs of memory while only using a few MB.

Resources