I was writing a code which requires a large 'int' array to be allocated (size of 10^9).
While doing so i faced several issues and after reading stuff on Google i came to following conclusions of my own. Can someone see this and point out if i am missing some thing and also suggest a better way to do this.
(Machine config: VM machine Ubuntu 10.4,gcc 4.4.3 , 32bit, 2GB ram(though my host machine as 6gigs)
1.I declared the array as 'unsigned long int' with size 1*10^9. It didn't worked as on compiling the code i got the error 'array size too long'.
So i searched for this and finally realized that i cant allocate that much memory on stack as my physical memory was 2 GB.( i had already tried allocating the array as global variable which would allocate them in global area instead of stack but the same error)
So i tried allocating the same amount of memory using 'malloc' but again got the error with 'malloc' this time 'Cannot alllocate memory'.
So after doing all this my understanding/problems are as follows:
3- I can't allocate that much memory be it stack or heap as my physical mem is only 2Gb ( so this is the actual problem or some other factors also govern this mem allocation ??)
4- Is there any possible workaround where i can allocate a memory of size 10^9 on a 2gig machine( I know allocating a array or mem area this much big is neither good algo design nor efficient but i just want know the limits.)
5- any better solution for allocating this much memory ( i mean should i use 2 small arrays/heap mem instead of one big chunk)
(NOTE:Point 4 and 5 are two different approaches i would appreciate suggestion for both the approaches)
Many thanks
P.S forgive me if i am being novice ..
You are compiling a 32 bit process and there is simply not enough physical address space for your huge data block. A 32 bit pointer can hold 2^32 distinct values, i.e. 4GB. You can't allocate more than that because you would have no way to refer to the memory. Each byte of memory that is mapped into your process must have a unique address.
So, nothing is going to fit your data into a 4GB address space. Even if your array was less than 4GB you may have problems allocating a single contiguous block of memory.
You could use a 64 bit process but you'd need to make sure you had enough physical memory to avoid disk thrashing when your array was swapped. Or you could find a different algorithm that did not require such a huge block of memory.
Related
Good morning. I hope some can help me out understanding how one aspect of virtual memory works and how C behaves.
From what I understand, whenever we call malloc, C will add it to the heap, with the pointers going upwards. If the stack and the heap bump on each other, malloc will return NULL, since there is no more memory to work with.
What I do not understand is the fact that the Virtual memory of each program is seized when we call it, and the low and high adresses of the runing script itself are determined. This way, the program has a fixed amount of memory to use. Is the heap growing with the data on it, or the heap is actually just a set of pointers to the actuall data? If the program has a fixed memory at the begin (because it canĀ“t have all the memory) for me it does not make sense to store the raw data in the heap, or else we easily would get out of available memory. What am I missing?
You are making several incorrect assumptions. The most important one being that you program has one chunk of memory assigned to it that starts at address x and goes to address x + program size. This is not so, your program is divided into chunks (different platforms give them different names). The stack will be one, the heap may be several, the code will be in several etc.
When the heap manager runs out of its current chunk it can simply get another one.
Also note that this has nothing to do with 'virtual memory'.
First of all I noticed when I malloc memory vs. calloc the memory footprint is different. I am working with datasets of several GB. It is ok for this data to be random.
I expected that I could just malloc a large amount of memory and read whatever random data was in it cast to a float. However, looking at the memory footprint in the process viewer the memory is obviously not being claimed (vs. calloc where I see a large foot print). I ran a loop to write data into the memory and then I saw the memory footprint climb. Am I correct in saying that the memory isn't actually claimed until I initialize it?
Finally after I passed 1024*1024*128 bytes (1024 MB in the process viewer) I started getting segfaults. Calloc however seems to initialize the full amount up to 1 GB. Why do I get segfaults when initializing memory in a for loop with malloc at this number 128MB and why does the memory footprint show 1024MB?
If malloc a large amount from memory and then read from it what am I getting (since the process viewer shows almost no footprint until I initialize it)?
Finally is there any way for me to alloc more than 4GB? I am testing memory hierarchy performance.
Code for #2:
long long int i;
long long int *test=(long long int*)malloc(1024*1024*1024);
for (i=0;i<1024*1024*128;i++)
test[i]=i;
sleep(15);
Some notes:
As the comments note, Linux doesn't actually allocate your memory until you use it.
When you use calloc instead of malloc, it zeroes out all the memory you requested. This is equivalent to using it.
1- If you are working on a 32-bit machine you can't have a variable with more than 2GBs allocated to it.
2- If you are working on a 64-bit machine you can allocate as much as RAM+Swap memory in total, however, allocating all for one variable requires a big consequent chunk of memory which might not be available. Try it with a linked list, where each element has only 1 MB assigned and you can achieve a higher memory allocated in total.
3- As noted by you and Sharth, unless you use your memory, linux won't allocate it.
Your #2 is failing with a segfault either because sizeof(long long int) > 8 or because your malloc returned NULL. That is very possible if you are requesting 1 GB of RAM.
More info on #2. From your 128 MB comment I get the idea that you may not realize what's happening. Because you declare the array pointer as long long int the size of each array element is 8 bytes. 1024/8 == 128 so that is why your loop works. It did when I tried it, anyway.
Your for loop in your example code is actually touching 1GB of memory, since it is indexing 128*1024*1024 long longs, and each long long is 8 bytes.
I - not a professional software engineer - am currently extending a quite large scientific software.
At runtime I get an error stating "insufficient virtual memory".
At this point during runtime, the used working memory is about 550mb and the error accurs when a rather big threedimensional array is dynamically allocated. The array - if it would be allocated - would be about a size of 170mb. Adding this to the already used 550mb the program would still be way below the 2gb boundary that is set for 32bit applications. Also there is more than enough working memory available on the system.
Visual Studio is currently set that it allocates arrays on the stack. Allocating them on the heap does not make any difference anyway.
Splitting the array into smaller arrays (being the size of the one big array in sum) results in the program running just fine. So I guess that the dynamically allocated memory has to be available in one adjacent block.
So there I am and I have no clue how to solve this. I can not deallocate some of the already used 550mb as the data is still required. I also can not change very much of the configuration (e.g. the compiler).
Is there a solution for my problem?
Thank you some much in advance and best regards
phroth248
The virtual memory is the memory your program can address. It is usually the sum of the physical memory and the swap space. For example, if you have 16GB of physical memory and 4GB of swap space, the virtual memory will be 20GB. If your Fortran program tries to allocate more than those 20 addressable GB, you will get an "insufficient virtual memory" error.
To get an idea of the required memory of your 3D array:
allocate (A(nx,ny,nz))
You have nx*ny*nz elements and each element takes 8 bytes in double precision or 4 bytes in single precision. I let you do the math.
Some things:
1. It is usually preferable to to allocate huge arrays using operating system services rather than language facilities. That will circumvent any underlying library problems.
You may have a problem with 550MB in a 32-bit system. Usually there is some division of the 4GB address space into dedicated regions.
You need to make sure you have enough virtual memory.
a) Make sure your page file space is large enough.
b) Make sure that your system is not configured to limit processes address space sizes to smaller than what you need.
c) Make sure that your accounts settings are not limiting your process address space to smaller than allowed by the system.
Apologies if this is a stupid question, but it's been kinda bothering me for a long time.
I'd like to know some details on how the memory manager knows what memory is in use.
Imagine a one-chip microcomputer with 1024B of RAM - not much to spare.
Now you allocate 100 ints - each int is 4 bytes, each pointer 4 bytes too (yea, 8bit one-chip will most likely have smaller pointers, but w/e).
So you've just used 800B of ram for 100 ints? But it's worse - the allocation system must somehow take note on where the memory is malloc'd and where it's free - 200 extra bytes or something? Or some bit marks?
If this is true, why is C favoured over assembler so often?
Is this really how it works? So super inefficient?
(Or am I having a totally incorrect idea about it?)
It may surprise younger developers to learn that greying old ones like myself used to write in C on systems with 1 or 2k of RAM.
In systems this size, dynamic memory allocation would have been a luxury that we could not afford. It's not just the pointer overhead of managing the free store, but also the effects of free store fragmentation making memory allocation inefficient and very probably leading to a fatal out-of-memory condition (virtual memory was not an option).
So we used to use static memory allocation (i.e. global variables), kept a very tight control on the depth of function all nesting, and an even tighter control over nested interrupt handling.
When writing on these systems, I didn't even link the standard library. I wrote my own C startup routines and provided custom minimal I/O routines.
One program I wrote in a 2k ram system used the lower part of RAM as the data area and the upper part as the stack. In the final cut, I proved that the maximal use of stack reached so far down in memory that it was 1 byte away from the last variable in the data area.
Ah, the good old days...
EDIT:
To answer your question specifically, the original K&R free store manager used to add a header block to the beginning of each block of memory allocated via malloc.
The header block looked something like this:
union header {
struct {
union header *ptr;
unsigned size;
} s;
};
Where ptr is the address of the next header block and size is the size of the memory allocated (in blocks). The malloc function would actually return the address computed by &header + sizeof(header). The free function would subtract the size of the header from the pointer you provided in order to re-link the block back into the free list.
There are several approaches how you can do that:
as you write, malloc() one memory block for every int you have. Completely inefficient, thus I strike it out.
malloc() an array of 100 ints. That needs in total 100*sizeof(int) + 1* sizeof(int*) + whatever malloc() internally needs. Much better.
statically allocate the array. Here you just need 100*sizeof(int).
allocate the array on the stack. That needs the same, but only for the current function call.
Which of these you need depends on how long you need the memory and other criteria.
If you have that few RAM, it might even be questionable if it is useful to use malloc() at all. It can be an option if several code blocks need a lot of RAM, but not at the same time.
On how the memory addresses are tracked, that as well depends:
for malloc(), you have to put the pointer in a place where you don't lose it.
for an array on the stack, it is expressed relatively to the current frame pointer. The code sets up the frame and thus knows about the offset, so it is normally not needed to store it anywhere.
for an array in the data segment, the compiler and linker know about the address and statically put the address where it is needed.
If this is true, why is C favoured over assembler so often?
You're simplifying the problem too much. C or assembler - doesn't matter, you still need to manage the memory chunks. The main issue is fragmentation, not the actual management overhead. In a system like the one you described, you would probably just allocate the memory and never ever release it, thus no need to check what's free - whatever is below the watermark is free, and that's it.
Is this really how it works? So super inefficient?
There are many algorithms around this problem, but if you're simplifying - yes, it basically is. In reality its a much more complicated problem - and there are much more complicated systems revolving around servicing memory, dealing with fragmentation, garbage collection (on a OS level), etc etc.
What's the advantage of using malloc (besides the NULL return on failure) over static arrays? The following program will eat up all my ram and start filling swap only if the loops are uncommented. It does not crash.
...
#include <stdio.h>
unsigned int bigint[ 1u << 29 - 1 ];
unsigned char bigchar[ 1u << 31 - 1 ];
int main (int argc, char **argv) {
int i;
/* for (i = 0; i < 1u << 29 - 1; i++) bigint[i] = i; */
/* for (i = 0; i < 1u << 31 - 1; i++) bigchar[i] = i & 0xFF; */
getchar();
return 0;
}
...
After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit? Apparently I can have as many of of them as I want. It will segfault, but only if I ask for (and try to use) more than malloc would give me anyway.
Is there a way to determine if a static array was actually allocated and safe to use?
EDIT: I'm interested in why malloc is used to manage the heap instead of letting the virtual memory system handle it. Apparently I can size an array to many times the size I think I'll need and the virtual memory system will only keep in ram what is needed. If I never write to e.g. the end (or beginning) of these huge arrays then the program doesn't use the physical memory. Furthermore, if I can write to every location then what does malloc do besides increment a pointer in the heap or search around previous allocations in the same process?
Editor's note: 1 << 31 causes undefined behaviour if int is 32-bit, so I have modified the question to read 1u. The intent of the question is to ask about allocating large static buffers.
Well, for two reasons really:
Because of portability, since some systems won't do the virtual memory management for you.
You'll inevitably need to divide this array into smaller chunks for it to be useful, then to keep track of all the chunks, then eventually as you start "freeing" some of the chunks of the array you no longer require you'll hit the problem of memory fragmentation.
All in all you'll end up implementing a lot of memory management functionality (actually pretty much reimplementing the malloc) without the benefit of portability.
Hence the reasons:
Code portability via memory management encapsulation and standardisation.
Personal productivity enhancement by the way of code re-use.
Please see:
malloc() and the C/C++ heap
Should a list of objects be stored on the heap or stack?
C++ Which is faster: Stack allocation or Heap allocation
Proper stack and heap usage in C++?
About C/C++ stack allocation
Stack,Static and Heap in C++
Of Memory Management, Heap Corruption, and C++
new on stack instead of heap (like alloca vs malloc)
with malloc you can grow and shrink your array: it becomes dynamic, so you can allocate exactly for what you need.
This is called custom memory management, I guess.
You can do that, but you'll have to manage that chunk of memory yourself.
You'd end up writing your own malloc() woring over this chunk.
Regarding:
After some trial and error I found the
above is the largest static array
allowed on my 32-bit Intel machine
with GCC 4.3. Is this a standard
limit, a compiler limit, or a machine
limit?
One upper bound will depend on how the 4GB (32-bit) virtual address space is partitioned between user-space and kernel-space. For Linux, I believe the most common partitioning scheme has a 3 GB range of addresses for user-space and a 1 GB range of addresses for kernel-space. The partitioning is configurable at kernel build-time, 2GB/2GB and 1GB/3GB splits are also in use. When the executable is loaded, virtual address space must be allocated for every object regardless of whether real memory is allocated to back it up.
You may be able to allocate that gigantic array in one context, but not others. For example, if your array is a member of a struct and you wish to pass the struct around. Some environments have a 32K limit on struct size.
As previously mentioned, you can also resize your memory to use exactly what you need. It's important in performance-critical contexts to not be paging out to virtual memory if it can be avoided.
There is no way to free stack allocation other than going out of scope. So when you actually use global allocation and VM has to alloc you real hard memory, it is allocated and will stay there until your program runs out. This means that any process will only grow in it's virtual memory use (functions have local stack allocations and those will be "freed").
You cannot "keep" the stack memory once it goes out of scope of function, it is always freed. So you must know how much memory you will use at compile time.
Which then boils down to how many int foo[1<<29]'s you can have. Since first one takes up whole memory (on 32bit) and will be (lets lie: 0x000000) the second will resolve to 0xffffffff or thereaobout. Then the third one would resolve to what? Something that 32bit pointers cannot express. (remember that stack reservations are resolved partially at compiletime, partially runtime, via offsets, how far the stack offset is pushed when you alloc this or that variable).
So the answer is pretty much that once you have int foo [1<<29] you cant have any reasonable depth of functions with other local stack variables anymore.
You really should avoid doing this unless you know what you're doing. Try to only request as much memory as you need. Even if it's not being used or getting in the way of other programs it can mess up the process its self. There are two reasons for this. First, on certain systems, particularly 32bit ones it can cause address space to be exhausted prematurely in rare circumstances. Additionally many kernels have some kind of per process limit on reserved/virtual/not in use memory. If your program asks for memory at points in run time the kernel can kill the process if it asks for memory to be reserved that exceeds this limit. I've seen programs that have either crashed or exited due to a failed malloc because they are reserving GBs of memory while only using a few MB.