I have a 4GB Ram installed on Coure2Duo PC with a 32bit Windows 7 Operating system. I have increased the paging size up to 106110MB. But after doing all this I am not able to significantly increase the maximum array size.
Following are the specs
memory
Maximum possible array: 129 MB (1.348e+08 bytes) *
Memory available for all arrays: 732 MB (7.673e+08 bytes) **
Memory used by MATLAB: 563 MB (5.899e+08 bytes)
Physical Memory (RAM): 3549 MB (3.722e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Kindly help me on your earliest. I am not even able to read a file of 48+MB size in double format.
There are two things you can do to clear up memory for MATLAB. Since you're using a 32-bit version of the program, you're normally limited to 2GB of memory. Using the /3GB switch while opening the program makes an additional 1GB of RAM available to that program.
Second, you should consider using the pack() function, which rearranges variables in RAM to free up more contiguous memory space. This, more than anything, is affecting your ability to open individual arrays.
Remember: you can figure out how many items an array will hold by dividing the memory amount available by the size of the variable type. Double variables take up 8 bytes each. Your 129MB of space available should allow around 16.85 million double values in a single array.
You can view information about memory usage using the memory functions included in MATLAB.
memory shows the memory information
inmem will show you the variables and functions stored in memory
clear will allow you to clear the memory of specific variables or functions.
You may try to set the 3GB switch, maybe this increases the possible memory. Otherwise: Switch to a 64 bit os. Your system wastes 547MB of RAM simply because there are no addresses for it.
Related
My background knowledge:
To my understanding, to be allocated/used properly, memory must be contiguous in the virtual address space, but doesn't have to be actually contiguous in the physical memory or even the physical memory address space.
This would kind of suggest that the way that the memory address translations from physical to virtual work is that it is a series of mappings where any free memory blocks in the physical memory address space get assigned to a corresponding area in the virtual memory address space.
Setup to the question:
This answer, in response to a question about freeing memory in C, refers to memory fragmentation, a scenario in which (in this specific case) repeatedly allocating and freeing memory could result in there existing enough OS-allocated memory for future process usage, but it can't be used because it isn't contiguous in the free store linked-list.
If we could keep plucking memory blocks out of the OS-allocated memory that are not in use even if they are dispersed (not contiguous), wouldn't that fix the problem of memory fragmentation? To me, that seems exactly the same way that the physical to virtual memory address translations work, where non-contiguous blocks are utilized as if they were contiguous.
So, to repeat my question, why does memory have to be contiguous?
Two issues here:
It is necessary for each object to occupy a contiguous region in virtual memory, so that indexing and pointer arithmetic can be done efficiently. If you have an array int arr[5000];, then a statement like arr[i] = 0; boils down to simple arithmetic: the value of i is multiplied by 4 (or whatever sizeof(int) may be) and then added to the base address of arr. Addition is very fast for a CPU. If the elements of arr weren't located at consecutive virtual addresses, then arr[i] would require some more elaborate computation, and your program would be orders of magnitude slower. Likewise, with contiguous arrays, pointer arithmetic like ptr++ really is just addition.
Virtual memory has granularity. Every mapping of a virtual to a physical address requires some metadata to be kept somewhere in memory (say 8 bytes per mapping), and when this mapping is used, it is cached by the CPU in a translation lookaside buffer which requires some silicon on the chip. If every byte of memory could be mapped independently, your mapping tables would require 8 times more memory than the program itself, and you'd need an immense number of TLBs to keep them cached.
So virtual memory is instead done in units of pages, typically 4KB or 16KB or so. A page is a contiguous 4K region of virtual memory that is mapped to a contiguous 4K region of physical memory. Thus you only need one entry in your mapping tables (page tables) and TLB for the entire page, and addresses within a page are mapped one-to-one (the low bits of the virtual address are used directly as the low bits of the physical address).
But this means that fragmentation by sub-page amounts can't be fixed with virtual memory. As in Steve Summit's example, suppose you allocated 1000 objects of 1KB each, which were stored consecutively in virtual memory. Now you free all the odd-numbered objects. Nominally there is now 500 KB of memory available. But if you now want to allocate a new object of size 2KB, none of that 500 KB is usable, since there is no contiguous block of size 2KB in your original 1000 KB region. The available 1KB blocks can't be remapped to coalesce them into a larger block, because they can't be separated from the even-numbered objects with which they share pages. And the even-numbered objects can't be moved around in virtual memory, because there may be pointers to those objects elsewhere in the program, and there is no good way to update them all. (Implementations that do garbage collection might be able to do so, but most C/C++ implementations do not, because that comes with substantial costs of its own.)
So, to repeat my question, why does memory have to be contiguous?
It doesn't have to be contiguous.
To avoid fragmentation within a block of OS-allocated memory page; you need ensure that the memory being allocated from "heap" (e.g. using "malloc()") as at least as large as a block of block of OS-allocated memory. This gives 2 possible options:
a) change the hardware (and OS/software) so that a block of OS-allocated memory is much smaller (e.g. maybe the same size as a cache line, or maybe 64 bytes instead of 4 KiB). This would significantly increase the overhead of managing virtual memory.
b) change the minimum allocation size of the heap so that it's much larger. Typically (for modern systems) if you "malloc(1);" it rounds the size up to 8 bytes or 16 bytes for alignment and calls it "padding". In the same way, it could round the size up to the size of a block of OS-allocated memory instead and call that "padding" (e.g. "malloc(1);" might have 4095 bytes of padding and cost 4 KiB of memory). This is worse than fragmentation because padding can't be allocated (e.g. if you did "malloc(1); malloc(1);" those allocations couldn't use different parts of the same block of OS-allocated memory).
However; this only really applies to small allocations. If you use "malloc();" to allocate a large amount of memory (e.g. maybe 1234 KiB for an array) most modern memory managers will just use blocks of OS-allocated memory, and won't have a reason to care about fragmentation for those large blocks.
In other words; for smaller allocations you can solve fragmentation in the way you've suggested but it would be worse than allowing some fragmentation; and for larger allocations you can solve fragmentation in the way you've suggested and most modern memory managers already do that.
I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
Let's suppose I have an array of size 4KB (one page = one frame size), In virtual memory that array is one page.
In virtual memory every page in translated into one frame so our array is still contiguous...
(In Page Table we translate pages into frames not every byte into its own frame...)
Side Question: (When Answering this please mention clearly it's for the side note):
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)? In this case at worst the answer above is 2, am I wrong?
Unless the start of the array happens to be aligned to the beginning of a memory page, it can still occupy two pages; it can start near the end of one page and end on the next page. Arrays allocated on the stack will probably not be forced to occupy a single page, because stack frames are simply allocated sequentially in the stack memory, and the array will usually be at the same offset within each stack frame.
The heap memory allocator (malloc()) could try to ensure that arrays that are smaller than a page will be allocated entirely on the same page, but I'm not sure if this is actually how most allocators are implemented. Doing this might increase memory fragmentation.
I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
This statement is missing something very important. The array size
For small arrays the statement is wrong. For "large/huge" arrays the statement is correct.
In other words: The probability of an array being split over multiple non-contiguous physical pages is a function of the array size.
For small arrays the probability is close to zero but the probability increases as the array size increase. When the array size increases above the systems page size, the probability gets closer and closer to 1. But an array requiring multiple page may still be contiguous in physical memory.
For you side question:
With an array size equal to your systems page size, the array can at maximum span two physical pages.
Anything (array, structure, ...) that is larger than the page size must be split across multiple pages; and therefore may be "virtually contiguous, physical non-contiguous".
Without further knowledge or restriction; anything (array, structure, ...) that is between its minimum alignment (e.g. 4 bytes for an array of uint32_t) and the page size has a probability of being split across multiple pages; where the probability depends on its size and alignment. For example, if page size is 4096 bytes and an array has a minimum alignment of 4 bytes and a size of 4092 bytes, then there's 2 chances in 1024 that it will end up on a single page (and a 99.8% chance that it will be split across multiple pages).
Anything (variable, tiny array, tiny structure, ...) that has a size equal to its minimum alignment won't (shouldn't - see note 3) be split across multiple pages.
Note 1: For anything using memory allocated from the heap, the minimum alignment can be assumed to be the (implementation defined) minimum alignment provided by the heap and not the minimum alignment of the object itself. E.g. for an array of uint16_t the minimum alignment would be 2 bytes; but malloc() will return memory with much larger alignment (maybe 16 bytes)
Note 2: When things are nested (e.g. array inside a structure inside another structure) all of the above applies to the outer structure only. E.g. if you have an array of uint16_t inside a structure where the array happens to begin at offset 4094 within the structure; then it will be significantly more likely that the array will be split across pages.
Note 3: It's possible to explicitly break minimum alignment using pointers (e.g. use malloc() to allocate 1024 bytes, then create a pointer to an array that begins at any offset you want within the allocated area).
Note 4: If something (array, structure, ...) is split across multiple pages; then there's a chance that it will still be physically contiguous. For worst case this depends on the amount of physical memory (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, then there's approximately 1 chance in 262000 that 2 virtually contiguous pages will be "physically contiguous by accident"). If the OS implements page/cache coloring (see https://en.wikipedia.org/wiki/Cache_coloring ) it improves the probability of "physically contiguous by accident" by the number of page/cache "colors" (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, and the OS uses 256 page/cache colors, then there's approximately 1 chance in 1024 that 2 virtually contiguous pages will be "physically contiguous by accident").
Note 5: Most modern operating systems using multiple page sizes (e.g. 4 KiB pages and 2 MiB pages, and maybe also 1 GiB pages). This can either make it hard to guess what the page size actually is, or improve the probability of "physically contiguous by accident" if you assume the smallest page size is used.
Note 6: For some CPUs (e.g. recent AMD/Zen) the TLBs behave as if pages are larger (e.g. as if you're using 16 KiB pages and not 4 KiB pages) if and only if page table entries are compatible (e.g. if 4 page table entries describe four physically contiguous 4 KiB pages with the same permissions/attributes). If an OS is optimized for these CPUs the result is similar to having an extra page size (4 KiB, "16 KiB", 2 MiB and maybe 1 GiB).
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)?
When allocating an array in heap memory of size one page; the minimum alignment would be the implementation defined minimum alignment provided by the heap manager/malloc() (e.g. maybe 16 bytes). However; most modern heap managers switch to using an alternative (e.g. mmap() or VirtualAlloc() or similar) when the amount of memory being allocated is "large enough"; so (depending on the implementation and their definition of "large enough") it might be page aligned.
When allocating an array in raw virtual memory (e.g. using mmap() or VirtualAlloc() or similar yourself, and NOT using the heap and not using something like malloc()); page alignment is guaranteed (mostly because the virtual memory manager doesn't deal with anything smaller).
I - not a professional software engineer - am currently extending a quite large scientific software.
At runtime I get an error stating "insufficient virtual memory".
At this point during runtime, the used working memory is about 550mb and the error accurs when a rather big threedimensional array is dynamically allocated. The array - if it would be allocated - would be about a size of 170mb. Adding this to the already used 550mb the program would still be way below the 2gb boundary that is set for 32bit applications. Also there is more than enough working memory available on the system.
Visual Studio is currently set that it allocates arrays on the stack. Allocating them on the heap does not make any difference anyway.
Splitting the array into smaller arrays (being the size of the one big array in sum) results in the program running just fine. So I guess that the dynamically allocated memory has to be available in one adjacent block.
So there I am and I have no clue how to solve this. I can not deallocate some of the already used 550mb as the data is still required. I also can not change very much of the configuration (e.g. the compiler).
Is there a solution for my problem?
Thank you some much in advance and best regards
phroth248
The virtual memory is the memory your program can address. It is usually the sum of the physical memory and the swap space. For example, if you have 16GB of physical memory and 4GB of swap space, the virtual memory will be 20GB. If your Fortran program tries to allocate more than those 20 addressable GB, you will get an "insufficient virtual memory" error.
To get an idea of the required memory of your 3D array:
allocate (A(nx,ny,nz))
You have nx*ny*nz elements and each element takes 8 bytes in double precision or 4 bytes in single precision. I let you do the math.
Some things:
1. It is usually preferable to to allocate huge arrays using operating system services rather than language facilities. That will circumvent any underlying library problems.
You may have a problem with 550MB in a 32-bit system. Usually there is some division of the 4GB address space into dedicated regions.
You need to make sure you have enough virtual memory.
a) Make sure your page file space is large enough.
b) Make sure that your system is not configured to limit processes address space sizes to smaller than what you need.
c) Make sure that your accounts settings are not limiting your process address space to smaller than allowed by the system.
I was writing a code which requires a large 'int' array to be allocated (size of 10^9).
While doing so i faced several issues and after reading stuff on Google i came to following conclusions of my own. Can someone see this and point out if i am missing some thing and also suggest a better way to do this.
(Machine config: VM machine Ubuntu 10.4,gcc 4.4.3 , 32bit, 2GB ram(though my host machine as 6gigs)
1.I declared the array as 'unsigned long int' with size 1*10^9. It didn't worked as on compiling the code i got the error 'array size too long'.
So i searched for this and finally realized that i cant allocate that much memory on stack as my physical memory was 2 GB.( i had already tried allocating the array as global variable which would allocate them in global area instead of stack but the same error)
So i tried allocating the same amount of memory using 'malloc' but again got the error with 'malloc' this time 'Cannot alllocate memory'.
So after doing all this my understanding/problems are as follows:
3- I can't allocate that much memory be it stack or heap as my physical mem is only 2Gb ( so this is the actual problem or some other factors also govern this mem allocation ??)
4- Is there any possible workaround where i can allocate a memory of size 10^9 on a 2gig machine( I know allocating a array or mem area this much big is neither good algo design nor efficient but i just want know the limits.)
5- any better solution for allocating this much memory ( i mean should i use 2 small arrays/heap mem instead of one big chunk)
(NOTE:Point 4 and 5 are two different approaches i would appreciate suggestion for both the approaches)
Many thanks
P.S forgive me if i am being novice ..
You are compiling a 32 bit process and there is simply not enough physical address space for your huge data block. A 32 bit pointer can hold 2^32 distinct values, i.e. 4GB. You can't allocate more than that because you would have no way to refer to the memory. Each byte of memory that is mapped into your process must have a unique address.
So, nothing is going to fit your data into a 4GB address space. Even if your array was less than 4GB you may have problems allocating a single contiguous block of memory.
You could use a 64 bit process but you'd need to make sure you had enough physical memory to avoid disk thrashing when your array was swapped. Or you could find a different algorithm that did not require such a huge block of memory.
This is a follow-up to my previous question about why size_t is necessary.
Given that size_t is guaranteed to be big enough to represent the largest size of a block of memory you can allocate (meaning there can still be some integers bigger than size_t), my question is...
What determines how much you can allocate at once?
The architecture of your machine, the operating system (but the two are intertwined) and your compiler/set of libraries determines how much memory you can allocate at once.
malloc doesn't need to be able to use all the memory the OS could give him. The OS doesn't need to make available all the memory present in the machine (and various versions of Windows Server for example have different maximum memory for licensing reasons)
But note that the OS can make available more memory than the one present in the machine, and even more memory than the one permitted by the motherboard (let's say the motherboard has a single memory slot that accepts only 1gb memory stick, Windows could still let a program allocate 2gb of memory). This is done throught the use of Virtual Memory, Paging (you know, the swap file, your old and slow friend :-) Or, for example, through the use of NUMA.
I can think of three constraints, in actual code:
The biggest unsigned int size_t is able to allocate. size_t should be the same type (same size, etc.) the OS' memory allocation mechanism is using.
The biggest block the operating system is able to handle in RAM (how are block's size represented? how this representation affects the maximum block size?).
Memory fragmentation (largest free block) and the total available free RAM.