Matlab tall arrays and memory

Matlab tall arrays and memory - arrays

The problem I'm having can be reproduced by running the code below.
gcp;
C={};
for i=1:1000
C = [C,{tall(ones(1000,1,1000,2))}];
pause(0.05)
end
My expectation is that, because tall arrays are only brought into memory for the purpose of evaluating expressions, and then only a few rows at a time, the above would not cause immediate memory problems. However, it seems to fill up my ram in exactly the same way as calling
gcp;
C={};
for i=1:1000
C = [C,{ones(1000,1,1000,2)}];
pause(0.05)
end
That is, using tall arrays does not seem to have any impact on memory usage at all.
If I wish to store large arrays of data produced by MatLab outside of memory, how should I do it? Using tall arrays doesn't seem to work.
Note: I am using MatLab 2017a, which doesn't support vertical concatenation of tall arrays. As such I am using the structure
{rows1,rows2,...,rowsn}
to represent blocks of rows of the same array. This may not be optimal.

As mentioned in the comments - the local tall array constructor (where you give it local data rather than a datastore) is generally used only for in-memory prototyping before you point your tall arrays at your real large data in a datastore.
You can use tall/write to write out your tall arrays to disk, then make a datastore reading in from the locations you used in write. This will process the data without loading it all back into memory at once.

Related

Large matrices with a certain structure: how can one define where memory allocation is not needed?

Is there a way to create a 3D array for which only certain elements are defined, while the rest does not take up memory?
Context: I am running Monte-Carlo simulations in which I want to solve 10^5 matrices. All of these matrices have a majority of elements that are zero, for which I wouldn't need to use 8 bytes of memory per element. These elements are the same for all matrices. For simplicity, I have combined all of these matrices into a 3D array, but if my matrices start to become too large, I encounter memory issues (since at matrix dimensions of 100*100*100000, the array already takes up 8 GB of memory).
One workaround would be to store every matrix element with its 10^6 iterations in a vector, that way, no additional information needs to be stored. The inconvenience is that then I would need to work with more than 50 different vectors, and I prefer working with arrays.
Is there any way to tell R that some matrix elements don't need information?
I have been thinking that defining a new class could help for this, but since I have just discovered classes, I am not sure what all the options are. Do you think this could be a good approach? Are there specific things I should keep in mind?
I also know that there are packages made to deal with memory problems, but that did not seem like the quickest solution in terms of human and computation effort for this specific problem.

repeating a vector many times without repmat MATLAB

I have a vector with very large size in column format, I want to repeat this vector multiple times. the simple method that works for small arrays is repmat but I am running out of memory. I used bsxfun but still no success, MATLAB gives me an error of memory for using ones. any idea how to do that?
Here is the simple code (just for demonstration):
t=linspace(0,1000,89759)';
tt=repmat(t,1,length(t));
or using bsxfun:
tt=bsxfun(#times,t, ones(length(t),length(t)));

The problem here is simply too much data, it does not have to do with the repmat function itself. To verify that it is too much data, you can simply try creating a matrix of ones of that size with a clear workspace to reproduce the error. On my system, I get this error:
>> clear
>> a = ones(89759,89759)
Error using ones
Requested 89759x89759 (60.0GB) array exceeds maximum array size preference. Creation of arrays greater than
this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
So you fundamentally need to reduce the amount of data you are handling.
Also, I should note that plots will hold onto references to the data, so even if you try plotting this "in chunks", then you will still run into the same problem. So again, you fundamentally need to reduce the amount of data you are handling.

What are the advantages and disadvantages of 3d array in Mathematica

Edited...
Thanks for every one to try to help me!!!
i am trying to make a Finite Element Analysis in Mathemetica.... We can obtain all the local stiffness matrices that has 8x8 dimensions. I mean there are 2000 matrices they are similar but not same. every local stiffness matrix shown like a function that name is KK. For example KK[1] is first element local stiffness matrix
i am trying to assemble all the local matrices to make global stiffness matrix. To make it easy:
Do[K[e][i][j]=KK[[e]][[i]][[j]],{e,2000},{i,8},{j,8}]....edited
Here is my question.... this equality can affect the analysis time...If yes what can i do to improve this...
in matlab this is named as 3d array but i don't know what is called in Mathematica
what are the advantages and disadvantages of this explanation type in Mathematica...is t faster or is it easy way
Thanks for your help...

It is difficult to understand what your question is, so you might want to reformulate it.
As others have mentioned, there is no advantage to be expected from a switch from a 3D array to DownValues or SubValues. In fact you will then move from accessing data-structures to pattern matching, which is powerful and the real strength of Mathematica but not very efficient for what you plan to do, so I would strongly suggest to stay in the realm of ordinary arrays.
There is another thing that might not be clear for someone more familiar with matlab than with Mathematica: In Mathematica the "default" for arrays behave a lot like cell arrays in matlab: each entry can contain arbitrary content and they don't need to be rectangular (as High Performance Mark has mentioned they are just expressions with a head List and can roughly be compared to matlab cell arrays). But if such a nested list is a rectangular array and every element of it is of the same type such arrays can be converted to so called PackedArrays. PackedArrays are much more memory efficient and will also speed up many calculations, they behave in many respect like regular ("not-cell") arrays in matlab. This conversion is often done implicitly from functions like Table, which will oten return a packed array automatically. But if you are interested in efficiency it is a good idea to check with Developer`PackedArrayQ and convert explicitly with Developer`ToPackedArray if necessary. If you are working with PackedArrays speed and memory efficiency of many operations are much better and usually comparable to verctorized operations on normal matlab arrays. Unfortunately it can happen that packed arrays get "unpacked" by some operations, so if calculations become slow it is usually a good idea to check if that has happend.
Neither "normal" arrays nor PackedArrays are restricted in the rank (called Depth in Mathematica) they can have, so you can of course create and use "3D arrays" just as you can in matlab. I have never experienced or would know of any efficiency penalties when doing so.
It probably is of interest that newer versions of Mathematica (>= 10) bring the finite element method as one of the solver methods for NDSolve, so if you are not doing this as an exercise you might want to have a look what is available already, there is quite excessive documentation about it.
A final remark is that you can instead of kk[[e]][[i]][[j]] use the much more readable form kk[[e,i,j]] which is also easier and less error prone to type...

extended comment i guess, but
KK[e][[i]][[j]]
is not the (e,i,j) element of a "3d array". Note the single
brackets on the e. When you use the single brackets you are not denoting an array or list element but a DownValue, which is quite different from a list element.
If you do for example,
f[1]=0
f[2]=2
...
the resulting f appears similar to an array, but is actually more akin to an overloaded function in some other language. It is convenient because the indices need not be contiguous or even integers, but there is a significant performance drawback if you ever want to operate on the structure as a list.
Your 'do' loop example would almost certainly be better written as:
kk = Table[ k[e][i][j] ,{e,2000},{i,8},{j,8} ]
( Your loop wont even work as-is unless you previously "initialized" each of the kk[e] as an 8x8 array. )
Note now the list elements are all double bracketed, ie kk[[e]][[i]][[j]] or kk[[e,i,j]]

Preallocating arrays in Matlab?

I am using a simple for loop to crop a large amount of images and then storing them in a cell array. I keep getting the message:
The variable croppedSag appears to change size on every loop iteration. Consider preallocating for speed.
I have seen this several times before while coding in MATLAB. I have always ignored it and am curious how much preallocating will increase the runtime if I have, say, 10,000 images or a larger number?
Also, I have read about preallocating in the documentation and it says to use zeros() for that purpose. How would I use that for the code below?
croppedSag = {};
for i = 1:sagNum
croppedSag{end+1} = imcrop(SagArray{i},rect);
end
I didn't quite follow the examples in the documentation.

Pre-allocating an array is always a good idea in Matlab. The alternative is to have an array which grows during each iteration through a loop. Each time an element is added to the end of the array, Matlab must produce a totally new array, copy the contents of the old array into the new one, and then, finally, add the new element at the end. Pre-allocating eliminates the need to allocate a new array and spend time copying the existing contents of the array into the new memory.
However, in your case, you might not see as much benefit as you might expect. When copying the cell array to a new, enlarged cell array, Matlab doesn't actually have to copy the contents of the cell array (the image data), but only pointers to that data.
Nonetheless, there is no reason not to pre-allocate (unless you actually don't know the final size in advance). Here's a pre-allocated version of your loop:
croppedSag = cell(1, sagNum);
for ii = 1:sagNum
croppedSag{ii} = imcrop(SagArray{ii}, rect);
end
I also changed the index variable "i" to "ii" so that it doesn't over-write the imaginary unit.
You can also re-write this loop in one line using the cellfun function:
croppedSag = cellfun(#(im) imcrop(im, rect), SagArray);
Here's a blog entry that might be informative:
Matlab - Speed up your Code by Preallocating the size of Arrays, Cells, and Structures

How can I efficiently copy 2-dimensional arrays of bytes into a larger 2D array?

I have a structure called Patch that represents a 2D array of data.
newtype Size = (Int, Int)
data Patch = Patch Size Strict.ByteString
I want to construct a larger Patch from a set of smaller Patches and their assigned positions. (The Patches do not overlap.) The function looks like this:
newtype Position = (Int, Int)
combinePatches :: [(Position, Patch)] -> Patch
combinePatches plan = undefined
I see two sub-problems. First, I must define a function to translate 2D array copies into a set of 1D array copies. Second, I must construct the final Patch from all those copies.
Note that the final Patch will be around 4 MB of data. This is why I want to avoid a naive approach.
I'm fairly confident that I could do this horribly inefficiently, but I would like some advice on how to efficiently manipulate large 2D arrays in Haskell. I have been looking at the "vector" library, but I have never used it before.
Thanks for your time.

If the spec is really just a one-time creation of a new Patch from a set of previous ones and their positions, then this is a straightforward single-pass algorithm. Conceptually, I'd think of it as two steps -- first, combine the existing patches into a data structure with reasonable lookup for any give position. Next, write your new structure lazily by querying the compound structure. This should be roughly O(n log(m)) -- n being the size of the new array you're writing, and m being the number of patches.
This is conceptually much simpler if you use the Vector library instead of a raw ByteString. But it is simpler still if you simply use Data.Array.Unboxed. If you need arrays that can interop with C, then use Data.Array.Storable instead.
If you ditch purity, at least locally, and work with an ST array, you should be able to trivially do this in O(n) time. Of course, the constant factors will still be worse than using fast copying of chunks of memory at a time, but there's no way to keep that code from looking low-level.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight