Preallocating arrays in Matlab?

Preallocating arrays in Matlab? - arrays

I am using a simple for loop to crop a large amount of images and then storing them in a cell array. I keep getting the message:
The variable croppedSag appears to change size on every loop iteration. Consider preallocating for speed.
I have seen this several times before while coding in MATLAB. I have always ignored it and am curious how much preallocating will increase the runtime if I have, say, 10,000 images or a larger number?
Also, I have read about preallocating in the documentation and it says to use zeros() for that purpose. How would I use that for the code below?
croppedSag = {};
for i = 1:sagNum
croppedSag{end+1} = imcrop(SagArray{i},rect);
end
I didn't quite follow the examples in the documentation.

Pre-allocating an array is always a good idea in Matlab. The alternative is to have an array which grows during each iteration through a loop. Each time an element is added to the end of the array, Matlab must produce a totally new array, copy the contents of the old array into the new one, and then, finally, add the new element at the end. Pre-allocating eliminates the need to allocate a new array and spend time copying the existing contents of the array into the new memory.
However, in your case, you might not see as much benefit as you might expect. When copying the cell array to a new, enlarged cell array, Matlab doesn't actually have to copy the contents of the cell array (the image data), but only pointers to that data.
Nonetheless, there is no reason not to pre-allocate (unless you actually don't know the final size in advance). Here's a pre-allocated version of your loop:
croppedSag = cell(1, sagNum);
for ii = 1:sagNum
croppedSag{ii} = imcrop(SagArray{ii}, rect);
end
I also changed the index variable "i" to "ii" so that it doesn't over-write the imaginary unit.
You can also re-write this loop in one line using the cellfun function:
croppedSag = cellfun(#(im) imcrop(im, rect), SagArray);
Here's a blog entry that might be informative:
Matlab - Speed up your Code by Preallocating the size of Arrays, Cells, and Structures

Related

How to efficiently append items to large arrays in Swift?

I am working on a Swift project the involves very large dynamically changing arrays. I am running into a problem where each successive operation take longer than the former. I am reasonably sure this problem is caused by appending to the arrays, as I get the same problem with a simple test that just appends to a large array.
My Test Code:
import Foundation
func measureExecution(elements: Int, appendedValue: Int) -> Void {
var array = Array(0...elements)
//array.reserveCapacity(elements)
let start = DispatchTime.now()
array.append(appendedValue)
let end = DispatchTime.now()
print(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000_000)
}
for i in 0...100 {
measureExecution(elements: i*10000, appendedValue: 1)
}
This tries for a 100 different array sizes between 10000 and 1000000, timing how long it take to append one item to the end of the array. As I understand it, Swift arrays are dynamic arrays that will reallocate memory geometrically (it allocates more and more memory each time it needs to reallocate), which Apple's documentation says should mean appending a single element to an array is an O(1) operation when averaged over many calls to the append(_:) method (source). As such, I don't think memory allocation is causing the issue.
However, there is a linear relationship between the length of the array and the time it takes to append an element. I graphed the times for a bunch of array lengths, and baring some outliers it is pretty clearly O(n). I also ran the same test with reserved capacity (commented out in the code block) to confirm that memory allocation was not the issue, and I got nearly identical results:
How do I efficiently append to the end of massive arrays (preferably without using reserveCapacity)?

From what I've read, Swift arrays pre-allocate storage. Each time you fill an Array's allocated storage, it doubles the space allocated. That way you don't do a new memory allocation that often, and also don't allocate a bunch of space you don't need.
The Array class does have a reserveCapacity(_:). If you know how many elements you are going to store you might want to try that.

How to blit from a 1D array along a dimension of a 2D array?

I have a 2D array, and have computed necessary updates along a given dimension of it using a 1D array (said updates can't be computed in place as earlier calculations would override values needed in later calculations). I thus want to copy the updates into my 2D array. The most obvious way to do this would, at first glance, appear to be to use Array slicing and Array.blit.
I have tried the approach of extracting the relevant dimension using array slicing, and then blitting across to that, but that doesn't update the values inside the 2D array. I think what is happening is that a new, separate, 1D array is being created when I make the slice, and the values are being blitted into that new array, which of course is dropped a moment later when it goes back out of scope.
I suppose you could say that I was expecting the slicing to return a view into the 2D array which would work for the blit function call, but instead the slicing actually returns a new array with the values copied into it (which, thinking about it, is what slicing does otherwise, I believe).
Currently I am using a workaround whereby I create a 2D array, where one of the dimensions is only 1 element wide (thus effectively re-creating a 1D array), and then using Array2D.blit. I would prefer to do it directly though, both because I find this ugly, and moreover because it would be quite useful elsewhere in my program where I can't just declare a 1D array as 2D.
My first approach:
let srcArray = Array.zeroCreate srcArrayLength
... // do relevant computation
srcArray.[index] <- result
... // finish computation
Array.blit srcArray 0 destArray.[index, *] 0 srcArrayLength
My current approach:
let srcArray = Array2D.zeroCreate 1 srcArrayLength
... // do relevant computation
srcArray.[0,index] <- result
... // finish computation
Array2D.blit srcArray 0 0 destArray index 0 1 srcArrayLength
The former approach has no effect on my destination 2D array. The latter approach works where I use it, but as I said above it isn't nice, and cannot be used in another situation, where I have a jagged 2D array (i.e. 'a[][]) that I would like to blit across from.
How might I go about achieiving my aim? I thought of Span/Memory, but it wasn't clear to me if and how they could be used here. Alternatively, if you can spot a better way to do this that doesn't involve blit, I'm all-virtual-ears.

I figured out a fairly good solution to this, with the help of someone over in the F# Foundation Slack. Since nobody else has posted an answer, I'll put this one up.
Both Array.Copy (note that that is the .NET Array.Copy method, not the F#-specific Array.copy) and Buffer.BlockCopy were suggested to me. Array.Copy still complains about mismatching array types, but Buffer.BlockCopy ignores the dimensionality of the supplied array, and merely copies the specified number of bytes from one location to another. Using this and relying on the fact that 2D arrays are really stored as 1D arrays in row-major order (the same as C, I believe), it is quite possible to overwrite the last dimension of a multi-dimensional array reasonably cleanly.
I updated the code from the 'current approach' in my question to the below:
let srcArray = Array.zeroCreate srcArrayLength
... //do relevant computation
srcArray.[index] <- result
... //finish computation
Buffer.BlockCopy(srcArray, 0, destArray, firstDimIndex * lengthOfSecondDim * sizeof<'a>, lengthOfSecondDim * sizeof<'a>
Not only does it do the job in a way which I personally find a bit tidier, but it has a side-benefit in that it is noticeably faster than the second approach described in the question - I haven't yet run a benchmark to quantify the difference though.

Numpy concatenate is slow: any alternative approach?

I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?

This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)

Maybe creating an empty array with the correct size and than populating it?
if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation

It depends on what New_Rows[i] is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows elements are all the same length, and you want a 2d array, one New_Rows value per row, np.array does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array is designed primarily to build an array from a list of lists.
np.concatenate can also build in 2d, but the inputs need to be 2d to start with. vstack and stack can take care of that. But all those stack functions use some sort of list comprehension followed by concatenate.
In general it is better/faster to iterate or append with lists, and apply the np.array (or concatenate) just once. appending to a list is fast; much faster than making a new array.

I think #thebeancounter 's solution is the way to go.
If you do not know the exact size of your numpy array ahead of time, you can also take an approach similar to how vector class is implemented in C++.
To be more specific, you can wrap the numpy ndarray into a new class which has a default size which is larger than your current needs. When the numpy array is almost fully populated, copy the current array to a larger one.

Assume you have a large list of 2D numpy arrays, with the same number of columns and different number of rows like this :
x = [numpy_array1(r_1, c),......,numpy_arrayN(r_n, c)]
concatenate like this:
while len(x) != 1:
if len(x) == 2:
x = np.concatenate((x[0], x[1]))
break
for i in range(0, len(x), 2):
if (i+1) == len(x):
x[0] = np.concatenate((x[0], x[i]))
else:
x[i] = np.concatenate((x[i], x[i+1]))
x = x[::2]

creating an array of financial time series objects

I have a couple of fints, how do I preallocate a cell array so that I can loop through them later? I don't really care if they are stored as a cell array or array or anything different, I just want to be able to do the following
for(i = 1:size(stocks))
figure(i);
plot(stocks(i));
end
or something equivalent. allocating with stocks = zeros(0,5) works great first, but doesn't work when I try to insert the fints because it is assumes it is a double. How would you even go about preallocating arrays for financial time series obejcts? Since it would be different lenghts everytime you insert a new one.

From the Matlab's doc on Preallocate Memory for a Cell Array
Cell arrays do not require completely contiguous memory. However, each
cell requires contiguous memory, as does the cell array header that
MATLAB® creates to describe the array. For very large arrays,
incrementally increasing the number of cells or the number of elements
in a cell results in Out of Memory errors.
Initialize a cell array by calling the cell function, or by assigning
to the last element. For example, these statements are equivalent:
C = cell(25,50); C{25,50} = [];
MATLAB creates the header for a
25-by-50 cell array. However, MATLAB does not allocate any memory for
the contents of each cell.

Array of Matrices in MATLAB

I am looking for a way to store a large variable number of matrixes in an array in MATLAB.
Are there any ways to achieve this?
Example:
for i: 1:unknown
myArray(i) = zeros(500,800);
end
Where unknown is the varied length of the array, I can revise with additional info if needed.
Update:
Performance is the main reason I am trying to accomplish this. I had it before where it would grab the data as a single matrix, show it in real time and then proceed to process the next set of data.
I attempted it using multidimensional arrays as suggested below by Rocco, however my data is so large that I ran out of Memory, I might have to look into another alternative for my case. Will update as I attempt other suggestions.
Update 2:
Thank you all for suggestions, however I should have specified beforehand, precision AND speed are both an integral factor here, I may have to look into going back to my original method before trying 3-d arrays and re-evaluate the method for importing the data.

Use cell arrays. This has an advantage over 3D arrays in that it does not require a contiguous memory space to store all the matrices. In fact, each matrix can be stored in a different space in memory, which will save you from Out-of-Memory errors if your free memory is fragmented. Here is a sample function to create your matrices in a cell array:
function result = createArrays(nArrays, arraySize)
result = cell(1, nArrays);
for i = 1 : nArrays
result{i} = zeros(arraySize);
end
end
To use it:
myArray = createArrays(requiredNumberOfArrays, [500 800]);
And to access your elements:
myArray{1}(2,3) = 10;
If you can't know the number of matrices in advance, you could simply use MATLAB's dynamic indexing to make the array as large as you need. The performance overhead will be proportional to the size of the cell array, and is not affected by the size of the matrices themselves. For example:
myArray{1} = zeros(500, 800);
if twoRequired, myArray{2} = zeros(500, 800); end

If all of the matrices are going to be the same size (i.e. 500x800), then you can just make a 3D array:
nUnknown; % The number of unknown arrays
myArray = zeros(500,800,nUnknown);
To access one array, you would use the following syntax:
subMatrix = myArray(:,:,3); % Gets the third matrix
You can add more matrices to myArray in a couple of ways:
myArray = cat(3,myArray,zeros(500,800));
% OR
myArray(:,:,nUnknown+1) = zeros(500,800);
If each matrix is not going to be the same size, you would need to use cell arrays like Hosam suggested.
EDIT: I missed the part about running out of memory. I'm guessing your nUnknown is fairly large. You may have to switch the data type of the matrices (single or even a uintXX type if you are using integers). You can do this in the call to zeros:
myArray = zeros(500,800,nUnknown,'single');

myArrayOfMatrices = zeros(unknown,500,800);
If you're running out of memory throw more RAM in your system, and make sure you're running a 64 bit OS. Also try reducing your precision (do you really need doubles or can you get by with singles?):
myArrayOfMatrices = zeros(unknown,500,800,'single');
To append to that array try:
myArrayOfMatrices(unknown+1,:,:) = zeros(500,800);

I was doing some volume rendering in octave (matlab clone) and building my 3D arrays (ie an array of 2d slices) using
buffer=zeros(1,512*512*512,"uint16");
vol=reshape(buffer,512,512,512);
Memory consumption seemed to be efficient. (can't say the same for the subsequent speed of computations :^)

if you know what unknown is,
you can do something like
myArray = zeros(2,2);
for i: 1:unknown
myArray(:,i) = zeros(x,y);
end
However it has been a while since I last used matlab.
so this page might shed some light on the matter :
http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_prog/f1-86528.html

just do it like this
x=zeros(100,200);
for i=1:100
for j=1:200
x(i,j)=input('enter the number');
end
end

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight