I want to load a csv file to Matlab using testread(), since the data in it has more than 2 million records, so I should preallocate the array for those data.
Suppose I cannot know the exact length of arrays, the docs of MATLAB v6.5 recommend me to use repmat() for my expanding array. The original words in the doc is below:
"In cases where you cannot preallocate, see if you can increase the
size of your array using the repmat function. repmat tries to get you
a contiguous block of memory for your expanding array".
I really don't know how to use the repmat for expanding?
Does it mean by estimating a rough number of the length for repmat() to preallocating, and then remove the empty elements?
If so, how is that different from preallocating using zeros() or cell()?
The documentation also says:
When you preallocate a block of memory to hold a matrix of some type
other than double, it is more memory efficient and sometimes faster to
use the repmat function for this.
The statement below uses zeros to preallocate a 100-by-100 matrix of
uint8. It does this by first creating a full matrix of doubles, and
then converting the matrix to uint8. This costs time and uses memory
unnecessarily.
A = int8(zeros(100));
Using repmat, you create only one double, thus reducing your memory
needs.
A = repmat(int8(0), 100, 100);
Therefore, the advantage is if you want a datatype other than doubles, you can use repmat to replicate a non-double datatype.
Also see: http://undocumentedmatlab.com/blog/preallocation-performance, which suggests:
data1(1000,3000) = 0
instead of:
data1 = zeros(1000,3000)
to avoid initialisation of other elements.
As for dynamic resizing, repmat can be used to concisely double the size of your array (a common method which results in amortized O(1) appends for each element):
data = [0];
i = 1;
while another element
...
if i > numel(data)
data = repmat(data,1,2); % doubles the size of data
end
data(i) = element
i = i + 1;
end
And yes, after you have gathered all your elements, you can resize the array to remove empty elements at the end.
Related
In julia, one can pre-allocate an array of a given type and dims with
A = Array{<type>}(undef,<dims>)
example for a 10x10 matrix of floats
A = Array{Float64,2}(undef,10,10)
However, for array of array pre-allocation, it does not seem to be possible to provide a pre-allocation for the underlying arrays.
For instance, if I want to initialize a vector of n matrices of complex floats I can only figure this syntax,
A = Vector{Array{ComplexF64,2}}(undef, n)
but how could I preallocate the size of each Array in the vector, except with a loop afterwards ? I tried e.g.
A = Vector{Array{ComplexF64,2}(undef,10,10)}(undef, n)
which obviously does not work.
Remember that "allocate" means "give me a contiguous chunk of memory, of size exactly blah". For an array of arrays, which is really a contiguous chunk of pointers to other contiguous chunks, this doesn't really make sense in general as a combined operation -- the latter chunks might just totally differ.
However, by stating your problem, you make clear that you actually have more structural information: you know that you have n 10x10 arrays. This really is a 3D array, conceptually:
A = Array{Float64}(undef, n, 10, 10)
At that point, you can just take slices, or better: views along the first axis, if you need an array of them:
[#view A[i, :, :] for i in axes(A, 1)]
This is a length n array of AbstractArrays that in all respects behave like the individual 10x10 arrays you wanted.
In the cases like you have described you need to use comprehension:
a = [Matrix{ComplexF64}(undef, 2,3) for _ in 1:4]
This allocates a Vector of Arrays. In Julia's comprehension you can iterate over more dimensions so higher dimensionality is also available.
The essential part of the code in question can be distilled into:
list=rand(1,x); % where x is some arbitrarily large integer
hitlist=[];
for n=1:1:x
if rand(1) < list(n)
hitlist=[hitlist n];
end
end
list(hitlist)=[];
This program is running quite slowly and I suspect this is why, however I'm unaware how to fix it. The length of the hitlist will necessarily vary in a random way, so I can't simply preallocate a 'zeros' of the proper size. I contemplated making the hitlist a zeros the length of my list, but then I would have to remove all the superfluous zeros, and I don't know how to do that without having the same problem.
How can I preallocate an array of random size?
I'm unsure about preallocating 'random size', but you can preallocate in large chunks, e.g. 1e3, or however is useful for your use case:
list=rand(1,x); % where x is some arbitrarily large integer
a = 1e3; % Increment of preallocation
hitlist=zeros(1,a);
k=1; % counter
for n=1:1:x
if rand(1) < list(n)
hitlist(k) = n;
k=k+1;
end
if mod(k-1,a)==0 % if a has been reached
hitlist = [hitlist zeros(1,a)]; % extend
end
end
hitlist = hitlist(1:k-1); % trim excess
% hitlist(k:end) = []; % alternative trim, but might error
list(hitlist)=[];
This won't be the fastest possible, but at least a whole lot faster than incrementing each iteration. Make sure to choose a suitable; you can even base it somehow on the available amount of RAM using memory, and trim the excess afterwards, that way you don't have to do the in-loop trick at all.
As an aside: MATLAB works column-major, so running through matrices that way is faster. I.e. first the first column, then the second and so on. For a 1D array this doesn't matter, but for matrices it does. Hence I prefer to use list = rand(x,1), i.e. as column.
For this specific case, don't use this looped approach anyway, but use logical indexing:
list = rand(x,1);
list = list(list<rand(size(list)));
I have a matrix with the size 4*n, lets say for instance (4*3000)
So what is the fastest way to store and read the elements from the matrix
I have tried two solutions that have given me the same time approximately.
one array with a size of 12000 elements (2D --> 1D) read by (i+ width*j)
4 arrays with the size 1*3000 and then by using (IF ELSE or Switch case) statement to decide which array i should read
Thus, is there another solution to be used.
Furthermore, how to use the shift technique >> to solve the problem if its applicable for this case
First technique should be faster.
Also, you can improve performance by accessing elements inside of a loop in a row (...arr[11] = ...; arr[12] = ...; arr[13] = ...;...).
I have the above loop running on the above variables:
A is a 2d array of size mxn.
mask is a 1d logical array of size 1xn
results is a 1d array of size 1xn
B is a vector of the form mx1
C is a mxm matrix, m is the same as the above.
Edit: expanded foo(x) into the function.
here is the code:
temp = (B.'*C*B);
for k = 1:n
x = A(:,k);
if(mask(k) == 1)
result(k) = (B.'*C*x)^2 / (temp*(x.'*C*x)); %returns scalar
end
end
take note, I am already successfully using the above code as a parfor loop instead of for. I was hoping you would be able to suggest some way to use meshgrid or the sort to yield better performance improvement. I don't think I have RAM problems so a solution can also be expensive memory wise.
Many thanks.
try this:
result=(B.'*C*A).^2./diag(temp*(A.'*C*A))'.*mask;
This vectorization via matrix multiplication will also make sure that result is a 1xn vector. In the code you provided there can be a case where the last elements in mask are zeros, in this case your code will truncate result to a smaller length, whereas, in the answer it'll keep these elements zero.
If your foo admits matrix input, you could do:
result = zeros(1,n); % preallocate result with zeros
mask = logical(mask); % make mask logical type
result(mask) = foo(A(mask),:); % compute foo for all selected columns
Excerpt from the O'Reilly book :
From the above excerpt the author explain in performance terms why there should be a performance difference in big oh or other terms and the basis for the formula to find any element in n by c dimensional array.
Additional: Why are different data types used in the three dimensional example? Why would you even bother to represent this in different ways ?
The article seems to point out different ways to represent matrix data structures and the performance gains of a single array representation, although it doesn't really explain why you get the performance gains.
For example, to represent a NxNxN matrix:
In object form:
Cell {
int x,y,z;
}
Matrix {
int size = 10;
Cell[] cells = new Cell[size];
}
In three-arrays form:
Matrix {
int size = 10;
int[][][] data = new int[size][size][size];
}
In a single array:
Matrx {
int size = 10;
int[] data = new int[size*size*size];
}
To your question, there is a performance gain by representing a NxN matrix as a single array of N*N length, you gain performance because of caching (assuming you cannot fit the entire matrix in one chunk); a single array representation guarantees the entire matrix will be in a contiguous chunk of memory. When data is moved from memory into cache (or disk into memory), it is moved in chunks, you sometimes grabs more data than you need. The extra data you grab contains the area surrounding the data you need.
Say, you are processing the matrix row by row. When getting new data, the OS can grab N+10 items per chunk. In the NxN case, the extra data (+10) may be unrelated data. In the case of a N*N length array, the extra data (+10) is most likely from the matrix.
This article from SGI seems to give a bit more detail, specifically the Principles of Good Cache Use:
http://techpubs.sgi.com/library/dynaweb_docs/0640/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch06.html