I have a row vector q with 200 elements, and another row vector, dij, which is the output of the pdist function with currently 48216200 elements, but I'd like to be able to go higher. The operation I want to do is essentially:
t=sum(q'*dij,2);
However, since this tries to allocate a 200x48211290 array, it complains that this would require 70GB of memory. Therefore I do it this way:
t = zeros(numel(q),1);
for i=1:numel(q)
qi = q(i);
factor = qi*dij;
t(i)=sum(factor);
end
However, this takes too much time. By too much time, I mean it takes about 36s, which is orders of magnitude longer than the time required by the pdist function. Is there a way I can speed up this operation without explicitly allocating so much memory? I'm assuming here, that if the first way could allocate the memory, (being a vector operation) it would be faster.
Just use the distributive property of multiplication with respect to addition:
t = q'*sum(dij);
for testing what Cris said in the first post comment I created 3 ".m" files as follows:
vec.m :
res=sum(sin(d.*q')./(d.*q'));
forloop.m
for i=1:200
res(i)=sum(sin(d.*q(i))./(d.*q(i)));
end
and test.m:
clc
clear all
d=rand(4e6,1);
q=rand(200,1);
res=zeros(1,200);
forloop;
vec;
forloop;
vec;
forloop;
vec;
then I used matlab run and time profiler ,
the results were very surprising ! :
3 calls to forloop : ~10.5 S
3 call to vec : 15.5 S !!!
and additionally when I converted data to single the results were:
... forloop : 7.5 S
... vec : 8.5 S
I don't know precisely why for-loop is faster in these scenarios, but as for your problem, you could speed up things by generating lesser variables in the loop and using vertical vectors( i think). and finally converting your data to single values :
q=single(rand(200,1));
...
Related
I am trying to perform a specific downsampling process. It is described by the following pseudocode.
//Let V be an input image with dimension of M by N (row by column)
//Let U be the destination image of size floor((M+1)/2) by floor((N+1)/2)
//The floor function is to emphasize the rounding for the even dimensions
//U and V are part of a wrapper class of Pixel_FFFF vImageBuffer
for i in 0 ..< U.size.rows {
for j in 0 ..< U.size.columns {
U[i,j] = V[(i * 2), (j * 2)]
}
}
The process basically takes pixel values on every other locations spanning on both dimensions. The resulting image will be approximately half of the original image.
On a one-time call, the process is relatively fast running by itself. However, it becomes a bottleneck when the code is called numerous times inside a bigger algorithm. Therefore, I am trying to optimize it. Since I use Accelerate in my app, I would like to be able to adapt this process in a similar spirit.
Attempts
First, this process can be easily done by a 2D convolution using the 1x1 kernel [1] with a stride [2,2]. Hence, I considered the function vImageConvolve_ARGBFFFF. However, I couldn't find a way to specify the stride. This function would be the best solution, since it takes care of the image Pixel_FFFF structure.
Second, I notice that this is merely transferring data from one array to another array. So, I thought vDSP_vgathr function is a good solution for this. However, I hit a wall, since the resulting vector of vectorizing a vImageBuffer would be the interleaving bits structure A,R,G,B,A,R,G,B,..., which each term is 4 bytes. vDSP_vgathr function transfers every 4 bytes to the destination array using a specified indexing vector. I could use a linear indexing formula to make such vector. But, considering both even and odd dimensions, generating the indexing vector would be as inefficient as the original solution. It would require loops.
Also, neither of the vDSP 2D convolution functions fit the solution.
Is there any other functions in Accelerate that I might have overlooked? I saw that there's a stride option in the vDSP 1D convolution functions. Maybe, does someone know an efficient way to translate 2D convolution process with strides to 1D convolution process?
I have a vector with very large size in column format, I want to repeat this vector multiple times. the simple method that works for small arrays is repmat but I am running out of memory. I used bsxfun but still no success, MATLAB gives me an error of memory for using ones. any idea how to do that?
Here is the simple code (just for demonstration):
t=linspace(0,1000,89759)';
tt=repmat(t,1,length(t));
or using bsxfun:
tt=bsxfun(#times,t, ones(length(t),length(t)));
The problem here is simply too much data, it does not have to do with the repmat function itself. To verify that it is too much data, you can simply try creating a matrix of ones of that size with a clear workspace to reproduce the error. On my system, I get this error:
>> clear
>> a = ones(89759,89759)
Error using ones
Requested 89759x89759 (60.0GB) array exceeds maximum array size preference. Creation of arrays greater than
this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
So you fundamentally need to reduce the amount of data you are handling.
Also, I should note that plots will hold onto references to the data, so even if you try plotting this "in chunks", then you will still run into the same problem. So again, you fundamentally need to reduce the amount of data you are handling.
currently I have the following portion of code:
for i = 2:N-1
res(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
where as the variables k, m, c and e are vectors of size N and x is a vector of size 2*N. Is there any way to do this a lot faster using something like arrayfun!? I couldn't figure this out :( I especially want to make it faster by running on the GPU later and thus, arrayfun would be also helpfull since matlab doesn't support parallelizing for-loops and I don't want to buy the jacket package...
Thanks a lot!
You don't have to use arrayfun. It works if use use some smart indexing:
clear all
N=20;
k=rand(N,1);
m=rand(N,1);
c=rand(N,1);
e=rand(N,1);
x=rand(2*N,1);
% for-based implementation
%Watch out, you are not filling the first element of forres!
forres=zeros(N-1,1); %Initialize array first to gain some speed.
for i = 2:N-1
forres(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
%vectorized implementation
parres=k(2:N-1)./m(2:N-1).*x(1:N-2) -(c(2:N-1)+c(3:N))./m(2:N-1).*x(N+2:2*N-1) +e(3:N)./m(2:N-1).*x(3:N);
%compare results; strip the first element from forres
difference=forres(2:end)-parres %#ok<NOPTS>
Firstly, MATLAB does support parallel for loops via PARFOR. However, that doesn't have much chance of speeding up this sort of computation since the amount of computation is small compared to the amount of data you're reading and writing.
To restructure things for GPUArray "arrayfun", you need to make all the array references in the loop body refer to the loop iterate, and have the loop run across the full range. You should be able to do this by offsetting some of the arrays, and padding with dummy values. For example, you could prepend all your arrays with NaN, and replace x(i-1) with a new variable x_1 = [x(2:N) NaN]
What are the pros and contras of using a Vector.<> instead of array?
From the adobe documentation page:
As a result of its restrictions, a Vector has two primary benefits over an Array instance whose elements are all instances of a single class:
Performance: array element access and iteration are much faster when using a Vector instance than when using an Array.
Type safety: in strict mode the compiler can identify data type errors such as assigning a value of the incorrect data type to a Vector or expecting the wrong data type when reading a value from a Vector. Note, however,
that when using the push() method or unshift() method to add values to a Vector, the arguments' data types are not checked at compile time but are checked at run time.
Pro: Vector is faster than Array - e.g. see this: Faster JPEG Encoding with Flash Player 10
Contra: Vector requires FP10, and according to http://riastats.com/ some 20% of users are still using FP9
Vectors are faster. Although for sequential iteration the fastest thing seems to be linked-lists.
Vectors can also be useful for bitmap operations (check out BitmapData.setVector, also BitmapData.lock and unlock).
The linked list example mentioned earlier in comments is incorrectly written though it skips odd nodes and because of that only iterates the half amount of the same data. No wonder he get so great results, might be faster with correct code as well, but not the same % difference. The loop sets current = current.next one time too much (both in the loop and as loop-condition) each iteration which cause that behavior.
According flash player penetration website it is a little higher. Around the 85%
This is the source
I am looking for a way to store a large variable number of matrixes in an array in MATLAB.
Are there any ways to achieve this?
Example:
for i: 1:unknown
myArray(i) = zeros(500,800);
end
Where unknown is the varied length of the array, I can revise with additional info if needed.
Update:
Performance is the main reason I am trying to accomplish this. I had it before where it would grab the data as a single matrix, show it in real time and then proceed to process the next set of data.
I attempted it using multidimensional arrays as suggested below by Rocco, however my data is so large that I ran out of Memory, I might have to look into another alternative for my case. Will update as I attempt other suggestions.
Update 2:
Thank you all for suggestions, however I should have specified beforehand, precision AND speed are both an integral factor here, I may have to look into going back to my original method before trying 3-d arrays and re-evaluate the method for importing the data.
Use cell arrays. This has an advantage over 3D arrays in that it does not require a contiguous memory space to store all the matrices. In fact, each matrix can be stored in a different space in memory, which will save you from Out-of-Memory errors if your free memory is fragmented. Here is a sample function to create your matrices in a cell array:
function result = createArrays(nArrays, arraySize)
result = cell(1, nArrays);
for i = 1 : nArrays
result{i} = zeros(arraySize);
end
end
To use it:
myArray = createArrays(requiredNumberOfArrays, [500 800]);
And to access your elements:
myArray{1}(2,3) = 10;
If you can't know the number of matrices in advance, you could simply use MATLAB's dynamic indexing to make the array as large as you need. The performance overhead will be proportional to the size of the cell array, and is not affected by the size of the matrices themselves. For example:
myArray{1} = zeros(500, 800);
if twoRequired, myArray{2} = zeros(500, 800); end
If all of the matrices are going to be the same size (i.e. 500x800), then you can just make a 3D array:
nUnknown; % The number of unknown arrays
myArray = zeros(500,800,nUnknown);
To access one array, you would use the following syntax:
subMatrix = myArray(:,:,3); % Gets the third matrix
You can add more matrices to myArray in a couple of ways:
myArray = cat(3,myArray,zeros(500,800));
% OR
myArray(:,:,nUnknown+1) = zeros(500,800);
If each matrix is not going to be the same size, you would need to use cell arrays like Hosam suggested.
EDIT: I missed the part about running out of memory. I'm guessing your nUnknown is fairly large. You may have to switch the data type of the matrices (single or even a uintXX type if you are using integers). You can do this in the call to zeros:
myArray = zeros(500,800,nUnknown,'single');
myArrayOfMatrices = zeros(unknown,500,800);
If you're running out of memory throw more RAM in your system, and make sure you're running a 64 bit OS. Also try reducing your precision (do you really need doubles or can you get by with singles?):
myArrayOfMatrices = zeros(unknown,500,800,'single');
To append to that array try:
myArrayOfMatrices(unknown+1,:,:) = zeros(500,800);
I was doing some volume rendering in octave (matlab clone) and building my 3D arrays (ie an array of 2d slices) using
buffer=zeros(1,512*512*512,"uint16");
vol=reshape(buffer,512,512,512);
Memory consumption seemed to be efficient. (can't say the same for the subsequent speed of computations :^)
if you know what unknown is,
you can do something like
myArray = zeros(2,2);
for i: 1:unknown
myArray(:,i) = zeros(x,y);
end
However it has been a while since I last used matlab.
so this page might shed some light on the matter :
http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_prog/f1-86528.html
just do it like this
x=zeros(100,200);
for i=1:100
for j=1:200
x(i,j)=input('enter the number');
end
end