Is there a way to perform 2D convolutions with strides using Accelerate library in Swift? - arrays

I am trying to perform a specific downsampling process. It is described by the following pseudocode.
//Let V be an input image with dimension of M by N (row by column)
//Let U be the destination image of size floor((M+1)/2) by floor((N+1)/2)
//The floor function is to emphasize the rounding for the even dimensions
//U and V are part of a wrapper class of Pixel_FFFF vImageBuffer
for i in 0 ..< U.size.rows {
for j in 0 ..< U.size.columns {
U[i,j] = V[(i * 2), (j * 2)]
}
}
The process basically takes pixel values on every other locations spanning on both dimensions. The resulting image will be approximately half of the original image.
On a one-time call, the process is relatively fast running by itself. However, it becomes a bottleneck when the code is called numerous times inside a bigger algorithm. Therefore, I am trying to optimize it. Since I use Accelerate in my app, I would like to be able to adapt this process in a similar spirit.
Attempts
First, this process can be easily done by a 2D convolution using the 1x1 kernel [1] with a stride [2,2]. Hence, I considered the function vImageConvolve_ARGBFFFF. However, I couldn't find a way to specify the stride. This function would be the best solution, since it takes care of the image Pixel_FFFF structure.
Second, I notice that this is merely transferring data from one array to another array. So, I thought vDSP_vgathr function is a good solution for this. However, I hit a wall, since the resulting vector of vectorizing a vImageBuffer would be the interleaving bits structure A,R,G,B,A,R,G,B,..., which each term is 4 bytes. vDSP_vgathr function transfers every 4 bytes to the destination array using a specified indexing vector. I could use a linear indexing formula to make such vector. But, considering both even and odd dimensions, generating the indexing vector would be as inefficient as the original solution. It would require loops.
Also, neither of the vDSP 2D convolution functions fit the solution.
Is there any other functions in Accelerate that I might have overlooked? I saw that there's a stride option in the vDSP 1D convolution functions. Maybe, does someone know an efficient way to translate 2D convolution process with strides to 1D convolution process?

Related

Large matrices with a certain structure: how can one define where memory allocation is not needed?

Is there a way to create a 3D array for which only certain elements are defined, while the rest does not take up memory?
Context: I am running Monte-Carlo simulations in which I want to solve 10^5 matrices. All of these matrices have a majority of elements that are zero, for which I wouldn't need to use 8 bytes of memory per element. These elements are the same for all matrices. For simplicity, I have combined all of these matrices into a 3D array, but if my matrices start to become too large, I encounter memory issues (since at matrix dimensions of 100*100*100000, the array already takes up 8 GB of memory).
One workaround would be to store every matrix element with its 10^6 iterations in a vector, that way, no additional information needs to be stored. The inconvenience is that then I would need to work with more than 50 different vectors, and I prefer working with arrays.
Is there any way to tell R that some matrix elements don't need information?
I have been thinking that defining a new class could help for this, but since I have just discovered classes, I am not sure what all the options are. Do you think this could be a good approach? Are there specific things I should keep in mind?
I also know that there are packages made to deal with memory problems, but that did not seem like the quickest solution in terms of human and computation effort for this specific problem.

Matlab: multiply subset of three dimensional array with two dimensional array

I have a AxBxC array where AXB are pointing to individual grids of a field that i sampled (like coordinates) and C corresponds to the layers underneath. Now I want to calculate the impact of certain activities on these individual points by multiplying it with a 2D matrix.
E.g.
x=5; %x-Dimensions of the sampled area
y=5; %y-Dimensions of the sampled area
z=3; %z-number of layers sampled
Area= zeros(x,y,z);
AreaN= zeros(x,y,z);
now I want to multiply every layer of a given point in X*Y with:
AppA=[0.4,0.4,0.2;0.4,0.5,0.1;0.1,0.2,0.7];
I tried:
for i=1:x
for j=1:y
AreaN(i,j,:)= AppA*Area(i,j,:);
end
end
Unfotunately I get the error:
Error using *
Inputs must be 2-D, or at least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.
Any help to this is appreciated since I am not yet really familiar with matlab.
Correct Approach
I think, to correct your code, you need to convert that Area(i,j,:) to a column vector, which you can do with squeeze. Thus, the correct loop-based code would look something like this -
AreaN= zeros(x,y,z);
for i=1:x
for j=1:y
AreaN(i,j,:)= AppA*squeeze(Area(i,j,:));
end
end
Now, there are efficient no-loop/vectorized approaches that can be suggested here to get to the output.
Vectorized Approach #1
First approach could be with matrix multiplication and has to be pretty efficient one -
AreaN = reshape(reshape(Area,x*y,z)*AppA.',x,y,z)
Vectorized Approach #2
Second one with bsxfun -
AreaN = squeeze(sum(bsxfun(#times,Area,permute(AppA,[3 4 2 1])),3))
Vectorized Approach #2 Rev 1
If you would like to get rid of the squeeze in the bsxfun code, you need to use an extra permute in there -
AreaN = sum(bsxfun(#times,permute(Area,[1 2 4 3]),permute(AppA,[4 3 1 2])),4)
This would solve the matrix multiplication problem:
AreaN(i,j,:)= AppA*reshape(Area(i,j,:),3,[]);
You might want to consider using bsxfun to aviod loops.

How to indicate specific slice of a 3D array in MATLAB using GPUs?

I have a 4x4x1250 matrix in MATLAB. I want to find a way to move through the 4x4 matrices slice by slice in order to find the condition of the 4x4 matrices individually.
I don't want to do it in a loop because I want to do this on the GPU and would like it to be indexed.
I saw "squeeze", but I don't think it works for 3D arrays...
I kind of want to use arrayfun, but I don't know how to indicate the specific dimension that I'm interested in.
Any ideas?
Edit: I thought the details I gave are sufficient, nevertheless:
I have a matrix A, size 4x4x1250.
I am interested in the conditions of the 1250 4x4 matrices that make up A. So lets say B = A(:,:,1).
I want to calculate cond(B), but in reality I want 1250 of these calculations.
If I do arrayfun, I don't know how to specify the specific dimension of A along which to slice.
ARRAYFUN disregards the shape of the input, and operates in a purely element-wise fashion. There's also PAGEFUN on the GPU which operates on pages of an array - however, PAGEFUN only really offers an advantage if you're using one of the functions explicitly supported - otherwise it operates in an element-wise fashion.

Making a for-loop in Matlab faster by using arrayfun?

currently I have the following portion of code:
for i = 2:N-1
res(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
where as the variables k, m, c and e are vectors of size N and x is a vector of size 2*N. Is there any way to do this a lot faster using something like arrayfun!? I couldn't figure this out :( I especially want to make it faster by running on the GPU later and thus, arrayfun would be also helpfull since matlab doesn't support parallelizing for-loops and I don't want to buy the jacket package...
Thanks a lot!
You don't have to use arrayfun. It works if use use some smart indexing:
clear all
N=20;
k=rand(N,1);
m=rand(N,1);
c=rand(N,1);
e=rand(N,1);
x=rand(2*N,1);
% for-based implementation
%Watch out, you are not filling the first element of forres!
forres=zeros(N-1,1); %Initialize array first to gain some speed.
for i = 2:N-1
forres(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
%vectorized implementation
parres=k(2:N-1)./m(2:N-1).*x(1:N-2) -(c(2:N-1)+c(3:N))./m(2:N-1).*x(N+2:2*N-1) +e(3:N)./m(2:N-1).*x(3:N);
%compare results; strip the first element from forres
difference=forres(2:end)-parres %#ok<NOPTS>
Firstly, MATLAB does support parallel for loops via PARFOR. However, that doesn't have much chance of speeding up this sort of computation since the amount of computation is small compared to the amount of data you're reading and writing.
To restructure things for GPUArray "arrayfun", you need to make all the array references in the loop body refer to the loop iterate, and have the loop run across the full range. You should be able to do this by offsetting some of the arrays, and padding with dummy values. For example, you could prepend all your arrays with NaN, and replace x(i-1) with a new variable x_1 = [x(2:N) NaN]

Slow array operations

I'm a quite new MatLab programmer, so this might be an easy one.. :)
I'm trying to generate a script, that will be able to read any number of XYZ-files, in any order, into a array, and arrange them in the array according to the X and Y coordinates given in the file..
My attempt is to use Load to get the files into a array, and after that, read through the array and, as explained, use the X and Y coordinate as the locations in a new array..
I've tried presetting the array size, and also I'm subtracting a value from both X and Y to minimize the size of the array (fullArray)
%# Script for extraction of XYZ-data from DSM/DTM xyz files
%# Define folders and filter
DSMfolder='/share/CFDwork/site/OFSites/MABH/DSM/*.xyz';
DTMfolder='/share/CFDwork/site/OFSites/MABH/DTM/*.xyz';
%# Define minimumvalues, to reduce arrays.. Please leave some slack, for the
%# reduction-algorithm..
borderX=100000;
borderY=210000;
%% Expected array-size
expSizeX=20000;
expSizeY=20000;
%# Program starts.. Please do not edit below this line!
files=ls(DSMfolder);
clear fullArray
fullArray=zeros(expSizeX,expSizeY);
minX=999999999;
minY=999999999;
maxX=0;
maxY=0;
disp('Reading DSM files');
[thisFile,remaining]=strtok(files);
while (~isempty(thisFile))
disp(['Reading: ' thisFile]);
clear fromFile;
fromFile=load(thisFile);
for k=1:size(fromFile,1)
tic
fullArray(fromFile(k,1)-borderX,fromFile(k,2)-borderY)=fromFile(k,3);
disp([k size(fromFile,1)]);
if (fromFile(k,1)<minX)
minX=fromFile(k,1);
end
if (fromFile(k,2)<minY)
minY=fromFile(k,2);
end
if (fromFile(k,1)>maxX)
maxX=fromFile(k,1);
end
if (fromFile(k,2)>maxY)
maxY=fromFile(k,2);
end
toc
end
[thisFile,remaining]=strtok(remaining);
end
As can be seen, I've added a tic-toc, and the time was 3.36secs for one operation!
Any suggestion on, why this is so slow, and how to improve the speed.. I need to order 2x6,000,000 lines, and I can't be bothered to wait 466 days.. :D
Best regards
Mark
Have you considered using a sparse matrix?
A sparse matrix in matlab is defined by a list of values and their location in the matrix -
incidentally this matches your input file perfectly.
While this representation is generally meant for matrices which are truly sparse, (i.e. most of their values are zeros), it appears that in your case it would be much faster to load the matrix using the sparse function even if it is not truly sparse.
Since your data is organised in such a way (location of every data point) my guess is it is sparse anyway.
The function to create a sparse matrix takes the location as columns so instead of a for loop your code will look something like this (this segment replaces the whole for loop):
minX = min(fromFile(:,1);
maxX = max(fromFile(:,1);
minY = min(fromFile(:,2);
minY = max(fromFile(:,2);
S = sparse(fromFile(:,1) - borderX, fromFile(:,2) - borderY, fromFile(:,3));
Note that the other change I've made is calculating minimum / maximum values directly from the matrix - this is much faster than going over a for loop, as operating on vectors and matrices unleashes the true power of matlab :)
You can perform all sorts of operations on the sparse matrix, but if you want to convert it to a regular matrix you can use the matlab full function.
More information here and there.

Resources