Calculating sum of array elements and reiterate for entire array in MATLAB - arrays

I have a vector A of size 7812x1 and would like to calculate the sum of fixed windows of length 21 (so 372 blocks). This should be reiterated, so that the output should return a vector of size 372x1.
I have t=7812, p=372, w=21;
for t=1:p
out = sum(A((t*w-w+1):(t*w)));
end
This code, however, does not work. My idea is that the part ((t*w-w+1):(t*w)) allows for something like a rolling window. The window is of length 21, so there is not really a need to express is with variables, yet I think it keeps some flexibility.
I've seen potentially related questions (such a partial sum of a vector), yet I'm not sure whether this would result the output desired.

Reshape into a matrix so that each block of A is a column, and compute the sum of each colum:
result = sum(reshape(A, w, []), 1);

Following your idea of using a rolling/moving window (requires Matlab 2016a or later):
t = 7812; w = 21; % your parameters
A = rand(t,1); % generate some test data
B = movsum(A,w); % the sum of a moving window with width w
out = B(ceil(w/2):w:end); % get every w'th element

Related

MATLAB: Improving for-loop

I need to multiply parts of a column vector with a fixed row vector. I solved this problem using a for-loop. However, I am wondering if the performance can be improved as I have to perform this kind of computation around 50 million times. Here's my code so far:
multMat = 1:5;
mat = randi(5,10,1);
windowSize = 5;
vout = nan(10,1);
for r = windowSize : 10
vout(r) = multMat * mat( (r - windowSize + 1) : r);
end
I was thinking about uisng arrayfun. However, first I don't know how to adress the cell range (i.e. the previous five cells including the current cell), and second, I am not sure if arrayfun will be any faster than using the loop?
This sliding vector multiplication you're describing is an example of what is known as convolution. The following produces the same result as the loop in your example:
vout = [nan(windowSize-1,1);
conv(mat,flip(multMat),'valid')];
If your output doesn't really need the leading NaN values which aren't overwritten in your loop then the conv expression is sufficient without concatenating the NaN elements to it.
For sufficiently large vectors this is of course not guaranteed to be as fast as you'd like it to be, but MATLAB's built-in convolution implementation is likely to be pretty close to an optimal tool for the job.

How to average independent consecutive blocks of an array as fast as possible?

Here is the problem:
data = 1:0.5:(8E6+0.5);
An array of 16 million points, needs to be averaged every 10,000 elements.
Like this:
x = mean(data(1:10000))
But repeated N times, where N depends on the number of elements we average over
range = 10000;
N = ceil(numel(data)/range);
My current method is this:
data(1) = mean(data(1,1:range));
for i = 2:N
data(i) = mean(data(1,range*(i-1):range*i));
end
How can the speed be improved?
N.B: We need to overwrite the original array of data (essentially bin the data and average it)
data = 1:0.5:(8E6-0.5); % Your data, actually 16M-2 elements
N = 1e4; % Amount to average over
tmp = mod(numel(data),N); % find out whether it fits
data = [data nan(1,N-tmp)]; % add NaN if necessary
data2=reshape(data,N,[]); % reshape into a matrix
out = nanmean(data2,1); % get average over the rows, ignoring NaN
Visual confirmation that it works using plot(out)
Note that technically you can't do what you want if mod(numel(data),N) is not equal to 0, since then you'd have a remainder. I elected to average over everything in there, although ignoring the remainder is also an option.
If you're sure mod(numel(data),N) is zero every time, you can leave all that out and reshape directly. I'd not recommend using this though, because if your mod is not 0, this will error out on the reshape:
data = 1:0.5:(8E6+0.5); % 16M elements now
N = 1e4; % Amount to average over
out = sum(reshape(data,N,[]),1)./N; % alternative
This is a bit wasteful, but you can use movmean (which will handle the endpoints the way you want it to) and then subsample the output:
y = movmean(x, [0 9999]);
y = y(1:10000:end);
Even though this is wasteful (you're computing a lot of elements you don't need), it appears to outperform the nanmean approach (at least on my machine).
=====================
There's also the option to just compensate for the extra elements you added:
x = 1:0.5:(8E6-0.5);
K = 1e4;
Npad = ceil(length(x)/K)*K - length(x);
x((end+1):(end+Npad)) = 0;
y = mean(reshape(x, K, []));
y(end) = y(end) * K/(K - Npad);
reshape the data array into a 10000XN matrix, then compute the mean of each column using the mean function.

Compute mean value over a sliding window in MATLAB

I have a time series data or considering a real valued data of length N. I want to create sub-blocks of length k, which is the window length. The value of k can be arbitrarily chosen. This creates problem since the window size is the same across the data. I want to store each subblock in an array. But I am stuck in creating sub-blocks of the data and to include a check so that the (mod(N, nseg)) nseg must be divisible by the data length.
N = 512; %length of the time series
data = rand(N,1);
window_length = 30; %k
Nseg = floor(N/window_length) %Number of segments or blocks
Modified_Data = [mean(reshape(data,window_length,Nseg))]; %Throws error
If you have the Image Processing toolbox you could use im2col to slide a specific block size over the entire time series. Each column of the output represents the data from one of those blocks.
values = im2col(data, [window_length 1], 'distinct');
Since it looks like you just want the mean over each block, you could also use blockproc to do this.
means = blockproc(data, [window_length, 1], #(x)mean(x.data));
If you do not have the Image Processing Toolbox, you can instead use accumarray to perform this task.
means = accumarray(floor((0:(N-1)).'/window_length) + 1, data, [], #mean);
If you want to discard any data that extends beyond a number which is divisible by window_length, you can do this with something like the following:
data = data(1:(numel(data) - mod(numel(data), window_length)));
If you want overlapping data, you'll either want to use straight-up convolution (the preferred method)
means = conv(data(:), ones(5, 1)/5, 'same');
Or you can create overlapping blocks with im2col by omitting the last input.
values = im2col(data, [window_length 1]);
means = mean(values 1);
If you have R2016a+, consider using the built-in movmean function:
N = 512; %length of the time series
data = rand(N,1);
window_length = 30; %k
Modified_Data = movmean(data, window_length);
See the documentation for further details and other options.
If I understand your question correctly, it's pretty straightforward:
filter(ones(N,1)/N,1,signal)
If you think about it filtering with [1/N 1/N 1/N...1/N] is exactly calculating the localized mean...

Matrix calculations without loops in MATLAB

I have an issue with a code performing some array operations. It is too slow, because I use loops and input data are quite big. It was the easiest way for me, but now I am looking for something faster than for loops. I was trying to optimize or rewrite code, but unsuccessful. I really aprecciate Your help.
In my code I have three arrays x1, y1 (coordinates of points in grid), g1 (values in the points) and for example their size is 300 x 300. I treat each matrix as composition of 9 and I make calculation for points in the middle one. For example I start with g1(101,101), but I am using data from g1(1:201,1:201)=g2. I need to calculate distance from each point of g1(1:201,1:201) to g1(101,101) (ll matrix), then I calculate nn as it is in the code, next I find value for g1(101,101) from nn and put it in N array. Then I go to g1(101,102) and so on until g1(200,200), where in this last case g2=g1(99:300,99:300).
As i said, this code is not very efficient, even I have to use larger arrays than I gave in the example, it takes too much time. I hope I explain enough clearly what I expect from the code. I was thinking of using arrayfun, but I have never worked with this function, so I don't know how should use it, however it seems to me it won't handle. Maybe there are other solutions, however I couldn't find anything apropriate.
tic
x1=randn(300,300);
y1=randn(300,300);
g1=randn(300,300);
m=size(g1,1);
n=size(g1,2);
w=1/3*m;
k=1/3*n;
N=zeros(w,k);
for i=w+1:2*w
for j=k+1:2*k
x=x1(i,j);
y=y1(i,j);
x2=y1(i-k:i+k,j-w:j+w);
y2=y1(i-k:i+k,j-w:j+w);
g2=g1(i-k:i+k,j-w:j+w);
ll=1./sqrt((x2-x).^2+(y2-y).^2);
ll(isinf(ll))=0;
nn=ifft2(fft2(g2).*fft2(ll));
N(i-w,j-k)=nn(w+1,k+1);
end
end
czas=toc;
For what it's worth, arrayfun() is just a wrapper for a for loop, so it wouldn't lead to any performance improvements. Also, you probably have a typo in the definition of x2, I'll assume that it depends on x1. Otherwise it would be a superfluous variable. Also, your i<->w/k, j<->k/w pairing seems inconsistent, you should check that as well. Also also, just timing with tic/toc is rarely accurate. When profiling your code, put it in a function and run the timing multiple times, and exclude the variable generation from the timing. Even better: use the built-in profiler.
Disclaimer: this solution will likely not help for your actual problem due to its huge memory need. For your input of 300x300 matrices this works with arrays of size 300x300x100x100, which is usually a no-go. Still, it's here for reference with a smaller input size. I wanted to add a solution based on nlfilter(), but your problem seems to be too convoluted to be able to use that.
As always with vectorization, you can do it faster if you can spare the memory for it. You are trying to work with matrices of size [2*k+1,2*w+1] for each [i,j] index. This calls for 4d arrays, of shape [2*k+1,2*w+1,w,k]. For each element [i,j] you have a matrix with indices [:,:,i,j] to treat together with the corresponding elements of x1 and y1. It also helps that fft2 accepts multidimensional arrays.
Here's what I mean:
tic
x1 = randn(30,30); %// smaller input for tractability
y1 = randn(30,30);
g1 = randn(30,30);
m = size(g1,1);
n = size(g1,2);
w = 1/3*m;
k = 1/3*n;
%// these will be indexed on the fly:
%//x = x1(w+1:2*w,k+1:2*k); %// size [w,k]
%//y = x1(w+1:2*w,k+1:2*k); %// size [w,k]
x2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
y2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
g2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
%// manual definition for now, maybe could be done smarter:
for ii=w+1:2*w %// don't use i and j as variables
for jj=k+1:2*k %// don't use i and j as variables
x2(:,:,ii-w,jj-k) = x1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
y2(:,:,ii-w,jj-k) = y1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
g2(:,:,ii-w,jj-k) = g1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
end
end
%// use bsxfun to operate on [2*k+1,2*w+1,w,k] vs [w,k]-sized arrays
%// need to introduce leading singletons with permute() in the latter
%// in order to have shape [1,1,w,k] compatible with the first array
ll = 1./sqrt(bsxfun(#minus,x2,permute(x1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2 ...
+ bsxfun(#minus,y2,permute(y1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2);
ll(isinf(ll)) = 0;
%// compute fft2, operating on [2*k+1,2*w+1,w,k]
%// will return fft2 for each index in the [w,k] subspace
nn = ifft2(fft2(g2).*fft2(ll));
%// we need nn(w+1,k+1,:,:) which is exactly of size [w,k] as needed
N = reshape(nn(w+1,k+1,:,:),[w,k]); %// quicker than squeeze()
N = real(N); %// this solution leaves an imaginary part of around 1e-12
czas=toc;

Summing blocks of numbers in a vector in matlab

I need to sum consecutive 96 value blocks in a vector of n (in one case 14112) values. The background is that the values are 15-min temperature measurements and I want to average 96 at a time (1 to 96, 96+1 to 2*96 ... n*96+1 to (n+1)*96) to produce a daily average. This could of course be done in a loop stepping 96 but my question is if there is a more efficient way to accomplish this in Matlab.
By using reshape and mean:
data = randn(1,14112); % example data. Row vector
m = 96; % block size. It is assumed that m divides length(data)
result = mean(reshape(data,m,[]));
As #Dan points out, if the number of elements is not a multiple of the block size some padding is necessary. The following code, due to him, does the necessary padding in the last block while keeping the mean of that block. Thanks also to #DennisJaheruddin for his sugggestion not to modifiy original variable:
data = randn(1,14100); % example data. Row vector
m = 96; % block size
n = length(data);
result = mean(reshape([data repmat(mean(data(n-mod(n,m)+1:n)), 1, m - mod(n, m))], m, []));
Here is an alternate way to nicely deal with the problem, it also works if the lenght of the data is not a nice multiple of the window size:
data = randn(1,14112);
w = 96;
N = numel(data);
M = NaN(w,ceil(N/w));
M(1:N) = data;
nanmean(M)
If you don't want to include partial days at the end, use fix instead of ceil.

Resources