I need to sum consecutive 96 value blocks in a vector of n (in one case 14112) values. The background is that the values are 15-min temperature measurements and I want to average 96 at a time (1 to 96, 96+1 to 2*96 ... n*96+1 to (n+1)*96) to produce a daily average. This could of course be done in a loop stepping 96 but my question is if there is a more efficient way to accomplish this in Matlab.
By using reshape and mean:
data = randn(1,14112); % example data. Row vector
m = 96; % block size. It is assumed that m divides length(data)
result = mean(reshape(data,m,[]));
As #Dan points out, if the number of elements is not a multiple of the block size some padding is necessary. The following code, due to him, does the necessary padding in the last block while keeping the mean of that block. Thanks also to #DennisJaheruddin for his sugggestion not to modifiy original variable:
data = randn(1,14100); % example data. Row vector
m = 96; % block size
n = length(data);
result = mean(reshape([data repmat(mean(data(n-mod(n,m)+1:n)), 1, m - mod(n, m))], m, []));
Here is an alternate way to nicely deal with the problem, it also works if the lenght of the data is not a nice multiple of the window size:
data = randn(1,14112);
w = 96;
N = numel(data);
M = NaN(w,ceil(N/w));
M(1:N) = data;
nanmean(M)
If you don't want to include partial days at the end, use fix instead of ceil.
Related
How can I delete all-zero pages from a 3D matrix in a loop?
I have come up with the following code, though it is not 'entirely' correct, if at all. I am using MATLAB 2019b.
%pseudo data
x = zeros(3,2,2);
y = ones(3,2,2);
positions = 2:4;
y(positions) = 0;
xy = cat(3,x,y); %this is a 3x2x4 array; (:,:,1) and (:,:,2) are all zeros,
% (:,:,3) is ones and zeros, and (:,:,4) is all ones
%my aim is to delete the arrays that are entirely zeros i.e. xy(:,:,1) and xy(:,:,2),
%and this is what I have come up with; it doesn't delete the arrays but instead,
%all the ones.
for ii = 1:size(xy,3)
for idx = find(xy(:,:,ii) == 0)
xy(:,:,ii) = strcmp(xy, []);
end
end
Use any to find indices of the slices with at least one non-zero value. Use these indices to extract the required result.
idx = any(any(xy)); % idx = any(xy,[1 2]); for >=R2018b
xy = xy(:,:,idx);
I am unsure what you'd expect your code to do, especially given you're comparing strings in all-numerical arrays. Here's a piece of code which does what you desire:
x = zeros(3,2,2);
y = ones(3,2,2);
positions = 2:4;
y(positions) = 0;
xy = cat(3,x,y);
idx = ones(size(xy,3),1,'logical'); % initialise catching array
for ii = 1:size(xy,3)
if sum(nnz(xy(:,:,ii)),'all')==0 % If the third dimension is all zeros
idx(ii)= false; % exclude it
end
end
xy = xy(:,:,idx); % reindex to get rid of all-zero pages
The trick here is that sum(xy(:,:,ii),'all')==0 is zero iff all elements on the given page (third dimension) are zero. In that case, exclude it from idx. Then, in the last row, simply re-index using logical indexing to retain only pages whit at least one non-zero element.
You can do it even faster, without a loop, using sum(a,[1 2]), i.e. the vectorial-dimension sum:
idx = sum(nnz(xy),[1 2])~=0;
xy = xy(:,:,idx);
I have a vector A of size 7812x1 and would like to calculate the sum of fixed windows of length 21 (so 372 blocks). This should be reiterated, so that the output should return a vector of size 372x1.
I have t=7812, p=372, w=21;
for t=1:p
out = sum(A((t*w-w+1):(t*w)));
end
This code, however, does not work. My idea is that the part ((t*w-w+1):(t*w)) allows for something like a rolling window. The window is of length 21, so there is not really a need to express is with variables, yet I think it keeps some flexibility.
I've seen potentially related questions (such a partial sum of a vector), yet I'm not sure whether this would result the output desired.
Reshape into a matrix so that each block of A is a column, and compute the sum of each colum:
result = sum(reshape(A, w, []), 1);
Following your idea of using a rolling/moving window (requires Matlab 2016a or later):
t = 7812; w = 21; % your parameters
A = rand(t,1); % generate some test data
B = movsum(A,w); % the sum of a moving window with width w
out = B(ceil(w/2):w:end); % get every w'th element
Here is the problem:
data = 1:0.5:(8E6+0.5);
An array of 16 million points, needs to be averaged every 10,000 elements.
Like this:
x = mean(data(1:10000))
But repeated N times, where N depends on the number of elements we average over
range = 10000;
N = ceil(numel(data)/range);
My current method is this:
data(1) = mean(data(1,1:range));
for i = 2:N
data(i) = mean(data(1,range*(i-1):range*i));
end
How can the speed be improved?
N.B: We need to overwrite the original array of data (essentially bin the data and average it)
data = 1:0.5:(8E6-0.5); % Your data, actually 16M-2 elements
N = 1e4; % Amount to average over
tmp = mod(numel(data),N); % find out whether it fits
data = [data nan(1,N-tmp)]; % add NaN if necessary
data2=reshape(data,N,[]); % reshape into a matrix
out = nanmean(data2,1); % get average over the rows, ignoring NaN
Visual confirmation that it works using plot(out)
Note that technically you can't do what you want if mod(numel(data),N) is not equal to 0, since then you'd have a remainder. I elected to average over everything in there, although ignoring the remainder is also an option.
If you're sure mod(numel(data),N) is zero every time, you can leave all that out and reshape directly. I'd not recommend using this though, because if your mod is not 0, this will error out on the reshape:
data = 1:0.5:(8E6+0.5); % 16M elements now
N = 1e4; % Amount to average over
out = sum(reshape(data,N,[]),1)./N; % alternative
This is a bit wasteful, but you can use movmean (which will handle the endpoints the way you want it to) and then subsample the output:
y = movmean(x, [0 9999]);
y = y(1:10000:end);
Even though this is wasteful (you're computing a lot of elements you don't need), it appears to outperform the nanmean approach (at least on my machine).
=====================
There's also the option to just compensate for the extra elements you added:
x = 1:0.5:(8E6-0.5);
K = 1e4;
Npad = ceil(length(x)/K)*K - length(x);
x((end+1):(end+Npad)) = 0;
y = mean(reshape(x, K, []));
y(end) = y(end) * K/(K - Npad);
reshape the data array into a 10000XN matrix, then compute the mean of each column using the mean function.
I have a matrix called V1all which has 1556480 variables in it. All in the first column. I am trying to get the average over every 1024 points. i.e. the average of the first 1024 points, then the second 1024 points and so on. In the end I should have a matrix with 1520 points. I have the following code but I only get one value repeated 1520 times.
V1 = zeros(1520,1);
for jj = 1024:1024:1556480;
V1(1:1520) = mean(V1all(jj-1023:jj));
end
Any idea what I am doing wrong? Regards, Jer
You can do it in one line: reshape into a 1024-row matrix and them apply mean to compute the mean of each column:
V1 = mean(reshape(V1all, 1024, []));
If you really want to use a loop: You are not indexing V1 correctly. Modify your code as follows:
V1 = zeros(1520,1);
for n = 1:1520;
jj = 1024*n;
V1(n) = mean(V1all(jj-1023:jj));
end
I want to store data coming from for-loops in an array. How can I do that?
sample output:
for x=1:100
for y=1:100
Diff(x,y) = B(x,y)-C(x,y);
if (Diff(x,y) ~= 0)
% I want to store these values of coordinates in array
% and find x-max,x-min,y-max,y-min
fprintf('(%d,%d)\n',x,y);
end
end
end
Can anybody please tell me how can i do that. Thanks
Marry
So you want lists of the x and y (or row and column) coordinates at which B and C are different. I assume B and C are matrices. First, you should vectorize your code to get rid of the loops, and second, use the find() function:
Diff = B - C; % vectorized, loops over indices automatically
[list_x, list_y] = find(Diff~=0);
% finds the row and column indices at which Diff~=0 is true
Or, even shorter,
[list_x, list_y] = find(B~=C);
Remember that the first index in matlab is the row of the matrix, and the second index is the column; if you tried to visualize your matrices B or C or Diff by using imagesc, say, what you're calling the X coordinate would actually be displayed in the vertical direction, and what you're calling the Y coordinate would be displayed in the horizontal direction. To be a little more clear, you could say instead
[list_rows, list_cols] = find(B~=C);
To then find the maximum and minimum, use
maxrow = max(list_rows);
minrow = min(list_rows);
and likewise for list_cols.
If B(x,y) and C(x,y) are functions that accept matrix input, then instead of the double-for loop you can do
[x,y] = meshgrid(1:100);
Diff = B(x,y)-C(x,y);
mins = min(Diff);
maxs = max(Diff);
min_x = mins(1); min_y = mins(2);
max_x = maxs(1); max_y = maxs(2);
If B and C are just matrices holding data, then you can do
Diff = B-C;
But really, I need more detail before I can answer this completely.
So: are B and C functions, matrices? You want to find min_x, max_x, but in the example you give that's just 1 and 100, respectively, so...what do you mean?