Reshape a 3D array and remove missing values - arrays

I have an NxMxT array where each element of the array is a grid of Earth. If the grid is over the ocean, then the value is 999. If the grid is over land, it contains an observed value. N is longitude, M is latitude, and T is months.
In particular, I have an array called tmp60 for the ten years 1960 through 1969, so 120 months for each grid.
To test what the global mean in January 1960 was, I write:
tmpJan60=tmp60(:,:,1);
tmpJan60(tmpJan60(:,:)>200)=NaN;
nanmean(nanmean(tmpJan60))
which gives me 5.855.
I am confused about the reshape function. I thought the following code should yield the same average, namely 5.855, but it does not:
load tmp60
N1=size(tmp60,1)
N2=size(tmp60,2)
N3=size(tmp60,3)
reshtmp60 = reshape(tmp60, N1*N2,N3);
reshtmp60( reshtmp60(:,1)>200,: )=[];
mean(reshtmp60(:,1))
this gives me -1.6265, which is not correct.
I have checked the result in Excel (!) and 5.855 is correct, so I assume I make a mistake in the reshape function.
Ideally, I want a matrix that takes each grid, going first down the N-dimension, and make the 720 rows with 120 columns (each column is a month). These first 720 rows will represent one longitude band around Earth for the same latitude. Next, I want to increase the latitude by 1, thus another 720 rows with 120 columns. Ultimately I want to do this for all 360 latitudes.
If longitude and latitude were inputs, say column 1 and 2, then the matrix should look like this:
temp = [-179.75 -89.75 -1 2 ...
-179.25 -89.75 2 4 ...
...
179.75 -89.75 5 9 ...
-179.75 -89.25 2 5 ...
-179.25 -89.25 3 4 ...
...
-179.75 89.75 2 3 ...
...
179.75 89.75 6 9 ...]
So temp(:,3) should be all January 1960 observations.
One way to do this is:
grid1 = tmp60(1,1,:);
g1 = reshape(grid1, [1,120]);
grid2 = tmp60(2,1,:);
g2 = reshape(grid2,[1,120]);
g = [g1;g2];
But obviously very cumbersome.
I am not able to automate this procedure for the N*M elements, so comments are appreciated!
A link to the file tmp60.mat

The main problem in your code is treating the nans. Observe the following example:
a = randi(10,6);
a(a>7)=nan
m = [mean(a(:),'omitnan') mean(mean(a,'omitnan'),'omitnan')]
m =
3.8421 3.6806
Both elements in m are simply the mean on all elements in a. But they are different! The reason is the taking the mean of all values together, with mean(a(:),'omitnan') is like summing all not-nan values, and divide by the number of values we summed:
sum(a(:),'omitnan')/sum(~isnan(a(:)))==mean(a(:),'omitnan') % this is true
but taking the mean of the first dimension, we get 6 mean values:
sum(a,'omitnan')./sum(~isnan(a))==mean(a,'omitnan') % this is also true
and when we take the mean of them we divide by a larger number, because all nans were omitted already:
mean(sum(a,'omitnan')./sum(~isnan(a)))==mean(a(:),'omitnan') % this is false
Here is what I think you want in your code:
% this is exactly as your first test:
tmpJan60=tmn60(:,:,1);
tmpJan60(tmpJan60>200) = nan;
m1 = mean(mean(tmpJan60,'omitnan'),'omitnan')
% this creates the matrix as you want it:
result = reshape(permute(tmn60,[3 1 2]),120,[]).';
result(result>200) = nan;
r = reshape(result(:,1),720,360);
m2 = mean(mean(r,'omitnan'),'omitnan')
isequal(m1,m2)
To create the matrix you first permute the dimensions so the one you want to keep as is (time) will be the first. Then reshape the array to Tx(lon*lat), so you get 120 rows for all time steps and 259200 columns for all combinations of the coordinates. All that's left is to transpose it.
m1 is your first calculation, and m2 is what you try to do in the second one. They are equal here, but their value is not 5.855, even if I use your code.
However, I think the right solution will be to take the mean of all values together:
mean(result(:,1),'omitnan')

Related

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end
You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

Create a matrix with a changing number of columns

I'm trying to do an homemade version of peakfinder.m, by making it work with multiple arrays instead of just one at a time, for more time efficient performance. (http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder)
I have a 2D matrix where I need to find if the sign changes in the 2nd dimension.
dx0 = diff(x0,1,2); % Find derivative
dx0(dx0 == 0) = -eps; % This is so we find the first of repeated values
ind = find(dx0(:,1:end-1).*(dx0(:,2:end)) < 0)+1; % Find where the derivative changes sign
Now my problem is that it does find where the derivative changes sign, but it is one big vector. So if the signs changes twice in the same row (or doesn't in a row), I have no way to find out.
So if x0 is of size 1000x10, I'd like ind to be of size 1000xY, where Y is the number of times it changes sign in EACH row. I also need to know at which values of x0 there is a sign change. So each row will be in the style of :
2 4 7
4 8
2 5 6 8
etc.
Is this possible at all? Or should I change the code so it places a 0 if it doesn't change and a 1 if it does change, considering I'll be working with the values where it changes?
cellfun approach -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
The above code assumes you have padarray which seems like a recent addition to MATLAB's Image Processing Toolbox. So, if you don't have it, you can concatenate zeros (with false) like this -
b1 = sign(dx0(:,1:end-1))~=sign(dx0(:,2:end))
b1 = [false(size(b1,1),1) b1]
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
Alternative solution using cellfun with nonzeros function -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#nonzeros,mat2cell(bsxfun(#times,b1,1:size(b1,2)),ones(1,size(b1,1)),size(b1,2)),'uni',0)
out contains the locations of sign change across the rows, which can be displayed using celldisp(out).
The counts of the sign changes can be calculated using -
counts = cellfun(#numel,out)

how to get more than one number inside of matrices

Ok. I have a simple question although I'm still fairly new to Matlab (taught myself). So I was wanting a 1x6 matrix to look like this below:
0
0
1
0
321, 12 <--- needs to be in one box in 1x6 matrices
4,30,17,19 <--- needs to be in one box in 1x6 matrices
Is there a possible way to do this or am I going to just have to write them all in separate boxes thus making it a 1x10 matrix?
My code:
event_marker = 0;
event_count = 0;
block_number = 1;
date = [321,12] % (its corresponding variables = 321 and 12)
time = [4,30,17,19] % (its corresponding variable = 4 and 30 and 17 and 19)
So if I understand you correctly, you want an array that contains 6 elements, of which 1 element equals 1, another element is the array [312,12] and the last element is the array [4,30,17,19].
I'll suggest two things to accomplish this: matrices, and cell-arrays.
Cell arrays
In Matlab, a cell array is a container for arbitrary types of data. You define it using curly-braces (as opposed to block braces for matrices). So, for example,
C = {'test', rand(4), {#cos,#sin}}
is something that contains a string (C{1}), a normal matrix (C{2}), and another cell which contains function handles (C{3}).
For your case, you can do this:
C = {0,0,1,0, [321,12], [4,30,17,19]};
or of course,
C = {0, event_marker, event_count, block_number, date, time};
Matrices
Depending on where you use it, a normal matrix might suffice as well:
M = [0 0 0 0
event_marker 0 0 0
event_count 0 0 0
block_number 0 0 0
321 12 0 0
4 30 17 19];
Note that you'll need some padding (meaning, you'll have to add those zeros in the top-right somehow). There's tonnes of ways to do that, but I'll "leave that as an exercise" :)
Again, it all depends on the context which one will be easier.
Consider using cell arrays rather than matrices for your task.
data = cell(6,1); % allocate cell
data{1} = event_marker; % note the curly braces here!
...
data{6} = date; % all elements of date fits into a single cell.
If your date and time variables are actually represent date (numbers of days, months, years) and time (hours, mins, sec), they can be packed into one or two numbers.
Look into DATENUM function. If you have a vector, for example, [2013, 4, 10], representing April 10th of 2013 you can convert it into a serial date:
daten = datenum([2013, 4, 10]);
It's ok if you have number of days in a year, but not months. datenum([2013, 0, 300]) will also work.
The time can be packed together with date or separately:
timen = datenum([0, 0, 0, 4, 30, 17.19]);
or
datetimen = datenum([2013, 4, 10, 4, 30, 17.19]);
Once you have this serial date you can just keep it in one vector with other numbers.
You can convert this number back into either date vector or date string with DATEVEC and DATESTR function.

How to structure a cell to store values in a specific format in Matlab?

I have a code that looks for the best combination between two arrays that are less than a specific value. The code only uses one value from each row of array B at a time.
B =
1 2 3
10 20 30
100 200 300
1000 2000 3000
and the code i'm using is :
B=[1 2 3; 10 20 30 ; 100 200 300 ; 1000 2000 3000];
A=[100; 500; 300 ; 425];
SA = sum(A);
V={}; % number of rows for cell V = num of combinations -- column = 1
n = 1;
for k = 1:length(B)
for idx = nchoosek(1:numel(B), k)'
rows = mod(idx, length(B));
if ~isequal(rows, unique(rows)) %if rows not equal to unique(rows)
continue %combination possibility valid
end %Ignore the combination if there are two elements from the same row
B_subset = B(idx);
if (SA + sum(B_subset) <= 2000) %if sum of A + (combination) < 2000
V(n,1) = {B_subset(:)}; %iterate cell V with possible combinations
n = n + 1;
end
end
end
However, I would like to display results differently than how this code stores them in a cell.
Instead of displaying results in cell V such as :
[1]
[10]
[300]
[10;200]
[1000;30]
[1;10;300]
This is preferred : (each row X column takes a specific position in the cell)
Here, this means that they should be arranged as cell(1,1)={[B(1,x),B(2,y),B(3,z),B(4,w)]}. Where x y z w are the columns with chosen values. So that the displayed output is :
[1;0;0;0]
[0;10;0;0]
[0;0;300;0]
[0;10;200;0]
[0;30;0;1000]
[1;10;300;0]
In each answer, the combination is determined by choosing a value from the 1st to 4th row of matrix B. Each row has 3 columns, and only one value from each row can be chosen at once. However, if for example B(1,2) cannot be used, it will be replaced with a zero. e.g. if row 1 of B cannot be used, then B(1,1:3) will be a single 0. And the result will be [0;x;y;z].
So, if 2 is chosen from the 1st row, and 20 is chosen from the 2nd row, while the 3rd and 4th rows are NOT included, they should show a 0. So the answer would be [2;20;0;0].
If only the 4th row is used (such as 1000 for example), the answer should be [0;0;0;1000]
In summary I want to implement the following :
Each cell contains length(B) values from every row of B (based on the combination)
Each value not used for the combination should be a 0 and printed in the cell
I am currently trying to implement this but my methods are not working .. If you require more info, please let me know.
edit
I have tried to implement the code in the dfb's answer below but having difficulties, please take a look at the answer as it contains half of the solution.
My MATLAB is super rusty, but doesn't something like this do what you need?
arr = zeros(1,len(B))
arr(idx) = B_subset(:)
V(n,1) = {arr}

Resources