Ignore NaN when detrending 3-d array - arrays

I'm using Matlab 2016a; I'm attempting to detrend a 3-dimensional array along the third dimension, but where there are missing values. It is critical that the values stay in the same positions in the array since the position relates to a geographic location.
In this image, imagine that Page 2 has NaN at random locations but that Page 1 and Page 3 have complete data. Detrending along the 3rd dimension, some vectors will have three data points and some will have two. I need to be able to detrend along the third dimension using all available values. If I were to look at the values for the detrended Page 1 or Page 3, there should be no missing values (since there are always either 2 or 3 data points to use), but Page 2 would have NaN placeholders in the location where the NaN was located.
My question is: how can I detrend along the third dimension while ignoring NaN?
I've attempted using detrend3 (found on the Matlab file exchange: https://www.mathworks.com/matlabcentral/fileexchange/61328-detrend3?focused=7203929&tab=function), which works perfectly when detrending 3-d arrays with no missing values.
Detrending with NaN present produces an error. I've tried ignoring NaN and also setting NaN to -9999 and then ignoring that number, but have been unable to get these efforts to work.
Any guidance about what direction to go would be greatly appreciated.

function detrended = detrendNaN3(A,t)
%DETRENDNAN3 Detrends a matrix with NaNs into the third dimension
% Input Arguments:
% - A: NxMxK matrix
% - t: 1xK time vector
% time to same format as A
t = bsxfun(#times,permute(t,[3 1 2]),ones(size(A)));
% where A == Nan, -> t = NaN
t(isnan(A)) = NaN;
%mean of time each pixel
xm = nanmean(t,3);
% mean of every pixel in A
ym = nanmean(A,3);
% calculate slope using least squares for every pixel
a = nansum(bsxfun(#times,bsxfun(#minus,t,xm),bsxfun(#minus,A,ym)),3)./nansum(bsxfun(#minus,t,xm).^2,3);
% calculate intercept for every pixel
b = ym - a.*xm;
% calculate trend for every pixel
trend = bsxfun(#plus,b,bsxfun(#times,a,t));
% remove trend
detrended = A-trend;
end
Even tough the function is fully vectorised it could be written a bit faster - but it's currently very readable and with a 2500x1700x100 matrix it takes about 8 seconds which I deem acceptable.
An updated version is maintained at the file exchange.

Related

How to count for 2 different arrays how many times the elements are repeated, in MATLAB?

I have array A (44x1) and B (41x1), and I want to count for both arrays how many times the elements are repeated. And if the repeated values are present in both arrays, I want their counting to be divided (for instance: value 0.5 appears 500 times in A and 350 times in B, so now divide 500 by 350).
I have to do this for bigger arrays as well, so I was thinking about using a looping (but no idea how to do it on MATLAB).
I got what I want on python:
import pandas as pd
data1 = pd.read_excel('C:/Users/Desktop/Python/data1.xlsx')
data2 = pd.read_excel('C:/Users/Desktop/Python/data2.xlsx')
for i in data1['Mag'].value_counts() & data2['Mag'].value_counts():
a = data1['Mag'].value_counts()/data2['Mag'].value_counts()
print(a)
break
Any idea of how to do the same on MATLAB? Thanks!
Since you can enumerate all valid earthquake magnitude values, you could use:
% Make up some data
A=randi([2 58],[100 1])/10;
B=randi([2 58],[20 1])/10;
% Round data to nearest tenth
%A=round(A,1); %uncomment if necessary
%B=round(B,1); %same
% Divide frequencies
validmags=0.2:0.1:5.8;
Afreqs=sum(double( abs(A-validmags)<1e-6 ),1); %relies on implicit expansion; A must be a column vector and validmags must be a row vector; dimension argument to sum() only to remind user; double() not really needed
Bfreqs=sum(double( abs(B-validmags)<1e-6 ),1); %same
Bfreqs./Afreqs, %for a fancier version: [{'Magnitude'} num2cell(validmags) ; {'Freq(B)/Freq(A)'} num2cell(Bfreqs./Afreqs)].'
The last line will produce NaN for 0/0, +Inf for nn/0, and 0 for 0/nn.
You could also use uniquetol, align the unique values of each vector, and divide the respective absolute frequencies. But I think the above approach is cleaner and easier to understand.

Reshape a 3D array and remove missing values

I have an NxMxT array where each element of the array is a grid of Earth. If the grid is over the ocean, then the value is 999. If the grid is over land, it contains an observed value. N is longitude, M is latitude, and T is months.
In particular, I have an array called tmp60 for the ten years 1960 through 1969, so 120 months for each grid.
To test what the global mean in January 1960 was, I write:
tmpJan60=tmp60(:,:,1);
tmpJan60(tmpJan60(:,:)>200)=NaN;
nanmean(nanmean(tmpJan60))
which gives me 5.855.
I am confused about the reshape function. I thought the following code should yield the same average, namely 5.855, but it does not:
load tmp60
N1=size(tmp60,1)
N2=size(tmp60,2)
N3=size(tmp60,3)
reshtmp60 = reshape(tmp60, N1*N2,N3);
reshtmp60( reshtmp60(:,1)>200,: )=[];
mean(reshtmp60(:,1))
this gives me -1.6265, which is not correct.
I have checked the result in Excel (!) and 5.855 is correct, so I assume I make a mistake in the reshape function.
Ideally, I want a matrix that takes each grid, going first down the N-dimension, and make the 720 rows with 120 columns (each column is a month). These first 720 rows will represent one longitude band around Earth for the same latitude. Next, I want to increase the latitude by 1, thus another 720 rows with 120 columns. Ultimately I want to do this for all 360 latitudes.
If longitude and latitude were inputs, say column 1 and 2, then the matrix should look like this:
temp = [-179.75 -89.75 -1 2 ...
-179.25 -89.75 2 4 ...
...
179.75 -89.75 5 9 ...
-179.75 -89.25 2 5 ...
-179.25 -89.25 3 4 ...
...
-179.75 89.75 2 3 ...
...
179.75 89.75 6 9 ...]
So temp(:,3) should be all January 1960 observations.
One way to do this is:
grid1 = tmp60(1,1,:);
g1 = reshape(grid1, [1,120]);
grid2 = tmp60(2,1,:);
g2 = reshape(grid2,[1,120]);
g = [g1;g2];
But obviously very cumbersome.
I am not able to automate this procedure for the N*M elements, so comments are appreciated!
A link to the file tmp60.mat
The main problem in your code is treating the nans. Observe the following example:
a = randi(10,6);
a(a>7)=nan
m = [mean(a(:),'omitnan') mean(mean(a,'omitnan'),'omitnan')]
m =
3.8421 3.6806
Both elements in m are simply the mean on all elements in a. But they are different! The reason is the taking the mean of all values together, with mean(a(:),'omitnan') is like summing all not-nan values, and divide by the number of values we summed:
sum(a(:),'omitnan')/sum(~isnan(a(:)))==mean(a(:),'omitnan') % this is true
but taking the mean of the first dimension, we get 6 mean values:
sum(a,'omitnan')./sum(~isnan(a))==mean(a,'omitnan') % this is also true
and when we take the mean of them we divide by a larger number, because all nans were omitted already:
mean(sum(a,'omitnan')./sum(~isnan(a)))==mean(a(:),'omitnan') % this is false
Here is what I think you want in your code:
% this is exactly as your first test:
tmpJan60=tmn60(:,:,1);
tmpJan60(tmpJan60>200) = nan;
m1 = mean(mean(tmpJan60,'omitnan'),'omitnan')
% this creates the matrix as you want it:
result = reshape(permute(tmn60,[3 1 2]),120,[]).';
result(result>200) = nan;
r = reshape(result(:,1),720,360);
m2 = mean(mean(r,'omitnan'),'omitnan')
isequal(m1,m2)
To create the matrix you first permute the dimensions so the one you want to keep as is (time) will be the first. Then reshape the array to Tx(lon*lat), so you get 120 rows for all time steps and 259200 columns for all combinations of the coordinates. All that's left is to transpose it.
m1 is your first calculation, and m2 is what you try to do in the second one. They are equal here, but their value is not 5.855, even if I use your code.
However, I think the right solution will be to take the mean of all values together:
mean(result(:,1),'omitnan')

Draw imaginary numbers in matlab

i am trying to learn matlab.
I am trying to make a program that draw these imaginary numbers: ("," = decimal number)
and determine what of the 500 numbers that is closest the real axis.
And i need a little guidance.
What do i have to do to solve this task?
I was thinking about making a loop where all the "values" get stored in a array:
[code]
n= 1
while n < 500
value=1+0.1^n;
disp(value)
n=n+1[/code]
(seems like value is printing wrong values? and how to store in a array?)
And then somehow determine what number that is nearest the real axis and then display the value.
would be really grateful if someone could help me.
thanks in advance.
MATLAB creates imaginary numbers by appending an i or j term with the number. For example, if you wanted to create an imaginary number such that the real component was 1 and the imaginary component was 1, you would simply do:
>> A = 1 + i
A =
1.0000 + 1.0000i
You can see that there is a distinct real component as well as an imaginary component and is stored in A. Similarly, if you want to make the imaginary component have anything other than 1, you would need to add a constant in front of the i (or j). Something like:
>> A = 3 + 6i
A =
3.0000 + 6.0000i
Therefore, for your task, you simply need to create a vector of n between 1 to 500, input this into the above equation, then plot the resulting imaginary numbers. In this case, you would plot the real component on the x axis and the imaginary component on the y axis. Something like:
>> n = 1 : 500;
>> A = (1 + 0.1i).^n;
>> plot(real(A), imag(A));
real and imag are functions in MATLAB that access the real and imaginary components of complex numbers stored in arrays, matrices or single values. As noted by knedlsepp, you can simply plot the array itself as plot can handle complex-valued arrays:
>> plot(A);
Nice picture btw! Be mindful of the . operator appended with the ^ operator. The . means an element-wise operation. This means that we wish to apply the power operation for each value of n from 1 to 500 with 1 + 0.1i as the base. The result would be a 500 element array with the resulting calculations. If we did ^ by itself, we would be expecting to perform a matrix power operation, when this is not the case.
The values that you want to analyze for each value of n being applied to the equation in your post are stored in A. We then plot the real and imaginary components on the graph. Now if you want to find which numbers are closest to the real axis, you simply need to find the smallest absolute imaginary component of the numbers stored in A, then search for all of those numbers that share this number.
>> min_dist = min(abs(imag(A)));
>> vals = A(abs(imag(A)) == min_dist)
vals =
1.3681 - 0.0056i
This means that the value of 1.3681 - 0.0056i is the closest to the real axis.

How to determine the most repeated values into a interval of a vector array matlab

this is my question:
I want to know which and how many times is a value repeated in a interval of a vector array, I know that many people will tell me that use "hist", but I did it and the results isn't exact enough, let me show you in a picture my problem:
In the past picture, you can see in blue the "Data"; and I have used 3 kinds of values: 1st "Mode", 2nd "Mean" and finally "Most repeated value in Histogram" which means that I used something like [a,b]=hist(Data), then Mayor Value = b(a==max(a)) and is very important to do NOT use a predefined range; but this picture doesn't represent the most repeted values, so let me show you another pic, which is a closer view of the data:
That blue "Data", which vary between (0-0.5)E-5 approximately is the interval that I need to obtain, but as you can see, the others three values are not close enough. And "mode" value is just "0". I hope that you can help me to solve this problem, thanks by the way!.
Ok to be more clear, I add this new pic:
What exactly I'm looking for is to get an interval, like in this example I wrote manually 0.1 - 0.4 E-4 (in purple), so the function will say:
[A,B]=magicfunction(Data);
A=[0.1E-4 0.4E-4]; B=[123];
Where B=123 means the amount of data contained in that interval, as you can see I just ingress vector "Data", nothing else.
In the next link you can get the "Data":
https://drive.google.com/file/d/0B4WGV21GqSL5Vk0tRUdLNk5XVnc/edit?usp=sharing
isn't taking the max of a hist in a range what you want? you almost got it, you just didn't define the bins well. For example:
range=4750:5050;
[counts val]=hist(data(range),unique(data(range)));
most_repeated _value_in_range=val(counts==max(counts));
Edit:
Following the clarification, what you want is a statistical bound regarding the histogram width around it's maximum (most frequent value) , here's a solution:
[c, v]=hist(data,linspace(min(data),max(data),num_of_bins));
range=find(c>1/exp(1)*max(c)); % can be also c>0.5*max(c) etc...
A=[v(range(1)) v(range(end))];
B=sum(c(range));
Let's test with some fake data:
t=linspace(-50,50,1e3);
data=0.3*exp(-(t-30).^2)+0.2*exp(-(t-10).^2)+0.3*exp(-(t+10).^2)+0.01*randn(1,numel(t));
[c, v]=hist(data,linspace(min(data),max(data),numel(t)));
range=find(c>1/exp(1)*max(c));
A=[v(range(1)) v(range(end))];
B=sum(c(range));
plot(t,data,'b'); hold on
plot([min(t) max(t)],[A(1) A(1)] ,'--r');
plot([min(t) max(t)],[A(2) A(2)] ,'--r');
B
B =
518
Of course you can change the definition of "width" of the histogram, I took 1/e to 1/e you can take full width at half max (c>0.5*max(c)), or narrower according to the type of data used, etc...
The function below is designed based on several assumptions:
The "interval" of interest is close to 0.
The majority of the samples are small.
The basic idea is to first filter out the samples that are too big, and then define the interval based on the sorted array of the remaining samples.
function [A, B] = magicfunction(data)
% Assuming the outlier samples only exist in the positive side, some
% samples of big, positive values can be excluded in order to obtain a
% better estimation of "the interval". Here we exclude the
% samples that are greater than mean(A)+K1*std(A), where K1 is empirically
% selected as 1.0
K1 = 1.0;
filtered_data = data( data < mean(data)+K1*std(data));
sorted_data = sort(filtered_data);
% Define the interval in terms of the percentile in the
% sorted_data. Here the interval is empirically selected as [0, 0.75]
interval = [0 0.75];
% Map the percentile interval to the actual index in sorted_data.
% Note that interval_index(1) cannot be smaller than 1, and
% interval_index(2) cannot be greater than length(sorted_data)
interval_index = round( length(sorted_data)*interval );
interval_index(1) = max(1, interval_index(1));
interval_index(2) = min(length(sorted_data), interval_index(2));
% Assign output A in terms of the value in the sorted_data
A = sorted_data(interval_index)
% Assign output B
B = sum( data>A(1) & data<A(2) )
% Visualization
x = [1:length(data)];
figure;
subplot(211);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
xlim([1 20000]);
subplot(212);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
ylim([0, 3*10^-5]);
xlim([1 20000]);
Feeding the data provided in your question into the function yields the following plot:
You may want to empirically tune the two variables in the function to obtain the desired result.
K1
interval

Create a matrix with a changing number of columns

I'm trying to do an homemade version of peakfinder.m, by making it work with multiple arrays instead of just one at a time, for more time efficient performance. (http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder)
I have a 2D matrix where I need to find if the sign changes in the 2nd dimension.
dx0 = diff(x0,1,2); % Find derivative
dx0(dx0 == 0) = -eps; % This is so we find the first of repeated values
ind = find(dx0(:,1:end-1).*(dx0(:,2:end)) < 0)+1; % Find where the derivative changes sign
Now my problem is that it does find where the derivative changes sign, but it is one big vector. So if the signs changes twice in the same row (or doesn't in a row), I have no way to find out.
So if x0 is of size 1000x10, I'd like ind to be of size 1000xY, where Y is the number of times it changes sign in EACH row. I also need to know at which values of x0 there is a sign change. So each row will be in the style of :
2 4 7
4 8
2 5 6 8
etc.
Is this possible at all? Or should I change the code so it places a 0 if it doesn't change and a 1 if it does change, considering I'll be working with the values where it changes?
cellfun approach -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
The above code assumes you have padarray which seems like a recent addition to MATLAB's Image Processing Toolbox. So, if you don't have it, you can concatenate zeros (with false) like this -
b1 = sign(dx0(:,1:end-1))~=sign(dx0(:,2:end))
b1 = [false(size(b1,1),1) b1]
out = cellfun(#find,mat2cell(b1,ones(1,size(b1,1)),size(b1,2)),'uni',0)
Alternative solution using cellfun with nonzeros function -
b1 = padarray(sign(dx0(:,1:end-1))~=sign(dx0(:,2:end)),[0 1],'pre')
out = cellfun(#nonzeros,mat2cell(bsxfun(#times,b1,1:size(b1,2)),ones(1,size(b1,1)),size(b1,2)),'uni',0)
out contains the locations of sign change across the rows, which can be displayed using celldisp(out).
The counts of the sign changes can be calculated using -
counts = cellfun(#numel,out)

Resources