Converting dataset to numeric array changes value in Matlab - arrays

I would be grateful for any insight on a quirk I'm trying to figure out in Matlab (I'm more used to R). I've looked through the help pages and googled, but I can't find this exact problem.
I am working with data comprised of climate variables from several years. I converted my numerical array to a dataset because I wanted to calculate means based on different categories of data.
% Make matrix then a dataset out of column vectors
data = [Year MO DD HR MM SS DecimalDate T_21m RH_21m P_bar_12m ws_21m wd_21m ustar_21m z_L_21m precip_mm Td_21m vpd wet_b T_soil T_bole_pi T_bole_fi T_bole_sp Rppfd_in_ Rppfd_out Rnet_25m_ Rsw_in_25 Rsw_out_2 Rlw_in_25 Rlw_out_2 T_2m T_8m RH_2m RH_8m h2o_soil co2_21m q];
header = {'Year', 'MO', 'DD', 'HR', 'MM', 'SS', 'DecimalDate', 'T_21m', 'RH_21m', 'P_bar_12m', 'ws_21m', 'wd_21m', 'ustar_21m', 'z_L_21m', 'precip_mm', 'Td_21m', 'vpd', 'wet_b', 'T_soil', 'T_bole_pi', 'T_bole_fi', 'T_bole_sp', 'Rppfd_in_', 'Rppfd_out', 'Rnet_25m_', 'Rsw_in_25', 'Rsw_out_2', 'Rlw_in_25', 'Rlw_out_2','T_2m', 'T_8m', 'RH_2m', 'RH_8m', 'h2o_soil', 'co2_21m', 'q'};
dataset1 = dataset({data,header{:}});
Here's what the first few rows look like to give you an idea of the dataset:
dataset1(1:5,:)
ans =
Year MO DD HR MM SS DecimalDate T_21m RH_21m P_bar_12m ws_21m wd_21m ustar_21m z_L_21m
1998 11 1 0 15 0 305.01 1.9 86.9 70.27 NaN 279.8 NaN NaN
1998 11 1 0 45 0 305.03 1.9 86.9 70.27 NaN 279.8 NaN NaN
1998 11 1 1 15 0 305.05 2.03 86.9 70.27 NaN 108.2 NaN NaN
1998 11 1 1 45 0 305.07 2.03 86.9 70.27 NaN 108.2 NaN NaN
1998 11 1 2 15 0 305.09 1.75 87 70.27 NaN 255.7 NaN NaN
precip_mm Td_21m vpd wet_b T_soil T_bole_pi T_bole_fi T_bole_sp Rppfd_in_ Rppfd_out Rnet_25m_
0 4.47 NaN NaN NaN NaN NaN NaN 0 NaN -5.8
0 4.47 NaN NaN NaN NaN NaN NaN 0 NaN -5.8
0 4.61 NaN NaN NaN NaN NaN NaN 0 NaN -6.2
0 4.61 NaN NaN NaN NaN NaN NaN 0 NaN -6.2
0 4.33 NaN NaN NaN NaN NaN NaN 0 NaN -6.6
Rsw_in_25 Rsw_out_2 Rlw_in_25 Rlw_out_2 T_2m T_8m RH_2m RH_8m h2o_soil co2_21m q quantum
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 night
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 night
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 night
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 night
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 night
YearOrd Day YearNom
1998 305 1998
1998 305 1998
1998 305 1998
1998 305 1998
1998 305 1998
Next I added several columns that were ordinal so that I could use them as categories. Here is an example of the code I used:
% Make a new data column based on year that is ordinal
y = min(Year(:));
Y = max(Year(:));
labels2 = num2str((y:Y)');
edges = y:Y+1;
dataset1.YearOrd = ordinal(dataset1.Year,labels2,[],edges);
Next I used the categories to calculate means, as follows:
statmean = grpstats(dataset1,{'YearOrd','Day','quantum'},'mean','DataVars',{'T_21m', 'RH_21m', 'P_bar_12m'});
And here is what the output looks like (notice how year is the first column):
ans =
YearOrd Day quantum GroupCount mean_T_21m mean_RH_21m mean_P_bar_12m
1998_305_night 1998 305 night 28 1.9579 87.067 70.151
1998_305_day 1998 305 day 20 3.646 86.587 70.166
1998_306_night 1998 306 night 28 0.76357 87.249 69.781
1998_306_day 1998 306 day 20 2.258 86.669 69.668
1998_307_night 1998 307 night 28 -2.735 80.785 69.862
Now comes the problem. I want to be able to do further calculations with these means (e.g. divide all values by a number), and it seems that this is not permitted in Matlab's dataset format. My solution was to convert the dataset 'statmean' into a numberical array using the 'double' command, as follows:
statTest = double(statmean);
However, this conversion from dataset to numeric array changes the values in my 'Year' column. I printed out the first few rows of the numeric array to show this. I suspect it has something to do with the levels in the previously ordinal Year column since 1998 was the first year. However, I can't find information on how to change it. Strangely, day of year (the second column) went through the transformation from ordinal to numeric correctly. For the year, I know I could just add 1997, but I want to understand what is happening so I don't accidentally change other values when converting between numeric arrays and datasets. Many thanks to all.
statTest(1:5,:)
ans =
1.0000 305.0000 1.0000 28.0000 1.9579 87.0671 70.1507
1.0000 305.0000 2.0000 20.0000 3.6460 86.5870 70.1660
1.0000 306.0000 1.0000 28.0000 0.7636 87.2493 69.7814
1.0000 306.0000 2.0000 20.0000 2.2580 86.6690 69.6680
1.0000 307.0000 1.0000 28.0000 -2.7350 80.7850 69.8621

Related

Storing values in matrices with different dimensions in matlab

My RHSvec is a 51X21 matrix. kdpolind is 11X51X21. Doing the following:
[RHSval,kprimeind] = max(RHSvec,[],2);
gives me a 51X1 RHSval and a 51X1 kprimeind.
if kprimeind is as follows:
16
20
20
16
20
16
16
then I want to store in kprimeind in kdpolind as
kdpolind(act,1,16)
kdpolind(act,2,20)
kdpolind(act,3,20)
kdpolind(act,4,16)
...
I am unable to do this due to dimensions mismatch. Is there a simple way of doing this?
Thanks!
If I understand you correctly, you want something like this:
An example of how to insert a matrix of a different size into another matrix
sub = randn(2,3); % Will give a random matrix of 2 rows and 3 columns
M = nan(3,4,5); % Creates a nan matrix of 3 by 4 by 5
M(2,2+(1:size(sub,1)),2+(1:size(sub,2))) = sub % Inserts the sub matrix into M with an offset of 2 (can be set to 0 for no offset)
will give:
M(:,:,1) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
M(:,:,2) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
M(:,:,3) =
NaN NaN NaN NaN
NaN NaN 0.3252 -0.7549
NaN NaN NaN NaN
M(:,:,4) =
NaN NaN NaN NaN
NaN NaN 1.3703 -1.7115
NaN NaN NaN NaN
M(:,:,5) =
NaN NaN NaN NaN
NaN NaN -0.1022 -0.2414
NaN NaN NaN NaN

Indexing into matrix with logical array

I have a matrix A, which is m x n. What I want to do is count the number of NaN elements in a row. If the number of NaN elements is greater than or equal to some arbitrary threshold, then all the values in that row will set to NaN.
num_obs = sum(isnan(rets), 2);
index = num_obs >= min_obs;
Like I say I am struggling to get my brain to work. Being trying different variations of the line below but no luck.
rets(index==0, :) = rets(index==0, :) .* NaN;
The Example data for threshold >= 1 is:
A = [-7 -8 1.6 11.9;
NaN NaN NaN NaN;
5.5 6.3 2.1 NaN;
5.5 4.2 2.2 5.6;
NaN NaN NaN NaN];
and the result I want is:
A = [-7 -8 1.6 11.9;
NaN NaN NaN NaN;
NaN NaN NaN NaN;
5.5 4.2 2.2 5.6;
NaN NaN NaN NaN];
Use
A = magic(4);A(3,3)=nan;
threshold=1;
for ii = 1:size(A,1) % loop over rows
if sum(isnan(A(ii,:)))>=threshold % get the nans, sum the occurances
A(ii,:)=nan(1,size(A,2)); % fill the row with column width amount of nans
end
end
Results in
A =
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
4 14 15 1
Or, as #Obchardon mentioned in his comment you can vectorise:
A(sum(isnan(A),2)>=threshold,:) = NaN
A =
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
4 14 15 1
As a side-note you can easily change this to columns, simply do all indexing for the other dimension:
A(:,sum(isnan(A),1)>=threshold) = NaN;
Instead of isnan function, you can use A ~= A for extracting NaN elements.
A(sum((A ~= A),2) >= t,:) = NaN
where t is the threshold for the minimum number of existing NaN elements.

Assign rows and column values to a NaN matrix in specific locations

I have a NaN (155*135) matrix, and another matrix showing a specific value with row and column numbers. Is there a way that I can assign these values back to the NaN matrix eventually having the same location and everything else remaining as NaN?
R C Value
19 4 -1133.803
20 4 -295.6810
32 4 -1906.021
20 5 -1027.048
21 5 -293.0065
32 5 236.0525
33 5 -425.1248
Use sub2ind:
data = [
% R C Value
19 4 -1133.803
20 4 -295.6810
32 4 -1906.021
20 5 -1027.048
21 5 -293.0065
32 5 236.0525
33 5 -425.1248];
N = nan(155,135);
N(sub2ind(size(N),data(:,1),data(:,2))) = data(:,3);
So you get for N(min(data(:,1)):max(data(:,1)),min(data(:,2)):max(data(:,2))) (i.e. N(19:32,4:5)):
ans =
-1133.8 NaN
-295.68 -1027
NaN -293.01
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
-1906 236.05
NaN -425.12
You can use accumarray:
result = accumarray([R C] , Value,[155,135],[],NaN)
Note: R and C assumed to be column vectors

Elementwise comparison of two vectors while ignoring all NaN's in between

I have two vectors 1x5000. They consist of numbers like this:
vec1 = [NaN NaN 2 NaN NaN NaN 5 NaN 8 NaN NaN 7 NaN 5 NaN 3 NaN 4]
vec2 = [NaN 2 NaN NaN 5 NaN NaN NaN 8 NaN 1 NaN NaN NaN 5 NaN NaN NaN]
I would like to check if the order of the numbers are equal, independent of the NaNs. But I do not want to remove the NaNs (Not-a-Number) since I will use them later. So now I create a new vector and call it results. Once they come in the same order, it is correct and we fill results with 1. If the next numbers are not equal we add 0 to results.
An example results would look like this for vec1 and vec2:
[1 1 1 0 1 0 0]
The first 3 numbers are the same, then 7 is compared to 1 which gives 0, then 5 compared to 5 is true which gives 1. Then the last two numbers are missing which gives 0.
One reason that I don't want to remove the NaNs is that I have a time vector 1x500 and somehow I want to get the time for each 1 and 0 (in a new vector). Is that possible too?
Help is super appreciated!
This is how I would do it:
temp1 = vec1(~isnan(vec1));
temp2 = vec2(~isnan(vec2));
m = min(numel(temp1), numel(temp2));
M = max(numel(temp1), numel(temp2));
results = [(temp1(1:m) == temp2(1:m)), false(1,M-m)];
Note that here results is a binary array. If you need it numeric, you can convert it to double for instance.
Regarding your concern about NaNs, depends on what you want to do with your arrays. If you are going to process them, it is more convenient to remove the NaNs. In order to keep the track of things you can keep the index of the kept elements:
id1 = find(~isnan(vec1));
vec1 = vec1(id1);
vec1 =
2 5 8 7 5 3 4
id1 =
3 7 9 12 14 16 18
% and same for vec2
If you decide to remove the NaNs, the solution will be the same, with all temps replaced with vec.
This would be my solution, using a mix of logical indexing and the find function. Returning the timestamps for the 1's and 0's is actually more tedious than finding the 1's and 0's.
vec1 = [NaN NaN 2 NaN NaN NaN 5 NaN 8 NaN NaN 7 NaN 5 NaN 3 NaN 4];
vec2 = [NaN 2 NaN NaN 5 NaN NaN NaN 8 NaN 1 NaN NaN NaN 5 NaN NaN NaN];
t=1:numel(vec1);
ind1=find(~isnan(vec1));
ind2=find(~isnan(vec2));
v1=vec1(ind1);
v2=vec2(ind2);
if length(v1)>length(v2)
ibig=1;
else
ibig=2;
end
n=min(length(v1),length(v2));
N=max(length(v1),length(v2));
v=false(1,N);
v(1:n)=v1(1:n)==v2(1:n);
t_ones1=t(ind1(v));
t_ones2=t(ind2(v));
if ibig==1
t_zeros1=t(ind1(~v));
t_zeros2=t(ind2(~v(1:n)));
else
t_zeros1=t(ind1(~v(1:n)));
t_zeros2=t(ind2(~v));
end

Simplifying for-loop

I have a data :
minval = NaN 7 8 9 9 9 10 10 10 10
NaN NaN 10 10 10 10 10 10 10 10
NaN NaN NaN 10 10 9 10 10 10 9
NaN NaN NaN NaN 9 9 10 9 10 10
NaN NaN NaN NaN NaN 9 10 10 10 10
NaN NaN NaN NaN NaN NaN 10 11 10 10
NaN NaN NaN NaN NaN NaN NaN 10 10 10
NaN NaN NaN NaN NaN NaN NaN NaN 10 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
and I do this following :
C=size(minval,2);
for e=2:C
D1(1,e)=minval(1,e);
end
D1(D1 == 0) = nan;
for e=3:C
for b=2:e-1
D2(b,e)= minval(b,e)+D1(1,b-1);
D2(D2 == 0) = nan;
[D1(2,e), idx_bt(1,e)]=min(nonzeros(D2(:,e)));
end
end
D1(D1 == 0) = nan;
for e=4:C
for b=3:e-1
D3(b,e)= minval(b,e)+D1(2,b-1);
D3(D3 == 0) = nan;
[D1(3,e), idx_bt(2,e)]=min(nonzeros(D3(:,e)));
end
end
D1(D1 == 0) = nan;
It works well, it gives me a right answer like this :
D1 = NaN 7 8 9 9 9 10 10 10 10
NaN NaN NaN 17 17 16 17 17 17 16
NaN NaN NaN NaN NaN 26 27 26 26 26
and
idx_bt = 0 2 3 4 5 6 7 8 9 10
0 0 1 3 3 3 3 3 3 3
I guess there's a trick to make this code more simple and faster. Is there any help? Thank you.
Crux of the following code revolves around bsxfun, which is supposedly one of the ways to vectorize codes.
Code
%%// Get C
C=size(minval,2);
%%// Declare variables to store required outputs
D1 = NaN(3,C);
idx_bt = zeros(2,C);
%%// --------- STAGE 0 -------------------------
D1(1,2:end) = minval(1,2:C);
%%// --------- STAGE 1 -------------------------
ft1 = bsxfun(#plus,minval(2:C-1,3:C),D1(1,1:C-2)');%%//'
ft1 = [zeros(1,size(ft1,2)) ;ft1];
ft1(ft1==0) = NaN;
D2 = ft1;
[D1(2,3:end) ,idx_bt(1,3:end)] = nanmin(D2);
%%// Probably do not need this given your data, but if you have zeros
%%// alongwith the NaNs and if you are looking to replace
%%// those zeros with NaNs you might. So, it all depends on your data.
%%// This could be looked after later on in the code as well.%%//'
D1(D1 == 0) = NaN;
%%// --------- STAGE 2 -------------------------
ft11 = bsxfun(#plus,minval(3:C-1,4:C),D1(2,2:C-2)');%%//'
ft11 = [zeros(2,size(ft11,2)) ;ft11];
ft11(ft11==0) = NaN;
D3 = ft11;
[D1(3,4:end) ,idx_bt(2,4:end)] = nanmin(D3);
D1(D1 == 0) = NaN;
Output
D1 =
NaN 7 8 9 9 9 10 10 10 10
NaN NaN NaN 17 17 16 17 17 17 16
NaN NaN NaN NaN NaN 26 27 26 26 26
idx_bt =
0 0 1 3 3 3 3 3 3 3
0 0 0 1 1 5 5 7 7 7

Resources