I have two cell arrays, named as countryname and export.
There is only one column in countryname, which is the code of the names of countries:
USA
CHN
ABW
There are two columns in export:
USA ABW
USA CHN
CHN USA
ABW USA
Each pair (X,Y) in a row of export means "country X has relation with country Y". The size of countryname has been simplified to 3. How can I achieve the following in MATLAB?
Create a square 3 by 3 (in general n by n, where n is the size of countryname) matrix M such that
M(i,j)=1 if country i has relation with country j
M(i,j)=0 otherwise.
The country names are relabeled as positive integers in countryname.
The first thing you need to do is establish a mapping from the country name to an integer value from 1 to 3. You can do that with a containers.Map where the input is a string and the output is an integer. Therefore, we will assign 'USA' to 1, 'CHN' to 2 and 'ABW' to 3. Assuming that you've initialized the cell arrays like you've mentioned above:
countryname = {'USA', 'CHN', 'ABW'};
export = {'USA', 'ABW'; 'USA', 'CHN'; 'CHN', 'USA'; 'ABW', 'USA'};
... you would create a containers.Map like so:
map = containers.Map(countryname, 1:numel(countryname));
Once you have this, you simply map the country names to integers and you can use the values function to help you do this. However, what will be returned is a cell array of individual elements. We need to unpack the cell array, so you can use cell2mat for that. As such, we can now create a 4 x 2 index matrix where each cell element is converted to a numerical value:
ind = cell2mat(values(map, export));
We thus get:
>> ind
ind =
1 3
1 2
2 1
3 1
Now that we have this, you can use sparse to create the final matrix for you where the first column serves as the row locations and the second column serves as the column locations. These locations will tell you where it will be non-zero in your final matrix. However, this will be a sparse matrix and so you'll need to convert the matrix to full to finally get a numerical matrix.
M = full(sparse(ind(:,1), ind(:,2), 1));
We get:
>> M
M =
0 1 1
1 0 0
1 0 0
As a more convenient representation, you can create a table to display the final matrix. Convert the matrix M to a table using array2table and we can add the row and column names to be the country names themselves:
>> T = array2table(M, 'RowNames', countryname, 'VariableNames', countryname)
T =
USA CHN ABW
___ ___ ___
USA 0 1 1
CHN 1 0 0
ABW 1 0 0
Take note that the above code to create the table only works for MATLAB R2013b and above. If that isn't what you require, just stick with the original numerical matrix M.
This is using basic MATLAB functionalities only. Solution posted above by #rayryeng is surely much more advanced and may be faster to code as well. However, this should also help you in understanding at fundamental level
clear
country={'USA','CHN','ABW'};
export={'USA' 'ABW'; 'USA' 'CHN'; 'CHN' 'USA' ; 'ABW' 'USA'};
M=zeros(length(country));
for i=1:length(country)
c=country(i);
ind_state=strfind(export(:,1),char(c)); % this gives state of every which is 1 or blank.
ind_match=find(not(cellfun('isempty', ind_state))); % extracting only indices which are 1.
exp_match=export(ind_match,2); % find corresponding export rel countries from second column
% useful only when your first ind_match matrix has more than 1 element.
% Like 'USA' appears twice in first column of export countries.
for j=1:length(exp_match)
c=exp_match(j);
ind_state=strfind(country,char(c));
ind_match=find(not(cellfun('isempty', ind_state)));
M(i,ind_match)=1; % Selective make elements of M 1 when there is match.
end
end
M
Related
I have two numpy arrays:
e.g.
np.array_1([
[5,2,0]
[4,3,0]
[4,2,0]
[3,2,1]
[4,1,1]
])
np.array_2([
[5,2,10]
[4,2,52]
[3,2,80]
[1,2,4]
[5,3,6]
])
In np.array_1, 0 and 1 at index 2 represent two different categories. For arguments sake say 0 = Red and 1 = Blue.
So, where the first two elements match in the two numpy arrays, I need to average the third element in np.array_2 by category. For example, [5,2,10] and [4,2,52] both match with category 0 i.e. Red. The code will return the average of the elements at index 2 for the Red category. It will also do the same for the Blue category.
I have no idea where to start with this, any ideas welcome.
You marked your post with Numpy tag due to the type of source arrays,
but it is much easier and intuitive to generate your result using Pandas.
Start from conversion of your both arrays to pandasonic DataFrames.
While converting the first array, convert also 0 and 1 in the last
column to Red and Blue:
import pandas as pd
df1 = pd.DataFrame(array_1, columns=['A', 'B', 'key'])
df1.key.replace({0: 'Red', 1: 'Blue'}, inplace=True)
df2 = pd.DataFrame(array_2, columns=['A', 'B', 'C'])
Then, to generate the result, run:
result = df2.merge(df1, on=['A', 'B']).groupby('key').C.mean().rename('Mean')
The result is:
key
Blue 80
Red 31
Name: Mean, dtype: int32
Details:
df2.merge(df1, on=['A', 'B']) - Generates:
A B C key
0 5 2 10 Red
1 4 2 52 Red
2 3 2 80 Blue
eliminating at the same time rows which don't belong to any group
(are neither Red nor Blue).
groupby('key') - From the above result, generates groups by key
(Red / Blue).
C.mean() - the last step is to take C column (from each group)
and compute its mean.
The result is a Series with:
index - the grouping key,
value - the value computed for the corresponding group.
rename('Mean') - Change the name from the source column name (C)
to a more meaningful Mean.
I am new to array formulae and am having trouble with the following scenario:
I have the following matrix:
F G H I J ... R S T U V
1 0 0 1 1
0 1 1 1 2 3 1 2
2 0 2 3 1 2 0 1 0 0
2 1 0 0 1 0 0 3 0 0
My goal is to count the number of rows within which the difference between the sum of columns F:J and the sum of columns R:V is greater than a threshold. Critically, only rows with full data should be included: row 1 (where there are only values for columns F1:J1) and row 2 (where there are only some values for columns F2:J2) should be ignored.
If the threshold = 2.5, then the solution is 1. That is, row 3 is the only row with complete data where the difference between the sum of F3:J3 (8) and the sum of R3:V3 (3) is greater than 2.5 (e.g., 5 > 2.5).
I have tried to put together the following formula, rather pathetically, based on the teachings of #Tom Sharpe and #QHarr:
=COUNT(IF(SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))-SUBTOTAL(9,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))>2.5,IF(AND(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1),SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1)),SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))),IF(AND(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1),SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1)),SUBTOTAL(9,OFFSET(R1,ROW(R1:V1)-ROW(R1),0,1,COLUMNS(R1:V1))))))
But it seems to always produce a value of 1, even if I edit the matrix such that the difference between the sum of F4:J4 and R4:v4 also exceeds 2.5. Sadly I am struggling to understand why and would appreciate any guidance on the matter.
As an array formula in one cell without volatile functions:
=SUM((MMULT(--(LEN(F2:J5)*LEN(R2:V5)>0),--TRANSPOSE(COLUMN(F2:J2)>0))=5)*(MMULT(F2:J5-R2:V5,TRANSPOSE(--(COLUMN(F2:J2)>0)))>2.5))
should do the trick :D
Maybe, in say X1 (assuming you have labelled your columns):
=COUNTIF(Y:Y,TRUE)
In Y1 whatever your chosen cutoff (eg 2.5) and in Y2:
=((COUNTBLANK(F2:J2)+COUNTBLANK(R2:V2)=0)*SUM(F2:J2)-SUM(R2:V2))>Y$1
copied down to suit.
Try this:
=SUMPRODUCT((MMULT(F1:J4-R1:V4,--(ROW(INDIRECT("1:"&COLUMNS(F1:J4)))>0))>2.5)*(MMULT((LEN(F1:J4)>0)+(LEN(R1:V4)>0),--(ROW(INDIRECT("1:"&COLUMNS(F1:J4)))>0))=(COLUMNS(F1:J4)+COLUMNS(R1:V4))))
I think this will do it, replacing your AND's by multiplies (*):
=SUMPRODUCT(--((SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))-SUBTOTAL(9,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))>2.5)*(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1))*(SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1))>0))
It could be simplified a bit more but a bit short of time.
Just another option...
=IF(NOT(OR(IFERROR(MATCH(TRUE,ISBLANK(F1:J1),0),FALSE),IFERROR(MATCH(TRUE,ISBLANK(R1:V1),0),FALSE))), SUBTOTAL(9,F1:J1)-SUBTOTAL(9,R1:V1), "Missing Value(s)")
My approach was a little different from what you tried to adapt from #TomSharp in that I'm validating the cells have data (not blank) and then perform the calculation, othewise return an error message. This is still an array function call, so when you enter the formulas, press ctrl+shft+enter.
The condition part of the opening if() checks to see that each range's cells are not blank: if a match( true= isblank(cell))
means a cell is blank (bad), if no match ... ie no blank cells, Match will return an #NA "error" (good). False is good = Errors found ? No. ((ie no blank cells))
Then the threshold condition becomes:
=COUNTIF(X1:X4,">"&Threshold)' Note: no Array formula here
I gave the threshold (Cell W6) a named range for read ablity.
I have an NxMxT array where each element of the array is a grid of Earth. If the grid is over the ocean, then the value is 999. If the grid is over land, it contains an observed value. N is longitude, M is latitude, and T is months.
In particular, I have an array called tmp60 for the ten years 1960 through 1969, so 120 months for each grid.
To test what the global mean in January 1960 was, I write:
tmpJan60=tmp60(:,:,1);
tmpJan60(tmpJan60(:,:)>200)=NaN;
nanmean(nanmean(tmpJan60))
which gives me 5.855.
I am confused about the reshape function. I thought the following code should yield the same average, namely 5.855, but it does not:
load tmp60
N1=size(tmp60,1)
N2=size(tmp60,2)
N3=size(tmp60,3)
reshtmp60 = reshape(tmp60, N1*N2,N3);
reshtmp60( reshtmp60(:,1)>200,: )=[];
mean(reshtmp60(:,1))
this gives me -1.6265, which is not correct.
I have checked the result in Excel (!) and 5.855 is correct, so I assume I make a mistake in the reshape function.
Ideally, I want a matrix that takes each grid, going first down the N-dimension, and make the 720 rows with 120 columns (each column is a month). These first 720 rows will represent one longitude band around Earth for the same latitude. Next, I want to increase the latitude by 1, thus another 720 rows with 120 columns. Ultimately I want to do this for all 360 latitudes.
If longitude and latitude were inputs, say column 1 and 2, then the matrix should look like this:
temp = [-179.75 -89.75 -1 2 ...
-179.25 -89.75 2 4 ...
...
179.75 -89.75 5 9 ...
-179.75 -89.25 2 5 ...
-179.25 -89.25 3 4 ...
...
-179.75 89.75 2 3 ...
...
179.75 89.75 6 9 ...]
So temp(:,3) should be all January 1960 observations.
One way to do this is:
grid1 = tmp60(1,1,:);
g1 = reshape(grid1, [1,120]);
grid2 = tmp60(2,1,:);
g2 = reshape(grid2,[1,120]);
g = [g1;g2];
But obviously very cumbersome.
I am not able to automate this procedure for the N*M elements, so comments are appreciated!
A link to the file tmp60.mat
The main problem in your code is treating the nans. Observe the following example:
a = randi(10,6);
a(a>7)=nan
m = [mean(a(:),'omitnan') mean(mean(a,'omitnan'),'omitnan')]
m =
3.8421 3.6806
Both elements in m are simply the mean on all elements in a. But they are different! The reason is the taking the mean of all values together, with mean(a(:),'omitnan') is like summing all not-nan values, and divide by the number of values we summed:
sum(a(:),'omitnan')/sum(~isnan(a(:)))==mean(a(:),'omitnan') % this is true
but taking the mean of the first dimension, we get 6 mean values:
sum(a,'omitnan')./sum(~isnan(a))==mean(a,'omitnan') % this is also true
and when we take the mean of them we divide by a larger number, because all nans were omitted already:
mean(sum(a,'omitnan')./sum(~isnan(a)))==mean(a(:),'omitnan') % this is false
Here is what I think you want in your code:
% this is exactly as your first test:
tmpJan60=tmn60(:,:,1);
tmpJan60(tmpJan60>200) = nan;
m1 = mean(mean(tmpJan60,'omitnan'),'omitnan')
% this creates the matrix as you want it:
result = reshape(permute(tmn60,[3 1 2]),120,[]).';
result(result>200) = nan;
r = reshape(result(:,1),720,360);
m2 = mean(mean(r,'omitnan'),'omitnan')
isequal(m1,m2)
To create the matrix you first permute the dimensions so the one you want to keep as is (time) will be the first. Then reshape the array to Tx(lon*lat), so you get 120 rows for all time steps and 259200 columns for all combinations of the coordinates. All that's left is to transpose it.
m1 is your first calculation, and m2 is what you try to do in the second one. They are equal here, but their value is not 5.855, even if I use your code.
However, I think the right solution will be to take the mean of all values together:
mean(result(:,1),'omitnan')
I currently have a vector containing a cell array of predefined values. The number and content of these values should be able to vary:
names = {'r1','r2','r3'};
Furthermore, I have a Matrix, that should serve as an index Matrix. It looks like the following example, however, should also be variable in its size.
mat = [1 3 3; 2 1 3; 1 1 1];
Delivering:
1 3 3
2 1 3
1 1 1
I would now like to create a matrix containing the respective values of the array in the same matrix format. Hence, whereever mat contains a 1 the output should contain the first value of names and so on. The final result should then look like:
r1 r3 r3
r2 r1 r3
r1 r1 r1
Just to avoid missunderstandings: The content of names simply serves as an example here. Later specific names should be matched and it cannot be solved by simply adding an r infront of every index value.
Many thanks for your help!
That's simple:
result = names(mat);
The only caveat is that every numeric element in mat must be integer and between 1 and the number of elements in names.
Explanation: The mat works as a linear index. The general rule when indexing linearly is that the values are taken from the source array in column order (as it is normal), but the shape is the same as the shape of the index array.
Later Edit, thanks to Luis Mendo: this rule is valid except for the singleton dimensions of the index array. To enforce the rule for this corner case, one may use the slightly more elaborate (and more time-consuming) form:
result = reshape(names(mat), size(mat));
I have a code that looks for the best combination between two arrays that are less than a specific value. The code only uses one value from each row of array B at a time.
B =
1 2 3
10 20 30
100 200 300
1000 2000 3000
and the code i'm using is :
B=[1 2 3; 10 20 30 ; 100 200 300 ; 1000 2000 3000];
A=[100; 500; 300 ; 425];
SA = sum(A);
V={}; % number of rows for cell V = num of combinations -- column = 1
n = 1;
for k = 1:length(B)
for idx = nchoosek(1:numel(B), k)'
rows = mod(idx, length(B));
if ~isequal(rows, unique(rows)) %if rows not equal to unique(rows)
continue %combination possibility valid
end %Ignore the combination if there are two elements from the same row
B_subset = B(idx);
if (SA + sum(B_subset) <= 2000) %if sum of A + (combination) < 2000
V(n,1) = {B_subset(:)}; %iterate cell V with possible combinations
n = n + 1;
end
end
end
However, I would like to display results differently than how this code stores them in a cell.
Instead of displaying results in cell V such as :
[1]
[10]
[300]
[10;200]
[1000;30]
[1;10;300]
This is preferred : (each row X column takes a specific position in the cell)
Here, this means that they should be arranged as cell(1,1)={[B(1,x),B(2,y),B(3,z),B(4,w)]}. Where x y z w are the columns with chosen values. So that the displayed output is :
[1;0;0;0]
[0;10;0;0]
[0;0;300;0]
[0;10;200;0]
[0;30;0;1000]
[1;10;300;0]
In each answer, the combination is determined by choosing a value from the 1st to 4th row of matrix B. Each row has 3 columns, and only one value from each row can be chosen at once. However, if for example B(1,2) cannot be used, it will be replaced with a zero. e.g. if row 1 of B cannot be used, then B(1,1:3) will be a single 0. And the result will be [0;x;y;z].
So, if 2 is chosen from the 1st row, and 20 is chosen from the 2nd row, while the 3rd and 4th rows are NOT included, they should show a 0. So the answer would be [2;20;0;0].
If only the 4th row is used (such as 1000 for example), the answer should be [0;0;0;1000]
In summary I want to implement the following :
Each cell contains length(B) values from every row of B (based on the combination)
Each value not used for the combination should be a 0 and printed in the cell
I am currently trying to implement this but my methods are not working .. If you require more info, please let me know.
edit
I have tried to implement the code in the dfb's answer below but having difficulties, please take a look at the answer as it contains half of the solution.
My MATLAB is super rusty, but doesn't something like this do what you need?
arr = zeros(1,len(B))
arr(idx) = B_subset(:)
V(n,1) = {arr}