Rowmax as new column in data table - dataset

I have rank scores of countries for different variables.
I would like to create a column with the maximum rank that occurs per row.
Say the data look something like:
A B C D E F G H I ....
V1 1 4 5 3 12 . 6 9 83
V2 . . 4 6 1 4 7 6 32
So A - X are countries. In rows V1 up you have various variables and in the cells you have the rank score relating to the variable.
Problem is that some countries for whatever reasons don´t score in relation to certain variables, perhaps because V1 is not relevant to country C or whatever.
So in the end I´d like something like
A B C D E F G H I .... newv
V1 1 4 5 3 12 . 6 9 83 83
V2 . . 4 6 1 4 7 6 5 6

I think egen newvar=rowmax(A B C D E F G H I…) does what you need. Have a look at the egen help file for more information. (I presume you need value 7 in the second row, not 6?)

Related

how to count classses in columns

I'm trying to make a query and i'm having a bad time with one thing. Suppose I have a table that looks like this:
id
Sample
Species
Quantity
Group
1
1
AA
5
A
2
1
AB
6
A
3
1
AC
10
A
4
1
CD
15
C
5
1
CE
20
C
6
1
DA
13
D
7
1
DB
7
D
8
1
EA
6
E
9
1
EF
4
E
10
1
EB
2
E
In the table I filter to have just 1 sample (but i have many), it has the species, the quantity of that species and a functional group (there are only five groups from A to E). I would like to make a query to group by the samples and make columns of the counts of the species of certain group, something like this:
Sample
N_especies
Group A
Group B
Group C
Group D
Group E
1
10
3
0
2
2
3
So i have to count the species (thats easy) but i don't know how to make the columns of a certain group, can anyone help me?
You can use PIVOT :
Select a.Sample,[A],[B],[C],[D],[E], [B]+[A]+[C]+[D]+[E] N_especies from
(select t.Sample,t.Grp from [WS_Database].[dbo].[test1] t) t
PIVOT (
COUNT(t.Grp)
for t.Grp in ([A],[B],[C],[D],[E])
) a

I wanna keep index in "pd.Series(a,index=).unique" code

I have a problem with pd.Series(a).unique()
I made a Series, and I used .unique().
However, this deletes the pd.Series index.
How can I made unique Array with original index?
Instead of using .unique() you can use .drop_duplicates():
x = pd.Series([1,2,3,1,1,2,4,5,6], index=list("abcdefghi"))
print(x)
a 1
b 2
c 3
d 1
e 1
f 2
g 4
h 5
i 6
dtype: int64
.drop_duplicates() will remove all duplicates from the Series while maintaining reference to the index. You can choose whether you want to keep the index location of the "first" or the "last" duplicated item via the keep argument:
# Keep the first entry of each duplicated value
x.drop_duplicates(keep="first")
a 1
b 2
c 3
g 4
h 5
i 6
dtype: int64
# Keep the last entry of each duplicated item
x.drop_duplicates(keep="last")
c 3
e 1
f 2
g 4
h 5
i 6
dtype: int64

How to find minimum value of a column imported from Excel using MATLAB

I have a set of values in the following pattern.
A B C D
1 5 6 11
2 6 5 21
3 7 3 42
4 3 7 22
1 2 3 54
2 3 2 43
3 4 3 27
4 3 2 14
I exported the every column into MATLAB workspace as follows.
A = xlsread('F:\R.xlsx','Complete Data','A2:A43');
B = xlsread('F:\R.xlsx','Complete Data','B2:B43');
C = xlsread('F:\R.xlsx','Complete Data','C2:C43');
D = xlsread('F:\R.xlsx','Complete Data','D2:D43');
I need help with code where the it has to check the Column A, find the lowest D value and output the corresponding B and C values. I need the output to look like.
1 5 6 11
2 6 5 21
3 4 3 27
4 3 2 14
I read through related questions and understand that I need to make it a matrix and sort it based on the element on the 4th column using
sortrows
and get indices of the sorted elements. But I am stuck here. Please Guide me.
You can export those columns in one go as:
ABCD = xlsread('F:\R.xlsx','Complete Data','A2:D43');
Now use sortrows to sort the rows according to the first and the fourth column.
req = sortrows(ABCD, [1 4]);
☆ If all elements of the first column exist twice then:
req = req(1:2:end,:);
☆ If it is not necessary that all elements of the first column will exist twice then:
[~, ind] = unique(req(:,1));
req = req(ind,:);

Cumulative Sum from a range identified based on Vlookup

In Excel sheet 1, I have the following data:
A B C D E F G
------------------------------
Name1 1 2 3 4 5 6
Name2 2 9 3 8 4 7
Name3 4 6 0 3 2 1
In Excel sheet 2, I have to calculate cumulative sum based on values in sheet 1
For example,
A B C D E F G
------------------------------
Name1 1 3 6 10 15 21
While I can calculate cumulative sum easily, I do not know how to select the correct range of cells from sheet 1, by searching for 'Name1'
You need a SUMPRODUCT with both relative and absolute column/row cell references.
=SUMPRODUCT(($A2:INDEX($A:$A,MATCH(1E+99,$B:$B))=$I5)*($B2:INDEX(B:B,MATCH(1E+99, B:B))))

SAS: Calculate an average excluding the current observation

I am searching for an elegant way (or, failing that, an inelegant way) to calculate an average which does not include the current record. So, if I have 30 observations I would end up with 30 different averages. Each would be the average of the other 29 values.
From this made-up data, I would want to create 5 new observations with the averages of A, B, and C not including their own data.
A B C
Albert 12 4 6
Bob 14 7 12
Clyde 6 7 11
Dennis 9 11 7
Earl 8 8 6
I have a vague idea that this will involve proc sql inside a loop. Other ideas or approaches are appreciated.
No loop needed. Use SQL to get the totals for each variable. The average without the current observation is (total sum - value)/(n-1)
data test;
input NAME $ A B C;
datalines;
Albert 12 4 6
Bob 14 7 12
Clyde 6 7 11
Dennis 9 11 7
Earl 8 8 6
;
run;
proc sql noprint;
select count(*),
sum(A),
sum(B),
sum(C)
into :n,
:a,
:b,
:c
from test;
quit;
data test2;
set test;
Ave_A = (&a - a)/(&n-1);
Ave_B = (&b - b)/(&n-1);
Ave_C = (&c - c)/(&n-1);
run;

Resources