I wanna keep index in "pd.Series(a,index=).unique" code - arrays

I have a problem with pd.Series(a).unique()
I made a Series, and I used .unique().
However, this deletes the pd.Series index.
How can I made unique Array with original index?

Instead of using .unique() you can use .drop_duplicates():
x = pd.Series([1,2,3,1,1,2,4,5,6], index=list("abcdefghi"))
print(x)
a 1
b 2
c 3
d 1
e 1
f 2
g 4
h 5
i 6
dtype: int64
.drop_duplicates() will remove all duplicates from the Series while maintaining reference to the index. You can choose whether you want to keep the index location of the "first" or the "last" duplicated item via the keep argument:
# Keep the first entry of each duplicated value
x.drop_duplicates(keep="first")
a 1
b 2
c 3
g 4
h 5
i 6
dtype: int64
# Keep the last entry of each duplicated item
x.drop_duplicates(keep="last")
c 3
e 1
f 2
g 4
h 5
i 6
dtype: int64

Related

Enforce same value for the combination of two other values

I have a table T, where I have 4 (integer) columns, A, B, C, and D. There is already a UNIQUE constraint on ABC, but I would need to write a constraint enforcing, that for the same AB combination, the D has the same value, no matter what C is. I.e.
A B C D Note
1 1 1 1 AB is 1,1, D is 1
1 1 2 1
1 1 3 2 wrong! D must be 1, because AB is 1,1
1 1 4 1 ok
2 1 1 5 ok, it's a new AB combination, so a new D value is possible
2 1 2 5 D must be 5 here (and for any following row with AB 2,1)
etc.
I have no idea where to start, and my Google-fu is weak in this case.

find largest value in an array if value in first column matches specified value

I'm trying to find the largest or max value in an array/range (E44:I205) among rows with values in column D (D44:D2015) that match a word. For instance:
D E F G H I
Cheetah Cat 0 1 2 3 4
Tiger Cat 1 1 2 3 4 5
Dog 0 0 1 2 3
Among the rows with the word "*"&"cat", I want to find the max value. In this example, the formula should = 5. I've tried the following formula, but it just returns the first instance of "cat" and the associated max value in that row.
=LARGE(IF($D$25:$D$205="*"&"cat",$E$44:$I$205,),1)
Any help is much appreciated!
Use:
=AGGREGATE(14,6,E25:I205/(RIGHT(D25:D205,3)="cat"),1)

Rowmax as new column in data table

I have rank scores of countries for different variables.
I would like to create a column with the maximum rank that occurs per row.
Say the data look something like:
A B C D E F G H I ....
V1 1 4 5 3 12 . 6 9 83
V2 . . 4 6 1 4 7 6 32
So A - X are countries. In rows V1 up you have various variables and in the cells you have the rank score relating to the variable.
Problem is that some countries for whatever reasons don´t score in relation to certain variables, perhaps because V1 is not relevant to country C or whatever.
So in the end I´d like something like
A B C D E F G H I .... newv
V1 1 4 5 3 12 . 6 9 83 83
V2 . . 4 6 1 4 7 6 5 6
I think egen newvar=rowmax(A B C D E F G H I…) does what you need. Have a look at the egen help file for more information. (I presume you need value 7 in the second row, not 6?)

How to ignore NA in subscripted assignment

Given two (named) arrays x and y, where all dimnames(y) exist in x.
How can I fill (update) x with values from y, but ignoring NAs in y?
I have come so far:
x<-array(1:15,dim=c(5,3),dimnames=list(1:5,1:3))
y<-(NA^!diag(1:3))*diag(1:3)
dimnames(y)<-list(1:3,1:3)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))]<-y
But this also overwrites x with "NA"s from y.
1 2 3
1 1 NA NA
2 NA 2 NA
3 NA NA 3
4 4 9 14
5 5 10 15
I guess it's something involving a filter !is.na(y) but I haven't found the right place to put it?
Thanks for any hint
We match the rownames of 'y' with rownames of 'x' to create the row index ('rn'), similarly get the corresponding column index ('cn') by matching. Get the index of values in 'y' that are non-NAs ('indx'). Subset the 'x' with row index, column index and resubset with 'indx' and replace those values with the non-NA values in y (y[indx]).
rn <- match(rownames(y), rownames(x))
cn <- match(colnames(y), colnames(x))
indx <- which(!is.na(y), arr.ind=TRUE)
x[rn,cn][indx] <- y[indx]
Or instead of matching, we can subset the 'x' with rownames(y) and colnames(y) and replace it as before.
x[rownames(y), colnames(y)][indx] <- y[indx]
You can index directly with rownames and colnames to get the relevant parts of x covered by y, and replace conditionally using ifelse:
x[rownames(y),colnames(y)] <- ifelse(is.na(y),x[rownames(y),colnames(y)],y)
x
1 2 3
1 1 6 11
2 2 2 12
3 3 8 3
4 4 9 14
5 5 10 15
just for completeness:
The accepted answer works under the assumption that we have a 2d-array (row/colnames).
But as the real problem was in higher dimension space (and this may the case for later readers) I show here how the solution can also be applied to the initial dimension-independent approach:
indx <- !is.na(y)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))][indx] <- y[indx]
Thanks!

Hash Table: Which is the right linear-probing array?

I am studying data structures right now and in specific Hash Tables. I came across the follow question:
Imagine that we have placed the following keys
in an initial empty hash table with a length of 7
with linear probing, using the following table of hash-values:
key: A B C D E F G
hash: 3 1 4 1 5 2 5
Which of the following arrays could be the linear-probing array?
1.
0 1 2 3 4 5 6
G B D F A C E
2.
0 1 2 3 4 5 6
B G D F A C E
3.
0 1 2 3 4 5 6
E G F A B C D
When I create the linear-probing array I get this:
0 1 2 3 4 5 6
G B D A C E F
Could somebody please tell me why I am wrong and whats the right answer?
Notice how the question doesn't specify the order in which the keys are inserted, so your answer is only correct assuming that the keys are actually inserted in the order A-B-C-D-E-F-G, but since the question doesn't explicitly state the order, you need to dig deeper.
What you do know, however, is that one of those keys will be inserted first and it will go to its designated slot as shown in the Key-to-Hash diagram, since the hash table is initially empty. This immediately discards option choice 2 because none of the keys are in their designated array entry, leaving you with choice 1 and 3.
For table 1, B is in slot 1, which corresponds to its hash value and for table 3, keys F and A are in their initial hash-value spots.
It's simple to prove that no sequence of key inserts on table 3 after inserting F and A will yield table 3 as a result. And its likewise easy to prove that the sequence of key inserts B-D-F-A-C-E-G will result in table 1.
Although this is a question based on hash tables, I honestly don't consider it a good way to assess your knowledge on linear probing, this is more of a puzzle, as #gnasher729 mentioned.

Resources