I have a table T, where I have 4 (integer) columns, A, B, C, and D. There is already a UNIQUE constraint on ABC, but I would need to write a constraint enforcing, that for the same AB combination, the D has the same value, no matter what C is. I.e.
A B C D Note
1 1 1 1 AB is 1,1, D is 1
1 1 2 1
1 1 3 2 wrong! D must be 1, because AB is 1,1
1 1 4 1 ok
2 1 1 5 ok, it's a new AB combination, so a new D value is possible
2 1 2 5 D must be 5 here (and for any following row with AB 2,1)
etc.
I have no idea where to start, and my Google-fu is weak in this case.
Related
Description
I have a table like this in Google Sheet:
A
B
C
D
E
F
G
1
Cond1
Person_code
n/a
Count
Cond2
n/a
Result
__
_______
________________
_____
________
_______
_____
________
2
0
Tom T_44767
1
1
3
0
Isrel I_44767
1
1
4
1
Patty P_44767
1
1
x
5
1
Isrel I_44767
0
1
6
0
Dummy D_44767
1
1
7
1
Patty P_447677
0
1
8
1
Jarson X_44768
1
1
x
A - Cond1 - either 0 or 1
B - Person_code - first name, second name and number which represents a date
C - n/a - column not important for the case, included for the sake of numeration
D - Count - either 0 or 1 because it counts THE first occurence of B with formula:
COUNTIF($B$1:$B2;$B2)=1)+0 for row 2
COUNTIF($B$1:$B3;$B3)=1)+0 for row 3 and so on.
NOTE: The important thing is to count ONLY THE FIRST occurence (see rows 4 and 7 for an example).
E - Cond2 - either 0 or 1
F - n/a - column not important for the case, included for the sake of numeration
G - Result - IF (Cond1 + Count + Cond 2 = 3) THEN x
What the problem is
Currently Column D counts the first occurence of B. It does not take into account anything else. Just the first occurence in B column. However, I need it to ignore (i.e. do not count) rows where Cond1 + Cond2 is different than 2 (i.e. 0 or 1). Instead, it should look for a first occurence of B where Cond1 + Cond2 = 2 and count it.
So the table should look like this (pay attention to D3, D5 and G5):
A
B
C
D
E
F
G
1
Cond1
Person_code
n/a
Count
Cond2
n/a
Result
__
_______
________________
_____
________
_______
_____
________
2
0
Tom T_44767
1
1
3
0
Isrel I_44767
0
1
4
1
Patty P_44767
1
1
x
5
1
Isrel I_44767
1
1
x
6
0
Dummy D_44767
1
1
7
1
Patty P_447677
0
1
8
1
Jarson X_44768
1
1
x
Row 3 was ignored and the first occurence of 'Isrel I_44767' was found in row 5. Therefore an 'x' appeared in G in row 5.
I've tried to include additional conditions in D but can't get it to work. Any solution would be acceptable. It's okay to add additional columns, if needed or use a totally different approach.
I will be grateful for any advice on this.
I need it to ignore (i.e. do not count) rows where Cond1 + Cond2 is different than 2 (i.e. 0 or 1). Instead, it should look for the first occurrence of B where Cond1 + Cond2 = 2 and count it
=ARRAYFORMULA(IF(A2:A8+E2:E8=2, 1, 0))
now to account for occurrences/instances eg. not count duplicates (if that's what you are after - it's not clear from your question):
=ARRAYFORMULA(IF(1=COUNTIFS(
IF(A2:A10+E2:E10=2, B2:B10, ),
IF(A2:A10+E2:E10=2, B2:B10, ),
ROW(B2:B10), "<="&ROW(B2:B10)), 1, 0))
G - Result - IF (Cond1 + Count + Cond 2 = 3) THEN x
and G2 would be:
=INDEX(IF(A2:A10+D2:D10+E2:E10=3, "x", ))
I have a table include "ID" and "Values", and wanted to know how many times does value "A" jumped into another values like below
ID
Values
1
A
1
A
1
A
1
B
1
A
1
B
1
B
1
C
1
C
1
C
1
A
2
A
2
A
2
B
2
A
2
B
2
C
2
B
Expected Result:
ID
Values
Desired Output
1
A
0
1
A
0
1
A
1
1
B
0
1
A
1
1
B
0
1
B
0
1
C
0
1
C
0
1
C
0
1
A
0
2
A
0
2
A
1
2
B
0
2
A
1
2
B
0
2
C
0
2
B
0
The final table should be like this:
ID
Number of Transitions
1
2
2
2
You just need LEAD() to look at the next value:
select id, values, lead(value) over(partition by id) next_value
from table
Then you can compare next_value with values, and apply an iff(value='A' and next_value!='A', 1, 0).
Then just SUM() or COUNT() and GROUP BY.
You could also treat this as a regexp problem where you want to count how many times a given pattern occurs for each id. The missing piece in your question is -what column dictates the order in which the values appear for each id? You'll need that for either of the solutions
select id, regexp_count(listagg(val,',') within group (order by ordering_col), 'A,[^A]')
from t
group by id;
how I can fin the first and last elements of a dataframe based on a group of rows with respect of a column?
df1:=
g col1 col2
h 1 2
h 0 1
h 7 8
h 5 2
h 0 1
k 7 3
k 2 1
k 9 1
if I wanna group the column with respect of g, and for each group and column I need the following information:
first element, last element, size of the group
IIUC, try:
df_g = df.groupby('dates1').agg(['first','last','size']).T.unstack()
df_g.columns = [f'{i}/{j}' for i, j in df_g.columns]
print(df_g)
Output:
2020-01/first 2020-01/last 2020-01/size 2020-02/first 2020-02/last 2020-02/size
col1 7 9 3 1 0 5
col2 3 1 3 2 1 5
I have a problem with pd.Series(a).unique()
I made a Series, and I used .unique().
However, this deletes the pd.Series index.
How can I made unique Array with original index?
Instead of using .unique() you can use .drop_duplicates():
x = pd.Series([1,2,3,1,1,2,4,5,6], index=list("abcdefghi"))
print(x)
a 1
b 2
c 3
d 1
e 1
f 2
g 4
h 5
i 6
dtype: int64
.drop_duplicates() will remove all duplicates from the Series while maintaining reference to the index. You can choose whether you want to keep the index location of the "first" or the "last" duplicated item via the keep argument:
# Keep the first entry of each duplicated value
x.drop_duplicates(keep="first")
a 1
b 2
c 3
g 4
h 5
i 6
dtype: int64
# Keep the last entry of each duplicated item
x.drop_duplicates(keep="last")
c 3
e 1
f 2
g 4
h 5
i 6
dtype: int64
Given a set of results in table 1
Col 1 Col 2 Col 3 Result
A B C 1
A B D 2
A B D 3
A B E 4
A B E 5
A B F 6
and a set of conditions in table 2
Col 1 Col 2 Col 3
A B C
A B D
A B E
how do I return a table of 'Result Sets' (or grouped results) in T-SQL to identify all the possible combinations of results from table 1 where all conditions in table 2 are met?
Result Set Result
1 1
1 2
1 4
2 1
2 2
2 5
3 1
3 3
3 4
4 1
4 3
4 5
EDIT: To clarify, the 'Result Set' value in the output table would be generated in the T-SQL to identify each set of results.