Snowflake: Subtracting one column from another and partitioning it by another column - snowflake-cloud-data-platform

Data:
Group
Date
Current
Next
A
03/09/2020
4
7
A
04/09/2020
2
4
A
05/09/2020
4
null
B
17/08/2020
4
9
B
19/08/2020
4
null
I don't think I can use a Windows Function, as I'm not using SUM, COUNT, MAX, MIN etc. If I do just do NEXT- CURRENT, it's not going to partition by group. Hence, the whole set will be treated as one group.
I want to subtract Current from Next but for it to partitioned by Group.
Desired output:
Group
Date
Current
Next
Diff
A
03/09/2020
4
7
3
A
04/09/2020
2
4
2
A
05/09/2020
4
null
null
B
17/08/2020
4
9
5
B
19/08/2020
4
null
null

This is just simple math, not sure what the real question is?
with data("Group", "Date", "Current", "Next") as (
select * from values
('A', '03/09/2020', 4, 7),
('A', '04/09/2020', 2, 4),
('A', '05/09/2020', 4, null),
('B', '17/08/2020', 4, 9),
('B', '19/08/2020', 4, null)
)
select *, "Next" - "Current" as Diff
from data;
Group
Date
Current
Next
DIFF
A
03/09/2020
4
7
3
A
04/09/2020
2
4
2
A
05/09/2020
4
B
17/08/2020
4
9
5
B
19/08/2020
4

Related

Cumulative sum of value with arrayformula and reset if =>20 - Google sheets [duplicate]

Please provide an array formula. Can you help to reset running totals when MOQ is reached?. Here MOQ=15. When running total becomes equal to or greater than 15 it should restart.
Date
Value
Desired
12/2022
6
6
01/2023
5
11
02/2023
4
15
03/2023
3
3
04/2023
9
12
05/2023
2
14
06/2023
6
20
07/2023
1
1
08/2023
6
7
09/2023
1
8
10/2023
8
16
11/2023
9
9
12/2023
3
12
all we need is a unique common ID for grouping the sum. we start with months to fall from whatever date to months 3, 7 and 11:
=ARRAYFORMULA(IFNA(VLOOKUP(MONTH(A2:A), {3;7;11}, 1, 1), 11))
next, we can use years to differentiate between 11/2022 and 11/2023 so we take whatever date and convert it into first day of given month and then offset the year by 58 days:
=ARRAYFORMULA(YEAR(EOMONTH(A2:A, -1)+1-58))
we combine it to get a unique ID per MOQ:
=ARRAYFORMULA(IFERROR(IFNA(VLOOKUP(MONTH(A2:A), {3;7;11}, 1, 1), 11)&
" "&YEAR(EOMONTH(A2:A, -1)+1-58)))
then we just use standard running total fx:
=ARRAYFORMULA(IF(A2:A="",,
MMULT(--TRANSPOSE(IF((TRANSPOSE(ROW(A2:A))>=ROW(A2:A))*(
IFERROR(IFNA(VLOOKUP(MONTH(A2:A), {3;7;11}, 1, 1), 11)&"×"&
YEAR(EOMONTH(A2:A, -1)+1-58))=TRANSPOSE(
IFERROR(IFNA(VLOOKUP(MONTH(A2:A), {3;7;11}, 1, 1), 11)&"×"&
YEAR(EOMONTH(A2:A, -1)+1-58)))), B2:B, 0)), ROW(A2:A)^0)))
update:
=ARRAYFORMULA(IF(A2:A="",, MMULT(--TRANSPOSE(IF((TRANSPOSE(ROW(B2:B))>=ROW(B2:B))*(
ARRAY_CONSTRAIN({0; IF(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15)>
MAX(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))-1,
MAX(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))-1,
TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))}, ROWS(B2:B), 1)=TRANSPOSE(
ARRAY_CONSTRAIN({0; IF(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15)>
MAX(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))-1,
MAX(TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))-1,
TRUNC(SUMIF(ROW(B2:B), "<="&ROW(B2:B), B2:B)/15))}, ROWS(B2:B), 1))), B2:B, 0)),
ROW(B2:B)^0)))
Use the new SCAN function
=SCAN(0,B2:B, LAMBDA(a,c, IF(c="",,IF(a>=15,c,a+c))))
What we did is: return the running total of Values if less than or equal 15, else rest.
Explation
SCAN Takes initial_value we set it to 0 and array_or_range we name it c "Current value" for short, and the accumulator in lambda a you can name them any thing.
IF the accumulator a is greater or equal 15 >= 15 then reset by returning the Current value of array_or_range c, if not add the current value of array_or_range c to the accumulator a.
Used formulas help
SCAN - LAMBDA - IF

SQLite: combination of different ordering policies in one query

I have a table which holds 4 columns: int x, int y, int z, int type.
I need to select sorted list of records when next properties preserved:
All records with type 1 must be ordered by x and y (y is secondary key) in ascending order;
All records of type 2 must be ordered by z. In ascending order.
Records of type 2 usually ordered by x and y and when ordered by z the ordering should be the same. In theory. However, in reality some of type 2 records may be partially disordered with respect to z, i.e. for them table.x2 > table.x1 but table.z2 < table.z1.
I need for all selected records of type 2 to ensure table.z2 > table.z1 even at price of having table.x1 > table.x2 (implied, that table.x1 < table.x2 for all other records).
I figured out some query like this:
SELECT * FROM aTable ORDER BY CASE type WHEN 1 THEN x ASC, y ASC ELSE z ASC END;
but for some reason SQLite rejects it as syntactically wrong.
Is it possible anyway to construct such query?
UPDATE:
Example
Data in table
x y z type
1 4 3 1
2 3 6 1
2 2 6 1
8 3 5 2
4 6 3 2
5 8 6 1
7 6 2 2
Expected result:
1 4 3 1
2 2 6 1
2 3 6 1
7 6 2 2
4 6 3 2
5 8 6 1
8 3 5 2

Algorithm for Number series row and column

I am trying to create a simple Algorithm in Dart but I think the programming language doesn't matter it is more about the Algorithm:
I am trying to make 2 lists of pairs of numbers depending on "row" and "column" for example:
col_1
col_2
1
2
3
4
5
6
7
8
9
10
=> I need a Algorithm that makes me 2 lists of numbers:
first list: 2,3,6,7,10...
second list: 4,5,8,9...
But this must also work when the "columns" change like that:
col_1
col_2
col_3
1
2
3
4
5
6
7
8
9
this time the first list must be:
3,4,9...
the second list:
6,7 ...
anyone has an Idea on how I could achieve this with a simple calculation? or algorithm depending on the "Maximum" amount of numbers?
You're probably looking for the modulo operator %
Directly, it can create a repeating series based upon the remainder of dividing by the value
For example, you could place your values into the Nth list in a list of lists (whew)
>>> cols = [[],[],[]]
>>> for x in range(12):
... cols[x % 3].append(x)
...
>>> cols
[[0, 3, 6, 9], [1, 4, 7, 10], [2, 5, 8, 11]]
0%5 --> 0
1%5 --> 1
...
11%5 --> 1
To massage values when you have (for example) some increment of 1, you can do a little math
>>> for x in range(12):
... print("x={} is in column {}".format(x + 1, (x % 3) + 1 ))
...
x=1 is in column 1
x=2 is in column 2
x=3 is in column 3
x=4 is in column 1
x=5 is in column 2
x=6 is in column 3
x=7 is in column 1
x=8 is in column 2
x=9 is in column 3
x=10 is in column 1
x=11 is in column 2
x=12 is in column 3
Note you may need to do some extra work to either
know how many values you need to fill all the columns
count and stop when you have enough rows
Examples are in Python because I'm most familiar with it
Note that work like [[],[],[]] should largely be avoided if a better collection is available (perhaps a dict of lists with column-name keys or Pandas DataFrame), but it makes for a good illustration
With a little discussion, https://oeis.org/A042964 seems like the solution, which suggests the following for testing entries for set membership
binomial(m+2, m) mod 2 = 0
in Python, this could be
>>> from scipy.special import binom # 3rd party math library
>>> for m in range(1, 15):
... print("{} belongs in list {}".format(m, 1 if binom(m+2, m) % 2 == 0 else 2))
...
1 belongs in list 2
2 belongs in list 1
3 belongs in list 1
4 belongs in list 2
5 belongs in list 2
6 belongs in list 1
7 belongs in list 1
8 belongs in list 2
9 belongs in list 2
10 belongs in list 1
11 belongs in list 1
12 belongs in list 2
13 belongs in list 2
14 belongs in list 1
If only the first list is desired, entries can be directly generated by the formula a(n)=2*n+2-n%2 (refer to PROG on OEIS page), rather than testing values for membership
>>> a = lambda n: 2*n+2-n%2
>>> [a(n) for n in range(10)]
[2, 3, 6, 7, 10, 11, 14, 15, 18, 19]

Recursive CTE - Updating nodes in SQL Server

I have some main tests. Each of these main tests consist on other tests. Each of these tests consists on other tests and so on. See below the trees as an example.
Main Test 1
ID:1
/ | \
/ | \
+ o +
Test Test Test
ID:2 ID:3 ID:4
/ \ / \
/ \ / \
+ o + o
Test Test Test Test
ID:5 ID:6 ID:7 ID:8
| / \
| / \
o o o
Test Test Test
ID:12 ID:9 ID:10
Main Test 2
ID:2
/ \
/ \
+ +
Test Test
ID:3 ID:8
/ | \
/ | \
o o o
Test Test Test
ID:5 ID:10 ID:7
Symbols:
'o' are leafs
'+' are parents
Main Test 1 and Main Test 2 are main tests (root tests).
Within each main test, ids for tests are unique, but ids tests within a main test can be repeated for within another main tests as above trees show.
I have an input table, let's say, "INPUT" with below columns:
ID_MainTest | ID_TEST | PASSED
With this input table we indicate which tests for each main test are passed.
Also we have another table that contains above trees representation into table, let's say table "Trees":
ID_MainTest | ID_TEST | PARENT_ID_TEST
Finally we have another table, let's say table "TESTS", which contains all the tests which indicates the current result (PENDING,FAILED,PASSED) for each test:
ID_MainTest | ID_TEST | RESULT
So suppose tables content are:
INPUT table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | PASSED
1 4 1
1 5 1
1 6 1
1 2 1
1 3 1
2 3 1
TREES table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | PARENT_ID_TEST
1 2 NULL
1 3 NULL
1 4 NULL
1 5 2
1 6 2
1 7 4
1 8 4
1 12 5
1 9 7
1 10 7
2 3 NULL
2 8 NULL
2 5 3
2 10 3
2 7 3
TESTS table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | RESULT
1 2 PENDING
1 3 FAILED
1 4 FAILED
1 5 PASSED
1 6 PENDING
1 7 PASSED
1 8 FAILED
1 12 PASSED
1 9 PASSED
1 10 PENDING
2 3 PENDING
2 8 FAILED
2 5 PASSED
2 10 PENDING
2 7 PENDING
The functionality is the following:
A test (those indicated in input table) will be switch to passed if and only if all its children figure as passed. If any of its children (or descendants) is failed, then parent will be set/switch to failed despite of indicated as passed in input table.
If test is indicated to be passed from input table, all its children (and descendants) will be set/switch to passed from the parent to the leafs when possible: Children (and descendants) may only be switch to passed if they figure as pending. If a child (or descendant) figures as failed it cannot be switch to passed (it keeps as failed). Also if a child (or descendant) already figures as passed it is not necessary to switch again to passed, it will be kept.
Parent indicated as passed in input table, can be switch to passed if
all its descendants figure as passed (independently if this parent
figures as failed or pending in the tests table, this is an
exception).
So taken into account the functionality and tables content above indicated I would like to obtain below result table with only the tests we have tried to switch to passed (successfully or not), switched to passed, or maintained to failed or passed, including those indicated in input table:
(ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | RESULT
1 2 PASSED
1 3 PASSED
1 4 FAILED
1 5 PASSED
1 6 PASSED
1 7 PASSED
1 8 FAILED
1 12 PASSED
1 9 PASSED
1 10 PASSED
2 3 PASSED
2 5 PASSED
2 10 PASSED
2 7 PASSED
I provide the initial tables below:
DECLARE #INPUT AS TABLE
(
ID_MainTest int,
ID_TEST int,
PASSED bit
)
INSERT INTO #INPUT VALUES
(1, 4, 1),
(1, 5, 1),
(1, 6, 1),
(1, 2, 1),
(1, 3, 1),
(2, 3, 1)
DECLARE #TREES AS TABLE
(
ID_MainTest int,
ID_TEST int,
PARENT_ID_TEST int
)
INSERT INTO #TREES VALUES
(1, 2, NULL),
(1, 3, NULL),
(1, 4, NULL),
(1, 5, 2),
(1, 6, 2),
(1, 7, 4),
(1, 8, 4),
(1, 12, 5),
(1, 9, 7),
(1, 10, 7),
(2, 3, NULL),
(2, 8, NULL),
(2, 5, 3),
(2, 10, 3),
(2, 7, 3)
DECLARE #TESTS AS TABLE
(
ID_MainTest int,
ID_TEST int,
RESULT NVARCHAR(50)
)
INSERT INTO #TESTS VALUES
(1, 2, 'PENDING'),
(1, 3, 'FAILED'),
(1, 4, 'FAILED'),
(1, 5, 'PASSED'),
(1, 6, 'PENDING'),
(1, 7, 'PASSED'),
(1, 8, 'FAILED'),
(1, 12, 'PASSED'),
(1, 9, 'PASSED'),
(1, 10, 'PENDING'),
(2, 3, 'PENDING'),
(2, 8, 'FAILED'),
(2, 5, 'PASSED'),
(2, 10, 'PENDING'),
(2, 7, 'PENDING')

How do you fill-in missing values despite differences in index values?

Here's my situation. I have a predicted values in the form of array (i.e. ([1,3,1,2,3,...3]) ) and a data frame column of missing NA's. Both array and column of data frame have the same dimensions. But, the indices don't match another.
For instance, the indices of predicted array are 0:100.
On the other hand, the indices of the column of NA's don't begin with 0, rather the first index where NA is observed in the dataFrame.
What's Pandas function will fill-in the first missing value with the first element of predicted array, second missing value with the second element, and so forth?
Assuming your missing data is represented in the DF as NaN/None values:
df = pd.DataFrame({'col1': [2,3,4,5,7,6,5], 'col2': [2,3,None,5,None,None,5],}) # Column 2 has missing values
pred_vals = [11, 22, 33] # Predicted values to be inserted in place of the missing values
print 'Original:'
print df
missing = df[pd.isnull(df['col2'])].index # Find indices of missing values
df.loc[missing, 'col2'] = pred_vals # Replace missing values
print '\nFilled:'
print df
Result:
Original:
col1 col2
0 2 2
1 3 3
2 4 NaN
3 5 5
4 7 NaN
5 6 NaN
6 5 5
Filled:
col1 col2
0 2 2
1 3 3
2 4 11
3 5 5
4 7 22
5 6 33
6 5 5

Resources