How would you rewrite this mysql syntax using Snowflake - snowflake-cloud-data-platform

ifnull(min(r.sold_date),IF('xxx' in ('xx','xxx','xxxx') and r.refill_status = 'Sold',r.refill_request_date,p.latest_sold_date)) as sold_date
The part where I am having the most trouble is the min(r.sold_date). I need sold_date as part of the output so I can't put it in the main select statement because you can't aggregate a min, max, etc in your group by

you you have
a
b
c
1
10
100
2
10
101
3
20
102
4
20
103
and you want the min(c) for all same b but you also want to get all 4 four rows out. Thus if you want
a
b
c
min_c
1
10
100
100
2
10
101
100
3
20
102
102
4
20
103
102
Then you want the WINDOW FUCNTION version of MIN which looks like:
SELECT
a,
b,
c,
MIN(c) OVER (partition by b) as min_c
FROM VALUES
(1,10,100),
(2,10,101),
(3,20,102),
(4,20,103)
v(a,b,c);

Related

Creating calculated variables in one datastep

Any help is much appreciated. Thanks
I would like to create a couple of variables with my transactional data
I am trying to create variables 'act_bal' and 'penalty' using amount, type and Op_Bal. The rules I have are:
For the first record, the id will have op_bal and it will be
subtracted from the 'amount' for type=C and added if type=D to
calculate act_bal
For the second record onwards it is act_bal + amount for type=C and
act_bal-amount for type=D
I will add the penalty 10 only if the amount is >4 and the type=D.
The id can have only two penalties.
Total Penalty should be subtracted from the act_bal of the last
record which would become op_bal for the next day. (e.g. for id
101, -178-20=-198 will become the op_bal for 4/2/2019)
This is the data I have for two customers IDs 101 and 102 for two different dates (My actual dataset has the data for all the 30 days).
id date amount type Op_Bal
101 4/1/2019 50 C 100
101 4/1/2019 25 D
101 4/1/2019 75 D
101 4/1/2019 3 D
101 4/1/2019 75 D
101 4/1/2019 75 D
101 4/2/2019 100 C
101 4/2/2019 125 D
101 4/2/2019 150 D
102 4/1/2019 50 C 125
102 4/1/2019 125 C
102 4/2/2019 250 D
102 4/2/2019 10 D
The code I wrote is like this
data want;
set have;
by id date;
if first.id or first.date then do;
if first.id then do;
if type='C' then act_bal=Op_Bal - amount;
if type='D' then act_bal=Op_Bal + amount;
end;
else do;
retain act_bal;
if type='C' then act_bal=act_bal + amount;
if type='D' then act_bal=act_bal - amount;
if amount>4 and type='D' then do;
penalty=10;
end;
run;
I couldn't create a counter to control the penalties to 2 and could not subtract the total penalty amount from the amount of the last row. Could someone help me in receiving the desired result? Thanks
id date amount type Op_Bal act_bal penalty
101 4/1/2019 50 C 200 150 0
101 4/1/2019 25 D 125 0
101 4/1/2019 150 D -25 10
101 4/1/2019 75 D -100 10
101 4/1/2019 3 D -103 0
101 4/1/2019 75 D -178 0
101 4/2/2019 100 C -198 -98 0
101 4/2/2019 125 D -223 10
101 4/2/2019 150 D -373 10
102 4/1/2019 50 C 125 175 0
102 4/1/2019 125 C 300 0
102 4/2/2019 250 D 50 0
102 4/2/2019 10 D 40 0
A few tips:
You have the same code for incrementing act_bal in both the if and else blocks, so factor it out. Don't repeat yourself.
You can skip the retain statement if you use a sum statement.
Use a separate variable to keep track of the number of penalties triggered per day, but only apply the first two of them.
So, putting it all together:
data want;
set have;
by id date;
if first.date and not first.id then op_bal = act_bal;
if first.date then do;
act_bal = op_bal;
penalties = 0;
end;
if type='C' then act_bal + amount;
if type='D' then act_bal + (-amount);
if amount > 4 and type='D' then penalties + 1;
if last.date then act_bal + (-min(penalties,2) * 10);
run;

How do I subset data set with specific condition?

I have a dataset that has ages of children. I want to subset the variables that correspond to a certain bracket of age.
basically if I have a dataset
data = names age marks attendance
A 5 1 90
B 12 9 87
C 16 7 98
D 3 0 70
E 7 4 77
I want this:
df = names age (2-10) marks attendance
A 5 1 90
D 3 0 70
E 7 4 77
And similarly for the age bracket 11-16

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

How to aggregate number of notes sent to each user?

Consider the following tables
group (obj_id here is user_id)
group_id obj_id role
--------------------------
100 1 A
100 2 root
100 3 B
100 4 C
notes
obj_id ref_obj_id note note_id
-------------------------------------------
1 2 10
1 3 10
1 0 foobar 10
1 4 20
1 2 20
1 0 barbaz 20
2 0 caszes 30
2 1 30
4 1 70
4 0 taz 70
4 3 70
Note: a note in the system can be assigned to multiple users (for instance: an admin could write "sent warning to 2 users" and link it to 2 user_ids). The first user the note gets linked to is stored differently than the other linked users. The note itself is linked to the first linked user only. Whenever group.obj_id = notes.obj_id then ref_obj_id = 0 and note <> null
I need to make an overview of the notes per user. Normally I would do this by joining on group.obj_id = notes.obj_idbut here this goes wrong because of ref_obj_id being 0 (in which case I should join on notes.obj_id)
There are 4 notes in this system (foobar, barbaz, caszes and taz).
The desired output is:
obj_id user_is_primary notes_primary user_is_linked notes_linked
-------------------------------------------------------------------
1 2 10;20 2 30;70
2 1 30 2 10;20
3 0 2 10;70
4 1 70 1 20
How can I get to this aggregated result?
I hope that I was able to explain the situation clearly; perhaps it is my inexperience but I find the data model not the most straightforward.
Couldn't you simply put this in the ON clause of your join?
case when notes.ref_obj_id = 0 then notes.obj_id else notes.ref_obj_id end = group.obj_id

T-SQL to sum total value instead of rejoining table multiple times

I've looked for an example question like this, I ask for grace if it's been answered (I thought it would have been but have a hard time finding meaningful results with the terms I searched.)
I work at a manufacturing plant where at ever manufacturing operation a part is issued a new serial number. The database table I have to work with has the serial number recorded in the Container field and the previous serial number the part had recorded in the From_Container field.
I'm trying to SUM the Extended_Cost column on parts we've had to re-do operations on.
Here's a sample of data from tbl_Container:
Container From_Container Extended_Cost Part_Key Operation
10 9 10 PN_100 60
9 8 10 PN_100 50
8 7 10 PN_100 40
7 6 10 PN_100 30
6 5 10 PN_100 20
5 4 10 PN_100 50
4 3 10 PN_100 40
3 2 10 PN_100 30
2 1 10 PN_100 20
1 100 10 PN_100 10
In this example the SUM I would expect returned is 40, because operations 20, 30, 40 and 50 were all re-done and cost $10 each.
So far I've been able to do this by rejoining the table to itself 10 times using aliases in the following fashion:
LEFT OUTER JOIN tbl_Container AS FCP_1
ON tbl_Container.From_Container = FCP_1.Container
AND FCP_1.Operation <= tbl_Container.Operation
AND tbl_Container.Part_Key = FCP_1.Part_Key
And then using SUM to add the Extended_Cost fields together. However, I'm violating the DRY principle and there has got to be a better way.
Thank you in advance for your help,
Me
You can try this query.
;WITH CTE AS
(
SELECT TOP 1 *, I = 0 FROM tbl_Container C ORDER BY Container
UNION ALL
SELECT T.*, I = I + 1 FROM CTE
INNER JOIN tbl_Container T
ON CTE.Container = T.From_Container
AND CTE.Part_Key = T.Part_Key
)
SELECT Part_Key, SUM(T1.Extended_Cost) Sum_Extended_Cost FROM CTE T1
WHERE
EXISTS( SELECT * FROM
CTE T2 WHERE
T1.Operation = T2.Operation
AND T1.I > T2.I )
GROUP BY Part_Key
Result:
Part_Key Sum_Extended_Cost
---------- -----------------
PN_100 40

Resources