Need to shorten the SQL Query - snowflake-cloud-data-platform

Query #1:
SELECT
id,
SUM(spent_time) AS spent_time_01,
COUNT(vpno) AS spend_time_cnt_01
FROM
(SELECT
LEAD(create_ts) OVER (PARTITION BY id ORDER BY CAST(vpno AS INT)) AS lead,
DATEDIFF('second', create_ts::timestamp, LEAD::timestamp) AS spent_time,
*
FROM
table_1) A
WHERE
item_category = 'A'
AND pid IS NOT NULL
GROUP BY
id
Query #2:
SELECT
id,
SUM(spent_time) AS spent_time_02,
COUNT(vpno) AS spend_time_cnt_02
FROM
(SELECT
LEAD(create_ts) OVER (PARTITION BY id ORDER BY CAST(vpno AS INT)) AS lead,
DATEDIFF('second', create_ts::timestamp, LEAD::timestamp) AS spent_time,
*
FROM
table_1) A
WHERE
item_category = 'B'
AND pid IS NOT NULL
GROUP BY
id
I tried the below but I'm not getting the correct results. I think I'm missing something in the query.
SUM(CASE WHEN pid IS NOT NULL
AND vpno IS NOT NULL
AND item_category = 'A' THEN spent_time END) AS spent_time_01
,COUNT_IF(vpno IS NOT NULL
AND item_category = 'A'
AND pid IS NOT NULL) AS spend_time_cnt
Someone kindly guide me on this.

Common part to be turn into CTE and item_category moved as conditional aggregation:
WITH cte AS (
SELECT *,
LEAD(create_ts) OVER (PARTITION BY id ORDER BY vpno::INT) AS lead,
DATEDIFF('second', create_ts::timestamp, LEAD::timestamp) AS spent_time
FROM table_1
)
SELECT id,
SUM(CASE WHEN item_category = 'A' THEN spent_time END) AS spent_time_01,
COUNT(CASE WHEN item_category = 'A' THEN vpno END) AS spend_time_cnt_01,
SUM(CASE WHEN item_category = 'B' THEN spent_time END) AS spent_time_02,
COUNT(CASE WHEN item_category = 'B' THEN vpno END) AS spend_time_cnt_02
FROM cte
WHERE pid IS NOT NULL
AND item_category IN ('A', 'B')
GROUP BY id;

Well if we are trying for Code Golf (smaller code) Lukasz answer can be rewritten a couple line less, and smaller by moving the lead value into the only place it is used, and not have a CTE as it's only used once, and just have it as a subselect. Moving the WHERE filters into the select, and swapping from CASE to IFF. But these are points I have made before..
Also by only asking for the columns you want in the sub-select, the compile can be faster on cold meta data. but those are details that don't matter here.
SELECT id,
SUM(iff(item_category = 'A', spent_time, null)) AS spent_time_01,
COUNT(iff(item_category = 'A', vpno, null)) AS spend_time_cnt_01,
SUM(iff(item_category = 'B', spent_time, null)) AS spent_time_02,
COUNT(iff(item_category = 'B', vpno, null)) AS spend_time_cnt_02
FROM (
SELECT id, item_category,
DATEDIFF('second', create_ts::timestamp, (LEAD(create_ts) OVER (PARTITION BY id ORDER BY vpno::INT))::timestamp) AS spent_time
FROM table_1
WHERE pid IS NOT NULL
AND item_category IN ('A', 'B')
)
GROUP BY id;
But I do tend to prefer CTE for readability and I don't like over long lines, so even I will give my points vote to Lukasz's answer.

Related

Average day gap in between a repeat order for each product

Can someone please help me to find the average time between first and second purchase on a product level.
This is what I have written -
Select A.CustomerId,A.ProductId , A.OrderSequence, (Case WHEN OrderSequence = 1 THEN OrderDate END) AS First_Order_Date,
MAX(Case WHEN OrderSequence = 2 THEN OrderDate END) AS Second_Order_Date
From
(
Select t.CustomerId, t.ProductId, t.OrderDate,
Dense_RANK() OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate Asc) as OrderSequence
From Transactions t (NOLOCK)
Where t.SiteKey = 01
Group by t.CustomerId, t.ProductId, t.OrderDate)
A
Where A.OrderSequence IN (1,2)
Group By A.Customer_Id, A.ProductId, A.OrderSequence, A.OrderDate
Sample Data:
It looks like row-numbering and LEAD should do the trick for you here.
Don't use NOLOCK unless you really know what you're doing
It's unclear if you want the results to be partitioned by CustomerId also. If not, you can remove it everywhere in the query
SELECT
A.CustomerId,
A.ProductId,
AVG(DATEDIFF(day, OrderDate, NextOrderDate))
FROM
(
SELECT
t.CustomerId,
t.ProductId,
t.OrderDate,
ROW_NUMBER() OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate) AS rn,
LEAD(OrderDate) OVER (PARTITION BY t.CustomerId, t.ProductId ORDER BY OrderDate) AS NextOrderDate
FROM Transactions t
WHERE t.SiteKey = '01'
) t
WHERE t.rn = 1
GROUP BY
t.Customer_Id,
t.ProductId;

TSQL results in specific number of columns

I'm trying to create a query that will distribute the row results in specific number of columns (ex. 4). See below screenshot.
It is presentation matter and it should be handled on application level.
But if you insist:
SELECT column1 = MIN(CASE WHEN grp=1 THEN PartNumber+CHAR(13)+SerialNumber END)
,column2 = MIN(CASE WHEN grp=2 THEN PartNumber+CHAR(13)+SerialNumber END)
,column3 = MIN(CASE WHEN grp=3 THEN PartNumber+CHAR(13)+SerialNumber END)
,column4 = MIN(CASE WHEN grp=4 THEN PartNumber+CHAR(13)+SerialNumber END)
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY rn ORDER BY rn) AS grp
FROM (SELECT *,(ROW_NUMBER() OVER(ORDER BY 1/0)-1)/4 AS rn FROM tab)s)sub
GROUP BY rn;
DBFiddle Demo

Get running balance of a work center using partition

I am writing a script that will run on SQL Server 2014.
I have a table of transactions recording transfers from one work center to another. The simplified table is below:
DECLARE #transactionTable TABLE (wono varchar(10),transferDate date
,fromWC varchar(10),toWC varchar(10),qty float)
INSERT INTO #transactionTable
SELECT '0000000123','5/10/2018','STAG','PP-B',10
UNION
SELECT '0000000123','5/11/2018','PP-B','PP-T',5
UNION
SELECT '0000000123','5/11/2018','PP-T','TEST',3
UNION
SELECT '0000000123','5/12/2018','PP-B','PP-T',5
UNION
SELECT '0000000123','5/12/2018','PP-T','TEST',5
UNION
SELECT '0000000123','5/13/2018','PP-T','TEST',2
UNION
SELECT '0000000123','5/13/2018','TEST','FGI',8
UNION
SELECT '0000000123','5/14/2018','TEST','FGI',2
SELECT *,
fromTotal = -SUM(qty) OVER(PARTITION BY fromWC ORDER BY wono, transferdate, fromWC),
toTotal = SUM(qty) OVER(PARTITION BY toWC ORDER BY wono, transferdate, toWC)
FROM #transactionTable
ORDER BY wono, transferDate, fromWC
I want to get a running balance of the fromWC and toWC after each transaction.
Given the records above, the end result should be this:
I believe it is possible to use SUM(qty) OVER(PARTITION BY..., but I am not sure how to write the statement. When I try to get the increase and decrease, each line always results in 0.
How do I write the SUM statement to achieve the desired results?
UPDATE
This image shows each transaction, the resulting WC qty, and highlights the corresponding from and to work centers for each transaction.
For example, looking at the second record on 5/11, 3 were transferred from PP-T to TEST. After the transaction, there were 5 in PP-B, 2 in PP-T, and 3 in TEST.
I can get close, except starting balances:
SELECT wono, transferDate, fromWC, toWC, qty,
SUM( CASE WHEN WC = fromWC THEN RunningTotal ELSE 0 END ) AS FromQTY,
SUM( CASE WHEN WC = toWC THEN RunningTotal ELSE 0 END ) AS ToQTY
FROM( -- b
SELECT *, SUM(Newqty) OVER(PARTITION BY WC ORDER BY wono,transferdate, fromWC, toWC) AS RunningTotal
FROM(-- a
SELECT wono, transferDate, fromWC, toWC, fromWC AS WC, qty, -qty AS Newqty, 'From' AS RecType
FROM #transactionTable
UNION ALL
SELECT wono, transferDate, fromWC, toWC, toWC AS WC, qty, qty AS Newqty, 'To' AS RecType
FROM #transactionTable
) AS a
) AS b
GROUP BY wono, transferDate, fromWC, toWC, qty
My logic assumes that all balances start at 0, therefore "STAG" balance will be -10.
How the query works:
"Unpivot" the input record set into "From" and "To" records with quantities negated for "From" records.
Calculate running totals for each "WC".
Combine "Unpivoted" records back into original shape
Solution 2
WITH CTE
AS(
SELECT *,
ROW_NUMBER() OVER( ORDER BY wono, transferDate, fromWC, toWC ) AS Sequence
FROM #transactionTable
),
CTE2
AS(
SELECT *,
fromTotal = -SUM(qty) OVER(PARTITION BY fromWC ORDER BY Sequence),
toTotal = SUM(qty) OVER(PARTITION BY toWC ORDER BY Sequence)
FROM CTE
)
SELECT a.Sequence, b.Sequence, c.Sequence, a.wono, a.transferDate, a.fromWC, a.toWC, a.qty, a.fromTotal + ISNULL( b.toTotal, 0 ) AS FromTotal, a.toTotal + ISNULL( c.fromTotal, 0 ) AS ToTotal
FROM CTE2 AS a
OUTER APPLY( SELECT TOP 1 * FROM CTE2 WHERE wono = a.wono AND Sequence < a.Sequence AND toWC = a.fromWC ORDER BY Sequence DESC ) AS b
OUTER APPLY( SELECT TOP 1 * FROM CTE2 WHERE wono = a.wono AND Sequence < a.Sequence AND fromWC = a.toWC ORDER BY Sequence DESC ) AS c
ORDER BY a.Sequence
Note: This solution would benefit greatly from an "ID" column, that mirrors transaction order OR at least you will need an index on wono, transferDate, fromWC, toWC

Fill the gaps, event based

I am trying to calculate churn of customers based on activity they could have done, opposed to churn by date that is the normal thing. We have events that is connected to a specific host, in my example all events are hosted by Alice but it could be different hosts.
All the people that follow a specific event should be placed in a category (new, active, churned and resurrected).
New: First time a person follow an event from the specific host.
Active: Follow again (and last event from specific host was also followed).
Churned: Follower had a chance to follow but didn't.
Resurrected: Follower that has churned has started to follow a previously followed host.
declare #events table (event varchar(50), host varchar(50), date date)
declare #eventFollows table (event varchar(50), follower varchar(50))
insert into #events values ('e_1', 'Alice', GETDATE())
insert into #events values ('e_2', 'Alice', GETDATE())
insert into #events values ('e_3', 'Alice', GETDATE())
insert into #events values ('e_4', 'Alice', GETDATE())
insert into #events values ('e_5', 'Alice', GETDATE())
insert into #eventFollows values ('e_1', 'Bob') --new
insert into #eventFollows values ('e_2', 'Bob') --active
--Bob churned
insert into #eventFollows values ('e_4', 'Megan') --new
insert into #eventFollows values ('e_5', 'Bob') --resurrected
insert into #eventFollows values ('e_5', 'Megan') --active
select * from #events
select * from #eventFollows
The expected outcome should be something like this
select 'e_1', 1 as New, 0 as resurrected, 0 as active, 0 as churned --First time Bob follows Alice event
union all
select 'e_2', 0 as New, 0 as resurrected, 1 as active, 0 as churned --Bob follows the next event that Alice host (considered as Active)
union all
select 'e_3', 0 as New, 0 as resurrected, 0 as active, 1 as churned --Bob churns since he does not follow the next event
union all
select 'e_4', 1 as New, 0 as resurrected, 0 as active, 0 as churned --First time Megan follows Alice event
union all
select 'e_5', 0 as New, 1 as resurrected, 1 as active, 0 as churned --Second time (active) for Megan and Bob is resurrected
I started with a query of something like below, but the problem is that I don't get all the events that the followers did not follow (but could have followed).
select a.event, follower, date,
LAG (a.event,1) over (partition by a.host, ma.follower order by date) as lag,
LEAD (a.event,1) over (partition by a.host, ma.follower order by date) as lead,
LAG (a.event,1) over (partition by a.host order by date) as lagP,
LEAD (a.event,1) over (partition by a.host order by date) as leadP
from #events a left join #eventFollows ma on ma.event = a.event order by host, follower, date
Any ideas?
This may seem a bit of an indirect approach, but it's possible to detect islands by checking for gaps in the numbers:
;with nrsE as
(
select *, ROW_NUMBER() over (order by event) rnrE from #events
), nrs as
(
select f.*,host, rnrE, ROW_NUMBER() over (partition by f.follower, e.host order by f.event ) rnrF
from nrsE e
join #eventFollows f on f.event = e.event
), f as
(
select host, follower, min(rnrE) FirstE, max(rnrE) LastE, ROW_NUMBER() over (partition by follower, host order by rnrE - rnrF) SeqNr
from nrs
group by host, follower, rnrE - rnrF --difference between rnr-Event and rnr-Follower to detect gaps
), stat as --from the result above on there are several options. this example uses getting a 'status' and pivoting on it
(
select e.event, e.host, case when f.FirstE is null then 'No participants' when f.LastE = e.rnrE - 1 then 'Churned' when rnrE = f.FirstE then case when SeqNr = 1 then 'New' else 'Resurrected' end else 'Active' end Status
from nrsE e
left join f on e.rnrE between f.FirstE and f.LastE + 1 and e.host = f.host
)
select p.* from stat pivot(count(Status) for Status in ([New], [Resurrected], [Active], [Churned])) p
The last 2 steps could be simplified, but getting the 'Status' this way might be reusable for other scenarios
This matches your desired result
SELECT
X.event, X.host, X.date,
IsNew = SUM(CASE WHEN X.FirstFollowerEvent = X.event THEN 1 ELSE 0 END),
IsActive = SUM(CASE WHEN X.lagFollowerEvent = X.lagEvent THEN 1 ELSE 0 END),
IsChurned = SUM(CASE WHEN X.follower IS NULL THEN 1 ELSE 0 END),
IsResurrected = SUM(CASE WHEN X.lagFollowerEvent <> X.lagEvent AND X.FirstFollowerEvent IS NOT NULL THEN 1 ELSE 0 END)
FROM
(
select
a.event, a.host, ma.follower, a.date,
FIRST_VALUE(a.event) over (partition by a.host, ma.follower order by a.date, a.event) as FirstFollowerEvent,
LAG (a.event,1) over (partition by a.host, ma.follower order by a.date, a.event) as lagFollowerEvent,
LAG (a.event,1) over (partition by a.host order by a.date, a.event) as lagEvent
FROM
#events a
LEFT join
#eventFollows ma on a.event = ma.event
) X
GROUP BY
X.event, X.host, X.date
ORDER by
X.event, X.host, X.date

Help with finding the difference (delta) from a value returned from the last two records

I'm using MS SQL 2005 and I have created a CTE query to return values from the last two records. I then use this to find the delta of two figures returned. I have a working query of sorts but
I'm having problems getting anything other than the delta figure.
here is my query:
;with data as(
SELECT
NetObjectID,
RawStatus,
RowID,
rn
from(
SELECT
CustomPollerAssignmentID AS NetObjectID,
RawStatus,
RowID,
row_number() over(order by DateTime desc)as rn
FROM CustomPollerStatistics_Detail
WHERE
(CustomPollerAssignmentID='a87f531d-4842-4bb3-9d68-7fd118004356')
) x where rn<=2
)
SELECT
case when
max(case rn when 1 then RawStatus end) > max(case rn when 2 then RawStatus end)
then
max(case rn when 1 then RawStatus end) - max(case rn when 2 then RawStatus end)
else
max(case rn when 2 then RawStatus end) - max(case rn when 1 then RawStatus end)
end as Delta
from data having
(SELECT
case when
max(case rn when 1 then RawStatus end) > max(case rn when 2 then RawStatus end)
then
max(case rn when 1 then RawStatus end) - max(case rn when 2 then RawStatus end)
else
max(case rn when 2 then RawStatus end) - max(case rn when 1 then RawStatus end)
end
from data) >= 1
What I'm after is to get the Delta & NetObjectID returned. Each time I try, I get errors.
data.NetObjectID is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
If I try adding group by etc.. to the end of the query I get further error complaining about the word 'group'.
I'm relatively new to SQL and I am picking things up as I go. Any help would be gratefully received.
see if something like this will work.
;with data as
(
SELECT
NetObjectID,
RawStatus,
RowID,
rn
from
(
SELECT
CustomPollerAssignmentID AS NetObjectID,
RawStatus,
RowID,
row_number() over(order by DateTime desc)as rn
FROM CustomPollerStatistics_Detail
WHERE
(
CustomPollerAssignmentID='a87f531d-4842-4bb3-9d68-7fd118004356'
)
) x
where rn<=2
)
select
NetObjectID,
max(RawStatus)-min(RawStatus) as Delta
from data
group by NetObjectID
Sorry, I'm very new here and I'm unsure of CTE queries, however it looks like after you define Data, you are selecting case ... as Delta FROM.... Meaning you only have Delta in your select statement. Again, sorry if I'm way off base.

Resources