Using both max and count aggregates in Snowflake

Using both max and count aggregates in Snowflake - snowflake-cloud-data-platform

I have Snowflake table data as below.
RUNID
OBJECT_NAME
LAST_EXECUTION_DATETIME
EXECUTION_STATUS
1
HR_JOB
4/19/2022 22:58:48
SUCCESS
2
HR_JOB
4/19/2022 23:30:42
SUCCESS
3
HR_JOB
4/19/2022 23:32:23
SUCCESS
4
HR_JOB
4/19/2022 23:35:38
SUCCESS
4
HR_JOB
4/19/2022 23:35:38
FAILED
5
HR_JOB
4/19/2022 23:37:58
FAILED
My requirement is to get the max of last_execution_datetime for the Execution_status as success.
In the example attached, the MAX date(RunId 5) has a "FAILED" status and cannot be taken into account.
For the Runid 4, it has both Success and failed as status and this also cannot be taken.
in this Scenario, the MAX date taken as Last Execution is the Runid 3, since the status is a Success
I tried the following Snowflake queries and not able to achieve the results.
select max(LAST_EXECUTION_DATETIME)
from
(
select LAST_EXECUTION_DATETIME, count(*) dt_cnt
from EXECUTION_CONTROL
WHERE OBJECT_NAME = 'HR_JOB'
group by LAST_EXECUTION_DATETIME
)
where dt_cnt = 1;
with cte AS
(
select *, row_number() over (partition by LAST_EXECUTION_DATETIME order by execution_status desc) as rn
from EXECUTION_CONTROL WHERE OBJECT_NAME = 'HR_JOB'
)
select count(*) as cnt, max(LAST_EXECUTION_DATETIME) as mxdt
from cte
group by LAST_EXECUTION_DATETIME

it seems like "all rows are part of the same batch" but I am going to assume
LAST_EXECUTION_DATE is the "batch" and LAST_EXECUTION_DATETIME is the order.
select *
from (
select *
from EXECUTION_CONTROL
where object_name = 'HR_JOB'
qualify count(distinct execution_status)
over (partition by runid, LAST_EXECUTION_DATE) = 1
)
where execution_status = 'SUCCESS'
qualify row_number()
over (partition by LAST_EXECUTION_DATE order by LAST_EXECUTION_DATETIME desc) = 1;
will remove any runid's that have more than one different status. If two success or two fails rows are possible and invalid remove the DISTINCT
then we remove any row that was not a SUCCESS
then we rank the rows, of the same LAST_EXECUTION_DATE batch by the LAST_EXECUTION_DATETIME to take the last valid one.
thus with your updated data:
with EXECUTION_CONTROL(runid, object_name, last_execution_datetime, last_exection_date, execution_status) as (
select column1, column2, to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss'), to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss')::date, column4 from values
(1,'HR_JOB','4/19/2022 22:58:48','SUCCESS'),
(2,'HR_JOB','4/19/2022 23:30:42','SUCCESS'),
(3,'HR_JOB','4/19/2022 23:32:23','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','FAILED'),
(5,'HR_JOB','4/19/2022 23:37:58','FAILED')
)
select *
from (
select *
from EXECUTION_CONTROL
where object_name = 'HR_JOB'
qualify count(distinct execution_status)
over (partition by runid, last_exection_date) = 1
)
where execution_status = 'SUCCESS'
qualify row_number()
over (partition by last_exection_date order by last_execution_datetime desc) = 1;
gives:
RUNID
OBJECT_NAME
LAST_EXECUTION_DATETIME
LAST_EXECTION_DATE
EXECUTION_STATUS
3
HR_JOB
2022-04-19 23:32:23.000
2022-04-19
SUCCESS
And if you don't have any batch ID (I was using LAST_EXECTION_DATE sigh type), you can use:
with EXECUTION_CONTROL(runid, object_name, last_execution_datetime, execution_status) as (
select column1, column2, to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss'), column4 from values
(1,'HR_JOB','4/19/2022 22:58:48','SUCCESS'),
(2,'HR_JOB','4/19/2022 23:30:42','SUCCESS'),
(3,'HR_JOB','4/19/2022 23:32:23','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','FAILED'),
(5,'HR_JOB','4/19/2022 23:37:58','FAILED')
)
select *
from (
select *
from EXECUTION_CONTROL
where object_name = 'HR_JOB'
qualify count(distinct execution_status)
over (partition by runid) = 1
)
where execution_status = 'SUCCESS'
qualify row_number()
over (order by last_execution_datetime desc) = 1;

Related

Second server UNION ALL displays same output data from first query with equal timestamp

After running my UNION ALL query I have the same output data on the second query with equal timestamp output data. How I could gather the same output data if the area2 server has different vendors with timestamps, could the output same data be due to the order on the bottom of the query, I have tried the following query.
Current table data from both servers, AREA1 with AREA2.
QUERY
DECLARE #Invoice_Date SMALLINT;
SET #Invoice_Date = 2020;
SELECT DISTINCT 'AREA1' AS 'Server',
*
FROM (
SELECT Name,
Vendor,
Invoice_Date,
count(*) Count_InvoiceNo,
rank() OVER (
PARTITION BY Name ORDER BY count(*) DESC
) rn
FROM dbo.Invoices
WHERE Invoice_Date >= '2020-01-01'
GROUP BY Name,
Vendor,
Invoice_Date
) t
WHERE rn = 1
AND InvDate >= DATEADD(MONTH, - 12, GETDATE())
UNION ALL
SELECT DISTINCT 'AREA2' AS 'Server',
*
FROM (
SELECT Name,
Vendor,
Invoice_Date,
count(*) Count_InvoiceNo,
rank() OVER (
PARTITION BY Name ORDER BY count(*) DESC
) rn
FROM dbo.Invoices
WHERE Invoice_Date >= '2020-01-01'
GROUP BY Name,
Vendor,
Invoice_Date
) t
WHERE rn = 1
AND Invoice_Date >= DATEADD(MONTH, - 12, GETDATE())
ORDER BY SERVER,
Invoice_Date

how to select last rows where one certain value exist but not if it's in between

I have this table. With case#, Linenumber and code#.
case# Linenumber Code#
99L1HV 1 1510
99L1HV 2 4320
99PX58 1 1510
99PX58 2 4320
99PX58 3 4500
99PX59 1 1510
99PX59 2 918
99PX59 3 4320
How can I get the records with the last LineNumber per case# where code = 4320
The output should be like this
case# Linenumber Code
99L1HV 2 4320
99PX59 3 4320

Using ROW_NUMBER to get a number that's in the opposite order of the linenumber per case#.
Then the last lines will have RN = 1
SELECT [case#], Linenumber, [Code#]
FROM
(
SELECT [case#], Linenumber, [Code#],
ROW_NUMBER() OVER (PARTITION BY [case#] ORDER BY Linenumber DESC) AS RN
FROM yourtable
) q
WHERE RN = 1
AND [Code#] = 4320
ORDER BY [case#];
Or the more concise version.
Using a TOP 1 WITH TIES in combination with an ORDER BY ROW_NUMBER.
SELECT *
FROM
(
SELECT TOP 1 WITH TIES [case#], Linenumber, [Code#]
FROM yourtable
ORDER BY ROW_NUMBER() OVER (PARTITION BY [case#] ORDER BY Linenumber DESC)
) q
WHERE [Code#] = 4320
ORDER BY [case#];

cte is to generate a running number by case#. rn = 1 will be the last row for each case#
; with cte as
(
select *, rn = row_number() over (partition by [case#] order by linenumber desc)
from yourtable
)
select *
from cte
where rn = 1
and [code#] = 4320

declare #t table (
CaseNumber varchar(10),
LineNumber int,
CodeNumber int
);
-- Filling the table with data, skipped
select t.*
from #t t
where t.CodeNumber = 4320
and not exists (
select 0 from #t x
where x.CaseNumber = t.CaseNumber
and x.LineNumber > t.LineNumber
);

with cte as
(select case#, max(linenumber)
from source_table
group by case#)
select t1.*
from source_table t1 inner join cte t2
on t1.case# = t2.case# and t1.linenumber = t2.linenumber
where t1.Code# = 4320

Get running balance of a work center using partition

I am writing a script that will run on SQL Server 2014.
I have a table of transactions recording transfers from one work center to another. The simplified table is below:
DECLARE #transactionTable TABLE (wono varchar(10),transferDate date
,fromWC varchar(10),toWC varchar(10),qty float)
INSERT INTO #transactionTable
SELECT '0000000123','5/10/2018','STAG','PP-B',10
UNION
SELECT '0000000123','5/11/2018','PP-B','PP-T',5
UNION
SELECT '0000000123','5/11/2018','PP-T','TEST',3
UNION
SELECT '0000000123','5/12/2018','PP-B','PP-T',5
UNION
SELECT '0000000123','5/12/2018','PP-T','TEST',5
UNION
SELECT '0000000123','5/13/2018','PP-T','TEST',2
UNION
SELECT '0000000123','5/13/2018','TEST','FGI',8
UNION
SELECT '0000000123','5/14/2018','TEST','FGI',2
SELECT *,
fromTotal = -SUM(qty) OVER(PARTITION BY fromWC ORDER BY wono, transferdate, fromWC),
toTotal = SUM(qty) OVER(PARTITION BY toWC ORDER BY wono, transferdate, toWC)
FROM #transactionTable
ORDER BY wono, transferDate, fromWC
I want to get a running balance of the fromWC and toWC after each transaction.
Given the records above, the end result should be this:
I believe it is possible to use SUM(qty) OVER(PARTITION BY..., but I am not sure how to write the statement. When I try to get the increase and decrease, each line always results in 0.
How do I write the SUM statement to achieve the desired results?
UPDATE
This image shows each transaction, the resulting WC qty, and highlights the corresponding from and to work centers for each transaction.
For example, looking at the second record on 5/11, 3 were transferred from PP-T to TEST. After the transaction, there were 5 in PP-B, 2 in PP-T, and 3 in TEST.

I can get close, except starting balances:
SELECT wono, transferDate, fromWC, toWC, qty,
SUM( CASE WHEN WC = fromWC THEN RunningTotal ELSE 0 END ) AS FromQTY,
SUM( CASE WHEN WC = toWC THEN RunningTotal ELSE 0 END ) AS ToQTY
FROM( -- b
SELECT *, SUM(Newqty) OVER(PARTITION BY WC ORDER BY wono,transferdate, fromWC, toWC) AS RunningTotal
FROM(-- a
SELECT wono, transferDate, fromWC, toWC, fromWC AS WC, qty, -qty AS Newqty, 'From' AS RecType
FROM #transactionTable
UNION ALL
SELECT wono, transferDate, fromWC, toWC, toWC AS WC, qty, qty AS Newqty, 'To' AS RecType
FROM #transactionTable
) AS a
) AS b
GROUP BY wono, transferDate, fromWC, toWC, qty
My logic assumes that all balances start at 0, therefore "STAG" balance will be -10.
How the query works:
"Unpivot" the input record set into "From" and "To" records with quantities negated for "From" records.
Calculate running totals for each "WC".
Combine "Unpivoted" records back into original shape
Solution 2
WITH CTE
AS(
SELECT *,
ROW_NUMBER() OVER( ORDER BY wono, transferDate, fromWC, toWC ) AS Sequence
FROM #transactionTable
),
CTE2
AS(
SELECT *,
fromTotal = -SUM(qty) OVER(PARTITION BY fromWC ORDER BY Sequence),
toTotal = SUM(qty) OVER(PARTITION BY toWC ORDER BY Sequence)
FROM CTE
)
SELECT a.Sequence, b.Sequence, c.Sequence, a.wono, a.transferDate, a.fromWC, a.toWC, a.qty, a.fromTotal + ISNULL( b.toTotal, 0 ) AS FromTotal, a.toTotal + ISNULL( c.fromTotal, 0 ) AS ToTotal
FROM CTE2 AS a
OUTER APPLY( SELECT TOP 1 * FROM CTE2 WHERE wono = a.wono AND Sequence < a.Sequence AND toWC = a.fromWC ORDER BY Sequence DESC ) AS b
OUTER APPLY( SELECT TOP 1 * FROM CTE2 WHERE wono = a.wono AND Sequence < a.Sequence AND fromWC = a.toWC ORDER BY Sequence DESC ) AS c
ORDER BY a.Sequence
Note: This solution would benefit greatly from an "ID" column, that mirrors transaction order OR at least you will need an index on wono, transferDate, fromWC, toWC

SQL Server 2008 R2 GROUP BY or OVER

I have this table:
ID COLOR TYPE DATE
-------------------------------
1 blue A 2012.02.05
2 white V 2010.10.23
3 white V 2014.03.05
4 black S 2013.02.14
I'd like to select only the ID, but in case of 2nd and 3rd rows I want to select the 3rd row because of its latest DATE value.
I have tried this query but it gives back all the two rows:
SELECT
ID, MAX(DATE) OVER(PARTITION BY COLOR, TYPE)
FROM
TABLE
WHERE
...
How can I select just one column value while I group the rows by other columns, please?

;WITH CTE AS
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY COLOR,[TYPE] ORDER BY [DATE] DESC) rn
FROM TableName
)
SELECT ID
,COLOR
,[TYPE]
,[DATE]
FROM CTE
WHERE rn = 1
OR
SELECT ID
,COLOR
,[TYPE]
,[DATE]
FROM
(
SELECT * , ROW_NUMBER() OVER (PARTITION BY COLOR,[TYPE] ORDER BY [DATE] DESC) rn
FROM TableName
) A
WHERE rn = 1

SQL Server - Select top 2 rows

I'm attempting to write a query that will return
The most recent AccountDate with a record of 0 per locationID
Then the second most recent AccountDate per locationID. The record can be either 1 or 0.
If there are two AccountDates with the same date then return the most recent AccountDate based on DateAccountLoaded
How ever my solution doesn't look very elegant. Has anyone got a better way of achieving this.
Please see below my solution
CREATE TABLE [dbo].[TopTwoKeyed](
ID INT IDENTITY(1,1) PRIMARY KEY(ID),
[LocationID] [int] NULL,
[AccountDate] [date] NULL,
[Record] [tinyint] NULL,
[DateAccountLoaded] [date] NULL
)
INSERT INTO [dbo].[TopTwoKeyed] (
[LocationID],
AccountDate,
Record,
DateAccountLoaded
)
VALUES(1,'2009-10-31',0,'2011-03-23'),
(1,'2008-10-31',1,'2011-03-23'),
(1,'2008-10-31',0,'2010-03-22'),
(1,'2008-10-31',1,'2009-03-23'),
(1,'2011-10-31',1,'2010-03-22'),
(1,'2009-10-31',0,'2010-03-23'),
(2,'2011-10-31',0,'2010-03-23'),
(2,'2010-10-31',0,'2010-03-23'),
(2,'2010-10-31',1,'2010-03-23'),
(2,'2010-10-31',1,'2009-03-23'),
(3,'2010-10-31',0,'2010-03-23'),
(3,'2009-10-31',0,'2010-03-23'),
(3,'2008-10-31',1,'2010-03-23')
-- Get the most recent Account Date per locationID which has a record type of 0
SELECT f.LocationID
,f.AccountDate
,f.DateAccountLoaded
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
,DateAccountLoaded AS DateAccountLoaded
FROM [dbo].[TopTwoKeyed]
WHERE Record = 0
) f
WHERE f.RowNumber = 1
UNION ALL
SELECT ff.LocationID
,ff.AccountDate
,ff.DateAccountLoaded
FROM (
-- Get the SECOND most recent AccountDate. Can be either Record 0 or 1.
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
,DateAccountLoaded 'DateAccountLoaded'
FROM [dbo].[TopTwoKeyed] tt
WHERE EXISTS
(
-- Same query as top of UNION. Get the most recent Account Date per locationID which has a record type of 0
SELECT 1
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
FROM [dbo].[TopTwoKeyed]
WHERE Record = 0
) f
WHERE f.RowNumber = 1
AND tt.LocationID = f.LocationID
AND tt.AccountDate < f.AccountDate
)
) ff
WHERE ff.RowNumber = 1
-- DROP TABLE [dbo].[TopTwoKeyed]

You could use a row_number subquery to find the most recent account date. Then you can outer apply to search for the next most recent account date:
select MostRecent.LocationID
, MostRecent.AccountDate
, SecondRecent.AccountDate
from (
select row_number() over (partition by LocationID order by
AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed
where Record = 0
) MostRecent
outer apply
(
select top 1 *
from TopTwoKeyed
where Record in (0,1)
and LocationID = MostRecent.LocationID
and AccountDate < MostRecent.AccountDate
order by
AccountDate desc
, DateAccountLoaded desc
) SecondRecent
where MostRecent.rn = 1
EDIT: To place the rows below eachother, you probably have to use a union. A single row_number can't work because the second row has different criterium for the Record column.
; with Rec0 as
(
select ROW_NUMBER() over (partition by LocationID
order by AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed
where Record = 0
)
, Rec01 as
(
select ROW_NUMBER() over (partition by LocationID
order by AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed t1
where Record in (0,1)
and not exists
(
select *
from Rec0 t2
where t2.rn = 1
and t1.LocationID = t2.LocationID
and t2.AccountDate < t1.AccountDate
)
)
select *
from Rec0
where rn = 1
union all
select *
from Rec01
where rn = 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Using both max and count aggregates in Snowflake - snowflake-cloud-data-platform

Related

Second server UNION ALL displays same output data from first query with equal timestamp

how to select last rows where one certain value exist but not if it's in between

Get running balance of a work center using partition

SQL Server 2008 R2 GROUP BY or OVER

SQL Server - Select top 2 rows

Categories

Resources