I have two tables.
Repair -
RepairID, EquipID, RepairDate
Events -
EventID, EquipID, ReturnDate, CustomerID
I am trying to determine who the last customer was that returned the equipment, before the repair was done. Equipment could have been returned multiple times in the past, but I only need to track the very last customer that returned it.
Final result will include CustomerID, EquipID, ReturnDate, RepairDate
My SQLFiddle for a sample DDL and query:
http://sqlfiddle.com/#!3/f2691/6/0
This returns all the customers, not only the very last one that returned.
Does that return what you expect?
Option 1:
SELECT E.EquipID,
E.CustomerID,
max(E.ReturnDate) MAXRETURN
FROM Repair R
CROSS APPLY (
SELECT *,
row_number() OVER (
PARTITION BY EquipID ORDER BY ReturnDate DESC
) AS RN
FROM Event E
WHERE R.RepairDate > E.ReturnDate
AND E.EquipID = R.EquipID
) E
WHERE E.RN = 1
GROUP BY E.EquipID,
E.CustomerID
Option 2:
SELECT E.EquipID,
E.CustomerID,
max(E.ReturnDate) MAXRETURN
FROM (
SELECT E.*,
row_number() OVER (
PARTITION BY E.EquipID ORDER BY E.ReturnDate DESC
) AS RN
FROM Event E
INNER JOIN Repair R
ON E.EquipID = R.EquipID
WHERE R.RepairDate > E.ReturnDate
) E
WHERE E.RN = 1
GROUP BY E.EquipID,
E.CustomerID
Related
I am getting the following error in dbt, using snowflake and I can't figure out what the issue is.
Database Error in model stg_bank_balances2 (models/staging/cas/vpapay/finance/stg_bank_balances.sql)
000603 (XX000): SQL execution internal error:
Processing aborted due to error 300010:2077141494; incident 5570604.
compiled SQL at target/run/cas_datawarehouse/staging/cas/vpapay/finance/stg_bank_balances.sql
I have a staging table that is running 100% when I open the file and run it manually.
However when I run it with
dbt run --models +stg_bank_balances
then I get this error... any ideas?
Compiled SQL code:
with
__dbt__CTE__dw_bank_balance_base as (
with
source as (select * from CAS_RAW.BANK_BALANCE_INFORMATION_FOR_DATAWAREHOUSE.FACILITY_DATA),
renamed as (
select
to_date(date) as date
,FACILITY_BALANCE as facility_balance
,FACILITY_LIMIT as facility_limit
,LVR as loan_to_value_ratio_expected
,UNENCUMBERED_CASH as unencumbered_cash
from source
)
select *
from renamed
),data_sheet as ( select *
,row_number() over (order by date) as row_num
from __dbt__CTE__dw_bank_balance_base
),
calendar as ( select *
from ANALYTICS.dev_avanwyk.stg_calendar
where date >= (select min(date) from data_sheet)
and date <= current_date()
),
creating_leads as (
select a.*
,a.date as date_from
,case
when b.date is null then current_date()
else b.date
end as date_to
from data_sheet a
left join data_sheet b on a.row_num = b.row_num-1
),
renamed as (
select cal.date as cal_date
,ds.date_from, ds.date_to
,ds.facility_balance
,ds.facility_limit
,ds.loan_to_value_ratio_expected
,ds.unencumbered_cash
from calendar cal
left join creating_leads ds on
ds.date_from <= cal.date
and
cal.date < ds.date_to
)
select *
from renamed
Your cte names are the same, try using in your models unique cte (common table expression) names. You can see you are referencing twice a cte called "renamed". Try changing this and write back what is Snowflake spitting out.
I think Mincho is right.
The first thing to note is that this is a Database Error (docs) — this means that Snowflake is returning the error, and dbt is just passing it on.
Here, Snowflake is having difficulty because you have two CTEs (common table expressions) with the same name — renamed. It looks like you have an upstream model named dw_bank_balance_base that is ephemeral, so it's being injected as a CTE.
You can:
Rename one of your renamed CTEs to something else
Make dw_bank_balance_base a view or table by changing the materialized config
Let me know if that fixes it!
Found the issue - dbt doesn't want me joining a table to itself.
Hence I created another CTE with the prev_row_num = row_num -1 to facilitate this.
with
__dbt__CTE__dw_bank_balance_base as (
with
source as (select * from CAS_RAW.BANK_BALANCE_INFORMATION_FOR_DATAWAREHOUSE.FACILITY_DATA),
renamed as (
select
to_date(date) as date
,FACILITY_BALANCE as facility_balance
,FACILITY_LIMIT as facility_limit
,LVR as loan_to_value_ratio_expected
,UNENCUMBERED_CASH as unencumbered_cash
from source
)
select *
from renamed
),data_sheet as ( select *
,row_number() over (order by date) as row_num
,(row_number() over (order by date))-1 as prev_row_num
from __dbt__CTE__dw_bank_balance_base
),
data_sheet1 as ( select *
,(row_number() over (order by date))-1 as prev_row_num
from __dbt__CTE__dw_bank_balance_base
),
calendar as ( select *
from ANALYTICS.dev_avanwyk.stg_calendar
where date >= (select min(date) from data_sheet)
and date <= current_date()
),
creating_leads as (
select
a.date as date_from
,a.facility_balance
,a.facility_limit
,a.loan_to_value_ratio_expected
,a.unencumbered_cash
,case
when b.date is null then current_date()
else b.date
end as date_to
from data_sheet a
left join data_sheet1 b on a.row_num = b.prev_row_num
),
staging as (
select cal.date as cal_date
,ds.date_from
, ds.date_to
,ds.facility_balance
,ds.facility_limit
,ds.loan_to_value_ratio_expected
,ds.unencumbered_cash
from calendar cal
left join creating_leads ds on
ds.date_from <= cal.date
and
cal.date < ds.date_to
)
select *
from staging
I don't know exactly where I'm wrong, but I need a list of all the workers who are currently at work (for the current day), this is my sql query:
SELECT
zp.ID,
zp.USER_ID,
zp.Arrive,
zp.Deppart,
zp.DATUM
FROM time_recording as zp
INNER JOIN personal AS a on zp.USER_ID, = zp.USER_ID,
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
ORDER BY zp.ID DESC
this is what the data looks like with my query:
For me the question is, how can I correct my query so that I only get the last Arrive time for the current day for each user?
In this case to get only these values:
Try this below script using ROW_NUMBER as below-
SELECT * FROM
(
SELECT zp.ID, zp.USER_ID, zp.Arrive, zp.Deppart, zp.DATUM,
ROW_NMBER() OVER(PARTITION BY zp.User_id ORDER BY zp.Arrive DESC) RN
FROM time_recording as zp
INNER JOIN personal AS a
on zp.USER_ID = zp.USER_ID
-- You need to adjust above join relation as both goes to same table
-- In addition, as you are selecting nothing from table personal, you can drop the total JOIN part
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
)A
WHERE RN =1
you can try this:
SELECT DISTINCT
USER_ID,
LAR.LastArrive
FROM time_recording as tr
CROSS APPLY (
SELECT
MAX(Arrive) as LastArrive
FROM time_recording as ta
WHERE
tr.USER_ID = ta.USER_ID AND
ta.Arrive IS NOT NULL
) as LAR
Id Mshp_Id Action
1 9029 Register
2 9029 Create CV
3 8476 Register
4 8476 Create CV
5 8476 JOB SEARCH
I want to return the two membership ID's and their latest action.
so what would be left is ID 2 AND 5 ONLY.
If you are using SQL Server 2012+, you can use LAST_VALUE
SELECT ID,
,mshp_id
,action
FROM (
SELECT *,LAST_VALUE(id) OVER (PARTITION BY mshp_id
ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) last_val
FROM YOUR_TABLE
) a
WHERE id = last_val
ORDER BY ID
Check Demo here
Output
Last action per member can be fetched through the following ways
Solution 1:
select Id, Mshp_Id, Action from (
select *, row_number() over (partition by Mshp_Id order by id desc) r from user_action
) a
where a.r = 1
order by id
Solution 2
select u.* from user_action u
join (select Mshp_Id, max(id) id from user_action
group by Mshp_Id ) a
on a.Mshp_Id = u.Mshp_Id and a.id = u.id
order by u.id
Good luck with your work !
I'm trying to get some individual stats from a score keeping system. In essence, teams are scheduled into matches
Match
---------
Matchid (uniqueidentifier)
SessionId (int)
WeekNum (int)
Those matches are broken into sets, where two particular players from a team play each other
MatchSet
-----------
SetId (int)
Matchid (uniqueidentifier)
HomePlayer (int)
AwayPlayer (int)
WinningPlayer (int)
LosingPlayer (int)
WinningPoints (int)
LosingPoints (int)
MatchEndTime (datetime)
In order to allow for player absences, players are allowed to play twice per Match. The points from each set will count for their team totals, but for the individual awards, only the first time that a player plays should be counted.
I had been trying to make use of a CTE to number the rows
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MatchId ORDER BY MatchEndTime) AS rn
FROM
(SELECT
SetId, MS.MatchId, WinningPlayer, LosingPlayer,
HomePlayer, AwayPlayer, WinningPoints, LosingPoints, MatchEndTime
FROM
MatchSet MS
INNER JOIN
[Match] M ON M.MatchId = MS.MatchId AND M.[Session] = #SessionId
)
but I'm struggling as the player could be either the home player or away player in a given set (also, could either be the winner or the loser)
Ideally, this result could then be joined based on either WinningPlayer or LosingPlayer back to the players table, which would let me get a list of individual standings
I think the first step is to write a couple CTEs that get the data into a structure where you can evaluate player points regardless of win/loss. Here's a possible start:
;with PlayersPoints as
(
select m.MatchId
,m.SessionId
,m.WeekNum
,ms.SetId
,ms.WinningPlayer as PlayerId
,ms.WinningPoints as Points
,'W' as Outcome
,ms.MatchEndTime
from MatchSet ms
join Match m on on ms.MatchId = m.MatchId
and m.SessionId = #SessionId
union all
select m.MatchId
,m.SessionId
,m.WeekNum
,ms.SetId
,ms.LosingPlayer as PlayerId
,ms.LosingPoints as Points
,'L' as Outcome
,ms.MatchEndTime
from MatchSet ms
join Match m on on ms.MatchId = m.MatchId
and m.SessionId = #SessionId
)
, PlayerMatch as
(
select SetId
,WeekNum
,MatchId
,PlayerId
,row_number() over (partition by PlayerId, WeekNum order by MatchEndTime) as PlayerMatchSequence
from PlayerPoints
)
....
The first CTE pulls out the points for each player, and the second CTE identifies which match it is. So for calculating individual points, you'd look for PlayerMatchSequence = 1.
Perhaps you could virtualize a normalized view of your data and key off of it instead of the MatchSet table.
;WITH TeamPlayerMatch AS
(
SELECT TeamID,PlayerID=WinnningPlayer,MatchID,Points = MS.WinningPoints, IsWinner=1 FROM MatchSet MS INNER JOIN TeamPlayer T ON T.PlayerID=HomePlayer
UNION ALL
SELECT TeamID,PlayerID=LosingPlayer,MatchID,Points = MS.LosingPoints, IsWinner=0 FROM MatchSet MS INNER JOIN TeamPlayer T ON T.PlayerID=AwayPlayer
)
,cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MatchId ORDER BY MatchEndTime) AS rn
FROM
(SELECT
SetId, MS.MatchId, PlayerID, TeamID, Points, MatchEndTime, IsWinner
FROM
TeamPlayerMatch MS
INNER JOIN
[Match] M ON M.MatchId = MS.MatchId AND M.[Session] = #SessionId
WHERE
IsWinner=1
)
I have a history table containing a snapshot of each time a record is changed. I'm trying to return a certain history row with the original captured date. I am currently using this at the moment:
select
s.Description,
h.CaptureDate OriginalCaptureDate
from
HistoryStock s
left join
( select
StockId,
CaptureDate
from
HistoryStock
where
HistoryStockId in ( select MIN(HistoryStockId) from HistoryStock group by StockId )
) h on s.StockId = h.StockId
where
s.HistoryStockId = #HistoryStockId
This works but with 1 Million records its on the slow side and I'm not sure how to optimize this query.
How can this query be optimized?
UPDATE:
WITH OriginalStock (StockId, HistoryStockId)
AS (
SELECT StockId, min(HistoryStockId)
from HistoryStock group by StockId
),
OriginalCaptureDate (StockId, OriginalCaptureDate)
As (
SELECT h.StockId, h.CaptureDate
from HistoryStock h join OriginalStock o on h.HistoryStockId = o.HistoryStockId
)
select
s.Description,
h.OriginalCaptureDate
from
HistoryStock s left join OriginalCaptureDate h on s.StockId = h.StockId
where
s.HistoryStockId = #HistoryStockId
I've update the code to use CTE but I'm not better off performance wise, only have small performance increase. Any ideas?
Just another note, I need to get to the first record in the history table for StockId and not the earliest Capture date.
I am not certain I understand entirely how the data works from your query but nesting queries like that is never good for performance in my opinion. You could try something along the lines of:
WITH MinCaptureDate (StockID, MinCaptureDate)
AS (
SELECT HS.StockID
,MIN(HS.CaptureDate) AS OriginalCaptureDate
FROM HistoryStock HS
GROUP BY
HS.Description
)
SELECT HS.Description
,MCD.OriginalCaptureDate
FROM HistoryStock HS
JOIN MinCaptureDate MCD
ON HS.StockID = MCD.StockID
WHERE HS.StockID = #StockID
I think i see what you are trying to achieve. You basically want the description of the specified history stock record, but you want the date associated with the first history record for the stock... so if your history table looks like this
StockId HistoryStockId CaptureDate Description
1 1 Apr 1 Desc 1
1 2 Apr 2 Desc 2
1 3 Apr 3 Desc 3
and you specify #HistoryStockId = 2, you want the following result
Description OriginalCaptureDate
Desc 2 Apr 1
I think the following query would give you a slightly better performance.
WITH OriginalStock (StockId, CaptureDate, RowNumber)
AS (
SELECT
StockId,
CaptureDate,
RowNumber = ROW_NUMBER() OVER (PARTITION BY StockId ORDER BY HistoryStockId ASC)
from HistoryStock
)
select
s.Description,
h.CaptureDate
from
HistoryStock s left join OriginalStock h on s.StockId = h.StockId and h.RowNumber = 1
where
s.HistoryStockId = #HistoryStockId