Update gaps in sequential table - sql-server

I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);

Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days

Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp

Related

Reporting Previous Records on SQL Server

I’m struggling a bit here. The data is fabricated, but the query concept is very real.
I need to select the Customer, Current Amount, Previous Amount, Sequence and Date
WHERE DATE < 1190105
AND the DATE/SEQ is the maximum date/seq prior to that date point grouping by customer.
I’ve spent quite a few days trying all sorts of things using HAVING, nested select to try and obtain the max-date/amount and min-date/amount by customer and can’t quite get my head around it. I’m sure it should be quite easy, but any help you can offer would be really appreciated.
Thanks
**SEQ DATE CUSTOMER AMOUNT**
1 1181225 Bob 400
2 1181226 Fred 300
3 1190101 Bob 100
4 1190104 Fred 500
5 1190104 George 200
6 1190105 Bob 150
7 1190106 Bob 200
8 1190110 Fred 160
9 1190110 Bob 300
10 1190112 Fred 400
Opt 1 use row number and lag functions
SELECT
ROW_NUMBER() OVER (Partition By CustomerID Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Lead(Amount) OVER (Partition By CustomerID, Order By [Date]) as PreviousAmount
FROM
YourTable
WHERE
[DATE] < 1190105
Opt use outer apply
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
Prev.Amount as PreviousAmount
FROM
YourTable T
OUTER APPLY (
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) Prev
WHERE
DATE < 1190105
Opt 3 use a correlated subquery
SELECT
ROW_NUMBER() OVER (Partition By Customer Order By [Date]) as Sec,
[Date],
Customer,
Amount as CurrentAmount,
(
SELECT TOP 1 Amount FROM YourTable
WHERE Customer = T.Customer AND [Date] < T.[Date]
ORDER BY [DATE] DESC
) as PreviousAmount
FROM YourTable
WHERE
DATE < 1190105
First restrict the rows with the date filter, then search for the max by customer.
Using GROUP BY:
DECLARE #FilterDate INT = 1190105
;WITH MaxDateByCustomer AS
(
SELECT
T.CUSTOMER,
MaxSEQ = MAX(T.SEQ)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
GROUP BY
T.CUSTOMER
)
SELECT
T.*
FROM
YourTable AS T
INNER JOIN MaxDateByCustomer AS M ON
T.CUSTOMER = M.CUSTOMER AND
T.SEQ = M.MaxSEQ
Using ROW_NUMBER window function:
DECLARE #FilterDate INT = 1190105
;WITH DateRankingByCustomer AS
(
SELECT
T.*,
DateRanking = ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER ORDER BY T.SEQ DESC)
FROM
YourTable AS T
WHERE
T.Date < #FilterDate
)
SELECT
D.*
FROM
DateRankingByCustomer AS D
WHERE
D.DateRanking = 1

UNION Returns NULL When First SELECT Returns Nothing

Hi I have a table: T1 that contains two columns Date and Price
T1
---------------------------
DATE | PRICE |
---------------------------
2018-07-25 |2.00 |
---------------------------
2018-06-20 |3.00 |
---------------------------
2017-05-10 |3.00 |
---------------------------
Here are my requirements:
If a user enters a date that is not in the DB I need to return the last price and date in the table T1.
If a user enters a date that is superior or inferior to one of the dates in the table T1 -- for example if a user enters '2017-05-09' which is not in the table; I have to return the next date above the given date. In this case'2017-05-10'
I am using UNION in my script but it returns empty when one of the SELECT statements returns empty.
I am using a CTE table:
DECLARE #DateEntered DATE
WITH HistoricalCTE (Date, Price, RowNumber) AS (
SELECT R.Date,
R.Price,
ROW_NUMBER() OVER (PARTITION BY R.Date, R.Price ORDER BY Date DESC)
FROM T1 R
WHERE Date = #DateEntered
UNION
SELECT R.Date,
R.Price,
ROW_NUMBER() OVER (PARTITION BY R.Date, R.Price ORDER BY Date DESC)
FROM T1 R
WHERE Date < #DateEntered
UNION
SELECT R.Date,
R.Price,
ROW_NUMBER() OVER (PARTITION BY R.Date, R.Price ORDER BY Date DESC)
FROM T1 R
WHERE Date > #DateEntered
)
The issue is when I enter superior to all the dates in the table T1, I get an empty result because the first select is returning empty. Any idea about how I would solve this?
You might be overcomplicating this. If I read your question correctly, we can just take the smallest value greater than the input, or if that doesn't exist, then just take the max of the table.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) rn
FROM T1
WHERE Date > #DateEntered
)
SELECT
CASE WHEN EXISTS (SELECT 1 FROM cte WHERE rn = 1)
THEN (SELECT Date FROM cte WHERE rn = 1)
ELSE (SELECT MAX(Date) FROM T1) END AS Date,
CASE WHEN EXISTS (SELECT 1 FROM cte WHERE rn = 1)
THEN (SELECT Price FROM cte WHERE rn = 1)
ELSE (SELECT Price FROM T1 WHERE Date = (SELECT MAX(Date) FROM T1)) END AS Price;
Demo
All the edge cases seem to be working in the above demo, and you may test any input date against your sample data.

T-SQL : how to select row with multiple conditions

I have below data set in SQL Server and I need to select the data with conditions in order below:
First, check to see if date_end is 1/1/2099, then select the row that has smallest days gap and skill_group is not SWAT for rows have same employee_id, in this case that is row 2.
Second, for rows that do not have 1/1/2099 date_end, select row that has most recent day date_end, in this case it's row 4.
ID employee_id last_name first_name date_start date_end skill_group
---------------------------------------------------------------------------
1 N05E0F Mike Pamela 12/19/2013 1/1/2099 SWAT
2 N05E0F Mike Pamela 9/16/2015 1/1/2099 Welcome Team
3 NSH8A David Smith 12/19/2013 9/16/2016 Unlicensed
4 NSH8A David Smith 8/16/2015 10/16/2016 CMT
There are many ways to do this. Here are some of them:
top with ties version:
select top 1 with ties
*
from tbl
where skill_group != 'SWAT'
order by
row_number() over (
partition by employee_id
order by date_end desc, datediff(day,date_start,date_end) asc
)
with common_table_expression as () using row_number() version:
with cte as (
select *
, rn = row_number() over (
partition by employee_id
order by date_end desc, datediff(day,date_start,date_end) asc
)
from tbl
where skill_group != 'SWAT'
)
select *
from cte
where rn = 1

how to to get first top 6 records indifferent columns T-sql?

I got a situation to display first top 6 records. first 3 records in FirstCol and next 3 in SecondCol. My query is like this:
select top 6 [EmpName]
from [Emp ]
order by [Salary] Desc
Result:
[EmpName]
----------------------
Sam
Pam
Oliver
Jam
Kim
Nixon
But I want the result to look like this:
FirstCol SecondCol
Sam Jam
Pam Kim
Oliver Nixon
; WITH TOP_3 AS
(
select TOP 3 [EmpName]
,ROW_NUMBER() OVER (ORDER BY [Salary] Desc) rn
from [Emp ]
order by [Salary] Desc
),
Other3 AS
(
SELECT [EmpName]
,ROW_NUMBER() OVER (ORDER BY [Salary] Desc) rn
FROM Employees
ORDER BY [Salary] DESC OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY
)
SELECT T3.[EmpName] , O3.[EmpName]
FROM TOP_3 T3 INNER JOIN Other3 O3
ON T3.RN = O3.RN
ORDER BY T3.RN ASC
You can do this using several windowing functions, this is kind of ugly but it will get you the result that you want:
;with data as
(
-- get your Top 6
select top 6 empname, salary
from emp
order by salary desc
),
buckets as
(
-- use NTILE to split the six rows into 2 buckets
select empname,
nt = ntile(2) over(order by salary desc),
salary
from data
)
select
FirstCol = max(case when nt = 1 then empname end),
SecondCol = max(case when nt = 2 then empname end)
from
(
-- create a row number for each item in the buckets to return multiple rows
select empname,
nt,
rn = row_number() over(partition by nt order by salary desc)
from buckets
) d
group by rn;
See SQL Fiddle with Demo. This uses the function NTILE, this takes your dataset of six rows and splits it into two buckets - 3 rows in bucket 1 and 3 rows in bucket 2. The (2) inside the NTILE is used to determine the number of buckets.
Next I used row_number() to create a unique value for each row within each bucket, this allows you to return multiple rows for each column.

Determine consecutive date count in SQL Server

I have some data that looks like this:
id date
--------------------------------
123 2013-04-08 00:00:00.000
123 2013-04-07 00:00:00.000
123 2013-04-06 00:00:00.000
123 2013-04-04 00:00:00.000
123 2013-04-03 00:00:00.000
I need to return a count of the most recent consecutive date streak for a given ID, which in this case would be 3 for id 123. I have no idea if this can be done in SQL. Any suggestions?
The way to do this is to subtract a sequence of numbers and take the difference. This is a constant for a sequence of dates. Here is an example to get the length of all sequences for an id:
select id, grp, count(*) as NumInSequence, min(date), max(date)
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
To get the longest one, I would use row_number() again:
select t.*
from (select id, grp, count(*) as NumInSequence,
min(date) as mindate, max(date) as maxdate,
row_number() over (partition by id order by count(*) desc) as seqnum
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
) t
where seqnum = 1

Resources