Getting sum value of a column having minimum date? - sql-server

I have data like this:
DATE ID weight
---- ---- -------
2017-04-25 11:05:42.273 247 0.418
2017-04-25 11:05:42.310 248 0.568
2017-04-25 13:57:55.327 247 0.418
2017-04-25 13:57:55.360 247 0.534
2017-04-25 13:57:55.397 248 0.568
2017-04-25 13:57:55.453 248 0.448
Now the requirement is I have to sum the gross weight based on barcodeid having minimum date.
here the output should be (0.418+0.568) because it has minimum date for barcode 247 and 248 respectively.

Use a window function to assign a row number starting over for each partition (ID)
then only sum rownumber 1 a CTE or subquery is needed since RN would not be available to limit by.
A partition is just a grouping of records in the columns specified. so ID where 247 and 248 are different groups and row #1 will be assigned to the earliest date in each partition. Then when we say where rn = 1 we only get weights for those earliest dates of each different ID!
WITH CTE AS (SELECT A.*
, Row_NUMBER() Over (Partition by ID order by Date asc) RN
FROM TABLE A)
SELECT Sum(Weight)
FROM CTE
WHERE RN = 1

Edit: Well I have egg on my face. Fixed
I believe a simple sub query will suffice
SELECT sum(weight)
FROM Table t1
WHERE DATE = (select min(DATE) from Table t2 where t1.ID = t2.ID group by id)

;With cte([DATE],ID,[weight])
AS
(
SELECT '2017-04-25 11:05:42.273', 247, 0.418 Union all
SELECT '2017-04-25 11:05:42.310', 248, 0.568 Union all
SELECT '2017-04-25 13:57:55.327', 247, 0.418 Union all
SELECT '2017-04-25 13:57:55.360', 247, 0.534 Union all
SELECT '2017-04-25 13:57:55.397', 248, 0.568 Union all
SELECT '2017-04-25 13:57:55.453', 248, 0.448
)
SELECT Sum(MinWeight) [SumOFweight] From
(
SELECT ID,DATE,Min([weight])OVER(Partition by DATE) AS MinWeight ,Row_NUMBER() Over (Partition by ID order by Date asc) RN From
(
SELECT DATE,ID,SUM([weight])[weight] FROM cte
GROUP by ID,DATE
)dt
)Final
where Final.RN=1
OutPut
SumOFweight
-------------
0.986

Related

Update gaps in sequential table

I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp

How do I add original amount of oldest transaction in an aggregate function?

I'm trying to find out how to include the original amount of the first transaction (oldest by Posted Date) to an aggregate query.
The following finds reversed transactions ..
SELECT DISTINCT
[Account], [Voucher],
[DocumentDate],
SUM([Amount])
FROM
MyTable
WHERE
[Account] = 'abc'
GROUP BY
[Account], [Voucher], [DocumentDate]
HAVING
SUM([Amount]) = 0
How would I include the amount in the results for the transaction with the oldest posted date for each record?
For example, using the following:
Account Voucher DocumentDate PostedDate Amount
---------------------------------------------------------
abc 1 01/01/2018 08/01/2018 100.00
abc 1 01/01/2018 15/01/2018 -100.00
The expected result would be:
Account Voucher DocumentDate OriginalAmount Sum(Amount) Records
-------------------------------------------------------------------------
abc 1 01/01/2018 100.00 0.00 2
One way to do it is using a cte with first_value, sum...over and count...over.
First, create and populate sample table (Please save us this step in your future questions)
DECLARE #T AS TABLE
(
Account char(3),
Voucher int,
DocumentDate date,
PostedDate date,
Amount numeric(5,2)
)
INSERT INTO #T VALUES
('abc', 1, '2018-01-01', '2018-01-08', 100),
('abc', 1, '2018-01-01', '2018-01-15', -100)
The cte:
;WITH CTE
AS
(
SELECT [Account],
[Voucher],
[DocumentDate],
FIRST_VALUE(Amount) OVER(PARTITION BY [Account], [Voucher], [DocumentDate] ORDER BY PostedDate) AS OriginalAmount,
SUM([Amount]) OVER(PARTITION BY [Account], [Voucher], [DocumentDate]) AS [Sum(Amount)],
COUNT(*) OVER(PARTITION BY [Account], [Voucher], [DocumentDate]) Records
FROM
#T
WHERE
[Account] = 'abc'
)
The query:
SELECT DISTINCT *
FROM CTE
WHERE [Sum(Amount)] = 0
Results:
Account Voucher DocumentDate OriginalAmount Sum(Amount) Records
abc 1 01.01.2018 00:00:00 100,00 0,00 2
See a live demo on rextester.
It seems straightforward ... am I missing something?
WITH CTE AS
(
SELECT
[Account],
[Voucher],
[DocumentDate],
ROW_NUMBER() OVER (PARTITION BY [Account],[Voucher] ORDER BY [DocumentDate]) RN,
[Amount]
FROM
MyTable
WHERE
[Account] = 'abc'
)
SELECT
[Account],
[Voucher],
[DocumentDate],
max(case when RN = 1 THEN [Amount] else null end) OriginalAmount,
sum([Amount]) SUM_Amount,
count(*) Records
from cte
GROUP BY
[Account], [Voucher], [DocumentDate]
HAVING
SUM([Amount]) = 0

SQL Server: fill a range with dates from overlapping intervals with priority

I need to fill the range from 2017-04-01 to 2017-04-30 with the data from this table, knowing that the highest priority records should prevail over those with lower priorities
id startValidity endValidity priority
-------------------------------------------
1004 2017-04-03 2017-04-30 1
1005 2017-04-10 2017-04-22 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-28 2
1008 2017-04-26 2017-04-28 3
In practice I would need to get a result like this:
id startValidity endValidity priority
--------------------------------------------
1004 2017-04-03 2017-04-09 1
1005 2017-04-10 2017-04-18 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-25 2
1008 2017-04-26 2017-04-28 3
1004 2017-04-29 2017-04-30 1
can't think of anything elegant or more efficient solution right now . . .
-- Sample Table
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
-- Query
; with
date_range as -- find the min and max date for generating list of dates
(
select start_date = min(startValidity), end_date = max(endValidty)
from #tbl
),
dates as -- gen the list of dates using recursive CTE
(
select rn = 1, date = start_date
from date_range
union all
select rn = rn + 1, date = dateadd(day, 1, d.date)
from dates d
where d.date < (select end_date from date_range)
),
cte as -- for each date, get the ID based on priority
(
select *, grp = row_number() over(order by id) - rn
from dates d
outer apply
(
select top 1 x.id, x.priority
from #tbl x
where x.startValidity <= d.date
and x.endValidty >= d.date
order by x.priority desc
) t
)
-- final result
select id, startValidity = min(date), endValidty = max(date), priority
from cte
group by grp, id, priority
order by startValidity
I do not understand the purpose of Calendar CTE or table.
So I am not using any REcursive CTE or calendar.
May be I hvn't understood the requirement completly.
Try this with diff sample data,
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
;With CTE as
(
select * ,ROW_NUMBER()over(order by startValidity)rn
from #tbl
)
,CTE1 as
(
select c.id,c.startvalidity,isnull(dateadd(day,-1, c1.startvalidity)
,c.endValidty) Endvalidity
,c.[priority],c.rn
from cte c
left join cte c1
on c.rn+1=c1.rn
)
select id,startvalidity,Endvalidity,priority from cte1
union ALL
select id,startvalidity,Endvalidity,priority from
(
select top 1 id,ca.startvalidity,ca.Endvalidity,priority from cte1
cross apply(
select top 1
dateadd(day,1,endvalidity) startvalidity
,dateadd(day,-1,dateadd(month, datediff(month,0,endvalidity)+1,0)) Endvalidity
from cte1
order by rn desc)CA
order by priority
)t4
--order by startvalidity --if req

How to retrieve the third most recent date grouped another column?

EDIT: I am using SQL Server 2005
So here's a tricky one. For audit purposes, we need to make 3 attempts to contact a customer. We can make more than 3 attempts to go above and beyond, but audit purposes I need to retrieve the date of the third most recent attempt for each customer.
In most cases, you just need the most recent period, so you can do something like..
SELECT CustID,MAX(AttemptDate) FROM Attempts GROUP BY CustID
.. but that obviously won't work in this scenario.
Say I have a table of attempts that occur which are tied to a customer.
CustID AttemptDate
123 2014-01-02
123 2014-01-05
123 2014-01-06 * retrieve this one
123 2014-01-07
123 2014-01-10
555 2014-02-01
555 2014-02-03
555 2014-02-07 * retrieve this one
555 2014-02-12
555 2014-02-20
Output:
CustID AttemptDate
123 2014-01-06
555 2014-02-07
Any tips for pulling this off?
;WITH t AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY CustId ORDER BY AttemptDate DESC) AS nth_most_recent
FROM MyTable
)
SELECT *
FROM t
WHERE nth_most_recent = 3
The ROW_NUMBER ranking function is your friend here:
WITH cte (CustId, AttemptDate, AttemptNumber) AS (
SELECT
CustId,
AttemptDate,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY AttemptDate DESC) AS AttemptNumber
FROM Attempts
)
SELECT
CustId,
AttemptDate
FROM cte
WHERE AttemptNumber = 3
Alternatively, if the common table expression syntax is causing problems, you could use a subquery:
SELECT
CustId,
AttemptDate
FROM (
SELECT
CustId,
AttemptDate,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY AttemptDate DESC) AS AttemptNumber
FROM Attempts
) sq
WHERE AttemptNumber = 3

Determine consecutive date count in SQL Server

I have some data that looks like this:
id date
--------------------------------
123 2013-04-08 00:00:00.000
123 2013-04-07 00:00:00.000
123 2013-04-06 00:00:00.000
123 2013-04-04 00:00:00.000
123 2013-04-03 00:00:00.000
I need to return a count of the most recent consecutive date streak for a given ID, which in this case would be 3 for id 123. I have no idea if this can be done in SQL. Any suggestions?
The way to do this is to subtract a sequence of numbers and take the difference. This is a constant for a sequence of dates. Here is an example to get the length of all sequences for an id:
select id, grp, count(*) as NumInSequence, min(date), max(date)
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
To get the longest one, I would use row_number() again:
select t.*
from (select id, grp, count(*) as NumInSequence,
min(date) as mindate, max(date) as maxdate,
row_number() over (partition by id order by count(*) desc) as seqnum
from (select t.*,
(date - row_number() over (partition by id order by date)) as grp
from data t
) t
group by id, grp
) t
where seqnum = 1

Resources