Merge rows based on date in SQL Server - sql-server

I want to display data based on start date and end date. a code can contain different dates. if any time intervel is continues then I need to merge that rows and display as single row
Here is sample data
Code Start_Date End_Date Volume
470 24-Oct-10 30-Oct-10 28
470 17-Oct-10 23-Oct-10 2
470 26-Sep-10 2-Oct-10 2
471 22-Aug-10 29-Aug-10 2
471 15-Aug-10 21-Aug-10 2
The output result I want is
Code Start_Date End_Date Volume
470 17-Oct-10 30-Oct-10 30
470 26-Sep-10 2-Oct-10 2
471 15-Aug-10 29-Aug-10 4
a code can have any no. of time intervels. Pls help. Thank you

Based on your sample data (which I've put in a table called Test), and assuming no overlaps:
;with Ranges as (
select Code,Start_Date,End_Date,Volume from Test
union all
select r.Code,r.Start_Date,t.End_Date,(r.Volume + t.Volume)
from
Ranges r
inner join
Test t
on
r.Code = t.Code and
DATEDIFF(day,r.End_Date,t.Start_Date) = 1
), ExtendedRanges as (
select Code,MIN(Start_Date) as Start_Date,End_Date,MAX(Volume) as Volume
from Ranges
group by Code,End_Date
)
select Code,Start_Date,MAX(End_Date),MAX(Volume)
from ExtendedRanges
group by Code,Start_Date
Explanation:
The Ranges CTE contains all rows from the original table (because some of them might be relevant) and all rows we can form by joining ranges together (both original ranges, and any intermediate ranges we construct - we're doing recursion here).
Then ExtendedRanges (poorly named) finds, for any particular End_Date, the earliest Start_Date that can reach it.
Finally, we query this second CTE, to find, for any particular Start_Date, the latest End_Date that is associated with it.
These two queries combine to basically filter the Ranges CTE down to "the widest possible Start_Date/End_Date pair" in each set of overlapping date ranges.
Sample data setup:
create table Test (
Code int not null,
Start_Date date not null,
End_Date date not null,
Volume int not null
)
insert into Test(Code, Start_Date, End_Date, Volume)
select 470,'24-Oct-10','30-Oct-10',28 union all
select 470,'17-Oct-10','23-Oct-10',2 union all
select 470,'26-Sep-10','2-Oct-10',2 union all
select 471,'22-Aug-10','29-Aug-10',2 union all
select 471,'15-Aug-10','21-Aug-10',2
go

if I understand your request, you're looking for something like:
select code, min(Start_date), max(end_date), sum(volume)
from yourtable
group by code

Related

SQL Server : Join If Between

I have 2 tables:
Query1: contains 3 columns, Due_Date, Received_Date, Diff
where Diff is the difference in the two dates in days
QueryHol with 2 columns, Date, Count
This has a list of dates and the count is set to 1 for everything. All these dates represent public holidays.
I want to be able to get the sum of QueryHol["Count"] if QueryHol["Date"] is between Query1["Due_Date"] and Query1["Received_Date"]
Result Wanted: a column joined onto Query1 to state how many public holidays fell into the date range so they can be subtracted from the Query1["Diff"] column to give a reflection of working days.
Because the 01-01-19 is a bank holiday i would want to minus that from the Diff to end up with results like below
Let me know if you require any more info.
Here's an option:
SELECT query1.due_date
, query1.received_date
, query1.diff
, queryhol.count
, COALESCE(query1.diff - queryhol.count, query1.diff) as DiffCount
FROM Query1
OUTER APPLY(
SELECT COUNT(*) AS count
FROM QueryHol
WHERE QueryHol.Date <= Query1.Received_Date
AND QueryHol.Date >= Query1.Due_Date
) AS queryhol
You may need to play around with the join condition - as it is assumes that the Received_Date is always later than the Due_Date which there is not enough data to know all of the use cases.
If I understand your problem, I think this is a possible solution:
select due_date,
receive_date,
diff,
(select sum(table2.count)
from table2
where table2.due_date between table1.due_date and table1.due_date) sum_holi,
table1.diff - (select sum(table2.count)
from table2
where table2.date between table1.due_date and table2.due_date) diff_holi
from table1
where [...] --here your conditions over table1.

Creating sequential date ranges for items in a queue

I have a table 'item_queue' containing, items, groups and a sequence number.
Each item is unique and is held against a group with a number indicating the sequence. The count is a total for that item e.g.
group_id|item_id|sequence_order_number|count
--------------------------------------------
A |123 |1 |20
A |124 |2 |30
B |125 |1 |10
Given this information I am trying to set up sequential start and end dates
The start datetime of the first item for a group is the current time, for example assume start of item 123 is '2019-04-04 12:00:00.000' then
end datetime would be start + (count * minutes) so '2019-04-04 12:20:00.000'
The start of item 124 would equal that end date as it is the next in the sequence for that group. the end is then calculated the same way to be '2019-04-04 12:50:00.000'
item 125 would start the time again at '2019-04-04 12:00:00.000' as it is in a different group
I have attempted a few ways to do this, and I think the answer is a recursive cte, but I can't wrap my head around it to make it work for one or multiple groups, my unsuccessful attempt for a single group:
;with cte as
(
select
group_id,
item_id,
count,
GETDATE() as start_datetime,
DATEADD(MINUTE, count, GETDATE()) as end_datetime,
iq.sequence_order_number
from item_queue iq
where iq.group_id = 'A'
union all
select
group_id,
item_id,
count,
cte.end_datetime,
DATEADD(MINUTE, count, cte2.end_datetime) as end_datetime,
iq.sequence_order_number
from item_queue iq
inner join cte
on cte.group_id = iq.group_id
and cte.sequence_order_number > iq.sequence_order_number
where iq.group_id = 'A'
)
select * from cte
I suspect the answer may involve a row number window something like
ROW_NUMBER() OVER (Partition By iq.group_id Order By iq.sequence_order_number ASC)
But I have had trouble using it recursively.
Using SQL server 2012, without the ability to upgrade this database.
The minutes you want to add are practically a cumulative sum. The sum() over() window function is available in 2012 and performs exactly that. Try:
select
*,
isnull(sum([count]) over
(
partition by group_id
order by item_id asc
rows between unbounded PRECEDING and 1 PRECEDING
)
,0) as cum_count_start,
sum([count]) over ( partition by group_id order by item_id asc ) as cum_count_end
from item_queue
You already know how to use dateadd after this point.
What the individual window function caluses do:
partition by group_id : Seperate (partition) the calculations for each group_id value subset
order by item_id asc : make a virtual sorting of the rows on which the window range will be applied
rows between.... : The actual window. For the start date, we want to consider all the lines from the start (thus unbounded preceding) to the previous one (thus 1 preceding), since you don't want the start date to include the current line's [count]. Note that ommitting this clause like we did on the cum_count_end is equivelant to rows between unbounded preceding and current row.
The isnull(...,0) is needed because for the first line of each group_id you want to add 0 to the start date, but the window function sees no rows and returns NULL, so we need to change this to 0.

how to get multiple min values from two SQL tables?

I have two tables, a Members table and a Plan table. They are structured as follows.
member start_date Mplan Pplan version start_dt end_dt
John 20120701 johnplan johnplan 1 20120601 20130531
John 20130201 johnplan johnplan 2 20130601 20140531
John 20130901 johnplan
John 20131201 johnplan
I need to update the start_date on the Members table to be the minimum value present for that member but within the same Plan version.
Example:
20130201 would be changed to 20120701 and 20131201 would change to 20130901.
Code:
UPDATE Members
SET start_date =(
SELECT MIN(start_date) FROM Members a
LEFT JOIN Plan ON Mplan = Pplan AND
start_date BETWEEN start_dt AND end_dt
WHERE member=a.member
AND start_date BETWEEN start_dt AND end_dt
)
Unfortunately this sets every single start_date to 19900101 aka the lowest value in the entire table for that column.
First you need to get the minimum start date of each member for a specific plan. The following will provide you that.
select MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan
The result will be something like this.
min_date member_name plan_name
2012-07-01 00:00:00.000 John johnplan1
2013-09-01 00:00:00.000 John johnplan2
Use this to update each member's start date for a plan with the lowest start date of the respective plan.
update members
set start_date= tbl.min_date from
(SELECT MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a
inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan) as tbl
where member=tbl.member_name and Mplan=tbl.plan_name
I created your 2 tables, members and plan, and tested this solution with sample data and it works. I hope it helps.
You really need to convert the dates to Datetime. You will have a greater precision, the possibility to store hours, days and minutes as well as access to date specific functions, international conversion and localization.
If your column is a Varchar(8), then it uses no less space than a Datetime column.
That said, what you are looking for is row_number().
Something like:
SELECT Member, MPlan, Start_Date, Row_Number() OVER (PARTITION BY Member, MPLan ORDER BY Start_Date) as Version
FROM Members
Could you try this ? I didn't test it.
With Member_start_dt as
(
select *, (select start_dt from Pplan where M.start_date <= start_dt AND M.start_date >= end_dt) as Pplan_date
from Members M
),
Member_by_plan as
(
select *, ROW_NUMBER () over (partition by Pplan_date order by start_date) num
from Member_start_dt
)
update M
Set M.start_date = MBP1.start_date
from Members M
inner join Member_by_plan MBP1 ON MBP1.member = M.Member AND num = 1
inner join Member_by_plan MBP2 ON MBP2.member = M.Member AND MBP2.Pplan_date = MBP1.Pplan_date AND MBP2.start_date = M.start_date

Find the date when a bit column toggled state

I have this requirement.
My table contains a series of rows with serialnos and several bit columns and date-time.
To Simplify I will focus on 1 bit column.In essence, I need to know the recent date that this bit was toggled.
Ex: The following table depicts the bit values for 7 serials for the latest 6 days (10 to 5).
SQl Fiddle schema + query
I have succesfully managed to get the result in a sample but is taking ages on the real table containing over 30 million records and approx 300K serial nos.
Pseudo -->
For each Serial:
Get (max Date) bit value as A (latest bit value ex 1)
Get (max Date) NOT A as B ( Find most recent date that was ex 0)
Get the (Min Date) > B
Group by SNO
I am sure an optimised approach exists.
For completeness the dataset contains rows that I need to filter out etc. However I can build and add these later when getting the basic executing more efficiently.
Tks for your time!
with cte as
(
select *, rn = ROW_NUMBER() OVER (ORDER BY sno)
from dbo.TestCape2
)
select MAX(y.Device_date) as MaxDate,
y.SNo
from cte x
inner join cte as y
on x.rn = y.rn + 1
and x.SNo = y.SNo
and x.Cape <> y.Cape
group by y.SNo
order by SNo;
And if you're using SQL-Server 2012 and up you can make use of LAG, which will take a look at the previous row.
select max(Device_date) as MaxDate,
SNo
from (
select SNo
,Device_date
,Cape
,LAG (Cape, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevCape
,LAG (Sno, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevSno
from dbo.TestCape2) t
where sno = PrevSno
and t.Cape <> t.PrevCape
group by sno
order by sno;

Flatten/merge overlapping time intervals

I have a 'Service' table with millions of rows. Each row corresponds to a service provided by a staff in a given date and time interval (Each row has a unique ID). There are cases where a staff might provide services in overlapping time frames. I need to write a query that merges overlapping time intervals and returns the data in the format shown below.
I tried grouping by StaffID and Date fields and getting the Min of BeginTime and Max of EndTime but that does not account for the non-overlapping time frames. How can I accomplish this? Again, the table contains several million records so a recursive CTE approach might have performance issues. Thanks in advance.
Service Table
ID StaffID Date BeginTime EndTime
1 101 2014-01-01 08:00 09:00
2 101 2014-01-01 08:30 09:30
3 101 2014-01-01 18:00 20:30
4 101 2014-01-01 19:00 21:00
Output
StaffID Date BeginTime EndTime
101 2014-01-01 08:00 09:30
101 2014-01-01 18:00 21:00
Here is another sample data set with a query proposed by a contributor.
http://sqlfiddle.com/#!6/bfbdc/3
The first two rows in the results set should be merged into one row (06:00-08:45) but it generates two rows (06:00-08:30 & 06:00-08:45)
I only came up with a CTE query as the problem is there may be a chain of overlapping times, e.g. record 1 overlaps with record 2, record 2 with record 3 and so on. This is hard to resolve without CTE or some other kind of loops, etc. Please give it a go anyway.
The first part of the CTE query gets the services that start a new group and are do not have the same starting time as some other service (I need to have just one record that starts a group). The second part gets those that start a group but there's more then one with the same start time - again, I need just one of them. The last part recursively builds up on the starting group, taking all overlapping services.
Here is SQLFiddle with more records added to demonstrate different kinds of overlapping and duplicate times.
I couldn't use ServiceID as it would have to be ordered in the same way as BeginTime.
;with flat as
(
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime <= S1.BeginTime and S2.EndTime <> S1.EndTime
and S2.EndTime > S1.BeginTime)
union all
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime = S1.BeginTime and S2.EndTime > S1.EndTime)
and not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime < S1.BeginTime
and S2.EndTime > S1.BeginTime)
union all
select S.StaffID, S.ServiceDate, S.BeginTime, S.EndTime, flat.groupid
from flat
inner join services S
on flat.StaffID = S.StaffID
and flat.ServiceDate = S.ServiceDate
and flat.EndTime > S.BeginTime
and flat.BeginTime < S.BeginTime and flat.EndTime < S.EndTime
)
select StaffID, ServiceDate, MIN(BeginTime) as begintime, MAX(EndTime) as endtime
from flat
group by StaffID, ServiceDate, groupid
order by StaffID, ServiceDate, begintime, endtime
Elsewhere I've answered a similar Date Packing question with
a geometric strategy. Namely, I interperet the date ranges
as a line, and utilize geometry::UnionAggregate to merge
the ranges.
Your question has two peculiarities though. First, it calls
for sql-server-2008. geometry::UnionAggregate is not then
avialable. However, download the microsoft library at
https://github.com/microsoft/SQLServerSpatialTools and load
it in as a clr assembly to your instance and you have it
available as dbo.GeometryUnionAggregate.
But the real peculiarity that has my interest is the concern
that you have several million rows to work with. So I thought
I'd repeat the strategy here but with an added technique to
improve it's performance. This technique will work well if
you have a lot of your StaffID/date subsets that are the same.
First, let's build a numbers table. Swap this out with your favorite
way to do it.
select i = row_number() over (order by (select null))
into #numbers
from #services; -- where i put your data
Then convert the dates to floats and use those floats to create
geometrical points.
These points can then be turned into lines via STUnion and STEnvelope.
With your ranges now represented as geometric lines, merge them via
UnionAggregate. The resulting geometry object 'lines' might contain
multiple lines. But any overlapping lines turn into one line.
select s.StaffID,
s.Date,
linesWKT = geometry::UnionAggregate(line).ToString()
-- If you have SQLSpatialTools installed then:
-- linesWKT = dbo.GeometryUnionAggregate(line).ToString()
into #aggregateRangesToGeo
from #services s
cross apply (select
beginTimeF = convert(float, convert(datetime,beginTime)),
endTimeF = convert(float, convert(datetime,endTime))
) prepare
cross apply (select
beginPt = geometry::Point(beginTimeF, 0, 0),
endPt = geometry::Point(endTimeF, 0, 0)
) pointify
cross apply (select
line = beginPt.STUnion(endPt).STEnvelope()
) lineify
group by s.StaffID,
s.Date;
You have one 'lines' object for each staffId/date combo. But depending
on your dataset, there may be many 'lines' objects that are the same
between these combos. This may very well be true if staff are expected
to follow a routine and data is recorded to the nearest whatever.
So get a distinct lising of 'lines' objects. This should improve
performance.
From this, extract the individual lines inside 'lines'. Envelope the lines,
which ensures that the lines are stored only as their endpoints. Read the
endpoint x values and convert them back to their time representations.
Keep the WKT representation to join it back to the combos later on.
select lns.linesWKT,
beginTime = convert(time, convert(datetime, ap.beginTime)),
endTime = convert(time, convert(datetime, ap.endTime))
into #parsedLines
from (select distinct linesWKT from #aggregateRangesToGeo) lns
cross apply (select
lines = geometry::STGeomFromText(linesWKT, 0)
) geo
join #numbers n on n.i between 1 and geo.lines.STNumGeometries()
cross apply (select
line = geo.lines.STGeometryN(n.i).STEnvelope()
) ln
cross apply (select
beginTime = ln.line.STPointN(1).STX,
endTime = ln.line.STPointN(3).STX
) ap;
Now just join your parsed data back to the StaffId/Date combos.
select ar.StaffID,
ar.Date,
pl.beginTime,
pl.endTime
from #aggregateRangesToGeo ar
join #parsedLines pl on ar.linesWKT = pl.linesWKT
order by ar.StaffID,
ar.Date,
pl.beginTime;

Resources