Combining rows with overlapping dates in T-SQL - sql-server

I have some data similar to the below:
Base data
Student Start Date End Date Course
John 01-Jan-20 30-Sep-20 Business
John 01-Jan-20 30-Dec-20 Psychology
John 01-Oct-20 NULL Music
Jack 01-Feb-20 30-Sep-20 Business
Jack 01-Apr-20 30-Nov-20 Music
I want to transform the data so I have a row for each student, for each time period, with a concatenated list of courses, i.e.
Target output
Student Start Date End Date Course
John 01-Jan-20 30-Sep-20 Business, Psychology
John 01-Oct-20 30-Dec-20 Psychology, Music
John 01-Jan-21 NULL Music
Jack 01-Feb-20 31-Mar-20 Business
Jack 01-Apr-20 30-Sep-20 Business, Music
Jack 01-Oct-20 30-Nov-20 Music
I have a script that works if the dates are identical, using STUFF on the course field and grouping on student/dates (code below). But I can't work out how to handle the overlapping dates?
Select Student,
Courses =
STUFF((select ',' + course
from Table1 b
where a.student = b.student
for XML PATH('')
),1,1,''
)
from table1 a
Group by student

This is a little long winded, as you need to get the groups for the dates. As the dates don't overlap, you then need to do a bit of elimination of some of the groupings too, so it takes a couple of sweeps.
I use CTEs to get the groups I need, and then use a subquery to string aggregate (on a more recent version of SQL Server you can use STRING_AGG and not need a second scan of the table). This ends up with this:
WITH YourTable AS(
SELECT *
FROM (VALUES('John',CONVERT(date,'01-Jan-20'),CONVERT(date,'30-Sep-20'),'Business'),
('John',CONVERT(date,'01-Jan-20'),CONVERT(date,'30-Dec-20'),'Psychology'),
('John',CONVERT(date,'01-Oct-20'),CONVERT(date,NULL),'Music'),
('Jack',CONVERT(date,'01-Feb-20'),CONVERT(date,'30-Sep-20'),'Business'),
('Jack',CONVERT(date,'01-Apr-20'),CONVERT(date,'30-Nov-20'),'Music'))V(Student,StartDate,EndDate,Course)),
Dates AS(
SELECT DISTINCT V.Student, V.[Date]
FROM YourTable YT
CROSS APPLY (VALUES(YT.Student,YT.StartDate),
(YT.Student,YT.EndDate)) V(Student,[Date])),
Islands AS(
SELECT *,
LEAD(ISNULL([Date],'99991231')) OVER (PARTITION BY Student ORDER BY ISNULL([Date],'99991231')) AS NextDate
FROM Dates
WHERE [Date] IS NOT NULL),
Groups AS(
SELECT I.Student,
I.Date AS StartDate,
CASE DATEPART(DAY,I.NextDate) WHEN 1 THEN DATEADD(DAY, -1, I.NextDate) ELSE I.NextDate END AS EndDate,
STUFF((SELECT ',' + YT.Course
FROM YourTable YT
WHERE YT.Student = I.Student
AND YT.StartDate <= I.[Date]
AND (YT.EndDate >= I.NextDate OR YT.EndDate IS NULL)
ORDER BY YT.Course
FOR XML PATH(''),TYPE).value('(./text())[1]','nvarchar(MAX)'),1,1,'') AS Courses
FROM Islands I)
SELECT Student,
StartDate,
EndDate,
Courses
FROM Groups
WHERE ([StartDate] != EndDate OR EndDate IS NULL)
AND Courses IS NOT NULL
ORDER BY Student DESC,
StartDate ASC;

Related

SQL Server Group rows with multiple occurences of Group BY columns

I am trying to summarize a dataset and get the minimum and maximum date for each group. However, a group can exist multiple times if there is a gap. Here is sample data:
CREATE TABLE temp (
id int,
FIRSTNAME nvarchar(50),
LASTNAME nvarchar(50),
STARTDATE datetime2(7),
ENDDATE datetime2(7)
)
INSERT into temp values(1,'JOHN','SMITH','2013-04-02','2013-05-31')
INSERT into temp values(2,'JOHN','SMITH','2013-05-31','2013-10-31')
INSERT into temp values(3,'JANE','DOE','2013-10-31','2016-07-19')
INSERT into temp values(4,'JANE','DOE','2016-07-19','2016-08-11')
INSERT into temp values(5,'JOHN','SMITH','2016-08-11','2017-02-01')
INSERT into temp values(6,'JOHN','SMITH','2017-02-01','9999-12-31')
I am looking to summarize the data as follows:
JOHN SMITH 2013-04-02 2013-10-31
JANE DOE 2013-10-31 2016-08-11
JOHN SMITH 2016-08-11 9999-12-31
A "group by" will combine the two John Smith records together with the incorrect min and max dates.
Any help is appreciated.
Thanks.
As JNevill pointed out, this is a classic Gaps and Islands problem. Below is one solution using Row_Number().
Select FirstName
,LastName
,StartDate=min(StartDate)
,EndDate =max(EndDate)
From (
Select *
,Grp = Row_Number() over (Order by ID) - Row_Number() over (Partition By FirstName,LastName Order by EndDate)
From Temp
) A
Group By FirstName,LastName,Grp
Order By min(StartDate)
Please try the following...
SELECT firstName,
lastName,
MIN( startDate ) AS earliestStartDate,
MAX( endDate ) AS latestEndDate
FROM temp
GROUP BY firstName,
lastName;
This statement will use the GROUP BY statement to group together the records based on firstName and lastName combinations. It will then return the firstName and lastName for each group as well as the earliest startDate for that group courtesy of the MIN() function and the latest endDate for that group courtesy of the MAX() function.
If you have any questions or comments, then please feel free to post a Comment accordingly.

T-SQL - Get last as-at date SUM(Quantity) was not negative

I am trying to find a way to get the last date by location and product a sum was positive. The only way i can think to do it is with a cursor, and if that's the case I may as well just do it in code. Before i go down that route, i was hoping someone may have a better idea?
Table:
Product, Date, Location, Quantity
The scenario is; I find the quantity by location and product at a particular date, if it is negative i need to get the sum and date when the group was last positive.
select
Product,
Location,
SUM(Quantity) Qty,
SUM(Value) Value
from
ProductTransactions PT
where
Date <= #AsAtDate
group by
Product,
Location
i am looking for the last date where the sum of the transactions previous to and including it are positive
Based on your revised question and your comment, here another solution I hope answers your question.
select Product, Location, max(Date) as Date
from (
select a.Product, a.Location, a.Date from ProductTransactions as a
join ProductTransactions as b
on a.Product = b.Product and a.Location = b.Location
where b.Date <= a.Date
group by a.Product, a.Location, a.Date
having sum(b.Value) >= 0
) as T
group by Product, Location
The subquery (table T) produces a list of {product, location, date} rows for which the sum of the values prior (and inclusive) is positive. From that set, we select the last date for each {product, location} pair.
This can be done in a set based way using windowed aggregates in order to construct the running total. Depending on the number of rows in the table this could be a bit slow but you can't really limit the time range going backwards as the last positive date is an unknown quantity.
I've used a CTE for convenience to construct the aggregated data set but converting that to a temp table should be faster. (CTEs get executed each time they are called whereas a temp table will only execute once.)
The basic theory is to construct the running totals for all of the previous days using the OVER clause to partition and order the SUM aggregates. This data set is then used and filtered to the expected date. When a row in that table has a quantity less than zero it is joined back to the aggregate data set for all previous days for that product and location where the quantity was greater than zero.
Since this may return multiple positive date rows the ROW_NUMBER() function is used to order the rows based on the date of the positive quantity day. This is done in descending order so that row number 1 is the most recent positive day. It isn't possible to use a simple MIN() here because the MIN([Date]) may not correspond to the MIN(Quantity).
WITH x AS (
SELECT [Date],
Product,
[Location],
SUM(Quantity) OVER (PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS Quantity,
SUM([Value]) OVER(PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS [Value]
FROM ProductTransactions
WHERE [Date] <= #AsAtDate
)
SELECT [Date], Product, [Location], Quantity, [Value], Positive_date, Positive_date_quantity
FROM (
SELECT x1.[Date], x1.Product, x1.[Location], x1.Quantity, x1.[Value],
x2.[Date] AS Positive_date, x2.[Quantity] AS Positive_date_quantity,
ROW_NUMBER() OVER (PARTITION BY x1.Product, x1.[Location] ORDER BY x2.[Date] DESC) AS Positive_date_row
FROM x AS x1
LEFT JOIN x AS x2 ON x1.Product=x2.Product AND x1.[Location]=x2.[Location]
AND x2.[Date]<x1.[Date] AND x1.Quantity<0 AND x2.Quantity>0
WHERE x1.[Date] = #AsAtDate
) AS y
WHERE Positive_date_row=1
Do you mean that you want to get the last date of positive quantity come to positive in group?
For example, If you are using SQL Server 2012+:
In following scenario, when the date going to 01/03/2017 the summary of quantity come to 1(-10+5+6).
Is it possible the quantity of following date come to negative again?
;WITH tb(Product, Location,[Date],Quantity) AS(
SELECT 'A','B',CONVERT(DATETIME,'01/01/2017'),-10 UNION ALL
SELECT 'A','B','01/02/2017',5 UNION ALL
SELECT 'A','B','01/03/2017',6 UNION ALL
SELECT 'A','B','01/04/2017',2
)
SELECT t.Product,t.Location,SUM(t.Quantity) AS Qty,MIN(CASE WHEN t.CurrentSum>0 THEN t.Date ELSE NULL END ) AS LastPositiveDate
FROM (
SELECT *,SUM(tb.Quantity)OVER(ORDER BY [Date]) AS CurrentSum FROM tb
) AS t GROUP BY t.Product,t.Location
Product Location Qty LastPositiveDate
------- -------- ----------- -----------------------
A B 3 2017-01-03 00:00:00.000

SQL Server : optimize the efficiency with many joins relationship

I have SQL Server code which takes long time to run the result. In the past, it took 15 minutes. But recently, might as a result of accumulated sales data, it took 2 hours to get the result!!
Therefore, I would like to get some advice regarding how to optimize the code:
The code structure is simple: just to get the sales sum for different regions for different time periods and for each SKU. (I have deleted some code here is to find the different SKU for each materials without size).
Many thanks in advance for your help.
The main code structure is as below, since it is almost the same, so I just give the first 2 paragraphs as example:
SELECT SKU from [MATINFO]
-- Global Sales History Qty - All the years
LEFT JOIN
(
SELECT SKU,SUM([SALES Qty]) as [Global Sales History Qty - All the years]
from dbo.[SALES]
where [PO] IS NOT NULL
group by SKU
)histORy
on MATINFO.[SKU]=histORy.[SKU]
-- Global Sales History Qty - Past 2 years
LEFT JOIN
(
SELECT (
SELECT SKU,SUM([SALES Qty]) as [Global Sales History Qty - All the years]
from dbo.[SALES]
where [PO] IS NOT NULL
group by SKU
/* date range */
and ([ORDER DATE] = '2015.11' OR [ORDER DATE] = '2015.12' or [ORDER DATE] like '%2015%' OR [ORDER DATE] like '%2016%' )
group by SKU
)histORy2
on MATINFO.[SKU]=histORy2.[SKU]
--Global Sales History Qty - Past 1 years
......SIMILAR TO THE CODE STRUCTURE AS ABOVE
The most likely cause of the poor performance is using string for dates and possibly the lack if as adequate indexes.
like '%2015%'
Using double-ended wildcards with like results in full table scans so subqueries are scanning the whole table each time you serach for a different date range. Using temp tables will not solve the underlying issues.
[added later]
Another facet of your original query structure might reduce the number of scans you need of the data - by using "conditional aggregates"
e.g. here is a condensed version of your original query
SELECT
SKU
FROM [MATINFO]
-- Global Sales History Qty - All the years
LEFT JOIN (SELECT
SKU
, SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY
SKU) histORy ON MATINFO.[SKU] = histORy.[SKU]
-- Global Sales History Qty - Past 2 years
LEFT JOIN (SELECT
SKU
, SUM([SALES Qty]) AS [Global Sales History Qty - Past 2 years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
/* date range */
AND [ORDER DATE] >= '20151101' AND [ORDER DATE] < '20161101'
GROUP BY
SKU) histORy2 ON MATINFO.[SKU] = histORy2.[SKU]
That requires a 2 complete passes of the data in dbo.[SALES], but if you were to use a case expression inside the SUM() function you need only one pass of the data (in this example)
SELECT
SKU
, SUM([SALES Qty]) AS [Qty_all_years]
, SUM(CASE
WHEN [ORDER DATE] >= '20151101' AND [ORDER DATE] < '20161101'
THEN [SALES Qty]
END) AS [Qty_past_2_years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY
SKU
I suspect you could apply this logic to most of the columns and substantially improve efficiency of the query when coupled with date columns and appropriate indexing.
Expansion on my comment. Note it is just a suggestion, no guarantee if will run faster.
Take the following derived table histORy:
SELECT SKU,SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY SKU
Before you run your query, materialize the derived table in a temporary table:
SELECT SKU,SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
INTO #histORy
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY SKU
Then use the temporary table in the query:
LEFT JOIN #histORy AS h ON MATINFO.[SKU]=h.[SKU]
In this case you may want to have a index on the SKU field, so you could create the temporary table yourself, slap an index on it, populate with INSERT INTO #history... SELECT ... etc.

how to get multiple min values from two SQL tables?

I have two tables, a Members table and a Plan table. They are structured as follows.
member start_date Mplan Pplan version start_dt end_dt
John 20120701 johnplan johnplan 1 20120601 20130531
John 20130201 johnplan johnplan 2 20130601 20140531
John 20130901 johnplan
John 20131201 johnplan
I need to update the start_date on the Members table to be the minimum value present for that member but within the same Plan version.
Example:
20130201 would be changed to 20120701 and 20131201 would change to 20130901.
Code:
UPDATE Members
SET start_date =(
SELECT MIN(start_date) FROM Members a
LEFT JOIN Plan ON Mplan = Pplan AND
start_date BETWEEN start_dt AND end_dt
WHERE member=a.member
AND start_date BETWEEN start_dt AND end_dt
)
Unfortunately this sets every single start_date to 19900101 aka the lowest value in the entire table for that column.
First you need to get the minimum start date of each member for a specific plan. The following will provide you that.
select MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan
The result will be something like this.
min_date member_name plan_name
2012-07-01 00:00:00.000 John johnplan1
2013-09-01 00:00:00.000 John johnplan2
Use this to update each member's start date for a plan with the lowest start date of the respective plan.
update members
set start_date= tbl.min_date from
(SELECT MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a
inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan) as tbl
where member=tbl.member_name and Mplan=tbl.plan_name
I created your 2 tables, members and plan, and tested this solution with sample data and it works. I hope it helps.
You really need to convert the dates to Datetime. You will have a greater precision, the possibility to store hours, days and minutes as well as access to date specific functions, international conversion and localization.
If your column is a Varchar(8), then it uses no less space than a Datetime column.
That said, what you are looking for is row_number().
Something like:
SELECT Member, MPlan, Start_Date, Row_Number() OVER (PARTITION BY Member, MPLan ORDER BY Start_Date) as Version
FROM Members
Could you try this ? I didn't test it.
With Member_start_dt as
(
select *, (select start_dt from Pplan where M.start_date <= start_dt AND M.start_date >= end_dt) as Pplan_date
from Members M
),
Member_by_plan as
(
select *, ROW_NUMBER () over (partition by Pplan_date order by start_date) num
from Member_start_dt
)
update M
Set M.start_date = MBP1.start_date
from Members M
inner join Member_by_plan MBP1 ON MBP1.member = M.Member AND num = 1
inner join Member_by_plan MBP2 ON MBP2.member = M.Member AND MBP2.Pplan_date = MBP1.Pplan_date AND MBP2.start_date = M.start_date

How to query a Master database using Inner Join links to 2 sub-databases that are identical to each other

I have an Inventory table containing Master file info and 2 Movement History tables (Current Year and Last Year).
I want to use a Query to extract Movements from (say) June LAST Year to March THIS Year in Code, Date sequence.
I am relatively new to SQL and have tried to use the following INNER JOIN structure to do this:
SELECT Code, Descrip, Category, MLast.Date, MLast.DocNo, MCurr.Date, MCurr.DocNo
FROM Stock AS S
INNER JOIN MoveTrnArc MLast ON MLast.Stockcode = S.Code
AND MLast.Date >='2011/06/01' AND MLast.Date <='2012/03/31'
INNER JOIN MoveTrn MCurr ON MCurr.Stockcode = S.Code
AND MCurr.Date >='2011/06/01' AND MCurr.Date <='2012/03/31'
ORDER BY S.Code
This creates a Query Table with the following column structure:
Code | Descrip | Category | Date | DocNo | Date | DocNo |
...where the data from the LAST Year table appears in the first Date/DocNo columns and the CURRENT Year data appears in the second Date/DocNo columns.
What must I do to the Query to have each Movement in its own row or is there a better, more efficient Query to achieve this?
Also, I need the Movements listed in Code followed by Date sequence.
use union all instead of joins
select s.Code , s.Descrip , s.Category , t.Date , t.DocNo
from
(
select Stockcode, Date, DocNo from MoveTrnArc
union all
select Stockcode, Date, DocNo from MoveTrn
) t join Stock s on s.Code = t.Stockcode
where t.Date >='2011/06/01' AND t.Date <='2012/03/31'
beside careful with comparing dates, if Date column is type datetime and includes time you have to change t.Date <='2012/03/31' into t.Date <'2012/04/01' to include all the rows from 31st of march,
as '2012/03/31' is casted as '2012/03/31 00:00:00.000'

Resources