Date Range Intersection Splitting in SQL - sql-server

I have a SQL Server 2005 database which contains a table called Memberships.
The table schema is:
PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime
I'm currently working on a grid feature which shows a break-down of memberships by person. One of the requirements is to split membership rows where there is an intersection of date ranges. The intersection must be bound by the Surname and FirstName, ie splits only occur with membership records of the same Surname and FirstName.
Example table data:
18 Smith John Poker Club 01/01/2009 NULL
18 Smith John Library 05/01/2009 18/01/2009
18 Smith John Gym 10/01/2009 28/01/2009
26 Adams Jane Pilates 03/01/2009 16/02/2009
Expected result set:
18 Smith John Poker Club 01/01/2009 04/01/2009
18 Smith John Poker Club / Library 05/01/2009 09/01/2009
18 Smith John Poker Club / Library / Gym 10/01/2009 18/01/2009
18 Smith John Poker Club / Gym 19/01/2009 28/01/2009
18 Smith John Poker Club 29/01/2009 NULL
26 Adams Jane Pilates 03/01/2009 16/02/2009
Does anyone have any idea how I could write a stored procedure that will return a result set which has the break-down described above.

The problem you are going to have with this problem is that as the data set grows, the solutions to solve it with TSQL won't scale well. The below uses a series of temporary tables built on the fly to solve the problem. It splits each date range entry into its respective days using a numbers table. This is where it won't scale, primarily due to your open ranged NULL values which appear to be inifinity, so you have to swap in a fixed date far into the future that limits the range of conversion to a feasible length of time. You could likely see better performance by building a table of days or a calendar table with appropriate indexing for optimized rendering of each day.
Once the ranges are split, the descriptions are merged using XML PATH so that each day in the range series has all of the descriptions listed for it. Row Numbering by PersonID and Date allows for the first and last row of each range to be found using two NOT EXISTS checks to find instances where a previous row doesn't exist for a matching PersonID and Description set, or where the next row doesn't exist for a matching PersonID and Description set.
This result set is then renumbered using ROW_NUMBER so that they can be paired up to build the final results.
/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int,
Surname nvarchar(30),
FirstName nvarchar(30),
Description nvarchar(100),
StartDate datetime,
EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO
*/
SELECT
PersonID,
Description,
theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
FROM master..spt_values
WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate
AND theDate <= isnull(EndDate, '31/12/2012')
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
PersonID,
theDate,
STUFF((
SELECT '/' + Description
FROM #SplitRanges AS s
WHERE s.PersonID = sr.PersonID
AND s.theDate = sr.theDate
FOR XML PATH('')
), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID,
*
INTO #InterimResults
FROM
(
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID - 1 = t2.RowID
AND t1.Descriptions = t2.Descriptions)
UNION ALL
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID = t2.RowID - 1
AND t1.Descriptions = t2.Descriptions)
) AS t
SELECT DISTINCT
PersonID,
Surname,
FirstName
INTO #DistinctPerson
FROM Schedule
SELECT
t1.PersonID,
dp.Surname,
dp.FirstName,
t1.Descriptions,
t1.theDate AS StartDate,
CASE
WHEN t2.theDate = '31/12/2012' THEN NULL
ELSE t2.theDate
END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1
ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2
ON t2.PersonID = t1.PersonID
AND t1.ID + 1 = t2.ID
AND t1.Descriptions = t2.Descriptions
DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults
/*
DROP TABLE Schedule
*/
The above solution will also handle gaps between additional Descriptions as well, so if you were to add another Description for PersonID 18 leaving a gap:
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')
It will fill the gap appropriately. As pointed out in the comments, you shouldn't have name information in this table, it should be normalized out to a Persons Table that can be JOIN'd to in the final result. I simulated this other table by using a SELECT DISTINCT to build a temp table to create that JOIN.

Try this
SET DATEFORMAT dmy
DECLARE #Membership TABLE(
PersonID int,
Surname nvarchar(16),
FirstName nvarchar(16),
Description nvarchar(16),
StartDate datetime,
EndDate datetime)
INSERT INTO #Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO #Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')
--Program Starts
declare #enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select #enddate = case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from #Membership
--Fill the null enddates
; with fillNullEndDates_cte as
(
select
row_number() over(partition by PersonId order by PersonId) RowNum
,PersonId
,Surname
,FirstName
,Description
,StartDate
,isnull(EndDate,#enddate) EndDate
from #Membership
)
--Generate a date calender
, generateCalender_cte as
(
select
1 as CalenderRows
,min(startdate) DateValue
from #Membership
union all
select
CalenderRows+1
,DateValue + 1
from generateCalender_cte
where DateValue + 1 <= #enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
(
select
t.RowNum
,t.PersonId
,t.Surname
,t.FirstName
,t.Description
, d.DateValue
,d.CalenderRows
from generateCalender_cte d
join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
select
PersonID
,Surname
,FirstName
,stuff((
select '/' + Description
from datesBasedOnMemberships_cte d1
where d1.PersonID = d2.PersonID
and d1.DateValue = d2.DateValue
for xml path('')
), 1, 1,'') as Description
, DateValue
,CalenderRows
from datesBasedOnMemberships_cte d2
group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
select d.*,
CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS [Group]
from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= #enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)

[Only many, many years later.]
I created a stored procedure that will align and break segments by a partition within a single table, and then you can use those aligned breaks to pivot the description into a ragged column using a subquery and XML PATH.
See if the below help:
Documentation: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.md
Stored Procedure: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.sql
For example, your call might look like:
EXEC dbo.DateSegments_AlignWithinTable
#tableName = 'tableName',
#keyFieldList = 'PersonID',
#nonKeyFieldList = 'Description',
#effectivveDateFieldName = 'StartDate',
#terminationDateFieldName = 'EndDate'
You will want to capture the result (which is a table) into another table or temporary table (assuming it is called "AlignedDataTable" in below example). Then, you can pivot using a subquery.
SELECT
PersonID, StartDate, EndDate,
SUBSTRING ((SELECT ',' + [Description] FROM AlignedDataTable AS innerTable
WHERE
innerTable.PersonID = AlignedDataTable.PersonID
AND (innerTable.StartDate = AlignedDataTable.StartDate)
AND (innerTable.EndDate = AlignedDataTable.EndDate)
ORDER BY id
FOR XML PATH ('')), 2, 999999999999999) AS IdList
FROM AlignedDataTable
GROUP BY PersonID, StartDate, EndDate
ORDER BY PersonID, StartDate

Related

how can i know the total number of holidays by employee

I have this tables Holiday(Id,FK(EmployeeId),StartDate,EndDate) and table Employee(Id,FullName,etc...)
I want to know the number of days that each employee have
I was trying something like this :
SELECT Employee.Id, SUM(DATEDIFF(day,Holiday.StartDate,Holiday.EndDate) + 1)
FROM Employee
LEFT JOIN Holiday ON Holiday.EmployeeId=Employee.Id
GROUP BY Employee.id
i know this doesn't work because, to sum thati would need to group by Holiday.Id since i will have many rows in the Holiday table for the same EmployeeId
how can i accomplish this?
thanks for the help
Or, using a workingday calculation I found here: https://www.sqlshack.com/how-to-calculate-work-days-and-hours-in-sql-server you could do the following:
CREATE FUNCTION workingdays ( #DateFrom Date, #DateTo Date) RETURNS INT AS
BEGIN
DECLARE #TotDays INT= DATEDIFF(DAY, #DateFrom, #DateTo) + 1;
DECLARE #TotWeeks INT= DATEDIFF(WEEK, #DateFrom, #DateTo) * 2;
DECLARE #IsSunday INT= CASE
WHEN DATENAME(WEEKDAY, #DateFrom) = 'Sunday'
THEN 1
ELSE 0
END;
DECLARE #IsSaturday INT= CASE
WHEN DATENAME(WEEKDAY, #DateTo) = 'Saturday'
THEN 1
ELSE 0
END;
DECLARE #TotWorkingDays INT= #TotDays - #TotWeeks - #IsSunday + #IsSaturday;
RETURN #TotWorkingDays;
END
GO
create table Employee (Id int identity, name varchar(64), Primary Key (Id));
create table Holiday (EmployeeId int, StartDate date, EndDate date);
insert into Employee VALUES ('Harry Potter'),('Hermiony Granger'),('Ron Weasly'),('Ginny Weasley');
insert into Holiday VALUES (1,'2020-02-12','2020-02-18'),(1,'2020-04-02','2020-04-07'),(1,'2020-08-21','2020-09-05'),
(2,'2020-01-04','2020-01-13'),(2,'2020-03-17','2020-03-23'),(2,'2020-05-29','2020-06-7');
SELECT Employee.Id, SUM(dbo.workingdays(Holiday.StartDate,Holiday.EndDate))
FROM Employee
LEFT JOIN Holiday ON Holiday.EmployeeId=Employee.Id
GROUP BY Employee.id
This will still only be a crude estimation as is does not account for public holidays.
DEMO: https://rextester.com/QKGUT41272
Hi i think you can build query using CTE like this :
Resource : CTE Microsoft : https://learn.microsoft.com/fr-fr/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15
Or using select in clause form
EDIT : Or it work with your current query to
create Table #EMPLOYEE
(
EMP_ID INT,
EMP_NAME varchar(128)
)
create Table #HOLIDAYS
(
HL_ID INT,
HL_EMP varchar(128),
StartDate DATE,
EndDate DATE
)
Insert into #EMPLOYEE
(
EMP_ID,
EMP_NAME
)
SELECT 1, 'Toto'
UNION ALL
SELECT 2,'Dupont'
UNION ALL
SELECT 3,'Titi'
UNION ALL
SELECT 4,'Tata'
Insert into #HOLIDAYS
(
HL_ID,
HL_EMP,
StartDate,
EndDate
)
SELECT '1', '1',GETDATE(),DATEADD(day,4,GETDATE())
UNION ALL
SELECT '2','1',DATEADD(day,-7,GETDATE()), DATEADD(day,-5,GETDATE())
UNION ALL
SELECT '3','2',DATEADD(day,4,GETDATE()),DATEADD(day,15,GETDATE())
-- USING CTE EXEMPLE
;WITH MyCteAPP AS (
SELECT EMP_ID, EMP_NAME, HL_ID, ISNULL(StartDate,GETDATE()) AS 'StartDate', ISNULL(EndDate,GETDATE()) AS 'EndDate'
FROM #EMPLOYEE
LEFT JOIN #HOLIDAYS ON EMP_ID = HL_EMP
)
SELECT EMP_ID, EMP_NAME, SUM(DATEDIFF(day,StartDate,EndDate)) AS 'NbDay'
FROM MyCteAPP
GROUP BY EMP_ID,EMP_NAME
ORDER BY EMP_ID
-- USING SELECT IN FROM EXEMPLE
SELECT EMP_ID, EMP_NAME, SUM(DATEDIFF(day,StartDate,EndDate)) AS 'NbDay'
FROM (SELECT EMP_ID, EMP_NAME, HL_ID, ISNULL(StartDate,GETDATE()) AS 'StartDate', ISNULL(EndDate,GETDATE()) AS 'EndDate'
FROM #EMPLOYEE
LEFT JOIN #HOLIDAYS ON EMP_ID = HL_EMP
) AS SUBQUERY
GROUP BY EMP_ID,EMP_NAME
ORDER BY EMP_ID
--USING CURRENT QUERY
SELECT EMP_ID, EMP_NAME, SUM(DATEDIFF(day,ISNULL(StartDate,GETDATE()),ISNULL(EndDate,GETDATE()))) AS 'NbDay'
FROM #EMPLOYEE
LEFT JOIN #HOLIDAYS ON EMP_ID = HL_EMP
GROUP BY EMP_ID,EMP_NAME
ORDER BY EMP_ID
DROP TABLE #EMPLOYEE
DROP TABLE #HOLIDAYS
RESULT :
I have tried with sub query. Please use below query. It will be helpful.
SELECT Id, SUM(leave) AS leave
FROM
(
SELECT Employee.Id, DATEDIFF(dd,Holiday.StartDate,Holiday.EndDate) as leave
FROM Employee
LEFT JOIN Holiday ON Holiday.EmployeeId=Employee.Id
)a
GROUP BY ID

How can I combine tables and create a new record (when missing) from a previous table?

I get provided a list of users and their home zip code every month. However, not every user provides a zip code for every month so my monthly tables are never the same size.
What I want to do is create one master table that has a record for every month for every user starting with the first month. Then if a user in the first month doesn't appear in the second month they should still get a record for the second month with the zip code assigned based on the prior month.
For example, I have two tables that look like this:
UserNumber Month ZIP
1 201701 12345
2 201701 30032
3 201701 01432
Etc.
UserNumber Month ZIP
1 201702 12345
3 201702 01433
4 201702 30032
Etc.
You can see that some ZIP codes will change (user 3 "moved") which is ok. But user 2 doesn't have a record for 201702. But my new master table should have a record for them where the ZIP code from 201701 is used. So the master table should look like this:
UserNumber Month ZIP
1 201701 12345
1 201702 12345
2 201701 30032
2 201702 30032
3 201701 01432
Etc.
As mentioned, there is a record for user 2 for 201702 using the same zip code where we had a record. Sometimes there will be multiple missing months so I want to grab the most recent record that is less than the current month.
I have tried creating multiple temporary tables based on the table intersects and then appending them together and that worked. But with 30+ months of data that was going to get very complicated and tedious so I'm hoping there is a better way. And this master table would have to be updated each month as well.
I would appreciate any suggestions!
Currently the data is in S3 which I access using Hive so a HiveQL solution would be ideal so I don't have to import all this data into SSMS but if it's easier to do this in SSMS using SQL I can make that work as well.
Something like this sounds like it would work, at least once you get the initial set built. This does make the assumption that the process is updated every month so that there can't be gaps in the series:
insert into Master (userid, month, zip)
select
coalesce(u.userid, m.userid),
coalesce(u.month, convert(char(6), dateadd(month, 1, m.month + '01'), 112),
coalesce(u.zip, m.zip)
from ZipUpdate u full outer join Master m
on m.userid = u.userid and m.month =
convert(char(6), dateadd(month, -1, u.month + '01'), 112);
If you want to fill in the gaps you could start with an empty table and just run this 30+ times in a loop. If the table names are predictable then something like this could generate the entire script.
declare #sql varchar(8000);
declare #dt date = '20170101';
while #dt < cast('20190901' as date)
begin
set #sql = 'insert into Master (userid, month, zip)
select
coalesce(u.userid, m.userid),
''' + convert(char(6), #dt, 112) + ''',
coalesce(u.zip, m.zip)
from '
/* change this expression as needed */
+ 'ZipUpdate'
+ convert(varchar(3), datediff(month, '20170101', #dt) + 1)
+ ' u full outer join Master m
on m.userid = u.userid and m.month = ''' +
+ convert(char(6), dateadd(month, -1, #dt), 112) + '''
where u.month = ''' + convert(char(6), #dt, 112) + ''';'
set #dt = dateadd(month, 1, #dt);
select #sql;
end
The following solution is suitable for your question:
begin tran
create table #tbl1 (UserNumber int, [Month] int, ZIP char(5));
create table #tbl2 (UserNumber int, [Month] int, ZIP char(5));
create table #tbl3 (UserNumber int, [Month] int, ZIP char(5));
insert into #tbl1 (UserNumber, [Month], ZIP)
select 1, 201701, '12345' union all
select 2, 201701, '30032' union all
select 3, 201701, '01432';
insert into #tbl2 (UserNumber, [Month], ZIP)
select 1, 201702, '12345' union all
select 3, 201702, '01433' union all
select 4, 201702, '30032';
insert into #tbl3 (UserNumber, [Month], ZIP)
select 3, 201703, '01435' union all
select 4, 201703, '30032';
create table #full (UserNumber int, [Month] int, ZIP char(5));
insert into #full (UserNumber, [Month], ZIP)
select UserNumber, [Month], ZIP from #tbl1
union all
select UserNumber, [Month], ZIP from #tbl2
union all
select UserNumber, [Month], ZIP from #tbl3;
CREATE UNIQUE CLUSTERED INDEX [CI_Full] ON #full (UserNumber asc, [Month] asc);
create table #month ([Month] int);
insert into #month ([Month])
select [Month]
from #full
group by [Month];
CREATE UNIQUE CLUSTERED INDEX [CI_Month] ON #month ([Month] asc);
create table #start_usernumber (UserNumber int, [Month] int);
insert into #start_usernumber (UserNumber, [Month])
select UserNumber, min([Month])
from #full
group by UserNumber;
CREATE UNIQUE CLUSTERED INDEX [CI_StartUserNumber] ON #start_usernumber (UserNumber asc, [Month] asc);
select su.UserNumber,
m.[Month],
case when(f.ZIP is null) then (select top(1) f0.ZIP from #full as f0 where f0.UserNumber=su.UserNumber and f0.[Month]<m.[Month] and f0.ZIP is not null order by f0.[Month] desc) else f.ZIP end as ZIP
from #start_usernumber as su
inner join #month as m on su.[Month]<=m.[Month]
left join #full as f on m.[Month]=f.[Month] and su.UserNumber=f.UserNumber
order by su.UserNumber, m.[Month];
rollback tran
Result:

Sum up a column in SQL

I have four tables in my database which is Star Schema Design. Those tables are
Product_DM (Product_Id, Product_Name)
Shop_DM (Branch_Id, Branch_Name, Branch_State)
Date_DM (Date_Id as Date, Day, Month and Year). Day and Month and Year values are populated based on Date_Id.
Revenue_FT (Product_id, Branch_Id, Date_Id, Quantity)
What I want to have is I want to look at 5 BRANCH OR SHOP which sells most products last five years of Boxing day.
SELECT
RF.BRANCH_ID,
SD.BRANCH_NAME,
SD.BRANCH_STATE,
PD.PRODUCT_NAME,
SELF_RF.TOTAL,
FORMAT ( DD.[date], 'd', 'en-US' ) AS 'Great Britain English Result'
FROM
PRODUCT_DM AS PD,
SHOP_DM AS SD,
DATE_DM AS DD,
REVENUE_FT AS RF
JOIN
(SELECT BRANCH_ID, [date], SUM(quantity) AS TOTAL
FROM REVENUE_FT
GROUP BY BRANCH_ID, [date]) AS SELF_RF ON SELF_RF.BRANCH_ID = RF.BRANCH_ID
AND SELF_RF.[date] = RF.[date]
WHERE
RF.BRANCH_ID = SD.BRANCH_ID
AND SD.BRANCH_STATE = 'NSW'
AND RF.[date] = DD.[date]
AND DD.[day] = 26
AND DD.[month] = 12
AND DD.[year] BETWEEN 2012 AND 2018
ORDER BY
SELF_RF.TOTAL DESC;
This is the query I have and this is the result:
The problem is it is not summing up different products and different dates (for example 12/26/2013 and 12/26/2014 should also sum up). I know I am doing something wrong in my query but I needed a hand.
See an example below where you're able to get the top 5 branch / shops which have had the highest sales over the past 5 boxing days.
It's an example based off your listed table structure above.
;with GetBranchDetailsAndData
as
(
--Joins the date table, sales, and shop details to get the branch details and quantity per transaction
SELECT
c.Branch_Id,
c.Branch_Name,
c.Branch_State,
a.Quantity
FROM
Revenue_FT a
INNER JOIN
DATE_DM b
ON
a.Date_Id = b.Date_Id
INNER JOIN
Shop_DM c
ON
c.Branch_Id = a.Branch_Id
WHERE
[DAY] = 26 AND
[Month] = 12
AND [Year] BETWEEN 2012 AND 2017 --Boxing days in 2012 - 2017
--You could also filter on a specific state in the where clause
),
SUMDetailsAndData
as
(
SELECT
Branch_ID,
Branch_Name,
Branch_State,
SUM(Quantity) as [Quantity] --Sum quantity per branch
FROM
GetBranchDetailsAndData
GROUP BY
Branch_ID,
Branch_Name,
Branch_State
),
GetTop5
as
(
SELECT
Branch_ID,
Branch_Name,
Branch_State,
Quantity,
DENSE_RANK() OVER(ORDER BY Quantity DESC) as [QuantityOrder] --DENSE RANK to get the quantity order
FROM
SUMDetailsAndData
)
SELECT
*
FROM
GetTop5
WHERE
QuantityOrder <= 5 --Where the quantity order less or equal to 5. This will return multiple rows if there is multiple with the same number in the top 5.
ORDER BY
QuantityOrder
Here is a snippet below which I used to generate testing data.
--Create tables
CREATE TABLE Product_DM (Product_Id bigint identity(1,1), Product_Name NVARCHAR(200))
CREATE TABLE Shop_DM (Branch_Id bigint identity(1,1), Branch_Name NVARCHAR(100), Branch_State NVARCHAR(100))
CREATE TABLE Date_DM (Date_Id bigint identity(1,1), [Day] int, [Month] int, [Year] int)
CREATE TABLE Revenue_FT (Product_id bigint, Branch_Id bigint, Date_Id bigint, Quantity bigint)
--Insert Data
INSERT INTO Product_DM (Product_Name) VALUES ('Test Product'),('Test Product2')
INSERT INTO Shop_DM (Branch_Name, Branch_State) VALUES
('Branch1', 'State1'), ('Branch2', 'State1'), ('Branch3', 'State1'), ('Branch4', 'State1'), ('Branch5', 'State1'),
('Branch6', 'State1'), ('Branch7', 'State1'), ('Branch8', 'State1'), ('Branch9', 'State1'), ('Branch10', 'State1')
DECLARE #DateStart date = '2010-01-01', #DateEnd date = '2018-01-01'
WHILE(#DateStart <= #DateEnd)
BEGIN
INSERT INTO DATE_DM ([day], [month], [year]) VALUES (DATEPART(dd, #datestart), DATEPART(MM, #datestart), DATEPART(YYYY, #datestart))
SET #DateStart = DATEADD(dd, 1, #DateStart)
END
--Insert random product sales
DECLARE #MinProduct int = 1, #MaxProduct int = 2
DECLARE #MinBranch int = 1, #MaxBranch int = 10
DECLARE #MinDate int = 1, #MaxDate int = 2923
DECLARE #Startloop int = 1, #EndLoop int = 200000
WHILE #Startloop <= #EndLoop
BEGIN
INSERT INTO Revenue_FT VALUES (
ROUND(((#MaxProduct - #MinProduct) * RAND() + #MinProduct), 0),
ROUND(((#MaxBranch - #MinBranch) * RAND() + #MinBranch), 0),
ROUND(((#MaxDate - #MinDate) * RAND() + #MinDate), 0), 1)
SET #Startloop = #Startloop + 1
END
Example output below:

Grouping similar items recursively

I have been reading the following Microsoft article on recursive queries using CTE and just can't seem to wrap my head around how to use it for group common items.
I have a table the contains the following columns:
ID
FirstName
LastName
DateOfBirth
BirthCountry
GroupID
What I need to do is start with the first person in the table and iterate through the table and find all the people that have the same (LastName and BirthCountry) or have the same (DateOfBirth and BirthCountry).
Now the tricky part is that I have to assign them the same GroupID and then for each person in that GroupID, I need to see if anyone else has the same information and then put the in the same GroupID.
I think I could do this with multiple cursors but it is getting tricky.
Here is sample data and output.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
John Doe and Jane Doe are in the same Group (100) because they have the same (LastName and BirthCountry).
Adam Wayne is in Group (100) because he has the same (BirthDate and BirthCountry) as John Doe.
Kay Wayne is in Group (100) because she has the same (LastName and BirthCountry) as Adam Wayne who is already in Group (100).
Matt Knox is in a new group (101) because he does not match anyone in previous groups.
Jack Stone is in a group (100) because he has the same (BirthDate and BirthCountry) as Kay Wayne who is already in Group (100).
Data scripts:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);
Here's what I came up with. I have rarely written recursive queries so it was some good practice for me. By the way Kay and Adam do not share a birth country in your sample data.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
For the recursion to terminate it's necessary to keep track of the equivalences (via string concatenation) so that at each level it only needs to consider newly discovered equivalences (or connections, transitivities, etc.) Notice that I've avoided using the word group to avoid bleeding into the GROUP BY concept.
http://rextester.com/edit/TVRVZ10193
EDIT: I used an almost arbitrary numbering for the equivalences but if you wanted them to appear in a sequence based on the lowest ID with each block that's easy to do. Instead of using row_number() say min(ID) as grpNum presuming, of course, that IDs are unique.
I assume groupid is the output you want which start from 100.
Even if groupid come from another table,then it is no problem.
Firstly,sorry for my "No cursor comments".Cursor or RBAR operation is require for this task.In fact after a very long time i met such requirement which took so long and I use RBAR operation.
if tommorrow i am able to do using SET BASE METHOD,then I will come and edit it.
Most importantly using RBAR operation make the script more understanding and I think it wil work for other sample data too.
Also give feedback about the performance and how it work with other sample data.
Alsi in my script you note that id are not in serial,and it do not matter,i did this in order to test.
I use print for debuging purpose,you can remove it.
SET NOCOUNT ON
DECLARE #Tbl TABLE(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL) ,
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL),
(7, 'Jerry', 'Stone', '1976-06-08', 'Hay', NULL)
DECLARE #StartGroupid INT = 100
DECLARE #id INT
DECLARE #Groupid INT
DECLARE #Maxid INT
DECLARE #i INT = 1
DECLARE #MinGroupID int=#StartGroupid
DECLARE #MaxGroupID int=#StartGroupid
DECLARE #LastGroupID int
SELECT #maxid = max(id)
FROM #tbl
WHILE (#i <= #maxid)
BEGIN
SELECT #id = id
,#Groupid = Groupid
FROM #Tbl a
WHERE id = #i
if(#Groupid is not null and #Groupid<#MinGroupID)
set #MinGroupID=#Groupid
if(#Groupid is not null and #Groupid>#MaxGroupID)
set #MaxGroupID=#Groupid
if(#Groupid is not null)
set #LastGroupID=#Groupid
UPDATE A
SET groupid =case
when #id=1 and b.groupid is null then #StartGroupid
when #id>1 and b.groupid is null then #MaxGroupID+1--(Select max(groupid)+1 from #tbl where id<#id)
when #id>1 and b.groupid is not null then #MinGroupID --(Select min(groupid) from #tbl where id<#id)
end
FROM #Tbl A
INNER JOIN #tbl B ON b.id = #ID
WHERE (
(
a.BirthCountry = b.BirthCountry
and a.DateOfBirth = b.dateofbirth
)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
)
--if(#id=7) --#id=2,#id=3 and so on (for debug
--break
SET #i = #i + 1
SET #ID = #I
END
SELECT *
FROM #Tbl
Alternate Method but still it return 56,000 rows without rownum=1.See if it work with other sample data or see if you can further optimize it.
;with CTE as
(
select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,#StartGroupid GroupID
,1 rn
FROM #Tbl A where a.id=1
UNION ALL
Select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,case when ((a.BirthCountry = b.BirthCountry and a.DateOfBirth = b.dateofbirth)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
) then b.groupid else b.groupid+1 end
, b.rn+1
FROM #tbl A
inner join CTE B on a.id>1
where b.rn<#Maxid
)
,CTE1 as
(select * ,row_number()over(partition by id order by groupid )rownum
from CTE )
select * from cte1
where rownum=1
Maybe you can run it in this way
SELECT *
FROM table_name
GROUP BY
FirstName,
LastName,
GroupID
HAVING COUNT(GroupID) >= 2
ORDER BY GroupID

How can I select first inserted row in SQL Server?

may be this question be unrelated to stackoverflow . but this is my problem and i do not know the syntax .
with this query i select the persons who had transactions by their date time .
this is my query
i want to write query that select the their first TransactionsTimeStamp?
I assume that you are looking for ranking functions like ROW_NUMBER, you could use them for example with a Common Table Expression (CTE):
WITH CTE AS
(
SELECT ..., RN = ROW_NUMBER() OVER (PARTITION BY FirstName, LastName
ORDER BY TransactionsTimeStamp ASC)
FROM dbo.TableName ... (join tables here)
)
SELECT ....
FROM CTE
WHERE RN = 1
... are the columns that you want to select, you can select all, as opposed to a GROUP BY.
But if you just want to select the TransactionsTimeStamp-column for every user:
SELECT MIN(TransactionsTimeStamp) AS TransactionsTimeStamp, FirstName, LastName
FROM dbo.tableName ... (join tables here)
GROUP BY FirstName, LastName
The problem in your query is that you are grouping by Date column. So you are getting all different Date values as a result. You should group only by FirstName and LastName and apply some aggregation functions to Date column.
If just Min date is needed then you can get that date using aggregate function like:
DECLARE #test TABLE
(
first_name NVARCHAR(MAX) ,
last_name NVARCHAR(MAX) ,
transaction_date DATETIME
)
INSERT INTO #test
VALUES ( 'A', 'B', '20150101' )
INSERT INTO #test
VALUES ( 'A', 'B', '20150120' )
INSERT INTO #test
VALUES ( 'C', 'D', '20150103' )
INSERT INTO #test
VALUES ( 'C', 'D', '20150119' )
SELECT first_name ,
last_name ,
MIN(transaction_date) AS min_transaction_date
FROM #test
GROUP BY first_name ,
last_name
Output:
first_name last_name min_transaction_date
A B 2015-01-01 00:00:00.000
C D 2015-01-03 00:00:00.000
Select firstname, lastname, min(date) as minimum_date from clubprofile_cp
group by firstname, lastname

Resources