Grouping similar items recursively - sql-server

I have been reading the following Microsoft article on recursive queries using CTE and just can't seem to wrap my head around how to use it for group common items.
I have a table the contains the following columns:
ID
FirstName
LastName
DateOfBirth
BirthCountry
GroupID
What I need to do is start with the first person in the table and iterate through the table and find all the people that have the same (LastName and BirthCountry) or have the same (DateOfBirth and BirthCountry).
Now the tricky part is that I have to assign them the same GroupID and then for each person in that GroupID, I need to see if anyone else has the same information and then put the in the same GroupID.
I think I could do this with multiple cursors but it is getting tricky.
Here is sample data and output.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
John Doe and Jane Doe are in the same Group (100) because they have the same (LastName and BirthCountry).
Adam Wayne is in Group (100) because he has the same (BirthDate and BirthCountry) as John Doe.
Kay Wayne is in Group (100) because she has the same (LastName and BirthCountry) as Adam Wayne who is already in Group (100).
Matt Knox is in a new group (101) because he does not match anyone in previous groups.
Jack Stone is in a group (100) because he has the same (BirthDate and BirthCountry) as Kay Wayne who is already in Group (100).
Data scripts:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);

Here's what I came up with. I have rarely written recursive queries so it was some good practice for me. By the way Kay and Adam do not share a birth country in your sample data.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
For the recursion to terminate it's necessary to keep track of the equivalences (via string concatenation) so that at each level it only needs to consider newly discovered equivalences (or connections, transitivities, etc.) Notice that I've avoided using the word group to avoid bleeding into the GROUP BY concept.
http://rextester.com/edit/TVRVZ10193
EDIT: I used an almost arbitrary numbering for the equivalences but if you wanted them to appear in a sequence based on the lowest ID with each block that's easy to do. Instead of using row_number() say min(ID) as grpNum presuming, of course, that IDs are unique.

I assume groupid is the output you want which start from 100.
Even if groupid come from another table,then it is no problem.
Firstly,sorry for my "No cursor comments".Cursor or RBAR operation is require for this task.In fact after a very long time i met such requirement which took so long and I use RBAR operation.
if tommorrow i am able to do using SET BASE METHOD,then I will come and edit it.
Most importantly using RBAR operation make the script more understanding and I think it wil work for other sample data too.
Also give feedback about the performance and how it work with other sample data.
Alsi in my script you note that id are not in serial,and it do not matter,i did this in order to test.
I use print for debuging purpose,you can remove it.
SET NOCOUNT ON
DECLARE #Tbl TABLE(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL) ,
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL),
(7, 'Jerry', 'Stone', '1976-06-08', 'Hay', NULL)
DECLARE #StartGroupid INT = 100
DECLARE #id INT
DECLARE #Groupid INT
DECLARE #Maxid INT
DECLARE #i INT = 1
DECLARE #MinGroupID int=#StartGroupid
DECLARE #MaxGroupID int=#StartGroupid
DECLARE #LastGroupID int
SELECT #maxid = max(id)
FROM #tbl
WHILE (#i <= #maxid)
BEGIN
SELECT #id = id
,#Groupid = Groupid
FROM #Tbl a
WHERE id = #i
if(#Groupid is not null and #Groupid<#MinGroupID)
set #MinGroupID=#Groupid
if(#Groupid is not null and #Groupid>#MaxGroupID)
set #MaxGroupID=#Groupid
if(#Groupid is not null)
set #LastGroupID=#Groupid
UPDATE A
SET groupid =case
when #id=1 and b.groupid is null then #StartGroupid
when #id>1 and b.groupid is null then #MaxGroupID+1--(Select max(groupid)+1 from #tbl where id<#id)
when #id>1 and b.groupid is not null then #MinGroupID --(Select min(groupid) from #tbl where id<#id)
end
FROM #Tbl A
INNER JOIN #tbl B ON b.id = #ID
WHERE (
(
a.BirthCountry = b.BirthCountry
and a.DateOfBirth = b.dateofbirth
)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
)
--if(#id=7) --#id=2,#id=3 and so on (for debug
--break
SET #i = #i + 1
SET #ID = #I
END
SELECT *
FROM #Tbl
Alternate Method but still it return 56,000 rows without rownum=1.See if it work with other sample data or see if you can further optimize it.
;with CTE as
(
select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,#StartGroupid GroupID
,1 rn
FROM #Tbl A where a.id=1
UNION ALL
Select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,case when ((a.BirthCountry = b.BirthCountry and a.DateOfBirth = b.dateofbirth)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
) then b.groupid else b.groupid+1 end
, b.rn+1
FROM #tbl A
inner join CTE B on a.id>1
where b.rn<#Maxid
)
,CTE1 as
(select * ,row_number()over(partition by id order by groupid )rownum
from CTE )
select * from cte1
where rownum=1

Maybe you can run it in this way
SELECT *
FROM table_name
GROUP BY
FirstName,
LastName,
GroupID
HAVING COUNT(GroupID) >= 2
ORDER BY GroupID

Related

Need help pulling specific type of ID when there are multiple IDs

I have a dataset where an employee with one SSN can have multiple employee IDs. In that situation I need to only brings back records where the employee ID begins with '200.' In most situations there will only be one employee ID or the employee ID is null(which is okay to bring back).
This is a sample dataset:
declare #t table(id int, name varchar(100), ssn int, eeid int)
insert into #t
values(1, 'John Smith', '55512', '2006544'),
(1, 'John Smith', '55512', '12345'),
(2, 'Bob Johnson', '55514', '200454'),
(3, 'Tom Smith', '44454', NULL),
(4, 'John Thompson', '45434', '204435'),
(4, 'John Thompson', '45434', '12353568')
The output should look like this:
Id Name SSN EEID
1 John Smith 55512 2006544
2 Bob Johnson 55514 200454
3 Tom Smith 44454 NULL
4 John Thompson 45434 204435
I tried playing with a Window function but got stuck. I tried using Rownum but it didn't give the correct result with 'John Smith.'
select *,
row_number()over(partition by ssn ORDER BY case when EEID like '200%'
then 1 end) AS ROWNUM
from #t
ROW_NUMBER, on it's own, doesn't change the returned rows, it just "numbers" them. You'd either need to put the expression in a CTE/Subquery and then filter to the first row in the WHERE, or you use TOP and put the expression in the ORDER BY:
--CTE solution
WITH CTE AS(
SELECT ID,
[Name],
SSN,
EEID,
ROW_NUMBER() OVER (PARTITION BY SSD ORDER BY CASE WHEN eeid LIKE '200%' THEN 1 ELSE 2 END, eeid ASC) AS RN
FROM dbo.YourTable)
SELECT ID,
[Name],
SSN,
EEID
FROM CTE
WHERE RN = 1;
--ORDER BY and TOP solution
SELECT TOP (1) WITH TIES
ID,
[Name],
SSN,
EEID
FROM dbo.YourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY SSD ORDER BY CASE WHEN eeid LIKE '200%' THEN 1 ELSE 2 END, eeid);

As of date queries

I'm trying to implement similar functionality to temporal tables
https://msdn.microsoft.com/en-us/library/dn935015.aspx
Due to some complex business requirements, I cannot use SQL Server's temporal tables, so I'm trying to implement my own version.
Basically, I need a way to be able to determine what is the data in a specific point in time.
The table will store deltas, and a bitmask to determine what columns actually changed (changing the data to NULL is a valid scenario, that's why we need this update mask)
I do have a prototype working, but my query seems very expensive. Instead of one subquery per column, I've been trying to combine them into just 1 query (with a cursor-like behavior).
Tables:
CREATE TABLE user_efdt
(
user_id INT not null,
date_effective DATE not null,
first_name NVARCHAR(100),
last_name NVARCHAR(100),
address1 NVARCHAR(100),
position NVARCHAR(100),
modified_fields VARBINARY(128) NOT NULL
)
ALTER TABLE user_efdt
ADD CONSTRAINT PK_user_efdt
PRIMARY KEY CLUSTERED (user_id, date_effective);
GO
Data setup:
DECLARE #first_name_bit INT
DECLARE #last_name_bit INT
DECLARE #address1_bit INT
DECLARE #position_bit INT
SET #first_name_bit = 3
SET #last_name_bit = 4
SET #address1_bit = 5
SET #position_bit = 6
-- user john does gets added to system on 1-may-16 with first name and last name
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-may-16', 'john', 'doe', null, null, CAST(POWER(2, #first_name_bit) + POWER(2, #last_name_bit) AS VARBINARY))
-- user mary ann gets added to system on 1-may-16 with first name, last name, address and position
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-may-16', 'mary', 'ann', '123 main st', 'manager', CAST(POWER(2, #first_name_bit) + POWER(2, #last_name_bit) + POWER(2, #address1_bit) + POWER(2, #position_bit) AS VARBINARY))
-- john doe gets address updated on 1-apr-16
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-apr-16', null, null, '456 el dorado st', null, CAST(POWER(2, #address1_bit) AS VARBINARY))
-- john doe gets position updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-mar-16', null, null, null, 'engineer', CAST(POWER(2, #position_bit) AS VARBINARY))
-- john doe gets position updated again
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-jun-16', null, null, null, 'engineer 2', CAST(POWER(2, #position_bit) AS VARBINARY))
-- john doe gets position updated to NULL (erased)
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-jul-16', null, null, null, null, CAST(POWER(2, #position_bit) AS VARBINARY))
-- mary ann gets address updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-jun-16', null, null, '1443 hoover st', null, CAST(POWER(2, #address1_bit) AS VARBINARY))
-- mary ann gets position updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-jul-16', null, null, null, 'manager 2', CAST(POWER(2, #position_bit) AS VARBINARY))
Here's my version of the query:
CREATE PROCEDURE as_of_date_query (#date DATE)
AS
BEGIN
DECLARE #first_name_bit INT
DECLARE #last_name_bit INT
DECLARE #address1_bit INT
DECLARE #position_bit INT
SET #first_name_bit = 3
SET #last_name_bit = 4
SET #address1_bit = 5
SET #position_bit = 6
SELECT
q1.date_effective, fn.val first_name, ln.val last_name,
addr.val address1, pos.val position
FROM
(SELECT
user_id, MAX(date_effective) date_effective
FROM
user_efdt
WHERE
date_effective <= #date
GROUP BY
user_id) q1
LEFT JOIN
(SELECT DISTINCT
user_id,
FIRST_VALUE(first_name) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM
user_efdt
WHERE
(modified_fields & POWER(2, #first_name_bit)) > 0
AND date_effective <= #date) fn ON q1.user_id = fn.user_id
LEFT JOIN
(SELECT DISTINCT
user_id,
FIRST_VALUE(last_name) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM
user_efdt
WHERE
(modified_fields & POWER(2, #last_name_bit)) > 0
AND date_effective <= #date) ln ON q1.user_id = ln.user_id
LEFT JOIN (
SELECT DISTINCT user_id, FIRST_VALUE(address1) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM user_efdt
WHERE (modified_fields & POWER(2, #address1_bit)) > 0
AND date_effective <= #date
) addr ON q1.user_id = addr.user_id
LEFT JOIN (
SELECT DISTINCT user_id, FIRST_VALUE(position) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM user_efdt
WHERE (modified_fields & POWER(2, #position_bit)) > 0
AND date_effective <= #date
) pos ON q1.user_id = pos.user_id
END
expected results:
EXEC as_of_date_query '1-mar-16'
-- 2016-03-01 NULL NULL NULL engineer
EXEC as_of_date_query '5-apr-16'
-- 2016-04-01 NULL NULL 456 el dorado st engineer
EXEC as_of_date_query '15-may-16'
-- 2016-05-01 john doe 456 el dorado st engineer
-- 2016-05-01 mary ann 123 main st manager
EXEC as_of_date_query '5-jun-16'
-- 2016-06-01 john doe 456 el dorado st engineer 2
-- 2016-06-01 mary ann 1443 hoover st manager
EXEC as_of_date_query '1-jul-16'
-- 2016-07-01 john doe 456 el dorado st NULL
-- 2016-07-01 mary ann 1443 hoover st manager 2

Computing a column based on comparisons of other columns in table

Ok I am working on a database table with 4 columns, lets say a first name, middle name, last name and a group id. I want to group people based on the fact that they have the same first and last names regardless of their middle name. I also want to, if a new entry comes in, give that entry the correct group id.
Here is an example of the result:
----------------------------------------------------------
| First_Name | Middle_Name | Last_Name | Group_ID |
----------------------------------------------------------
| Jon | Jacob | Schmidt | 1 |
----------------------------------------------------------
| William | B. | Schmidt | 1 |
----------------------------------------------------------
| Sally | Anne | Johnson | 2 |
----------------------------------------------------------
I'm not sure whether or not this falls under the jurisdiction of a computed column, some kind of join or something far less obscure, Please help!
If you only need to enumerate the groups within a query then row_number() will work for you:
declare #Names table ( First_Name varchar(10), Middle_Name varchar(10), Last_Name varchar(10))
insert into #Names
select 'Jon', 'Jacob', 'Schmidt' union all
select 'William', 'B.', 'Schmidt' union all
select 'Sally', 'Anne', 'Johnson' union all
select 'Jon', 'Two', 'Schmidt'
;with Groups (First_Name, Last_Name, Group_ID) as
( select First_Name, Last_Name, row_number()over(order by Last_Name)
from #Names
group
by First_Name, Last_Name
)
select n.First_Name, n.Middle_Name, n.Last_Name, g.Group_Id
from #Names n
join Groups g on
n.First_Name = g.First_Name and
n.Last_Name = g.Last_Name;
Be aware the Group_ID value will change as new nameGroups are introduced.
If you want to assign and persist a Group_ID then I would suggest creating an ancillary table and assign the Group_IDs there.
By storing the mapping outside of the #Names table you are allowing users to change their names and not have to worry about re-evaluating the group assignment. It also allows you to modify the grouping logic without re-assigning names. You also have the ability to map similar enough values to the same grouping (John, Jon, Jonny).
Group_ID is composed of a First_Name and Last_Name. So, store it that way.
declare #Names table ( First_Name varchar(10), Middle_Name varchar(10), Last_Name varchar(10))
insert into #Names
select 'Jon', 'Jacob', 'Schmidt' union all
select 'William', 'B.', 'Schmidt' union all
select 'Sally', 'Anne', 'Johnson' union all
select 'Jon', 'Two', 'Schmidt'
declare #NameGroup table (Group_Id int identity(1,1), First_Name varchar(10), Last_Name varchar(10) unique(First_Name, Last_Name));
insert into #NameGroup (First_Name, Last_Name)
select 'Jon', 'Schmidt' union all
select 'Sally', 'Johnson';
declare #Group_ID int;
declare #First_Name varchar(10),
#Middle_Name varchar(10),
#Last_Name varchar(10)
select #First_Name = 'Jon',
#Middle_Name = 'X',
#Last_Name = 'Schmidt'
--be sure the Id has already been assigned
insert into #NameGroup
select #First_Name, #Last_Name
where not exists(select 1 from #NameGroup where First_Name = #First_Name and Last_Name = #Last_Name)
--resolve the id
select #Group_ID = Group_ID
from #NameGroup
where First_Name = #First_Name and
Last_Name = #Last_Name;
--store the name
insert into #Names (First_Name, Middle_Name, Last_Name)
values(#First_Name, #Middle_Name, #Last_Name);
select n.First_Name, n.Middle_Name, n.Last_Name, ng.Group_Id
from #Names n
join #NameGroup ng on
n.First_Name = ng.First_Name and
n.Last_Name = ng.Last_Name;

TSQL - SQL 2000

I'm struggling with this one. I have a table A which looks like this:
Employee_ID Dependant_FirstName DOB
1 John 12/12/1980
1 Lisa 11/11/1982
2 Mark 06/06/1985
2 Shane 07/07/1982
2 Mike 03/04/1990
3 NULL NULL
and would like to copy these data in Table B like this (knowing that there could only be a maximum of 6 dependants in Table A):
Employee_ID Dependant1_FirstName DOB Dependant2_FirstName DOB Dependant3_FirstName DOB
1 John 12/12/1980 Lisa 11/11/1982 NULL NULL
2 Mark 06/06/1985 Shane 07/07/1982 Mike 03/04/1990
3 NULL NULL NULL NULL NULL NULL
Thanks very much for the help.
Marc
This is a working example for just your example data, to give an idea of how I'd do it. I'm using a faked-up dependant counter based on date of birth and name. Bear in mind it will break if an employee has twins with the same name, but if they do that, then they deserve all the lifelong data-confusion that they've got in store :)
Also, please consider upgrading that SQL Server. Or moving this kind of pivoting to your reporting tool rather than the database.
CREATE TABLE #employees (employee_id INTEGER, Dependant_FirstName VARCHAR(20), DOB DATETIME)
INSERT INTO #employees VALUES (1, 'John', '12/12/1980')
INSERT INTO #employees VALUES (1, 'Lisa', '11/11/1982')
INSERT INTO #employees VALUES (2, 'Shane', '07/07/1982')
INSERT INTO #employees VALUES (2, 'Mark', '06/06/1985')
INSERT INTO #employees VALUES (2, 'Mike', '03/04/1990')
INSERT INTO #employees VALUES (3, NULL, NULL)
SELECT
employee_id,
MAX(CASE WHEN dep_count = 1 THEN Dependant_FirstName ELSE NULL END) 'Dependant1_FirstName',
MAX(CASE WHEN dep_count = 1 THEN DOB ELSE NULL END) 'Dependant1_DOB',
MAX(CASE WHEN dep_count = 2 THEN Dependant_FirstName ELSE NULL END) 'Dependant2_FirstName',
MAX(CASE WHEN dep_count = 2 THEN DOB ELSE NULL END) 'Dependant2_DOB',
MAX(CASE WHEN dep_count = 3 THEN Dependant_FirstName ELSE NULL END) 'Dependant3_FirstName',
MAX(CASE WHEN dep_count = 3 THEN DOB ELSE NULL END) 'Dependant3_DOB'
FROM
(
SELECT
employee_id,
Dependant_FirstName,
DOB,
(
SELECT
COUNT(*)
FROM
#employees deps
WHERE
#employees.employee_id = deps.employee_id AND
CONVERT(VARCHAR, #employees.DOB, 126) + #employees.Dependant_FirstName <=
CONVERT(VARCHAR, deps.DOB, 126) + deps.Dependant_FirstName
) 'dep_count'
FROM
#employees
) add_dep_count_query
GROUP BY
employee_id
You could
Create a view
Calculate a fictuous ranking
Group to find the maximum ranking for each employee_ID
return the results.
Note: I have ommitted the DOB column in the examples
Statement
CREATE VIEW dbo.VIEW_Employees_Ranking AS
SELECT Ranking = ISNULL(e6.Employee_ID, 0)
+ ISNULL(e5.Employee_ID, 0)
+ ISNULL(e4.Employee_ID, 0)
+ ISNULL(e3.Employee_ID, 0)
+ ISNULL(e2.Employee_ID, 0)
+ ISNULL(e1.Employee_ID, 0)
, e1.Employee_ID
, Name1 = e1.Dependant_FirstName
, Name2 = e2.Dependant_FirstName
, Name3 = e3.Dependant_FirstName
, Name4 = e4.Dependant_FirstName
, Name5 = e5.Dependant_FirstName
, Name6 = e6.Dependant_FirstName
FROM dbo.Employees e1
LEFT OUTER JOIN dbo.Employees e2 ON e2.Employee_ID = e1.Employee_ID AND e2.DOB > e1.DOB
LEFT OUTER JOIN dbo.Employees e3 ON e3.Employee_ID = e2.Employee_ID AND e3.DOB > e2.DOB
LEFT OUTER JOIN dbo.Employees e4 ON e4.Employee_ID = e3.Employee_ID AND e4.DOB > e3.DOB
LEFT OUTER JOIN dbo.Employees e5 ON e5.Employee_ID = e4.Employee_ID AND e5.DOB > e4.DOB
LEFT OUTER JOIN dbo.Employees e6 ON e6.Employee_ID = e5.Employee_ID AND e6.DOB > e5.DOB
GO
SELECT er.*
FROM dbo.VIEW_Employees_Ranking er
INNER JOIN (
SELECT Ranking = MAX(Ranking)
, Employee_ID
FROM dbo.VIEW_Employees_Ranking
GROUP BY
Employee_ID
) ermax ON ermax.Ranking = er.Ranking AND ermax.Employee_ID = er.Employee_ID
Check this code please, It might work for you.
declare #Emp_Id int
declare #Name int
declare #DOB int
declare #Count int
set #Count=1
DECLARE x_cursor CURSOR FOR
SELECT distinct Employee_ID from tableA
OPEN x_cursor
FETCH NEXT FROM x_cursor
INTO #Emp_Id
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE second_cursor CURSOR FOR
SELECT distinct Dependant_FirstName,DOB from tableA
where Employee_ID=#Emp_Id
OPEN second_cursor
FETCH NEXT FROM second_cursor
INTO #Name,#DOB
WHILE ##FETCH_STATUS = 0
BEGIN
if(#Count=1)
begin
insert into tableB (Employee_ID , Dependant1_FirstName,DOB)
values(#Emp_Id,#Name,#DOB)
set #Count=#Count+1
end
else
begin
exec('Update tableB set Dependant'+#count+'_FirstName='+#Name+' ,DOB'+#Count+'='+#DOB+' where Employee_ID='+#Emp_Id)
set #Count=#Count+1
end
FETCH NEXT FROM second_cursor
INTO #Name,#DOB
END
CLOSE second_cursor
DEALLOCATE second_cursor
set #Count=1
FETCH NEXT FROM x_cursor
INTO #Emp_Id
END
CLOSE x_cursor;
DEALLOCATE x_cursor
GO
Have a look at this example:
http://ryanfarley.com/blog/archive/2005/02/17/1712.aspx
here he is concatentating the child elements of a parent key into a string which should allow you to write out a flat record.

Date Range Intersection Splitting in SQL

I have a SQL Server 2005 database which contains a table called Memberships.
The table schema is:
PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime
I'm currently working on a grid feature which shows a break-down of memberships by person. One of the requirements is to split membership rows where there is an intersection of date ranges. The intersection must be bound by the Surname and FirstName, ie splits only occur with membership records of the same Surname and FirstName.
Example table data:
18 Smith John Poker Club 01/01/2009 NULL
18 Smith John Library 05/01/2009 18/01/2009
18 Smith John Gym 10/01/2009 28/01/2009
26 Adams Jane Pilates 03/01/2009 16/02/2009
Expected result set:
18 Smith John Poker Club 01/01/2009 04/01/2009
18 Smith John Poker Club / Library 05/01/2009 09/01/2009
18 Smith John Poker Club / Library / Gym 10/01/2009 18/01/2009
18 Smith John Poker Club / Gym 19/01/2009 28/01/2009
18 Smith John Poker Club 29/01/2009 NULL
26 Adams Jane Pilates 03/01/2009 16/02/2009
Does anyone have any idea how I could write a stored procedure that will return a result set which has the break-down described above.
The problem you are going to have with this problem is that as the data set grows, the solutions to solve it with TSQL won't scale well. The below uses a series of temporary tables built on the fly to solve the problem. It splits each date range entry into its respective days using a numbers table. This is where it won't scale, primarily due to your open ranged NULL values which appear to be inifinity, so you have to swap in a fixed date far into the future that limits the range of conversion to a feasible length of time. You could likely see better performance by building a table of days or a calendar table with appropriate indexing for optimized rendering of each day.
Once the ranges are split, the descriptions are merged using XML PATH so that each day in the range series has all of the descriptions listed for it. Row Numbering by PersonID and Date allows for the first and last row of each range to be found using two NOT EXISTS checks to find instances where a previous row doesn't exist for a matching PersonID and Description set, or where the next row doesn't exist for a matching PersonID and Description set.
This result set is then renumbered using ROW_NUMBER so that they can be paired up to build the final results.
/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int,
Surname nvarchar(30),
FirstName nvarchar(30),
Description nvarchar(100),
StartDate datetime,
EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO
*/
SELECT
PersonID,
Description,
theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
FROM master..spt_values
WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate
AND theDate <= isnull(EndDate, '31/12/2012')
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
PersonID,
theDate,
STUFF((
SELECT '/' + Description
FROM #SplitRanges AS s
WHERE s.PersonID = sr.PersonID
AND s.theDate = sr.theDate
FOR XML PATH('')
), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID,
*
INTO #InterimResults
FROM
(
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID - 1 = t2.RowID
AND t1.Descriptions = t2.Descriptions)
UNION ALL
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID = t2.RowID - 1
AND t1.Descriptions = t2.Descriptions)
) AS t
SELECT DISTINCT
PersonID,
Surname,
FirstName
INTO #DistinctPerson
FROM Schedule
SELECT
t1.PersonID,
dp.Surname,
dp.FirstName,
t1.Descriptions,
t1.theDate AS StartDate,
CASE
WHEN t2.theDate = '31/12/2012' THEN NULL
ELSE t2.theDate
END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1
ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2
ON t2.PersonID = t1.PersonID
AND t1.ID + 1 = t2.ID
AND t1.Descriptions = t2.Descriptions
DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults
/*
DROP TABLE Schedule
*/
The above solution will also handle gaps between additional Descriptions as well, so if you were to add another Description for PersonID 18 leaving a gap:
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')
It will fill the gap appropriately. As pointed out in the comments, you shouldn't have name information in this table, it should be normalized out to a Persons Table that can be JOIN'd to in the final result. I simulated this other table by using a SELECT DISTINCT to build a temp table to create that JOIN.
Try this
SET DATEFORMAT dmy
DECLARE #Membership TABLE(
PersonID int,
Surname nvarchar(16),
FirstName nvarchar(16),
Description nvarchar(16),
StartDate datetime,
EndDate datetime)
INSERT INTO #Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO #Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')
--Program Starts
declare #enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select #enddate = case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from #Membership
--Fill the null enddates
; with fillNullEndDates_cte as
(
select
row_number() over(partition by PersonId order by PersonId) RowNum
,PersonId
,Surname
,FirstName
,Description
,StartDate
,isnull(EndDate,#enddate) EndDate
from #Membership
)
--Generate a date calender
, generateCalender_cte as
(
select
1 as CalenderRows
,min(startdate) DateValue
from #Membership
union all
select
CalenderRows+1
,DateValue + 1
from generateCalender_cte
where DateValue + 1 <= #enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
(
select
t.RowNum
,t.PersonId
,t.Surname
,t.FirstName
,t.Description
, d.DateValue
,d.CalenderRows
from generateCalender_cte d
join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
select
PersonID
,Surname
,FirstName
,stuff((
select '/' + Description
from datesBasedOnMemberships_cte d1
where d1.PersonID = d2.PersonID
and d1.DateValue = d2.DateValue
for xml path('')
), 1, 1,'') as Description
, DateValue
,CalenderRows
from datesBasedOnMemberships_cte d2
group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
select d.*,
CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS [Group]
from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= #enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)
[Only many, many years later.]
I created a stored procedure that will align and break segments by a partition within a single table, and then you can use those aligned breaks to pivot the description into a ragged column using a subquery and XML PATH.
See if the below help:
Documentation: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.md
Stored Procedure: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.sql
For example, your call might look like:
EXEC dbo.DateSegments_AlignWithinTable
#tableName = 'tableName',
#keyFieldList = 'PersonID',
#nonKeyFieldList = 'Description',
#effectivveDateFieldName = 'StartDate',
#terminationDateFieldName = 'EndDate'
You will want to capture the result (which is a table) into another table or temporary table (assuming it is called "AlignedDataTable" in below example). Then, you can pivot using a subquery.
SELECT
PersonID, StartDate, EndDate,
SUBSTRING ((SELECT ',' + [Description] FROM AlignedDataTable AS innerTable
WHERE
innerTable.PersonID = AlignedDataTable.PersonID
AND (innerTable.StartDate = AlignedDataTable.StartDate)
AND (innerTable.EndDate = AlignedDataTable.EndDate)
ORDER BY id
FOR XML PATH ('')), 2, 999999999999999) AS IdList
FROM AlignedDataTable
GROUP BY PersonID, StartDate, EndDate
ORDER BY PersonID, StartDate

Resources