I'm trying to implement similar functionality to temporal tables
https://msdn.microsoft.com/en-us/library/dn935015.aspx
Due to some complex business requirements, I cannot use SQL Server's temporal tables, so I'm trying to implement my own version.
Basically, I need a way to be able to determine what is the data in a specific point in time.
The table will store deltas, and a bitmask to determine what columns actually changed (changing the data to NULL is a valid scenario, that's why we need this update mask)
I do have a prototype working, but my query seems very expensive. Instead of one subquery per column, I've been trying to combine them into just 1 query (with a cursor-like behavior).
Tables:
CREATE TABLE user_efdt
(
user_id INT not null,
date_effective DATE not null,
first_name NVARCHAR(100),
last_name NVARCHAR(100),
address1 NVARCHAR(100),
position NVARCHAR(100),
modified_fields VARBINARY(128) NOT NULL
)
ALTER TABLE user_efdt
ADD CONSTRAINT PK_user_efdt
PRIMARY KEY CLUSTERED (user_id, date_effective);
GO
Data setup:
DECLARE #first_name_bit INT
DECLARE #last_name_bit INT
DECLARE #address1_bit INT
DECLARE #position_bit INT
SET #first_name_bit = 3
SET #last_name_bit = 4
SET #address1_bit = 5
SET #position_bit = 6
-- user john does gets added to system on 1-may-16 with first name and last name
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-may-16', 'john', 'doe', null, null, CAST(POWER(2, #first_name_bit) + POWER(2, #last_name_bit) AS VARBINARY))
-- user mary ann gets added to system on 1-may-16 with first name, last name, address and position
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-may-16', 'mary', 'ann', '123 main st', 'manager', CAST(POWER(2, #first_name_bit) + POWER(2, #last_name_bit) + POWER(2, #address1_bit) + POWER(2, #position_bit) AS VARBINARY))
-- john doe gets address updated on 1-apr-16
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-apr-16', null, null, '456 el dorado st', null, CAST(POWER(2, #address1_bit) AS VARBINARY))
-- john doe gets position updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-mar-16', null, null, null, 'engineer', CAST(POWER(2, #position_bit) AS VARBINARY))
-- john doe gets position updated again
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-jun-16', null, null, null, 'engineer 2', CAST(POWER(2, #position_bit) AS VARBINARY))
-- john doe gets position updated to NULL (erased)
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (1000, '1-jul-16', null, null, null, null, CAST(POWER(2, #position_bit) AS VARBINARY))
-- mary ann gets address updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-jun-16', null, null, '1443 hoover st', null, CAST(POWER(2, #address1_bit) AS VARBINARY))
-- mary ann gets position updated
INSERT INTO user_efdt (user_id, date_effective, first_name, last_name, address1, position, modified_fields)
VALUES (2000, '1-jul-16', null, null, null, 'manager 2', CAST(POWER(2, #position_bit) AS VARBINARY))
Here's my version of the query:
CREATE PROCEDURE as_of_date_query (#date DATE)
AS
BEGIN
DECLARE #first_name_bit INT
DECLARE #last_name_bit INT
DECLARE #address1_bit INT
DECLARE #position_bit INT
SET #first_name_bit = 3
SET #last_name_bit = 4
SET #address1_bit = 5
SET #position_bit = 6
SELECT
q1.date_effective, fn.val first_name, ln.val last_name,
addr.val address1, pos.val position
FROM
(SELECT
user_id, MAX(date_effective) date_effective
FROM
user_efdt
WHERE
date_effective <= #date
GROUP BY
user_id) q1
LEFT JOIN
(SELECT DISTINCT
user_id,
FIRST_VALUE(first_name) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM
user_efdt
WHERE
(modified_fields & POWER(2, #first_name_bit)) > 0
AND date_effective <= #date) fn ON q1.user_id = fn.user_id
LEFT JOIN
(SELECT DISTINCT
user_id,
FIRST_VALUE(last_name) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM
user_efdt
WHERE
(modified_fields & POWER(2, #last_name_bit)) > 0
AND date_effective <= #date) ln ON q1.user_id = ln.user_id
LEFT JOIN (
SELECT DISTINCT user_id, FIRST_VALUE(address1) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM user_efdt
WHERE (modified_fields & POWER(2, #address1_bit)) > 0
AND date_effective <= #date
) addr ON q1.user_id = addr.user_id
LEFT JOIN (
SELECT DISTINCT user_id, FIRST_VALUE(position) OVER (PARTITION BY user_id ORDER BY date_effective DESC) val
FROM user_efdt
WHERE (modified_fields & POWER(2, #position_bit)) > 0
AND date_effective <= #date
) pos ON q1.user_id = pos.user_id
END
expected results:
EXEC as_of_date_query '1-mar-16'
-- 2016-03-01 NULL NULL NULL engineer
EXEC as_of_date_query '5-apr-16'
-- 2016-04-01 NULL NULL 456 el dorado st engineer
EXEC as_of_date_query '15-may-16'
-- 2016-05-01 john doe 456 el dorado st engineer
-- 2016-05-01 mary ann 123 main st manager
EXEC as_of_date_query '5-jun-16'
-- 2016-06-01 john doe 456 el dorado st engineer 2
-- 2016-06-01 mary ann 1443 hoover st manager
EXEC as_of_date_query '1-jul-16'
-- 2016-07-01 john doe 456 el dorado st NULL
-- 2016-07-01 mary ann 1443 hoover st manager 2
Related
We receive data on a weekly and monthly basis with information regarding customers. We also sometimes have the same information stored from another source. The two sources sometimes provide contradictory information regarding customers.
How would I write a query which tells me the mismatched CustomerId and corresponding Vehicle? For example, CustomerId 947623 is associated with Kia in the vendor extract [Table 1] whereas we have the same customer stored as related to Hyundai [Table 2].
Table 1: Data received from the vendor.
CustomerId
FirstName
LastName
Vehicle
MiscColumns
027548
Jane
Doe
Honda
MiscData
947623
John
Smith
Kia
MiscData
549816
Erin
Woods
Chevy
MiscData
739232
Henry
Jackson
Ford
MiscData
Table 2: Internal data records
CustomerId
FirstName
LastName
Vehicle
MiscColumns
027548
Jane
Doe
Honda
MiscData
947623
John
Smith
Hyundai
MiscData
549816
Erin
Woods
Chevy
MiscData
739232
Henry
Jackson
Ford
MiscData
Please try the following solution.
It will work starting from SQL Server 2016 onwards.
SQL
-- DDL and sample data population, start
DECLARE #TableA TABLE (CustomerId CHAR(6) PRIMARY KEY, FirstName VARCHAR(100), LastName VARCHAR(100), Vehicle VARCHAR(100));
DECLARE #TableB table (CustomerId CHAR(6) PRIMARY KEY, FirstName VARCHAR(100), LastName VARCHAR(100), Vehicle VARCHAR(100));
INSERT INTO #TableA (CustomerId, FirstName, LastName, Vehicle) VALUES
('027548', 'Jane', 'Doe', 'Honda'),
('947623', 'John', 'Smith', 'Kia'),
('549816', 'Erin', 'Woods', 'Chevy'),
('739232', 'Henry', 'Jackson', 'Ford');
INSERT INTO #TableB (CustomerId, FirstName, LastName, Vehicle) VALUES
('027548', 'Jane', 'Doe', 'Honda'),
('947623', 'John', 'Smith', 'Hyundai'),
('549816', 'Erin', 'Woods', 'Chevy'),
('739232', 'Henry', 'Jackson', 'Ford');
-- DDL and sample data population, end
SELECT CustomerId
,[key] AS [column]
,Org_Value = MAX( CASE WHEN Src=1 THEN Value END)
,New_Value = MAX( CASE WHEN Src=2 THEN Value END)
FROM (
SELECT Src=1
,CustomerId
,B.*
FROM #TableA A
CROSS APPLY ( SELECT [Key]
,COALESCE(Value, '') AS Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
UNION ALL
SELECT Src=2
,CustomerId
,B.*
FROM #TableB A
CROSS APPLY ( SELECT [Key]
,COALESCE(Value, '') AS Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
) AS A
GROUP BY CustomerId,[key]
HAVING MAX(CASE WHEN Src=1 THEN Value END)
<> MAX(CASE WHEN Src=2 THEN Value END)
ORDER BY CustomerId,[key];
Output
CustomerId
column
Org_Value
New_Value
947623
Vehicle
Kia
Hyundai
I have been reading the following Microsoft article on recursive queries using CTE and just can't seem to wrap my head around how to use it for group common items.
I have a table the contains the following columns:
ID
FirstName
LastName
DateOfBirth
BirthCountry
GroupID
What I need to do is start with the first person in the table and iterate through the table and find all the people that have the same (LastName and BirthCountry) or have the same (DateOfBirth and BirthCountry).
Now the tricky part is that I have to assign them the same GroupID and then for each person in that GroupID, I need to see if anyone else has the same information and then put the in the same GroupID.
I think I could do this with multiple cursors but it is getting tricky.
Here is sample data and output.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
John Doe and Jane Doe are in the same Group (100) because they have the same (LastName and BirthCountry).
Adam Wayne is in Group (100) because he has the same (BirthDate and BirthCountry) as John Doe.
Kay Wayne is in Group (100) because she has the same (LastName and BirthCountry) as Adam Wayne who is already in Group (100).
Matt Knox is in a new group (101) because he does not match anyone in previous groups.
Jack Stone is in a group (100) because he has the same (BirthDate and BirthCountry) as Kay Wayne who is already in Group (100).
Data scripts:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);
Here's what I came up with. I have rarely written recursive queries so it was some good practice for me. By the way Kay and Adam do not share a birth country in your sample data.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
For the recursion to terminate it's necessary to keep track of the equivalences (via string concatenation) so that at each level it only needs to consider newly discovered equivalences (or connections, transitivities, etc.) Notice that I've avoided using the word group to avoid bleeding into the GROUP BY concept.
http://rextester.com/edit/TVRVZ10193
EDIT: I used an almost arbitrary numbering for the equivalences but if you wanted them to appear in a sequence based on the lowest ID with each block that's easy to do. Instead of using row_number() say min(ID) as grpNum presuming, of course, that IDs are unique.
I assume groupid is the output you want which start from 100.
Even if groupid come from another table,then it is no problem.
Firstly,sorry for my "No cursor comments".Cursor or RBAR operation is require for this task.In fact after a very long time i met such requirement which took so long and I use RBAR operation.
if tommorrow i am able to do using SET BASE METHOD,then I will come and edit it.
Most importantly using RBAR operation make the script more understanding and I think it wil work for other sample data too.
Also give feedback about the performance and how it work with other sample data.
Alsi in my script you note that id are not in serial,and it do not matter,i did this in order to test.
I use print for debuging purpose,you can remove it.
SET NOCOUNT ON
DECLARE #Tbl TABLE(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL) ,
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL),
(7, 'Jerry', 'Stone', '1976-06-08', 'Hay', NULL)
DECLARE #StartGroupid INT = 100
DECLARE #id INT
DECLARE #Groupid INT
DECLARE #Maxid INT
DECLARE #i INT = 1
DECLARE #MinGroupID int=#StartGroupid
DECLARE #MaxGroupID int=#StartGroupid
DECLARE #LastGroupID int
SELECT #maxid = max(id)
FROM #tbl
WHILE (#i <= #maxid)
BEGIN
SELECT #id = id
,#Groupid = Groupid
FROM #Tbl a
WHERE id = #i
if(#Groupid is not null and #Groupid<#MinGroupID)
set #MinGroupID=#Groupid
if(#Groupid is not null and #Groupid>#MaxGroupID)
set #MaxGroupID=#Groupid
if(#Groupid is not null)
set #LastGroupID=#Groupid
UPDATE A
SET groupid =case
when #id=1 and b.groupid is null then #StartGroupid
when #id>1 and b.groupid is null then #MaxGroupID+1--(Select max(groupid)+1 from #tbl where id<#id)
when #id>1 and b.groupid is not null then #MinGroupID --(Select min(groupid) from #tbl where id<#id)
end
FROM #Tbl A
INNER JOIN #tbl B ON b.id = #ID
WHERE (
(
a.BirthCountry = b.BirthCountry
and a.DateOfBirth = b.dateofbirth
)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
)
--if(#id=7) --#id=2,#id=3 and so on (for debug
--break
SET #i = #i + 1
SET #ID = #I
END
SELECT *
FROM #Tbl
Alternate Method but still it return 56,000 rows without rownum=1.See if it work with other sample data or see if you can further optimize it.
;with CTE as
(
select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,#StartGroupid GroupID
,1 rn
FROM #Tbl A where a.id=1
UNION ALL
Select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,case when ((a.BirthCountry = b.BirthCountry and a.DateOfBirth = b.dateofbirth)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
) then b.groupid else b.groupid+1 end
, b.rn+1
FROM #tbl A
inner join CTE B on a.id>1
where b.rn<#Maxid
)
,CTE1 as
(select * ,row_number()over(partition by id order by groupid )rownum
from CTE )
select * from cte1
where rownum=1
Maybe you can run it in this way
SELECT *
FROM table_name
GROUP BY
FirstName,
LastName,
GroupID
HAVING COUNT(GroupID) >= 2
ORDER BY GroupID
I have a Customer table with the following structure.
CustomerId Name Address Phone
1 Joe 123 Main NULL
I also have an Audit table that tracks changes to the Customer table.
Id Entity EntityId Field OldValue NewValue Type AuditDate
1 Customer 1 Name NULL Joe Add 2016-01-01
2 Customer 1 Phone NULL 567-54-3332 Add 2016-01-01
3 Customer 1 Address NULL 456 Centre Add 2016-01-01
4 Customer 1 Address 456 Centre 123 Main Edit 2016-01-02
5 Customer 1 Phone 567-54-3332 843-43-1230 Edit 2016-01-03
6 Customer 1 Phone 843-43-1230 NULL Delete 2016-01-04
I have a CustomerHistory reporting table that will be populated with a daily ETL job. It has the same fields as Customer Table with additional field SnapShotDate.
I need to write a query that takes the records in Audit table, transforms and inserts into CustomerHistory as seen below.
CustomerId Name Address Phone SnapShotDate
1 Joe 456 Centre 567-54-3332 2016-01-01
1 Joe 123 Main 567-54-3332 2016-01-02
1 Joe 123 Main 843-43-1230 2016-01-03
1 Joe 123 Main NULL 2016-01-04
I guess the solution would involve a self-join on Audit table or a recursive CTE. I would appreciate any help with developing this solution.
Note: Unfortunately, I do not have the option to use triggers or change the Audit table schema. Query performance is not a concern since this will be a nightly ETL process.
You can use below script.
DROP TABLE #tmp
CREATE TABLE #tmp (
id INT Identity
, EntityId INT
, NAME VARCHAR(10)
, Address VARCHAR(100)
, Phone VARCHAR(20)
, Type VARCHAR(10)
, SnapShotDate DATETIME
)
;with cte1 as (
select AuditDate, EntityId, Type, [Name], [Address], [Phone]
from
(select AuditDate, EntityId, Type, Field, NewValue from #Audit) p
pivot
(
max(NewValue)
for Field in ([Name], [Address], [Phone])
) as xx
)
insert into #tmp (EntityId, Name, Address, Phone, Type, SnapShotDate)
select EntityId, Name, Address, Phone, Type, AuditDate
from cte1
-- update NULLs columns with the most recent value
update #tmp
set Name = (select top 1 Name from #tmp tp2
where EntityId = tp2.EntityId and Name is not null
order by id desc)
where Name is null
update #tmp
set Address = (select top 1 Address from #tmp tp2
where EntityId = tp2.EntityId and Address is not null
order by id desc)
where Address is null
update #tmp
set Phone = (select top 1 Phone from #tmp tp2
where EntityId = tp2.EntityId and Phone is not null
order by id desc)
where Phone is null
To Create Test Data, use below script
CREATE TABLE #Customer (
CustomerId INT
, NAME VARCHAR(10)
, Address VARCHAR(100)
, Phone VARCHAR(20)
)
INSERT INTO #Customer
VALUES (1, 'Joe', '123 Main', NULL)
CREATE TABLE #Audit (
Id INT
, Entity VARCHAR(50)
, EntityId INT
, Field VARCHAR(20)
, OldValue VARCHAR(100)
, NewValue VARCHAR(100)
, Type VARCHAR(10)
, AuditDate DATETIME
)
insert into #Audit values
(1, 'Customer', 1, 'Name' ,NULL ,'Joe' ,'Add' ,'2016-01-01'),
(2, 'Customer', 1, 'Phone' ,NULL ,'567-54-3332' ,'Add' ,'2016-01-01'),
(3, 'Customer', 1, 'Address' ,NULL ,'456 Centre' ,'Add' ,'2016-01-01'),
(4, 'Customer', 1, 'Address' ,'456 Centre' ,'123 Main' ,'Edit' ,'2016-01-02'),
(5, 'Customer', 1, 'Phone' ,'567-54-3332' ,'843-43-1230' ,'Edit' ,'2016-01-03'),
(6, 'Customer', 1, 'Phone' ,'843-43-1230' ,NULL ,'Delete' ,'2016-01-04'),
(7, 'Customer', 2, 'Name' ,NULL ,'Peter' ,'Add' ,'2016-01-01'),
(8, 'Customer', 2, 'Phone' ,NULL ,'111-222-3333' ,'Add' ,'2016-01-01'),
(8, 'Customer', 2, 'Address' ,NULL ,'Parthenia' ,'Add' ,'2016-01-01')
Result
EntityId Name Address Phone Type SnapShotDate
1 Joe 456 Centre 567-54-3332 Add 2016-01-01
1 Joe 123 Main 843-43-1230 Edit 2016-01-02
1 Joe 123 Main 843-43-1230 Edit 2016-01-03
1 Joe 123 Main 843-43-1230 Delete 2016-01-04
Need to write to a table all the rows where values have changed between 2 datacuts.
This must be done in sql and not using any third party tools.
I can find the difference between 2 datacuts easily by using "Except".
I have not tried chksum but added a column just in case.
What I am struggling with and need your help is
How do pull all the data out from my findings into my #Changes table?
WANTED RESULT
EmployeeId ColumnName OldValue NewValue
3 MaritalStatus Single Married
3 Surname Malone Evans
10 MaritalStatus Single Married
SETUP TEST DATA
Dummy data set up (2 Employees with Id(3,10) have changes) if you notice
employee id(3) has 2 columns changes.
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL DROP TABLE #Employee
GO
IF OBJECT_ID('tempdb..#Changes') IS NOT NULL DROP TABLE #Changes
GO
CREATE TABLE #Employee
(
[Id] [int] NOT NULL,
EmployeeNo INT NOT NULL,
[DataCut] [int] NULL,
[Name] [varchar](50) NULL,
[Surname] [varchar](50) NULL,
[Gender] [varchar](10) NULL,
[MaritalStatus] [varchar](10) NULL,
[Chksum] [int] NULL,
CONSTRAINT [PK_#Employee]
PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
CREATE TABLE #Changes
(
[EmployeeNo] [int] ,
[ColumnName] [varchar](50) NULL,
[OldValue] [varchar](50) NULL,
[NewValue] [varchar](50) NULL
)
INSERT INTO #Employee([Id], EmployeeNo,[DataCut], [Name], [Surname], [Gender], [MaritalStatus],[Chksum])
SELECT 1, 1,1, N'Jo', N'Bloggs', N'Male', N'Single', NULL UNION ALL
SELECT 2, 2,1, N'Mark', N'Smith', N'Male', N'Single', NULL UNION ALL
SELECT 3, 3,1, N'Jenny', N'Malone', N'Female', N'Single', NULL UNION ALL
SELECT 4, 4,1, N'Mario', N'Rossi', N'Male', N'Single', NULL UNION ALL
SELECT 5, 5,1, N'Richard', N'Jones', N'Male', N'Single', NULL UNION ALL
SELECT 6, 1,2, N'Jo', N'Bloggs', N'Male', N'Single', NULL UNION ALL
SELECT 7, 2,2, N'Mark', N'Smith', N'Male', N'Single', NULL UNION ALL
SELECT 8, 3,2, N'Jenny', N'Evans', N'Female', N'Married', NULL UNION ALL
SELECT 9, 4,2, N'Mario', N'Rossi', N'Male', N'Single', NULL UNION ALL
SELECT 10,5,2, N'Richard', N'Jones', N'Male', N'Married', NULL
--Find all the Rows that have changed between 2 datacuts using EXCEPT
SELECT EmployeeNo,Name, Surname, Gender, MaritalStatus
FROM #Employee
WHERE DataCut=1
EXCEPT
SELECT EmployeeNo,Name, Surname, Gender, MaritalStatus
FROM #Employee
WHERE DataCut=2
UNION
--do the opposite so that we get all the rows.
SELECT EmployeeNo,Name, Surname, Gender, MaritalStatus
FROM #Employee
WHERE DataCut=2
EXCEPT
SELECT EmployeeNo,Name, Surname, Gender, MaritalStatus
FROM #Employee
WHERE DataCut=1
--HOW DO I FILL MY #CHANGES TABLES TO MATCH MY WANTED RESULT?
DROP TABLE #Changes
DROP TABLE #Employee
You can use UNPIVOT:
;WITH UnpivotedTable AS (
SELECT Id, EmployeeNo, DataCut, Val, Col
FROM
(SELECT Id, EmployeeNo, DataCut, CAST(Name AS VARCHAR(50)) AS Name,
CAST(Surname AS VARCHAR(50)) AS Surname,
CAST(Gender AS VARCHAR(50)) AS Gender,
CAST(MaritalStatus AS VARCHAR(50)) AS MaritalStatus
FROM #Employee) AS src
UNPIVOT
(Val FOR Col IN
(Name, Surname, Gender, MaritalStatus)) AS unpvt
)
SELECT t1.Id As EmployeeId,
t1.Col AS ColumnName,
t1.Val AS OldValue,
t2.Val AS NewValue
FROM UnpivotedTable AS t1
INNER JOIN UnpivotedTable AS t2
ON t1.EmployeeNo = t2.EmployeeNo AND t1.Col = t2.Col AND
t1.DataCut = 1 AND t2.DataCut = 2
WHERE t1.Val <> t2.Val
Demo here
Explanation:
Here's an excerpt of the data returned by the CTE (for EmployeeNo = 1):
Id EmployeeNo DataCut Val Col
---------------------------------------------
1 1 1 Jo Name
1 1 1 Bloggs Surname
1 1 1 Male Gender
1 1 1 Single MaritalStatus
6 1 2 Jo Name
6 1 2 Bloggs Surname
6 1 2 Male Gender
6 1 2 Single MaritalStatus
Using the above table expression we can easily get the expected result performing an INNER JOIN operation: we just have to compared 'old' (DataCut = 1) vs new (DataCut = 2) values for the same EmployeeNo and Col.
I have a SQL Server 2005 database which contains a table called Memberships.
The table schema is:
PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime
I'm currently working on a grid feature which shows a break-down of memberships by person. One of the requirements is to split membership rows where there is an intersection of date ranges. The intersection must be bound by the Surname and FirstName, ie splits only occur with membership records of the same Surname and FirstName.
Example table data:
18 Smith John Poker Club 01/01/2009 NULL
18 Smith John Library 05/01/2009 18/01/2009
18 Smith John Gym 10/01/2009 28/01/2009
26 Adams Jane Pilates 03/01/2009 16/02/2009
Expected result set:
18 Smith John Poker Club 01/01/2009 04/01/2009
18 Smith John Poker Club / Library 05/01/2009 09/01/2009
18 Smith John Poker Club / Library / Gym 10/01/2009 18/01/2009
18 Smith John Poker Club / Gym 19/01/2009 28/01/2009
18 Smith John Poker Club 29/01/2009 NULL
26 Adams Jane Pilates 03/01/2009 16/02/2009
Does anyone have any idea how I could write a stored procedure that will return a result set which has the break-down described above.
The problem you are going to have with this problem is that as the data set grows, the solutions to solve it with TSQL won't scale well. The below uses a series of temporary tables built on the fly to solve the problem. It splits each date range entry into its respective days using a numbers table. This is where it won't scale, primarily due to your open ranged NULL values which appear to be inifinity, so you have to swap in a fixed date far into the future that limits the range of conversion to a feasible length of time. You could likely see better performance by building a table of days or a calendar table with appropriate indexing for optimized rendering of each day.
Once the ranges are split, the descriptions are merged using XML PATH so that each day in the range series has all of the descriptions listed for it. Row Numbering by PersonID and Date allows for the first and last row of each range to be found using two NOT EXISTS checks to find instances where a previous row doesn't exist for a matching PersonID and Description set, or where the next row doesn't exist for a matching PersonID and Description set.
This result set is then renumbered using ROW_NUMBER so that they can be paired up to build the final results.
/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int,
Surname nvarchar(30),
FirstName nvarchar(30),
Description nvarchar(100),
StartDate datetime,
EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO
*/
SELECT
PersonID,
Description,
theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
FROM master..spt_values
WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate
AND theDate <= isnull(EndDate, '31/12/2012')
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
PersonID,
theDate,
STUFF((
SELECT '/' + Description
FROM #SplitRanges AS s
WHERE s.PersonID = sr.PersonID
AND s.theDate = sr.theDate
FOR XML PATH('')
), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate
SELECT
ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID,
*
INTO #InterimResults
FROM
(
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID - 1 = t2.RowID
AND t1.Descriptions = t2.Descriptions)
UNION ALL
SELECT *
FROM #MergedDescriptions AS t1
WHERE NOT EXISTS
(SELECT 1
FROM #MergedDescriptions AS t2
WHERE t1.PersonID = t2.PersonID
AND t1.RowID = t2.RowID - 1
AND t1.Descriptions = t2.Descriptions)
) AS t
SELECT DISTINCT
PersonID,
Surname,
FirstName
INTO #DistinctPerson
FROM Schedule
SELECT
t1.PersonID,
dp.Surname,
dp.FirstName,
t1.Descriptions,
t1.theDate AS StartDate,
CASE
WHEN t2.theDate = '31/12/2012' THEN NULL
ELSE t2.theDate
END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1
ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2
ON t2.PersonID = t1.PersonID
AND t1.ID + 1 = t2.ID
AND t1.Descriptions = t2.Descriptions
DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults
/*
DROP TABLE Schedule
*/
The above solution will also handle gaps between additional Descriptions as well, so if you were to add another Description for PersonID 18 leaving a gap:
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')
It will fill the gap appropriately. As pointed out in the comments, you shouldn't have name information in this table, it should be normalized out to a Persons Table that can be JOIN'd to in the final result. I simulated this other table by using a SELECT DISTINCT to build a temp table to create that JOIN.
Try this
SET DATEFORMAT dmy
DECLARE #Membership TABLE(
PersonID int,
Surname nvarchar(16),
FirstName nvarchar(16),
Description nvarchar(16),
StartDate datetime,
EndDate datetime)
INSERT INTO #Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO #Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO #Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')
--Program Starts
declare #enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select #enddate = case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from #Membership
--Fill the null enddates
; with fillNullEndDates_cte as
(
select
row_number() over(partition by PersonId order by PersonId) RowNum
,PersonId
,Surname
,FirstName
,Description
,StartDate
,isnull(EndDate,#enddate) EndDate
from #Membership
)
--Generate a date calender
, generateCalender_cte as
(
select
1 as CalenderRows
,min(startdate) DateValue
from #Membership
union all
select
CalenderRows+1
,DateValue + 1
from generateCalender_cte
where DateValue + 1 <= #enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
(
select
t.RowNum
,t.PersonId
,t.Surname
,t.FirstName
,t.Description
, d.DateValue
,d.CalenderRows
from generateCalender_cte d
join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
select
PersonID
,Surname
,FirstName
,stuff((
select '/' + Description
from datesBasedOnMemberships_cte d1
where d1.PersonID = d2.PersonID
and d1.DateValue = d2.DateValue
for xml path('')
), 1, 1,'') as Description
, DateValue
,CalenderRows
from datesBasedOnMemberships_cte d2
group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
select d.*,
CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS [Group]
from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= #enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)
[Only many, many years later.]
I created a stored procedure that will align and break segments by a partition within a single table, and then you can use those aligned breaks to pivot the description into a ragged column using a subquery and XML PATH.
See if the below help:
Documentation: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.md
Stored Procedure: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.sql
For example, your call might look like:
EXEC dbo.DateSegments_AlignWithinTable
#tableName = 'tableName',
#keyFieldList = 'PersonID',
#nonKeyFieldList = 'Description',
#effectivveDateFieldName = 'StartDate',
#terminationDateFieldName = 'EndDate'
You will want to capture the result (which is a table) into another table or temporary table (assuming it is called "AlignedDataTable" in below example). Then, you can pivot using a subquery.
SELECT
PersonID, StartDate, EndDate,
SUBSTRING ((SELECT ',' + [Description] FROM AlignedDataTable AS innerTable
WHERE
innerTable.PersonID = AlignedDataTable.PersonID
AND (innerTable.StartDate = AlignedDataTable.StartDate)
AND (innerTable.EndDate = AlignedDataTable.EndDate)
ORDER BY id
FOR XML PATH ('')), 2, 999999999999999) AS IdList
FROM AlignedDataTable
GROUP BY PersonID, StartDate, EndDate
ORDER BY PersonID, StartDate