System Versioned (Temporal) tables in a view - sql-server

I have a number of joined "system versioned" tables, e.g. Person, PhoneNumber and EmailAddress
The Person will only have one PhoneNumber and one EmailAddress at a time.
The PhoneNumber and EmailAddress will not usually be updated outside of a transaction that updates all 3 at once. (But they can be updated independently, just not in the normal scenario)
E.g. if I change the phone number, then all 3 records will be issued an update in the same transaction, hence giving them all the same "start time" in the history table.
Let's say I insert a person and then change the person's name, email address and phone number in 2 transactions:
DECLARE #Id TABLE(ID INT)
DECLARE #PersonId INT
-- Initial insert
BEGIN TRANSACTION
INSERT INTO Person (Name) OUTPUT inserted.PersonId INTO #Id VALUES ('Homer')
SELECT #PersonId = Id FROM #Id
INSERT INTO EmailAddress (Address, PersonId) VALUES ('homer#fake', #PersonId)
INSERT INTO PhoneNumber (Number, PersonId) VALUES ('999', #PersonId)
COMMIT TRANSACTION
-- Update
WAITFOR DELAY '00:00:02'
BEGIN TRANSACTION
UPDATE Person SET Name = 'Kwyjibo' WHERE PersonID = #PersonId
UPDATE EmailAddress SET Address = 'kwyjibo#fake' WHERE PersonID = #PersonId
UPDATE PhoneNumber SET Number = '000' WHERE PersonID = #PersonId
COMMIT TRANSACTION
Now I select from the view (just an inner join of the tables) using a temporal query:
SELECT * FROM vwPerson FOR SYSTEM_TIME ALL
WHERE PersonId = #PersonId
ORDER BY SysStartTime DESC
And I get returned a row for every combination of edit!
How can I query this view (if at all possible) to only return 1 row for the updates that were made in the same transaction?
I could add a WHERE clause to match all the SysStartTimes, however that would exclude those cases where a table was updated independently of the other 2.

Because of the independent updates, you actually first have to "reconstruct" a timeline, onto which you can join the data. A "sketch" of this follows, obviously don't have your actual table defns so untested:
;WITH AllTimes as (
SELECT PersonId,SysStartTime as ATime FROM Person
UNION
SELECT PersonId,SysEndTime FROM Person
UNION
SELECT PersonId,SysStartTime FROM EmailAddress
UNION
SELECT PersonId,SysEndTime FROM EmailAddress
UNION
SELECT PersonId,SysStartTime FROM PhoneNumber
UNION
SELECT PersonId,SysEndTime FROM PhoneNumber
), Ordered as (
SELECT
PersonId, ATime, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY Atime) rn
FROM
AllTimes
), Intervals as (
SELECT
p1.PersonId,
o1.ATime as StartTime,
o2.ATime as EndTime
FROM
Ordered o1
inner join
Ordered o2
on
o1.PersonId = o2.PersonId and
o1.rn = o2.rn - 1
)
SELECT
* --TODO - Columns
FROM
Intervals i
inner join
Person p
on
i.PersonId = p.PersonId and
i.StartTime < p.SysEndTime and
p.SysStartTime < i.EndTime
inner join
Email e
on
i.PersonId = e.PersonId and
i.StartTime < e.SysEndTime and
e.SysStartTime < i.EndTime
inner join
PhoneNumber pn
on
i.PersonId = pn.PersonId and
i.StartTime < pn.SysEndTime and
pn.SysStartTime < i.EndTime
With appropriate filters if you just want one persons details, the optimizer will hopefully work it out. There may be additional filters for the joins that I've missed out also.
Hopefully you can see how the 3 CTEs construct the timeline. We take advantage of UNION eliminating duplicates in the first one.

Related

How decrease logical reads while getting result from view?

I have a view which I will mention its code below, when I filter entire code, I get proper logical reads, but when I filter the view, logical reads increases so much!
I used subquery instead of cte, I made so many changes in my code, but I couldn't get proper result
This is my view code:
create view att.view_notRule
as
with timeline
as
(
select person,location,dateTime,d_base
from att.view_pD1
union all
select person,location,dateTime_in,d_base
from att.view_rule
union all
select person,location,dateTime_out,d_base
from att.view_rule
),
timelineRanking
as
(
select person,location,row_number() over (partition by person,location,d_base order by dateTime) rank,dateTime,d_base
from timeline
)
select x.person,x.location,x.dateTime dateTime_start,y.dateTime dateTime_end,x.d_base
from timelineRanking x
inner join timelineRanking y on x.person=y.person and x.location=y.location and x.d_base=y.d_base
where x.rank+1=y.rank and x.rank%2=1
when I execute this query I face so many logical reads:
select *
from att.view_notRule
where person='B18FE132-2779-4E0D-A776-4BD27E7EEB7C'
But, when I filter person, inside the code, I get proper logical reads
I need to execute this:
select *
from att.view_notRule
where person='B18FE132-2779-4E0D-A776-4BD27E7EEB7C'
But getting proper logical reads
When you are querying from a view that can't propagate predicates down to the base tables (which sometimes is due to the view design, and sometimes due to limitations in the query optimizer), a useful pattern is to replace the view with a In-Line Table-Valued Function, which is kind of like a parameterized view.
eg:
create or alter function att.view_notRule(#person varchar(200)) returns table
as return
with timeline
as
(
select person,location,dateTime,d_base
from att.view_pD1
where person = #person
union all
select person,location,dateTime_in,d_base
from att.view_rule
where person = #person
union all
select person,location,dateTime_out,d_base
from att.view_rule
where person = #person
),
timelineRanking
as
(
select person,location,row_number() over (partition by person,location,d_base order by dateTime) rank,dateTime,d_base
from timeline
)
select x.person,x.location,x.dateTime dateTime_start,y.dateTime dateTime_end,x.d_base
from timelineRanking x
inner join timelineRanking y on x.person=y.person and x.location=y.location and x.d_base=y.d_base
where x.rank+1=y.rank and x.rank%2=1
Then if you need to run the view across multiple people, you can do so with CROSS APPLY. But for your
If the logical reads is the concern i would normally start to filter base tables (as mentioned by David Browne). In your case, it would go as function instead of view by applying WHERE clause inside CTE.
Also, I would suggest to avoid self join instead use LEAD() function, following is the same example that you can start with:
Declare #PersonID varchar(20) = 'YourData';
with timeline
as
(
select person, location, dateTime, d_base
from att.view_pD1
where person = #PersonID
union all
select person, location, dateTime_in, d_base
from att.view_rule
where person = #PersonID
union all
select person, location, dateTime_out, d_base
from att.view_rule
where person = #PersonID
)
select person,
location,
d_base,
dateTime as StartTime,
LEAD(dateTime) over (partition by person, location, d_base order by dateTime) EndTime
from timeline
go
or To get extreme start and end date, you may start with following:
Declare #PersonID varchar(20) = 'YourData';
select m.person,
m.location,
m.d_base,
s.datetime as StartTime,
e.datetime as EndTime
from att.<viewMasterData> as M
LEFT JOIN
(select person, location, datetime, d_base,
ROW_NUMBER () OVER (Partition by person, location, d_base order by datetime) as StartRN
from att.view_rule
where person = #PersonID
) as s ON M.person = s.person and m.location = s.location and m.base = s.base
LEFT JOIN
(select person, location, datetime, d_base,
ROW_NUMBER () OVER (Partition by person, location, d_base order by datetime desc) as EndRN
from att.view_rule
where person = #PersonID
) as e ON M.person = e.person and m.location = e.location and m.base = e.base
where s.StartRN = 1 and e.EndRN = 1

Create a trigger update a field in a table from Total of Line Items in another table

I have two table Order and Order_Details. I would like to create a trigger that would update the Order.Order_Total by adding the Order_Details.Price fields that belong to that specific order. Here's what I have so far but its giving me the following error
Subquery returned more than 1 value. This is not permitted when the subquery follows
Update Order
Set Order_Total =
(Select SUM(Price)
From Order_Details
Group By Order_Id)
From Order_Details
Try this.. The issue is in your sub query which does not have any binding with the order table.
UPDATE o
SET o.Order_Total = t.tprice
FROM Order o
LEFT JOIN ( SELECT Order_Id, SUM(isnull(price,0)) tprice
FROM OrderDetails
GROUP BY Order_Id) t
ON o.Order_Id=t.Order_Id
Ok here's what I ended up doing in case someone has the same question. I created a CTE to add the Order_Details price and I updated the Order.Total from that CTE. Here's the full code I used.
IF EXISTS ( SELECT 1 FROM sys.triggers WHERE object_id = object_id('dbo.trOrder_Details_AIU') )
DROP TRIGGER dbo.trOrder_Details_AIU
GO
CREATE TRIGGER dbo.trOrder_Details_AIU
ON dbo.Order_Details
AFTER INSERT,UPDATE, Delete
AS
BEGIN
set nocount on;
begin
; with Total_CTE(Order_Id, Total)
as
(
Select Order_Id, SUM(Price)
From Order_Details
Group By Order_Id
)
Update Order
Set Order_Total = Total_CTE.Amount
From Total_CTE
Where Total_CTE.Order_Id = Total.Order_Id
end
END

Select from multiple tables without cartesian product

I'm trying to create a command to copy the details of client memberships from one database to another with identical structure. I've pared it down to the bare essentials for the purposes of the question so the four items to copy are the expiry date, the subscription ID, the client ID and the item ID (which is the service which comprises the subscription).
The clients have a common GUID in both bases. The subscription ID is a unique long int which should be the same in both bases and the expiry date is just a date. So far, so easy. The tricky part is that the item_id is not necessarily the same in each database. I need to map from one to the the other with a where statement, which I know how to do.
My problem is that I need to select from the destination database's own ITEM table (item_0) in order to get and insert the correct item_id and when I do this I get thousands of duplicate rows returned. I assume I need to use a join to avoid this but as I have nothing meaningful to join item_0 to I can't get any results.
insert into DestDB..subscription (expiry_date,id,client_id,item_id)
select
sub_1.expiry_date,
sub_1.id,
cli_0.id as client_id,
item_0.id as item_id,
from SourceDB..subscription sub_1,
DestDB..item item_0,
DestDB..client cli_0
inner join SourceDB..client cli_1
on cli_1.[guid] = cli_0.[guid]
where sub_1.id not in (select id from DestDB..subscription)
and item_0.id =
(select id from DestDB..collectiondetails
where service_ID =
(select id from DestDB..service s_0 where s_0.code =
(select code from SourceDB..service s_1 where s_1.id =
(select service_ID from Source..collectiondetails item_1 where item_1.id = sub_1.item_id)))
and collection_ID =
(select id from DestDB..collection col_0
where col_0.code =
(select code from SourceDB..collection col_1 where col_1.id =
(select collection_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.collection_ID)))
)
I am afraid the updated question is even more confusing. Is this Select in Where clause guaranteed to return one record only or should it be in instead of =?
If there is no rule to identify a particular DestDB..item from the list of matching then the top one should do as well. It would still seem that item_0 can be omitted altogether:
insert into DestDB..subscription (expiry_date,id,client_id,item_id)
select
sub_1.expiry_date,
sub_1.id,
cli_0.id as client_id,
(select Top 1 id from DestDB..collectiondetails --<- Limit to top 1 only
where service_ID =
(select id from DestDB..service s_0 where s_0.code =
(select code from SourceDB..service s_1 where s_1.id =
(select service_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.item_id)))
and collection_ID =
(select id from DestDB..collection col_0
where col_0.code =
(select code from SourceDB..collection col_1 where col_1.id =
(select collection_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.collection_ID)))
) as item_id,
from SourceDB..subscription sub_1,
DestDB..client cli_0
inner join SourceDB..client cli_1
on cli_1.[guid] = cli_0.[guid]
where sub_1.id not in (select id from DestDB..subscription)
Please note: if DestDB..item is empty the question example statement would not insert anything, but this answer one - would with item_id set to NULL.
I personally would try splitting this task into two separate statements in one transaction:
Insert into target table with NULL item_id.
Update target table with new item_id where item_id is NULL.
(optional) Delete unwanted records where appropriate item_id could not be found.

SELECT from multiple queries

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)
WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Resources