How decrease logical reads while getting result from view? - sql-server

I have a view which I will mention its code below, when I filter entire code, I get proper logical reads, but when I filter the view, logical reads increases so much!
I used subquery instead of cte, I made so many changes in my code, but I couldn't get proper result
This is my view code:
create view att.view_notRule
as
with timeline
as
(
select person,location,dateTime,d_base
from att.view_pD1
union all
select person,location,dateTime_in,d_base
from att.view_rule
union all
select person,location,dateTime_out,d_base
from att.view_rule
),
timelineRanking
as
(
select person,location,row_number() over (partition by person,location,d_base order by dateTime) rank,dateTime,d_base
from timeline
)
select x.person,x.location,x.dateTime dateTime_start,y.dateTime dateTime_end,x.d_base
from timelineRanking x
inner join timelineRanking y on x.person=y.person and x.location=y.location and x.d_base=y.d_base
where x.rank+1=y.rank and x.rank%2=1
when I execute this query I face so many logical reads:
select *
from att.view_notRule
where person='B18FE132-2779-4E0D-A776-4BD27E7EEB7C'
But, when I filter person, inside the code, I get proper logical reads
I need to execute this:
select *
from att.view_notRule
where person='B18FE132-2779-4E0D-A776-4BD27E7EEB7C'
But getting proper logical reads

When you are querying from a view that can't propagate predicates down to the base tables (which sometimes is due to the view design, and sometimes due to limitations in the query optimizer), a useful pattern is to replace the view with a In-Line Table-Valued Function, which is kind of like a parameterized view.
eg:
create or alter function att.view_notRule(#person varchar(200)) returns table
as return
with timeline
as
(
select person,location,dateTime,d_base
from att.view_pD1
where person = #person
union all
select person,location,dateTime_in,d_base
from att.view_rule
where person = #person
union all
select person,location,dateTime_out,d_base
from att.view_rule
where person = #person
),
timelineRanking
as
(
select person,location,row_number() over (partition by person,location,d_base order by dateTime) rank,dateTime,d_base
from timeline
)
select x.person,x.location,x.dateTime dateTime_start,y.dateTime dateTime_end,x.d_base
from timelineRanking x
inner join timelineRanking y on x.person=y.person and x.location=y.location and x.d_base=y.d_base
where x.rank+1=y.rank and x.rank%2=1
Then if you need to run the view across multiple people, you can do so with CROSS APPLY. But for your

If the logical reads is the concern i would normally start to filter base tables (as mentioned by David Browne). In your case, it would go as function instead of view by applying WHERE clause inside CTE.
Also, I would suggest to avoid self join instead use LEAD() function, following is the same example that you can start with:
Declare #PersonID varchar(20) = 'YourData';
with timeline
as
(
select person, location, dateTime, d_base
from att.view_pD1
where person = #PersonID
union all
select person, location, dateTime_in, d_base
from att.view_rule
where person = #PersonID
union all
select person, location, dateTime_out, d_base
from att.view_rule
where person = #PersonID
)
select person,
location,
d_base,
dateTime as StartTime,
LEAD(dateTime) over (partition by person, location, d_base order by dateTime) EndTime
from timeline
go
or To get extreme start and end date, you may start with following:
Declare #PersonID varchar(20) = 'YourData';
select m.person,
m.location,
m.d_base,
s.datetime as StartTime,
e.datetime as EndTime
from att.<viewMasterData> as M
LEFT JOIN
(select person, location, datetime, d_base,
ROW_NUMBER () OVER (Partition by person, location, d_base order by datetime) as StartRN
from att.view_rule
where person = #PersonID
) as s ON M.person = s.person and m.location = s.location and m.base = s.base
LEFT JOIN
(select person, location, datetime, d_base,
ROW_NUMBER () OVER (Partition by person, location, d_base order by datetime desc) as EndRN
from att.view_rule
where person = #PersonID
) as e ON M.person = e.person and m.location = e.location and m.base = e.base
where s.StartRN = 1 and e.EndRN = 1

Related

System Versioned (Temporal) tables in a view

I have a number of joined "system versioned" tables, e.g. Person, PhoneNumber and EmailAddress
The Person will only have one PhoneNumber and one EmailAddress at a time.
The PhoneNumber and EmailAddress will not usually be updated outside of a transaction that updates all 3 at once. (But they can be updated independently, just not in the normal scenario)
E.g. if I change the phone number, then all 3 records will be issued an update in the same transaction, hence giving them all the same "start time" in the history table.
Let's say I insert a person and then change the person's name, email address and phone number in 2 transactions:
DECLARE #Id TABLE(ID INT)
DECLARE #PersonId INT
-- Initial insert
BEGIN TRANSACTION
INSERT INTO Person (Name) OUTPUT inserted.PersonId INTO #Id VALUES ('Homer')
SELECT #PersonId = Id FROM #Id
INSERT INTO EmailAddress (Address, PersonId) VALUES ('homer#fake', #PersonId)
INSERT INTO PhoneNumber (Number, PersonId) VALUES ('999', #PersonId)
COMMIT TRANSACTION
-- Update
WAITFOR DELAY '00:00:02'
BEGIN TRANSACTION
UPDATE Person SET Name = 'Kwyjibo' WHERE PersonID = #PersonId
UPDATE EmailAddress SET Address = 'kwyjibo#fake' WHERE PersonID = #PersonId
UPDATE PhoneNumber SET Number = '000' WHERE PersonID = #PersonId
COMMIT TRANSACTION
Now I select from the view (just an inner join of the tables) using a temporal query:
SELECT * FROM vwPerson FOR SYSTEM_TIME ALL
WHERE PersonId = #PersonId
ORDER BY SysStartTime DESC
And I get returned a row for every combination of edit!
How can I query this view (if at all possible) to only return 1 row for the updates that were made in the same transaction?
I could add a WHERE clause to match all the SysStartTimes, however that would exclude those cases where a table was updated independently of the other 2.
Because of the independent updates, you actually first have to "reconstruct" a timeline, onto which you can join the data. A "sketch" of this follows, obviously don't have your actual table defns so untested:
;WITH AllTimes as (
SELECT PersonId,SysStartTime as ATime FROM Person
UNION
SELECT PersonId,SysEndTime FROM Person
UNION
SELECT PersonId,SysStartTime FROM EmailAddress
UNION
SELECT PersonId,SysEndTime FROM EmailAddress
UNION
SELECT PersonId,SysStartTime FROM PhoneNumber
UNION
SELECT PersonId,SysEndTime FROM PhoneNumber
), Ordered as (
SELECT
PersonId, ATime, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY Atime) rn
FROM
AllTimes
), Intervals as (
SELECT
p1.PersonId,
o1.ATime as StartTime,
o2.ATime as EndTime
FROM
Ordered o1
inner join
Ordered o2
on
o1.PersonId = o2.PersonId and
o1.rn = o2.rn - 1
)
SELECT
* --TODO - Columns
FROM
Intervals i
inner join
Person p
on
i.PersonId = p.PersonId and
i.StartTime < p.SysEndTime and
p.SysStartTime < i.EndTime
inner join
Email e
on
i.PersonId = e.PersonId and
i.StartTime < e.SysEndTime and
e.SysStartTime < i.EndTime
inner join
PhoneNumber pn
on
i.PersonId = pn.PersonId and
i.StartTime < pn.SysEndTime and
pn.SysStartTime < i.EndTime
With appropriate filters if you just want one persons details, the optimizer will hopefully work it out. There may be additional filters for the joins that I've missed out also.
Hopefully you can see how the 3 CTEs construct the timeline. We take advantage of UNION eliminating duplicates in the first one.

ROW_NUMBER in cross apply generating "incorrect" values based on exists clause

Here is the sql:
-- Schema
DECLARE #ModelItem TABLE (
ModelItemId UNIQUEIDENTIFIER,
MetamodelItemId UNIQUEIDENTIFIER
)
DECLARE #MetamodelItemAncestor TABLE (
MetamodelItemId UNIQUEIDENTIFIER,
ParentMetamodelItemId UNIQUEIDENTIFIER,
AncestorLevel INT
)
DECLARE #SolutionMetamodelItem TABLE (
MetamodelItemId UNIQUEIDENTIFIER,
SolutionId UNIQUEIDENTIFIER
)
INSERT INTO #ModelItem VALUES ('EC6AC6A9-684E-E611-8117-00155D026308', '2AB1F075-684E-E611-8117-00155D026308')
INSERT INTO #MetamodelItemAncestor
VALUES ('2AB1F075-684E-E611-8117-00155D026308', '2AB1F075-684E-E611-8117-00155D026308', 0),
('2AB1F075-684E-E611-8117-00155D026308', 'AA12E380-CA4D-E611-8117-00155D026308', 1)
INSERT INTO #SolutionMetamodelItem
VALUES ('2AB1F075-684E-E611-8117-00155D026308', 'f612a333-ca4d-e611-8117-00155d026308'),
('AA12E380-CA4D-E611-8117-00155D026308', 'fc160f3e-ca4d-e611-8117-00155d026308')
-- query
DECLARE #ModelItemId TABLE (EntityId UNIQUEIDENTIFIER)
DECLARE #SolutionId TABLE (EntityId UNIQUEIDENTIFIER)
INSERT INTO #ModelItemId
VALUES ('EC6AC6A9-684E-E611-8117-00155D026308')
INSERT INTO #SolutionId
VALUES ('f612a333-ca4d-e611-8117-00155d026308'), ('fc160f3e-ca4d-e611-8117-00155d026308')
SELECT mia.*
FROM (
SELECT M.EntityId AS ModelItemId, S.EntityId AS SolutionId
FROM #ModelItemId AS M
CROSS JOIN #SolutionId AS S
) AS m
CROSS APPLY (
SELECT
MI.ModelItemId,
OTA.ParentMetamodelItemId AS [MetamodelItemId],
ROW_NUMBER() OVER (PARTITION BY [MI].[ModelItemId] ORDER BY [OTA].[AncestorLevel] ASC) AS [AspectRank]
FROM #ModelItem AS MI
INNER JOIN #MetamodelItemAncestor AS OTA
ON MI.MetamodelItemId = OTA.MetamodelItemId
WHERE
MI.ModelItemId = m.ModelItemId
AND EXISTS (
SELECT 1
FROM #SolutionMetamodelItem AS MSMI
WHERE MSMI.MetamodelItemId = OTA.ParentMetamodelItemId
AND MSMI.SolutionId = m.SolutionId
)
) mia
SELECT mia.*
FROM #ModelItemId AS m
CROSS APPLY (
SELECT
MI.ModelItemId,
OTA.ParentMetamodelItemId AS [MetamodelItemId],
ROW_NUMBER() OVER (PARTITION BY [MI].[ModelItemId] ORDER BY [OTA].[AncestorLevel] ASC) AS [AspectRank]
FROM #ModelItem as MI
INNER JOIN #MetamodelItemAncestor AS OTA
ON MI.MetamodelItemId = OTA.MetamodelItemId
WHERE
MI.ModelItemId = m.EntityId
AND EXISTS (
SELECT 1
FROM #SolutionMetamodelItem MSMI
WHERE MSMI.MetamodelItemId = OTA.ParentMetamodelItemId
AND MSMI.SolutionId IN (SELECT s.EntityId FROM #SolutionId AS s)
)
) mia
Notice the AspectRank. In the second query it has correctly increased the value sequentially based on the partition.
Looking at the execution plan, for the first query it seems like the row_number (sequence project) is running concurrently to the scan of the #solution table, but I still am not fully sure why it has not increased the row number value since there a duplicate items.
Could someone explain this? I need to use the first approach because the cross apply query is in fact a UDF with the ModelItemId and SolutionId as parameters.
I would assume the cross apply is executed separately for each of the rows in your outer query -> each of the rows returned is the 1st (and only) row.
Why do you need to have the row number inside the cross apply, instead of being in the outer query, if that's actually where your data is?

How to use 'Merge' to combine rows into a single one

Scenario:
I have a simplified version of a result set obtained from a series of complex joins. I have placed the result set in a temporary table. The result set consists of records of activity/activities in a day.
I need to join the 2 rows (merge activities of a day into a single row) with similar dates so that the resulting result set would be
I am trying to make this work
Merge #temp as target
using #temp as source
on (target.Date = source.Date) and target.Writing is NULL
when matched then
update set target.Writing = source.Writing;
but I'm running into this error:
The MERGE statement attempted to UPDATE or DELETE the same row more
than once. This happens when a target row matches more than one source
row. A MERGE statement cannot UPDATE/DELETE the same row of the target
table multiple times. Refine the ON clause to ensure a target row
matches at most one source row, or use the GROUP BY clause to group
the source rows.
What code modifications can you suggest?
This should do it:
SELECT dfl.mydate, dfl.firststart, dfl.lastend, fa.ActivityA, sa.ActivityB
FROM
(select s.mydate, firststart, lastend FROM
(SELECT mydate, MIN(starttime) as firststart from target GROUP by mydate) s
iNNER JOIN
(SELECT mydate, MAX(EndTime) as lastend from target GROUP by mydate) e
ON s.mydate = e.mydate) AS dfl
INNER JOIN
target fa on dfl.mydate = fa.mydate and dfl.firststart = fa.starttime
INNER JOIN
target sa on dfl.mydate = sa.mydate and dfl.lastend = sa.EndTime
Please note for my test I have called my table target and the columns: mydate, starttime, endtime, activitya and activityb.
No need to merge, a (relatively) simple select yields the results you want.
HTH
PS It helps when using time data to use a 24 hour clock. I have assumed by 5:00 you really meant 17:00
You don't need MERGE statement.
DECLARE #Test TABLE ([Id] int, [Date] nvarchar(10), [TimeIn] nvarchar(10), [TimeOut] nvarchar(10), [Reading] nvarchar(10), [Writeing] nvarchar(10))
INSERT INTO #Test
VALUES
(1,'08-01','8:00','5:00','Y',NULL),
(2,'08-02','8:00','5:00',NULL,'Y'),
(3,'08-02','5:00','12:00',NULL,'Y'),
(4,'08-03','8:00','5:00',NULL,'Y'),
(5,'08-04','1:00','5:00','Y',NULL),
(6,'08-04','5:00','7:00',NULL,'Y'),
(7,'08-04','7:00','10:00',NULL,'Y'),
(8,'08-04','10:00','13:00',NULL,'Y'),
(9,'08-05','8:00','5:00','Y',NULL)
;WITH CTE AS
(
SELECT
t1.[Date],
t1.TimeIn,
ISNULL(t2.TimeOut, t1.TimeOut) AS TimeOut,
ROW_NUMBER() OVER (PARTITION BY t1.[Date] ORDER BY t1.Id) AS RowNumber
FROM #Test AS t1
LEFT OUTER JOIN #Test AS t2 ON t1.TimeOut = t2.TimeIn AND t1.[Date] = t2.[Date]
)
SELECT
c.[Date],
(SELECT c2.TimeIn FROM CTE AS c2 WHERE c2.[Date] = c.[Date] AND c2.RowNumber = MIN(c.RowNumber)) AS TimeIn,
(SELECT c2.TimeOut FROM CTE AS c2 WHERE c2.[Date] = c.[Date] AND c2.RowNumber = MAX(c.RowNumber)) AS TimeOut
FROM CTE AS c
GROUP BY c.[Date]
You can use merge statements in tables where you have an identical column. You can identify the one or more columns that can be used to uniquely identify the row to be merged.

SQL Query based on occurrence of records

After a long time, I am getting a chance to post a SQL Server question here.
I have a table variable as shown below, in SQL Server 2005. This table is populated by a stored procedure written by some other team.
This is a order processing system. Each order can be accomplished by multiple processes by various departments, based on the OPRouteCode.
Taking example for OrderNo = 2, it has two OPRouteCode - but both these OPRouteCodes are using the same processes by same departments. They are considered equivalent OPRouteCodes.
On the other hand, for example OrderNo = 1, the processes and departments vary; hence they are not equivalent.
What is the best way to select only orders that has non-equivalent OPRouteCodes.
Note: If there is only one OPRouteCode, it is considered as equivalent only. Non-equivalence come only if there are more than one OPRouteCode.
What is the best SQL Server query to get this result? I couldn't write anything working after hours of effort.
DECLARE #OrderProcess TABLE (OrderNo Int,
OPRouteCode VARCHAR(5),
Department VARCHAR(10),
Process VARCHAR(20) )
--Order = 1 OPRouteCode = '0023'
INSERT INTO #OrderProcess
SELECT 1,'0023' ,'103','Receive'
UNION ALL
SELECT 1,'0023' ,'104','Produce'
UNION ALL
SELECT 1,'0023' ,'104','Pack'
UNION ALL
SELECT 1,'0023' ,'105','Ship'
--Order = 1 OPRouteCode = '0077'
INSERT INTO #OrderProcess
SELECT 1,'0077' ,'103','Receive'
UNION ALL
SELECT 1,'0077' ,'104','Produce'
UNION ALL
SELECT 1,'0077' ,'105','Ship'
--Order = 2 OPRouteCode = '0044'
INSERT INTO #OrderProcess
SELECT 2,'0044' ,'105','Receive'
UNION ALL
SELECT 2,'0044' ,'106','Ship'
--Order = 2 OPRouteCode = '0055'
INSERT INTO #OrderProcess
SELECT 2,'0055' ,'105','Receive'
UNION ALL
SELECT 2,'0055' ,'106','Ship'
Table Variable
Expected Output
Alright I got it this time. Sorry for the wrong answer before.
Select OrderNo,OPRouteCode
From (
select OrderNo,OPRouteCode, RANK() OVER(PARTITION BY OrderNo,Department ORDER BY Process ) 'Rnk'
from #OrderProcess
) a
Where Rnk =2

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Resources