TSQL optimisation - sql-server

I have the below query which is taking 2 seconds to execute as there is a significant number of rows (1 million + each) in the two tables and was wondering if there is anything further I can do to optimise the query.
Tables
tblInspection.ID bigint (Primary Key)
tblInspection.IsPassedFirstTime bit (Non clustered index)
tblInspectionFailures.ID bigint (Primary Key)
tblInspectionFailures.InspectionID bigint (Non clustered index)
Query
SELECT TOP 1 tblInspection.ID FROM tblInspection
INNER JOIN tblInspectionFailures ON tblInspection.ID = tblInspectionFailures.InspectionID
WHERE (tblInspection.IsPassedFirstTime = 1)
Execution Plan
I can see that I am doing clustered seeks on the indexes but its still taking some time

the only thing I can think of is
SELECT i.ID FROM
(select TOP 1 id from tblInspection
WHERE IsPassedFirstTime = 1) i
INNER JOIN tblInspectionFailures ON
i.ID = tblInspectionFailures.InspectionID

try
SET ROWCOUNT 1
SELECT tblInspection.ID FROM tblInspection
INNER JOIN tblInspectionFailures ON tblInspection.ID = tblInspectionFailures.InspectionID
WHERE (tblInspection.IsPassedFirstTime = 1)
this does basically the same thing but tells sql to stop returning rows after the 1st one

Related

LEFT JOIN With Redundant Predicate Performs Better Than a CROSS JOIN?

I'm looking at the execution plans for two of these statements and am kind of stumped on why the LEFT JOIN statement performs better than the CROSS JOIN statement:
Table Definitions:
CREATE TABLE [Employee] (
[ID] int NOT NULL IDENTITY(1,1),
[FirstName] varchar(40) NOT NULL,
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ([ID] ASC)
);
CREATE TABLE [dbo].[Numbers] (
[N] INT IDENTITY (1, 1) NOT NULL,
CONSTRAINT [PK_Numbers] PRIMARY KEY CLUSTERED ([N] ASC)
); --The Numbers table contains numbers 0 to 100,000.
Queries in Question where I join one 'day' to each Employee:
DECLARE #PeriodStart AS date = '2019-11-05';
DECLARE #PeriodEnd AS date = '2019-11-05';
SELECT E.FirstName, CD.ClockDate
FROM Employee E
CROSS JOIN (SELECT DATEADD(day, N.N, #PeriodStart) AS ClockDate
FROM Numbers N
WHERE N.N <= DATEDIFF(day, #PeriodStart, #PeriodEnd)
) CD
WHERE E.ID > 2000;
SELECT E.FirstName, CD.ClockDate
FROM Employee E
LEFT JOIN (SELECT DATEADD(day, N.N, #PeriodStart) AS ClockDate
FROM Numbers N
WHERE N.N <= DATEDIFF(day, #PeriodStart, #PeriodEnd)
) CD ON CD.ClockDate = CD.ClockDate
WHERE E.ID > 2000;
The Execution Plans:
https://www.brentozar.com/pastetheplan/?id=B139JjPKK
As you can see, according to the optimizer the second (left join) query with the seemingly redundant predicate seems to cost way less than the first (cross join) query. This is also the case when the period dates span multiple days.
What's weird is if I change the LEFT JOIN's predicate to something different like 1 = 1 it'll perform like the CROSS APPLY. I also tried changing the SELECT portion of the LEFT JOIN to SELECT N and joined on CD.N = CD.N ... but that also seems to perform poorly.
According to the execution plan, the second query has an index seek that only reads 3000 rows from the Numbers table while the first query is reading 10 times as many. The second query's index seek also has this predicate (which I assume comes from the LEFT JOIN):
dateadd(day,[Numbers].[N] as [N].[N],[#PeriodStart])=dateadd(day,[Numbers].[N] as [N].[N],[#PeriodStart])
I would like to understand why the second query seems to perform so much better even though I wouldn't except it to? Does it have something to do with the fact I'm joining the results of the DATEADD function? Is SQL evaluating the results of DATEADD before joining?
The reason these queries get different estimates, even though the plan is almost the same and will probably take the same time, appears to be because DATEADD(day, N.N, #PeriodStart) is nullable, therefore CD.ClockDate = CD.ClockDate essentially just verifies that the result is not null. The optimizer cannot see that it will always be non-null, so takes the row-estimate down because of it.
But it seems to me that the primary performance problem in your query is that you are selecting the whole of your numbers table every time. Instead you should just select the amount of rows you need
SELECT E.FirstName, CD.ClockDate
FROM Employee E
CROSS JOIN (
SELECT TOP (DATEDIFF(day, #PeriodStart, #PeriodEnd) + 1)
DATEADD(day, N.N, #PeriodStart) AS ClockDate
FROM Numbers N
ORDER BY N.N
) CD
WHERE E.ID > 2000;
Using this technique, you can even use CROSS APPLY (SELECT TOP (outerValue) if you want to correlate the amount of rows to the rest of the query.
For further tips on numbers tables, see Itzik Ben-Gan's excellent series

Optimizing SQL Function

I'm trying to optimize or completely rewrite this query. It takes about ~1500ms to run currently. I know the distinct's are fairly inefficient as well as the Union. But I'm struggling to figure out exactly where to go from here.
I am thinking that the first select statement might not be needed to return the output of;
[Key | User_ID,(User_ID)]
Note; Program and Program Scenario are both using Clustered Indexes. I can provide a screenshot of the Execution Plan if needed.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID] (#_CompKey INT)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #UseID AS VARCHAR(8000);
SET #UseID = '';
SELECT #UseID = #UseID + ', ' + x.User_ID
FROM
(SELECT DISTINCT (UPPER(p.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
WHERE p.CompKey = #_CompKey
UNION
SELECT DISTINCT (UPPER(ps.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
LEFT OUTER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL) x
RETURN Substring(#UserIDs, 3, 8000);
END
There are two things happening in this query
1. Locating rows in the [Program] table matching the specified CompKey (#_CompKey)
2. Locating rows in the [Program_Scenario] table that have the same ProgKey as the rows located in (1) above.
Finally, non-null UserIDs from both these sets of rows are concatenated into a scalar.
For step 1 to be efficient, you'd need an index on the CompKey column (clustered or non-clustered)
For step 2 to be efficient, you'd need an index on the join key which is ProgKey on the Program_Scenario table (this likely is a non-clustered index as I can't imagine ProgKey to be PK). Likely, SQL would resort to a loop join strategy - i.e., for each row found in [Program] matching the CompKey criteria, it would need to lookup corresponding rows in [Program_Scenario] with same ProgKey. This is a guess though, as there is not sufficient information on the cardinality and distribution of data.
Ensure the above two indexes are present.
Also, as others have noted the second left outer join is a bit confusing as an inner join is the right way to deal with it.
Per my interpretation the inner part of the query can be rewritten this way. Also, this is the query you'd ideally run and optimize before tacking the string concatenation part. The DISTINCT is dropped as it is automatic with a UNION. Try this version of the query along with the indexes above and if it provides the necessary boost, then include the string concatenation or the xml STUFF approaches to return a scalar.
SELECT UPPER(p.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
WHERE
p.CompKey = #_CompKey
UNION
SELECT UPPER(ps.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
INNER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE
p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
I am taking a shot in the dark here. I am guessing that the last code you posted is still a scalar function. It also did not have all the logic of your original query. Again, this is a shot in the dark since there is no table definitions or sample data posted.
This might be how this would look as an inline table valued function.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID]
(
#_CompKey INT
) RETURNS TABLE AS RETURN
select MyResult = STUFF(
(
SELECT distinct UPPER(p.User_ID) as User_ID
FROM dbo.Program AS p
WHERE p.CompKey = #_CompKey
group by p.User_ID
UNION
SELECT distinct UPPER(ps.User_ID) as User_ID
FROM dbo.Program AS p
LEFT OUTER JOIN dbo.Program_Scenario AS ps ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
for xml path ('')
), 1, 1, '')
from dbo.Program

Can not get rid of Key Lookup in the explain plan

I am trying to get rid of the Key Lookup operation in the explain plan of the following query:
SELECT s.CompanyId ,
t.PeriodEndDate ,
t.DurationId ,
s.conceptid AS SConceptId ,
c.ConceptId AS CConceptId,
t.NumOfPeriods ,
cast(cast(s.Value as numeric) as varchar(100)) as Value,
s.ConceptId * 17.0 AS ConceptOrdering ,
t.CompoundSortKeyLogicalKey,
1980 + (s.NumberOfQuarters / 4) AS FiscalYear,
(s.NumberOfQuarters % 4) + 1 AS FiscalQuarter,
cam.Alias
FROM [dbo].[TmpCompanyOrderedAndFilteredPKs] t
INNER JOIN [dbo].[synt_ScreenerDb_dbo_ScreenerHistoricalYTD_Number_t] s ON s.CompanyId = t.CompanyId
AND s.numberofquarters = t.numberofquarters AND ( ( t.numberOfQuarters % 4 ) + 1 ) = 4
INNER JOIN [##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982] c ON c.ConceptMapId = s.ConceptId
LEFT JOIN dbo.ConceptAliasMapping cam ON cam.ConceptId = c.ConceptId
WHERE t.OperationGUID = '7FD96D75-FCDB-44B0-9DED-6FE0BC128982'
The screenshot of the explain plan:
I've tried to create indexes on following columns:
Value, ConceptId, CompanyId, NumberOfQuarters
with different combination on INDEX and INCLUDE columns. What did I miss?
There are many performance ussues in your query. Follow the steps mentioned below to avoid key Lookups.
Include all the columns in the select statement in non clustered index
create nonclustered index ncli_1 on TmpCompanyOrderedAndFilteredPKs(CompanyId)
include(PeriodEndDate,DurationId ,NumOfPeriods,CompoundSortKeyLogicalKey,numberofquarters )
create nonclustered index ncli_2 on synt_ScreenerDb_dbo_ScreenerHistoricalYTD_Number_t(CompanyId)
include(conceptid ,Value ,NumberOfQuarters )
create nonclustered index ncli_3 on ##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982(ConceptId)
`create unique clustered index cli_4 on ##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982(ConceptMapId)` -- This will make sql server use
merge join` instead of hash join which will provide performance gain.

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall
CREATE TABLE contacts
(
id int PRIMARY KEY,
phone1 varchar(20) NULL,
phone2 varchar(20) NULL,
phone3 varchar(20) NULL,
phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
list_id int NOT NULL,
phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
list_id ASC,
phone ASC
);
I would like to see what contacts matches the phone number in a specific list of DoNotCall phone.
For faster lookup, I have indexed donotcall on list_id and phone.
When I make the following JOIN it takes a long time (eg. 9 seconds):
SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
Execution plan on Pastebin
While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):
SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
ON d1.list_id = 1
AND d1.phone = c.phone1
LEFT JOIN donotcall d2
ON d2.list_id = 1
AND d2.phone = c.phone2
LEFT JOIN donotcall d3
ON d3.list_id = 1
AND d3.phone = c.phone3
LEFT JOIN donotcall d4
ON d4.list_id = 1
AND d4.phone = c.phone4
WHERE
d1.phone IS NOT NULL
OR d2.phone IS NOT NULL
OR d3.phone IS NOT NULL
OR d4.phone IS NOT NULL
Execution plan on Pastebin
My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?
SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.
You can test if the index would be faster with a hint:
SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)
Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:
select object_name(ind.object_id) as TableName
, ind.name as IndexName
, stats_date(ind.object_id, ind.index_id) as StatisticsDate
from sys.indexes ind
order by
stats_date(ind.object_id, ind.index_id) desc
You can update them manually with:
EXEC sp_updatestats;
With this poor database structure, a UNION ALL query might be fastest.

Reads are not getting low after putting a Index

The requirement is to load 50 records in paging with all 65 columns of table "empl" with minimum IO. There are 280000+ records in table. There is only one clustered index over the PK.
Pagination query is as following:
WITH result_set AS (
SELECT
ROW_NUMBER() OVER (ORDER BY e.[uon] DESC ) AS [row_number], e.*
FROM
empl e with (NOLOCK)
LEFT JOIN empl_add ea with (NOLOCK)
ON ea.ptid = e.ptid
WHERE
e.del = 0 AND e.pub = 1 AND e.sid = 2
AND e.md = 0
AND e.tid = 3
AND e.coid = 2
AND (e.cid = 102)
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1))
SELECT
*
FROM
result_set
WHERE
[row_number] BETWEEN 0 AND 50
Following are the stats after running the above query from profiler:
CPU: 1500, Reads: 25576, Duration: 25704
Then I put the following index over the table empl:
CREATE NONCLUSTERED INDEX [ci_empl]
ON [dbo].[empl] ([del],[md],[pub],[tid],[coid],[sid],[ptid],[cid],[uon])
GO
After putting index CPU and Reads are still higher. I don't know what's wrong with the index or something wrong with the query?
Edit:
The following query is also taking high reads after putting index. And there are only 3 columns and 1 count.
SELECT TOP (2147483647)
ame.aid ID, ame.name name,
COUNT(empl.pid) [Count], ps.uff uff FROM ame with (NOLOCK)
JOIN pam AS pa WITH (NOLOCK) ON pa.aid = ame.aid
JOIN empl WITH (NOLOCK) ON empl.pid = pa.pid
LEFT JOIN psam AS ps
ON ps.psid = 1001
AND ps.aid = ame.aid
LEFT JOIN empl_add ea with (NOLOCK)
ON ea.ptid = empl.ptid
WHERE
empl.del = 0 AND empl.pub = 1 AND empl.sid = 2
AND empl.md = 0
AND (empl.tid = 3)
AND (empl.coid = 2)
AND (empl.cid = 102)
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1)
AND ame.pub = 1 AND ame.del = 0
GROUP BY ame.aid, ame.name, ps.uff
ORDER BY ame.name ASC
Second Edit:
Now I had put the following index on "uon" column:
CREATE NONCLUSTERED INDEX [ci_empl_uon]
ON [dbo].[empl] (uon)
GO
But still CPU and Reads are Higher.
Third Edit:
DTA is suggesting me index with all columns included for the first query so I altered the suggested index convert it to a filter index for the basic four filters to make it more effective.
I added the line below after Include while creating the index.
Where e.del = 0 AND e.pub = 1 AND e.sid = 2 AND e.md = 0 AND e.coid = 2
But still the reads are high on both development and production machine.
Fourth Edit:
Now I had come to a solution that has improved the performance, but still not up to the goal. The key is that it's not going for ALL THE DATA.
The query is a following:
WITH result_set AS (
SELECT
ROW_NUMBER() OVER (ORDER BY e.[uon] DESC ) AS [row_number], e.pID pID
FROM
empl e with (NOLOCK)
LEFT JOIN empl_add ea with (NOLOCK)
ON ea.ptid = e.ptid
WHERE
e.del = 0 AND e.pub = 1 AND e.sid = 2
AND e.md = 0
AND e.tid = 3
AND e.coid = 2
AND (e.cid = 102)
AND ea.ptgid IN (SELECT ptgid FROM empl_dep where psid = 1001
AND ib = 1))
SELECT
*
FROM
result_set join empl on result_set.pID = empl.pID
WHERE
[row_number] BETWEEN #start AND #end
And recreated the index with key column alterations, include and filter:
CREATE NONCLUSTERED INDEX [ci_empl]
ON [dbo].[empl] ([ptid],[cid],[tid],[uon])
INCLUDE ([pID])
Where
[coID] = 2 and
[sID] = 2 and
[pub] = 1 and
[del] = 0 and
[md] = 0
GO
It improves the performance, but not up to the goal.
You are selecting the top 50 rows ordered by e.uon desc. An index that starts with uon will speed up the query:
create index IX_Empl_Uon on dbo.empl (uon)
The index will allow SQL Server to scan the top N rows of a this index. N is the highest number in your pagination: for the 3rd page of 50 elements, N equals 150. SQL Server then does 50 key lookups to retrieve the full rows from the clustered index. As far as I know, this is a textbook example of where an index can make a big difference.
Not all query optimizers will be smart enough to notice that row_number() over ... as rn with where
rn between 1 and 50 means the top 50 rows. But SQL Server 2012 does. It uses the index both for first and consecutive pages, like row_number() between 50 and 99.
You are trying to find the X through X+Nth row from a dataset, based on an order specified by column uon.
I’m assuming here that uon is the mentioned primary key. If not, without an index where uon is the first (if not only) column, a table scan is inevitable.
Next wrinkle: You don’t want that direct span of columns, you want that span of columns as filtered by a fairly extensive assortment of filters. The clustered index might pull the first 50 columns, but the WHERE may filter none, some, or all of those out. More will almost certainly have to read in order to "fill your span".
More fun: you perform a left outer join on table empl_add (e.g. retaing the empl row even if there is no empl_add found), and then require filter out all rows where empladd.ptgid is not found in the subquery. Might as well make this an inner join, it may speed things up and certainly will not make them slower. It is also a "filtering factor" that cannot be addressed with an index on table empl.
So: as I see it (i.e. I’m not testing it all out locally), SQL has to first assemble the data, filter out the invalid rows (which involves table joins), order what remains, and return that span of rows you are interested in. I believe that, with or without the index on uon, SQL is identifying a need to read all the data and filter/sort before it can pick out the desired range.
(Your new index would appear to be insufficient. The sixth column is sid, but sid is not referenced in the query, so it might only be able to help “so far”. This raises lots of questions about data cardinality and whatnot, at which point I defer to #Aarons’ point that we have insufficient information on the overall problem set for a full analysis.)

Resources