Can not get rid of Key Lookup in the explain plan - sql-server

I am trying to get rid of the Key Lookup operation in the explain plan of the following query:
SELECT s.CompanyId ,
t.PeriodEndDate ,
t.DurationId ,
s.conceptid AS SConceptId ,
c.ConceptId AS CConceptId,
t.NumOfPeriods ,
cast(cast(s.Value as numeric) as varchar(100)) as Value,
s.ConceptId * 17.0 AS ConceptOrdering ,
t.CompoundSortKeyLogicalKey,
1980 + (s.NumberOfQuarters / 4) AS FiscalYear,
(s.NumberOfQuarters % 4) + 1 AS FiscalQuarter,
cam.Alias
FROM [dbo].[TmpCompanyOrderedAndFilteredPKs] t
INNER JOIN [dbo].[synt_ScreenerDb_dbo_ScreenerHistoricalYTD_Number_t] s ON s.CompanyId = t.CompanyId
AND s.numberofquarters = t.numberofquarters AND ( ( t.numberOfQuarters % 4 ) + 1 ) = 4
INNER JOIN [##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982] c ON c.ConceptMapId = s.ConceptId
LEFT JOIN dbo.ConceptAliasMapping cam ON cam.ConceptId = c.ConceptId
WHERE t.OperationGUID = '7FD96D75-FCDB-44B0-9DED-6FE0BC128982'
The screenshot of the explain plan:
I've tried to create indexes on following columns:
Value, ConceptId, CompanyId, NumberOfQuarters
with different combination on INDEX and INCLUDE columns. What did I miss?

There are many performance ussues in your query. Follow the steps mentioned below to avoid key Lookups.
Include all the columns in the select statement in non clustered index
create nonclustered index ncli_1 on TmpCompanyOrderedAndFilteredPKs(CompanyId)
include(PeriodEndDate,DurationId ,NumOfPeriods,CompoundSortKeyLogicalKey,numberofquarters )
create nonclustered index ncli_2 on synt_ScreenerDb_dbo_ScreenerHistoricalYTD_Number_t(CompanyId)
include(conceptid ,Value ,NumberOfQuarters )
create nonclustered index ncli_3 on ##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982(ConceptId)
`create unique clustered index cli_4 on ##FinancialsConcepts7FD96D75-FCDB-44B0-9DED-6FE0BC128982(ConceptMapId)` -- This will make sql server use
merge join` instead of hash join which will provide performance gain.

Related

Improve a query with Pivot and Recursive code in SQL Server

I need to reach the next result considering these two tables.
An area receives services from different departments. Each department belongs to a hierarchy on three (or fewer) levels. The idea is to represent in one column the relationship between the area and all the hierarchies where it can be present. The Level Nro should be 1 for the record that does not have any father.
So far, I have this code https://rextester.com/KYHKR17801 . I've got the result that I need. However, the performance is not the best because the table is too large, and I had to do many transformations:
Pivot
Recursion
Addition of register because I lost the nulls when creating the Pivot table
Update the level Nro
I do not if anyone can give any advice to improve the runtime of this query.
This appears to do everything you need in one statement:
WITH R AS
(
SELECT
SA.AreaID,
S.[service],
S.[description],
L.[Level],
L.child_service,
Recursion = 1
FROM dbo.service_area AS SA
JOIN dbo.[service] AS S
ON S.[service] = SA.[Service]
OUTER APPLY
(
-- Unpivot
VALUES
(1, S.level1),
(2, S.level2),
(3, S.level3)
) AS L ([Level], child_service)
WHERE
L.child_service IS NOT NULL
UNION ALL
SELECT
R.AreaID,
S.[service],
S.[description],
R.[Level],
child_service = CHOOSE(R.[Level], S.level1, S.level2, S.level3),
Recursion = R.Recursion + 1
FROM R
JOIN dbo.[service] AS S
ON S.[service] = R.child_service
)
SELECT
R.AreaID,
R.[service],
R.[description],
[Level] = 'Level' + CONVERT(char(1), R.[Level]),
[Level Nro] = ROW_NUMBER() OVER (
PARTITION BY R.AreaID, R.[Level]
ORDER BY R.Recursion DESC)
FROM R
ORDER BY
R.AreaID ASC,
R.[Level] ASC,
[Level Nro]
OPTION (MAXRECURSION 3);
The following index will help the recursive section locate rows quickly:
CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.[service] ([service]);
db<>fiddle demo
If your version of SQL Server doesn't have CHOOSE, write the CASE statement out by hand:
CASE R.[Level] WHEN 1 THEN S.level1 WHEN 2 THEN S.level2 ELSE S.level3 END

Return the values that do not exist in the table but do exist in my IN list?

Environment
I have a table named DEVICE that contains 3 rows:
DeviceID | Number
1 1111111111111111111
2 2222222222222222222
3 4444444444444444444
Using SSMS I query an Azure SQL table for 3 rows using an IN list:
select number from device
where number in
(
'1111111111111111111',
'2222222222222222222',
'3333333333333333333',
)
Result:
NUMBER
1111111111111111111
2222222222222222222
Great that works as expected:
3 rows in table.
2 rows returned from my IN list
1 row not returned that is not in my IN list.
Question
How do I query to return the NUMBERS that DO NOT exist in the table but DO exist in my IN list? (ideally using IN if possible).
Expected result:
NUMBER
3333333333333333333
Important to note the IN list in my production environment contains:
8366 rows in the IN list.
Rows returned in production database 7225 of the 8366 exist in the table.
Therefore 1141 are missing. Its these values I need.
Production table contains 80,000 rows in total.
Testing
NOT EXISTS attempted but no rows are returned in the result set at all.
select number from device
where NOT EXISTS
(
SELECT number FROM DEVICE WHERE number IN
(
'1111111111111111111',
'2222222222222222222',
'3333333333333333333',
--(etc followed by 8366 unique values)
)
)
Result:
NUMBER
(null)
Expected to see the value 3333333333333333333 as it does not exist in my table.
LEFT JOIN is not helpful as there are not other tables to join.
NOT IN returned values that are in the database table but are not in my IN source list.
Result:
NUMBER
4444444444444444444
I have also considered creating a TEMP table, inserting my IN list values and running a LEFT join on the NUMBER. I need to proceed cautiously here as its a production environment.
Trying this too from another stack post but struggling:
select column_value as missing_num
from table (sys.odcinumberlist (123,345,555,777))
where column_value not in (select accnt from my_table);
Ref
A)
CREATE TABLE [dbo].[Device](
[DeviceID] [int] IDENTITY(1,1) NOT NULL,
[Number] [varchar](20) NULL,
CONSTRAINT [Device_PK] PRIMARY KEY CLUSTERED
(
[DeviceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
B)
Find values that do not exist in a table
Thank you.
Constructing an IN statement with 8,366 different values is not efficient. This essentially results in 8,366 different OR statements in your query.
For something like this, I recommend using a temp table and inserting your values into that, then using a JOIN to it. In this specific case, you should use a LEFT JOIN to that table and take only those values that are not found.
For example:
Declare #Numbers Table (Number Varchar (20));
Insert #Numbers
Values ('1111111111111111111'),
('2222222222222222222'),
('3333333333333333333'),
...
Select N.Number
From #Numbers N
Left Join Device D On N.Number = D.Number
Where D.Number Is Null
You can also use a CTE to build your list of numbers as well:
;With Numbers (Number) As
(
Select '1111111111111111111' Union All
Select '2222222222222222222' Union All
Select '3333333333333333333' Union All
...
)
Select N.Number
From Numbers N
Left Join Device D On N.Number = D.Number
Where D.Number Is Null

different estimated rows on same index operation?

Introduction and Background
I had to optimize a simple query (example below). After rewriting it several times I recognized that the estimated row count on the one and same index operation differs depending on the way the query is written.
Originally the query did a clustered index scan, as the table in production contains a binary column the table is quite large (about 100 GB) and the full table scan takes too much time to execute.
Question
Why is the estimated row count different on the same index operation (example will show)? What is the optimizer doing here?
the example database - I am using SQL Server 2008 R2
I tried to create a very simplyfied version of my production tables that shows the behaviour.
-- CREATE THE SAMPLE TABLES
----------------------------
CREATE TABLE dbo.MasterTable(
MasterId smallint NOT NULL,
Name varchar(5) NOT NULL,
CONSTRAINT PK_MasterTable PRIMARY KEY CLUSTERED (MasterId ASC)
) ON [PRIMARY]
GO
CREATE TABLE dbo.DetailTable(
DetailId bigint IDENTITY(1,1) NOT NULL,
MasterId smallint NOT NULL,
Name nvarchar(50) NOT NULL,
CreateDate datetime NOT NULL,
CONSTRAINT PK_DetailTable PRIMARY KEY CLUSTERED (DetailId ASC)
) ON [PRIMARY]
GO
ALTER TABLE dbo.DetailTable
ADD CONSTRAINT FK1
FOREIGN KEY(MasterId) REFERENCES dbo.MasterTable (MasterId)
GO
CREATE NONCLUSTERED INDEX IX_DetailTable
ON dbo.DetailTable( MasterId ASC, Name ASC )
GO
-- INSERT SOME SAMPLE DATA
----------------------------
SET NOCOUNT ON
GO
-- These are some Codes. In our system we always use these codes to search for "types" of data.
INSERT INTO dbo.MasterTable (MasterId, Name)
VALUES (1, 'N1'), (2, 'N2'), (3, 'N3'), (4, 'N4'), (5, 'N5'), (6, 'N6'), (7, 'N7'), (8, 'N8')
GO
-- ADD ROWS TO THE DETAIL TABLE
-- Takes about 1 minute to run
-- Don't care about the logic, it's just to get a distribution similar to production system
----------------------------
declare #x int = 1
DECLARE #MasterID INT
while (#x <= 400000)
begin
SET #MasterID = ABS(CHECKSUM(NEWID())) % 8 + 1
INSERT INTO dbo.DetailTable(MasterId,Name,CreateDate)
VALUES(
CASE
WHEN #MasterID IN (1, 3, 4) AND #x % 20 != 0 THEN 2
WHEN #MasterID IN (5, 6) AND #x % 20 != 0 THEN 7
WHEN #MasterID = 8 AND #x % 100 != 0 THEN 7
ELSE #MasterID
END,
NEWID(),
DATEADD(DAY, - ABS(CHECKSUM(NEWID())) % 1000, GETDATE())
)
SET #x = #x + 1
end
go
-- DO THE INDEX AND STATISTIC MAINTENANCE
----------------------------
alter index all on dbo.DetailTable reorganize
alter index all on dbo.MasterTable reorganize
update statistics dbo.DetailTable WITH FULLSCAN
update statistics dbo.MasterTable WITH FULLSCAN
go
Preparation is done, let's start with the query
Let's have a look at the statistics first, look at RANGE_HI_KEY=8, there are 489 EQ_ROWS
-- CHECK THE STATISTICS
----------------------------
dbcc show_statistics ('dbo.DetailTable', IX_DetailTable)
GO
Now we do the query. The first one is the original query I had to optimize.
Please activate the current execution plan when executing.
Have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"
-- ORIGINAL QUERY
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- FORCESEEK
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 50.000
-- TABLE VARIABLE
----------------------------
DECLARE #MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 40.000
-- TEMP TABLE
----------------------------
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
-- Actual 489, Estimated 489
DROP TABLE #MasterId
GO
Analyse and final question(s)
Please have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"
The comments in the script above show you the values I got for estimated and actual row count.
In our production environment this table has 33 million rows, the estimated rows in the queries above differ from 3 million to 16 million.
To summarize:
when a join between the DetailTable and the MasterTable is made, the estimated rowcount is 12,5% (there are 8 values in the master table, it makes sense, kind of...)
when a join between the DetailTable and the table variable is made, the estimated rowcount is 10%
when a join between the DetailTable and the temp table is made, the estimated rowcount is exactly the same as the actual row count
The question is why do these values differ?
The statistics are up to date and making an estimation should really be easy.
I just would like to understand this.
As nobody answer i ll try to give answer :
Please don`t force optimizer to follow you
(1) Explanation about you original query :
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
Why this query is slow ?
this query is slow because your indexes are not covering this query,
both query are using index scan and than joining with "Hash join":
WHY scanning entire row for mastertable ?
Because index on Master table is on column MasterId , not on column Name.
WHY scanning entire row for Detailtable? Because here as well index is on
(DETAILID) "CLUSTERED" AND ( MasterId ASC, Name ASC ) "NON CLUSTERED"
not on Createdate column.
having one NONCLUSTERED index will help this query ON column (CREATEDATE,MasterId ) for this particular Query.
If your Master table is huge as well you can create NONCLUSTERED index on (Name) column.
(2) Explanation on FORCESEEK :
-- FORCESEEK
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
Why optimizer estimated 50,000 row ?
Here you are joining on column d.MasterId = m.MasterId and you are FORCING optimizer to choose seek on Detail table, so
optizer using INDEX IX_DetailTable () to join your Mastertable using LOOP join .
Since Optimizer chooosing Loop join to join all rows (Actually ONE) of MAster table to Detail table
so it will choose one key from master table then seek for entire index and then pass the matching value to further iterator.
so optimizer chooses Average of rows per value .
8 unique values in column 40000 table cardinality (rows) so
40000 / 8 Is 50,000 rows estimated (fair enough).
(3) -- TABLE VARIABLE
Here is your query :
DECLARE #MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
GO
Statatictic does not maintain on table variable so optimzer has no idia how many rows( so it estimate 1 row )it gonaa deal with to produce a good plan,
here as well estimated rows are 1 and actual row 1 aswell congrates!!
but how optimizer Estimated "40.000" ROWS
Personally i never checked this and because of this question i did servels testing, but have no idia how optimzer calculating estimated rows, so it will be great if someone come up and enlight us.
(4) -- TEMP TABLE
Your Query
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
-- Actual 489, Estimated 489
DROP TABLE #MasterId
here as well optimizer is choosing same query plan as was choosing in table variable but diffrence is
Statistics does maintain on temp tables, So Here in query optimizer has a fair idia what row it actually going to join.
"N8" key has 8, and 8`s estimated rows in dbo.DetailTable is 489.

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall
CREATE TABLE contacts
(
id int PRIMARY KEY,
phone1 varchar(20) NULL,
phone2 varchar(20) NULL,
phone3 varchar(20) NULL,
phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
list_id int NOT NULL,
phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
list_id ASC,
phone ASC
);
I would like to see what contacts matches the phone number in a specific list of DoNotCall phone.
For faster lookup, I have indexed donotcall on list_id and phone.
When I make the following JOIN it takes a long time (eg. 9 seconds):
SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
Execution plan on Pastebin
While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):
SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
ON d1.list_id = 1
AND d1.phone = c.phone1
LEFT JOIN donotcall d2
ON d2.list_id = 1
AND d2.phone = c.phone2
LEFT JOIN donotcall d3
ON d3.list_id = 1
AND d3.phone = c.phone3
LEFT JOIN donotcall d4
ON d4.list_id = 1
AND d4.phone = c.phone4
WHERE
d1.phone IS NOT NULL
OR d2.phone IS NOT NULL
OR d3.phone IS NOT NULL
OR d4.phone IS NOT NULL
Execution plan on Pastebin
My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?
SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.
You can test if the index would be faster with a hint:
SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)
Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:
select object_name(ind.object_id) as TableName
, ind.name as IndexName
, stats_date(ind.object_id, ind.index_id) as StatisticsDate
from sys.indexes ind
order by
stats_date(ind.object_id, ind.index_id) desc
You can update them manually with:
EXEC sp_updatestats;
With this poor database structure, a UNION ALL query might be fastest.

TSQL optimisation

I have the below query which is taking 2 seconds to execute as there is a significant number of rows (1 million + each) in the two tables and was wondering if there is anything further I can do to optimise the query.
Tables
tblInspection.ID bigint (Primary Key)
tblInspection.IsPassedFirstTime bit (Non clustered index)
tblInspectionFailures.ID bigint (Primary Key)
tblInspectionFailures.InspectionID bigint (Non clustered index)
Query
SELECT TOP 1 tblInspection.ID FROM tblInspection
INNER JOIN tblInspectionFailures ON tblInspection.ID = tblInspectionFailures.InspectionID
WHERE (tblInspection.IsPassedFirstTime = 1)
Execution Plan
I can see that I am doing clustered seeks on the indexes but its still taking some time
the only thing I can think of is
SELECT i.ID FROM
(select TOP 1 id from tblInspection
WHERE IsPassedFirstTime = 1) i
INNER JOIN tblInspectionFailures ON
i.ID = tblInspectionFailures.InspectionID
try
SET ROWCOUNT 1
SELECT tblInspection.ID FROM tblInspection
INNER JOIN tblInspectionFailures ON tblInspection.ID = tblInspectionFailures.InspectionID
WHERE (tblInspection.IsPassedFirstTime = 1)
this does basically the same thing but tells sql to stop returning rows after the 1st one

Resources