Missing index (Impact 97) : Create Non Clustered Index - sql-server

I am trying to optimize my stored procedure. When I look at the query plan, I can see tablescan on tempcompany is showing 97 percent. I am also seeing the following message Missing index (Impact 97) : Create Non Clustered Index on #tempCompany
I have already set non clustered indexes. Could somebody point out what the problem is
if object_id('tempdb..#tempCompany') is not null drop table #tempCompany else
select
fp.companyId,fp.fiscalYear,fp.fiscalQuarter,fi.financialperiodid, fi.periodEndDate,
fc.currencyId,fp.periodtypeid,ROW_NUMBER() OVER (PARTITION BY fp.companyId,
fp.fiscalYear, fp.fiscalQuarter ORDER BY fi.periodEndDate DESC) rowno
into #tempCompany
from
ciqFinPeriod fp
inner join #companyId c on c.val = fp.companyId
join ciqFinInstance fi on fi.financialperiodid = fp.financialperiodid
join ciqFinInstanceToCollection ic on ic.financialInstanceId = fi.financialInstanceId
left join ciqFinCollection fc on fc.financialCollectionId = ic.financialCollectionId
left join ciqFinCollectionData fd on fd.financialCollectionId = fc.financialCollectionId
where
fp.periodTypeId = #periodtypeId
and fi.periodenddate >= #date
--and fp.companyId in (select val from #companyId)
CREATE NONCLUSTERED INDEX id_companyId2 on #tempCompany(companyId,fiscalYear,fiscalQuarter,financialperiodid,periodEndDate,currencyId,periodtypeid,rowno)
if object_id('tempdb..#EstPeriodTbl') is not null drop table #EstPeriodTbl else
select
companyId,fiscalYear,fiscalQuarter,financialPeriodId,periodenddate,currencyId,
periodtypeid,rowno
into #EstPeriodTbl
from #tempCompany a
where a.rowno = 1
order by companyid, periodenddate
CREATE NONCLUSTERED INDEX id_companyId3 on #EstPeriodTbl(companyId,periodenddate,fiscalYear,fiscalQuarter,currencyId,financialPeriodId,rowno)
Execution Plan

You do not need to include everything in the #tempCompany index; just rowno:
CREATE NONCLUSTERED INDEX id_companyId2 on #tempCompany(rowno)

Short answer: The index you provided, does not help SQL Server in the query you are doing. If you create another non-clustered index, and have rowno as the first column in the index, Sql Server will probably be able to use that index.
Long explination:
The reason you have a problem, is because the index you have created isn't useful to SQL Server with this specific query. The order that records are sorted in on an index is determined by the order you specify them when creating an index.
(e.g. Your index orders your records by companyId first, and then orders the records with the same companyId by their fiscalYear, and then by their fiscalQuarter.)
Trying to use the provided index to find an item by just it's rowno value, would be like you trying to find the entries in the phone book based off of someone's phone number. The only way to locate all of the matching records is to search through every record in the book (i.e. a table scan).
In general, you can utalize nonclustered indexs only when the information you use in your where clause matches the first column in your index, (i.e. if you can provide a SARGable predicate for companyId in your where clause, you could probably use this index)
Using the phone book again: If I gave you a last name and a phone number, now you no longer need to do a full table scan on the phone book, you can do an index scan for the last name. Which would be more efficient. And if you were able to give a last name, and first name, and then middle initial and phone number, you could do an even more efficient table scan. But if I only provide you with a last name, middle initial and phone number; now I am back to scaning the index on just the value of last name.
So if you can narrow down your record set to use at least companyId (i.e. use companyID in your where clause) you can use the index you have provided.
Or, and I imagine this is what you will want to do, create an index that sorts by rowno, then companyId and periodendDate.
e.g.
CREATE NONCLUSTERED INDEX idx_temp_rowno ON #tempCompany(rowno, companyId, periodenddate)

Related

Keyset Pagination - Filter By Search Term across Multiple Columns

I'm trying to move away from OFFSET/FETCH pagination to Keyset Pagination (also known as Seek Method). Since I'm just started, there are many questions I have in my mind but this is one of many where I try to get the pagination right along with Filter.
So I have 2 tables
aspnet_users
having columns
PK
UserId uniquidentifier
Fields
UserName NVARCHAR(256) NOT NULL,
AffiliateTag varchar(50) NULL
.....other fields
aspnet_membership
having columns
PK+FK
UserId uniquidentifier
Fields
Email NVARCHAR(256) NOT NULL
.....other fields
Indexes
Non Clustered Index on Table aspnet_users (UserName)
Non Clustered Index on Table aspnet_users (AffiliateTag)
Non Clustered Index on Table aspnet_membership(Email)
I have a page that will list the users (based on search term) with page size set to 20. And I want to search across multiple columns so instead of doing OR I find out having a separate query for each and then Union them will make the index use correctly.
so have the stored proc that will take search term and optionally UserName and UserId of last record for next page.
Create proc [dbo].[sp_searchuser]
#take int,
#searchTerm nvarchar(max) NULL,
#lastUserName nvarchar(256)=NULL,
#lastUserId nvarchar(256)=NULL
AS
IF(#lastUserName IS NOT NULL AND #lastUserId IS NOT NULL)
Begin
select top (#take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like #searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), #searchTerm)
) as u1
where u1.UserName > #lastUserName
OR (u1.UserName=#lastUserName And u1.UserId > convert(uniqueidentifier, #lastUserId))
order by u1.UserName
End
Else
Begin
select top (#take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like #searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), #searchTerm)
) as u1
order by u1.UserName
End
Now to get the result for first page with search term mua
exec [sp_searchuser] 20, 'mua%'
it uses both indexes created one for UserName column and another for AffiliateTag column which is good
But the problem is I find the inner union queries return all the matching rows
like in this case, the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 465
so in total inner queries return 469 matching rows
and then outer query take out 20 for final result reset. So really reading more data than needed.
And when go to next page
exec [sp_searchuser] 20, 'mua%', 'lastUserName', 'lastUserId'
the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 445
in total inner queries return 449 matching rows
so either with or without pagination, it reads more data than needed.
My expectation is to somehow limit the inner queries so it does not return all matching rows.
You might be interested in the Logical Processing Order, which determines when the objects defined in one step are made available to the clauses in subsequent steps. The Logical Processing Order steps are:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Of course, as noted the docs:
The actual physical execution of the statement is determined by the
query processor and the order may vary from this list.
meaning that sometimes some statements can start before previous complete.
In your case, you query looks like:
some data extraction
sort by user_name
get TOP records
There is no way to reduce the rows in the data extraction part as to have a deterministic result (we actually may need to order by user_name, user_id to have such) we need to get all matching rows, sort them and then get the desired rows.
For example, image the first query returning 20 names starting with 'Z'. And the second query to returned only one name starting with 'A'. If you stop somehow the execution and skip the second query, you will get wrong results - 20 names starting with 'Z' instead one starting with 'A' and 19 with 'Z'.
In such cases, I prefer to use dynamic T-SQL statements in order to get better execution times and reduce the code length. You are saying:
And I want to search across multiple columns so instead of doing OR I
find out having a separate query for each and then Union them will
make the index use correctly.
When you are using UNION you are performing double reads to your tables. In your cases, you are reading the aspnet_Membership table twice and the aspnet_Users twice (yes, here you are using two different indexes but I believe they are not covering and you end up performing look ups to extract the users name and email.
I guess you have started with covering indexed like in the example below:
DROP TABLE IF EXISTS [dbo].[StackOverflow];
CREATE TABLE [dbo].[StackOverflow]
(
[UserID] INT PRIMARY KEY
,[UserName] NVARCHAR(128)
,[AffiliateTag] NVARCHAR(128)
,[UserEmail] NVARCHAR(128)
,[a] INT
,[b] INT
,[c] INT
,[z] INT
);
CREATE INDEX IX_StackOverflow_UserID_UserName_AffiliateTag_I_UserEmail ON [dbo].[StackOverflow]
(
[UserID]
,[UserName]
,[AffiliateTag]
)
INCLUDE ([UserEmail]);
GO
INSERT INTO [dbo].[StackOverflow] ([UserID], [UserName], [AffiliateTag], [UserEmail])
SELECT TOP (1000000) ROW_NUMBER() OVER(ORDER BY t1.number)
,CONCAT('UserName',ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('AffiliateTag', ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('UserEmail', ROW_NUMBER() OVER(ORDER BY t1.number))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
GO
So, for the following query:
SELECT TOP 20 [UserID]
,[UserName]
,[AffiliateTag]
,[UserEmail]
FROM [dbo].[StackOverflow]
WHERE [UserName] LIKE 'UserName200%'
OR [AffiliateTag] LIKE 'UserName200%'
ORDER BY [UserName];
GO
The issue here is we are reading all the rows even we are using the index.
What's good is that the index is covering and we are not performing look ups. Depending on the search criteria it may perform better than your approach.
If the performance is bad, we can use a trigger to UNPIVOT the original data and record in a separate table. It may look like this (it will be better to use attribute_id rather than the text like me):
DROP TABLE IF EXISTS [dbo].[StackOverflowAttributes];
CREATE TABLE [dbo].[StackOverflowAttributes]
(
[UserID] INT
,[AttributeName] NVARCHAR(128)
,[AttributeValue] NVARCHAR(128)
,PRIMARY KEY([UserID], [AttributeName], [AttributeValue])
);
GO
CREATE INDEX IX_StackOverflowAttributes_AttributeValue ON [dbo].[StackOverflowAttributes]
(
[AttributeValue]
)
INSERT INTO [dbo].[StackOverflowAttributes] ([UserID], [AttributeName], [AttributeValue])
SELECT [UserID]
,'Name'
,[UserName]
FROM [dbo].[StackOverflow]
UNION
SELECT [UserID]
,'AffiliateTag'
,[AffiliateTag]
FROM [dbo].[StackOverflow];
and the query before will looks like:
SELECT TOP 20 U.[UserID]
,U.[UserName]
,U.[AffiliateTag]
,U.[UserEmail]
FROM [dbo].[StackOverflowAttributes] A
INNER JOIN [dbo].[StackOverflow] U
ON A.[UserID] = U.[UserID]
WHERE A.[AttributeValue] LIKE 'UserName200%'
ORDER BY U.[UserName];
Now, we are reading only a part of the the index rows and after that performing a lookup.
In order to compare performance it will be better to use:
SET STATISTICS IO, TIME ON;
as it will give you how pages are read from the indexes. The result can be visualized here.

What is the difference between Lookup, Scan and Seek?

So I found this query
SELECT MAX(us.[last_user_lookup]) as [last_user_lookup], MAX(us.[last_user_scan])
AS [last_user_scan], MAX(us.[last_user_seek]) as [last_user_seek]
from sys.dm_db_index_usage_stats as us
where us.[database_id] = DB_ID() AND us.[object_id] = OBJECT_ID('tblName')
group by us.[database_id], us.[object_id];
when i look up the documentation on sys.dm_db_index_usage_stats all it says is
last_user_seek datetime Time of last user seek
last_user_scan datetime Time of last user scan.
last_user_lookup datetime Time of last user lookup.
...
Every individual seek, scan, lookup, or update on the specified index by one query execution is counted as a use of that index and increments the corresponding counter in this view. Information is reported both for operations caused by user-submitted queries, and for operations caused by internally generated queries, such as scans for gathering statistics.
Now I understand that when I run the query it's getting the highest time of those 3 fields as sys.dm_db_index_usage_stats can have duplicate database_id and object_id where one or more of the fields may also be NULL (so you can just to a SELECT TOP 1 ... ORDER BY last_user_seek, last_user_scan, last_user_lookup DESC otherwise you potentially miss data) but when I run it I get values like
NULL | 2017-05-15 08:56:29.260 | 2017-05-15 08:54:02.510
but I don't understand what the user has done with the table which is represented by these values.
So what is the difference between Lookup, Scan and Seek?
Basic difference between these operations:
Let's consider that you have two tables. TableA and TableB. Both tables contain more than 1000 000 rows, and both have clustered indexes on Id column. TableB has also nonclustered index on code column. (Remember that your nonclustered index is always pointing at pages of clustered one...)
seek:
Let's consider that you want only 1 record from TableA and your clustered index is on column Id.
Query should be like:
SELECT Name
FROM TableA
WHERE Id = 1
Your result contains fewer than 15% (it is something between 10-20, depends on situation) of your full data set... Sql Server performs index seek in this scenario. (optimizer has found a useful index to retrieve data)
scan:
For example your query needs more than 15% of data from TableA, then it is necessary to scan the whole index to satisfy the query.
Let's consider that TableB has TableA Id column as foreign key from TableA, and TableB contains all Ids from TableA. Query should be like:
SELECT a.Id
FROM TableA a
JOIN TableB b ON a.Id = b.TableAId
Or just
SELECT *
FROM TableA
For index on TableA SQL Server performs use index scan. Because all data (pages) need to satisfy the query...
lookup:
Let's consider that TableB has column dim and also column code and nonclustered index on code (as we mentioned).
SQL Server will use lookup when it needs to retrieve non key data from the data page and nonclustered index is used to resolve the query.
For example key lookup could be used in query like:
SELECT id, dim
FROM TableB
WHERE code = 'codeX'
You can resolve it by covering index (include dim to nonclustered one)

How to remove one record so my unique key constraint won't break in the future

I have a table, Core_Faculty with 4 fields: ID (PK - INT), InstitutionID (INT), PersonID (INT), DeprecatedDate (SMALLDATETIME)
What I'd like to do is delete all the records for institution/person combinations that have both deprecated records and non-deprecated (DeprecatedDate IS NULL) record, but keep the non-deprecated record.
If an institution/person combination has only just one record (whether deprecated or not), I'd like to keep those and leave them alone. I'm only considering records that have both DeprecatedDate IS NULL and Deprecated IS NOT NULL for each unique institution/person combination.
End goal is to be left with one record per institution/person combination whether deprecated or not, but giving priority to the record that has a NULL deprecated date. These are the good, live records. However, if we are starting with only one record and it's deprecated, go ahead and keep it.
The database currently only can potentially have one of each as institution/person/deprecateddate is a unique key on the table.
How would I go about solving this, and what methods can I use to find the appropriate records, while only considering records that have both deprecated and non-deprecated values for the combination?
DELETE f
FROM
Core_Faculty f
INNER JOIN
(
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY
f.InstitutionID,
f.PersonID
ORDER BY
CASE
WHEN f.DeprecatedDate IS NULL THEN 1
ELSE 2
END,
f.DeprecatedDate
) RowNum
FROM
Core_Faculty f
) d ON
f.ID = d.ID
WHERE
d.RowNum > 1;
In SQL Server you can use a common table expression with a ROW_NUMBER function to identify the rows you want to keep:
WITH cte AS (
SELECT [ID]
,[InstitutionID]
,[PersonID]
,[DeprecatedDate]
,ROW_NUMBER() OVER (PARTITION BY [InstitutionID], [PersonID]
ORDER BY [DeprecatedDate] DESC) as [RowNumber]
FROM [Blog].[dbo].[Core_Faculty]
)
SELECT [ID]
,[InstitutionID]
,[PersonID]
,[DeprecatedDate]
,[RowNumber]
FROM cte
--WHERE [RowNumber] = 1
The ORDER BY [DeprecatedDate] DESC part will make ensure the latest record is the 1st row in the [InstitutionID], [PersonID] grouping. If there is only one row, even if it is a null, it will be kept since it is the 1st row in the grouping.
You can then use
DELETE
FROM cte
WHERE [RowNumber] > 1
instead of the select to remove the rest of the rows. Leaving you with just one row person/institution combo.

OR in WHERE statement slowing things down dramatically

I have the following query that finds customers related to an order. I have a legacy ID on the customer so I have to check old id (legacy) and customer id hence the or statement
SELECT
c.Title,
c.Name
FROM productOrder po
INNER JOIN Employee e ON po.BookedBy = e.ID
CROSS APPLY (
SELECT TOP 1 *
FROM Customer c
WHERE(po.CustID = c.OldID OR po.CustID = c.CustID)
) c
GROUP BY
c.CustomerId, c.Title, c.FirstName, c.LastName
if I remove the OR statement it runs fine for both situations. There is an index on customer id and legacy.
For table customer, you need to create separate indexes on columns oldid and custid. If you already have clustered index on custid, then add index on oldid as well:
CREATE INDEX customer_oldid_idx ON customer(oldid);
Without this index, search for oldid in this clause:
WHERE (po.CustID = c.OldID OR po.CustID = c.CustID)
will have to use full table scan, and that will be super slow.

How to avoid table scan and index scan for huge tables

I am using MSSQL 2008 R2. I have a table with huge number of rows (test table)
I have the following SQL code, please suggest where I can use index hints, force seek or any other means to improve performance.
Indexes
1. non-clustered - idx_id (id)
2. non-clustered - idx_name (name)
SELECT DISTINCT
p.id,
p.name,
FROM
test p
LEFT OUTER JOIN
(
SELECT
e.id
FROM
test e
INNER JOIN
(
SELECT
c.id
FROM
test c
GROUP BY
c.id
HAVING
COUNT(1) > 1
) f
ON e.id = f.id
WHERE
e.name = 'test_name'
) m
ON p.id = m.id
WHERE
m.id is null
Prerequise: have a primary key
select distinct
p.id
, p.name
from test p
where not exists (
SELECT TOP(1)
1
FROM test e
WHERE e.PrimaryKey <> p.PrimaryKey
AND e.id = p.id
AND 'test_name' IN (e.name, p.name)
)
How many columns your table contains? If there's only these two columns, it makes no sense to add nonclustered index. You should create CLUSTERED index on ID column, and that's it - you'll see performance increase.
If you have many colums, consider two options:
Create clustered index on NAME column and nonclustered index on ID column.
Create nonclustered index on ID column, and INCLUDE NAME column (you'll create covering index that way)
Generally speaking, relational databases (being relational) are written in such a way to optimize join statements. When using a "JOIN" clause with "ON" criteria, the database engine can create an optimized execution plan that takes the table structure, indexes, etc. into account. When joining on a sub-select, sometimes the same optimizing factors are not available, or are not taken into account the same way. It depends on your schema, but it is a good rule of thumb to assume that a standard join with an "on" clause is going to be more efficient than a join on a sub-select.
Your schema is pretty vague, so I am not even sure that you need the joins, but if you do, you should try performing the joins directly with "on" criteria.

Resources