First, I am using: Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64)
I have created a table that looks like this:
create table dbo.pos_key
( keyid int identity(1,1) not null
, systemid int not null
, partyid int not null
, portfolioid int null
, instrumentid int not null
, security_no decimal(10,0) null
, entry_date datetime not null
)
keyid is a clustered primary key. My table has about 144,000 rows. Currently systemId doesn't have much fluctuation, it is the same in every row except 1.
Now I perform the following query:
select *
from pos_key
where systemid = 33000
and portfolioid = 150444
and instrumentid = 639
Which returns 1 row after a clustered index scan. [pos_key].[PK_pos_key]
Execution plan said that expected row count was 1.082
SQL Server quickly suggests that I add an index.
CREATE NONCLUSTERED INDEX IDX_SYS_PORT_INST
ON [dbo].[pos_key] ([systemid],[portfolioid],[instrumentid])
So I do and run the query again.
Surprisingly SQL-server doesn't use the new index, instead it again goes for the same clustered index scan and but now it claims to expect 4087 rows! It however doesn't suggest any new index this time.
To get it to use the new index I have done the following:
Updated table statistics (update statistics)
Updated index statistics (update statistics)
Dropped cached execution plans related to this queries (DBCC FREEPROCCACHE)
No luck, SQL server always goes for the clustered scan and expects 4087 rows.
Index statistics look like this:
All Density Average Length Columns
----------------------------------------------------------------------------
0.5 4 systemid
6.095331E-05 7.446431 systemid, portfolioid
1.862301E-05 11.44643 systemid, portfolioid, instrumentid
6.9314E-06 15.44643 systemid, portfolioid, instrumentid, keyid
Curiously I left this overnight and in the morning ran the query again and BAMM now it hits the index. I dropped the index, ran the select and then created the index again. Now SQL server is back to 4087 expected rows and clustered index scan.
So what am I missing. The index obviously works but SQL server doesn't want to use it right away.
Is the of fluctuation in systemId somehow causing trouble?
Is DBCC FREEPROCCACHE not enough to get rid of cached execution plans?
Are the ways of SQL-Server just mysterious?
With a composite index and all columns used in equality predicates, specify the most selective column first (portfolieid here). SQL Server maintains a histogram only for the first column.
With the less selective column first, SQL Server probably overestimated the row count and chose to the clustered index scan instead thinking it was more efficient since you are selecting all columns.
Related
I am optimizing a query on SQL Server 2005. I have a simple query against mytable that has about 2 million rows:
SELECT id, num
FROM mytable
WHERE t_id = 587
The id field is the Primary Key (clustered index) and there exists a non-clustered index on the t_id field.
The query plan for the above query is including both a Clustered Index Seek and an Index Seek, then it's executing a Nested Loop (Inner Join) to combine the results. The STATISTICS IO is showing 3325 page reads.
If I change the query to just the following, the server is only executing 6 Page Reads and only a single Index Seek with no join:
SELECT id
FROM mytable
WHERE t_id = 587
I have tried adding an index on the num column, and an index on both num and tid. Neither index was selected by the server.
I'm looking to reduce the number of page reads but still retrieve the id and num columns.
The following index should be optimal:
CREATE INDEX idx ON MyTable (t_id)
INCLUDE (num)
I cannot remember if INCLUDEd columns were valid syntax in 2005, you may have to use:
CREATE INDEX idx ON MyTable (t_id, num)
The [id] column will be included in the index as it is the clustered key.
The optimal index would be on (t_id, num, id).
The reason your query is probably one that bad side is because multiple rows are being selected. I wonder if rephrasing the query like this would improve performance:
SELECT t.id, t.num
FROM mytable t
WHERE EXISTS (SELECT 1
FROM my_table t2
WHERE t2.t_id = 587 AND t2.id = t.id
);
Lets clarify the problem and then discuss on the solutions to improve it:
You have a table(lets call it tblTest1 and contains 2M records) with a Clustered Index on id and a Non Clustered Index on t_id, and you are going to query the data which filters the data using Non Clustered Index and getting the id and num columns.
So SQL server will use the Non Clustered Index to filter the data(t_id=587), but after filtering the data SQL server needs to get the values stored in id and num columns. Apparently because you have Clustered index then SQL server will use this index to obtain the data stored in id and num columns. This happens because leafs in the Non clustered index's tree contains the Clustered index's value, this is why you see the Key Lookup operator in the execution plan. In fact SQL Server uses the Index seek(NonCluster) to find the t_id=587 and then uses the Key Lookup to get the num data!(SQL Server will not use this operator to get the value stored in id column, because your have a Clustered index and the leafs in NonClustered Index contains the Clustered Index's value).
Referred to the above-mentioned screenshot, when we have Index Seek(NonClustred) and a Key Lookup, SQL Server needs a Nested Loop Join operator to get the data in num column using the Index Seek(Nonclustered) operator. In fact in this stage SQL Server has two separate sets: one is the results obtained from Nonclustered Index tree and the other is data inside Clustered Index tree.
Based on this story, the problem is clear! What will happen if we say to SQL server, not to do a Key Lookup? this will cause the SQL Server to execute the query using a shorter way(No need to Key Lookup and apparently no need to the Nested loop join! ).
To achieve this, we need to INCLUDE the num column inside the NonClustered index's tree, so in this case the leaf of this index will contains the id column's data and also the num column's data! Clearly when we say the SQL Server to find a data using NonClustred Index and return the id and num columns, it will not need to do a Key Lookup!
Finally what we need to do, is to INCLUDE the num in NonClustered Index! Thanks to #MJH's answer:
CREATE NONCLUSTERED INDEX idx ON tblTest1 (t_id)
INCLUDE (num)
Luckily, SQL Server 2005 provided a new feature for NonClustered indexes, the ability to include additional, non-key columns in the leaf level of the NonClustered indexes!
Read more:
https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-2017
But what will happens if we write the query like this?
SELECT id, num
FROM tblTest1 AS t1
WHERE
EXISTS (SELECT 1
FROM tblTest1 t2
WHERE t2.t_id = 587 AND t2.id = t1.id
)
This is a great approach, but lets see the execution plan:
Clearly, SQL server needs to do a Index seek(NonClustered) to find the t_id=587 and then obtain the data from Clustered Index using Clustered Index Seek. In this case we will not get any notable performance improvement.
Note: When you are using Indexes, you need to have an appropriate plan to maintain them. As the indexes get fragmented, their impact on the query performance will be decreased and you might face performance problems after a while! You need to have an appropriate plan to Reorganize and Rebuild them, when they get fragmented!
Read more: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-2017
I'm looking for some information on how Microsoft SQL Server (specifically 2016 in my case) decides on which index to use for joins. Most of the time it seems to work well, but other times I have to scratch my head...
Here is a simple query:
SELECT
OrderMaster.StartDate, Employee.EmpName
FROM
OrderMaster
INNER JOIN
Employee ON OrderMaster.EmpId = Employee.EmpId
EmpId is the PK (int) on the Employee table.
There are 2 indexes on Employee: PK_Employee, which is a clustered PK index on EmpId, and IX_Employee_EmpName which is a nonclustered index, and only uses column EmpName.
This query returns 500 order records, but has performed an Index Scan using IX_Employee_EmpName on the Employee table, which reads 30,000 records.
Why would it have picked this index, instead of doing an index seek on PK_Employee?
What is the preferred method to solve this problem? Would I specify that the join use the PK_Employee index, or should I create a new index on EmpId, but includes the additional columns I may pull from selecting (empname, address, etc)?
This query returns 500 order records, but has performed an Index Scan using IX_Employee_EmpName on the Employee table, which read 30,000 records.
SQL Server is a cost based optimizer, so it tries to choose a plan based on cost and in this case it chose this index IX_Employee_EmpName. Since empid is the clustered key, it is by default added to all nonclustered indexes and SQL Server has the two columns it needs from this narrow index
We have 10M records in a table in a SQL Server 2012 database and we want to retrieve the top 2000 records based on a condition.
Here's the SQL statement:
SELECT TOP 2000 *
FROM Users
WHERE LastName = 'Stokes'
ORDER BY LastName
I have added a non clustered index to the column LastName and it takes 9secs to retrieve 2000 records. I tried creating an indexed view with an index, created on the same column, but to no avail, it takes about the same time. Is there anything else I can do, to improve on the performance?
Using select * will cause key lookups for all the rows that match your criteria (=for each value of the clustered key, the database has to travel through the clustered index into the leaf level to find the rest of the values).
You can see that in the actual plan, and you can also check that the index you created is actually being used (=index seek for that index). If keylookup is what is the reason for the slowness, the select will go fast if you run just select LastName from ....
If there is actually just few columns you need from the table (or there's not that many columns in the table) you can add those columns in as included columns in your index and that should speed it up. Always specify what fields you need in the select instead of just using select *.
I have a non-clustered index on a datetime column (AdmitDate) and a varchar column (Status) in SQL Server. Now the issue is that I'm filtering the result only on the basis of the datetime column (no Index on AdmitDate column alone).
In order for me to utilize the non-clustered index I used a not null condition for the varchar column (Status) but in that scenario the execution plan shows "Index Scan".
select ClientName, ID
from PatientVisit
where
(PatientVisit.AdmitDate between '2010-01-01 00:00:00.000' AND '2014-01-31 00:00:00.000' )
AND PatientVisit.Status is not null
-- Index Scan
But if I pass a specific Status value then as expected the excution plan shows Index Seek.
select ClientName, ID
from PatientVisit
where
(PatientVisit.AdmitDate between '2010-01-01 00:00:00.000' AND '2014-01-31 00:00:00.000')
AND PatientVisit.Status = 'ADM'
--Index Seek
Should I use in operator and pass all the possible values for the Status column to utilize the non-clustered index?
Or is there any other way to utilize the index?
Thanks,
Shubham
You're using SELECT ClientID, Name and you fetch columns that are not part of the index, SQL Server will need to go to the actual data page to get those column values.
So if SQL Server finds a match in the non-clustered index, it will have to do an (expensive) key-lookup into the clustered index to fetch the data page, which contains all columns.
If too many rows have a Status that is NULL, SQL Server will come to the conclusion that it's faster just the bloody scan the whole index, rather then doing a great many index seeks and key lookups. In the other case, when you define a specific value, and that matches only a few (or only one) rows, then it might be faster to actually do the index seek and one expensive key lookup.
You thing you could try is to use an index which includes those two columns that you need for your SELECT:
CREATE NONCLUSTERED INDEX IX_PatientVisit_DateStatusIncluded
ON dbo.PatientVisit(AdmitDate, Status)
INCLUDE(ClientID, Name)
Now in this case, SQL Server could find the values it needs to satisfy this query in the index leaf page, so it will be a lot more likely to actually use that index - even if it finds a lot of hits - possibly with an Index Scan on that small index (which isn't bad, either!)
Create a filtered index. You can then create an index for the datetime field only for values where status is not null.
CREATE NONCLUSTERED INDEX FI_IX_AdmitDate_StatusNotNull
ON dbo.PatientVisit(AdmitDate)
WHERE Status IS NOT NULL
This will be used for your query where Status IS NOT NULL and your existing index will be used for queries where Status = 'ASpecificValue'
Example table:
CREATE TABLE Log (
logID int identity
logDate datetime
logText varchar(42)
)
logID is already indexed because it is a primary key, but if you were to query this table you would likely want to use logDate as a constraint. However, both logID and logDate are going to be in the same order because logDate would always be set to GETDATE().
Does it make sense to put an extra non-clustered index on logDate, taking into account that for the Log table it is important to have fast writes.
Make a clustered index logDate, logID (in that order).
As datetime is "growing" this should not cost anything extra. logID saves you from inserting two log entries at the same time (could happen)
If you'll have lots of query with a
WHILE LogDate > ......
(or something similar) clause - yes, by all means!
The clustered index on the LogID will not help if you select by date - so if that's a common operation, a non-clustered index on the date will definitely help a lot.
Marc
Arthur is spot on with as considering the clustered index on logdate, logid since you will potentialy run into duplicates within the window of the datetime accuracy (3.33ms on SQL 2005).
I would also consider in advance whether the table is going to be large and should use table partitioning to allow the log to have a sliding window and old data removed without the need for long running delete statements. That all depends on the quantity of log records and whether you have enterprise edition.