May I know how the include clause improves performance in a covering index?
CREATE NONCLUSTERED INDEX includeIndex
ON mytable(COL1)
INCLUDE(COL2,COL3,COL3)
and what is the difference between
CREATE NONCLUSTERED INDEX includeIndex ON mytable(COL1) INCLUDE(COL2,COL3,COL3)
and
CREATE NONCLUSTERED INDEX nonincludeIndex ON mytable(COL1,COL2,COL3,COL3)
Thanks
You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the number of index key columns or index key size.
An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
http://msdn.microsoft.com/en-us/library/ms190806.aspx
If your query does only query fields in the index (includes included fields) then after finding which rows to return the sql processor fdoes not have to load the actual data page to get the data. That simple. It can answer the query from the index only.
Related
I would like to create a covering index to improve query performance.
I do know that this one more index will impact INSERT performance.
The table has only INSERT operations no UPDATE or DELETE. The data in covering index will unique because the index keys contain the table's PK, so I need no further constraining uniqueness, my only goal to improve query performance.
Question
Which type of index will the optimal (means degrade less) the INSERT performance, unique or not unique?
Which type of index will the optimal (means degrade less) the INSERT
performance, unique or not unique?
The difference if any is negligible.
On the other hand, what could have some impact is the index key length. Smaller is better. You could remove the primary key from the index key and make the index non unique. It will remain being a covering index because all non clustered indexes have the clustered index key on leaf page nodes, assuming your primary key is clustered.
The data in covering index will unique because the index keys contain the table's PK
...
Which type of index will the optimal unique or not unique?
Assuming your table has a clustered index:
A non-unique nonclustered index in SQL Server is physically stored as a unique index after the clustered index keys are added as index key columns.
So for a nonclustered index that includes the clustered index keys there is absolutely no difference whether you declare the index to be unique or not.
This depends on your definition of "degrade". If you're talking about indexing the same column in either case, and are able to use the UNIQUE keyword because the column will always contain a unique set of values, then it can be a little more performance optimal to specify that keyword for future queries that use that index.
You can read on the performance benefits that come from being able to use the UNIQUE keyword when creating an index in Brent Ozar's - Performance Benefits of Unique Indexes.
In response to your comments, there will be no difference in INSERT performance if you denote your index as UNIQUE vs not specifying that keyword.
In our cases, I prefer to use not unique indexes. For instance we have a table with hundreds of millions keywords to be analyzed. The keyword is unique (clustered index & primary key). But as we cannot do them all at once, we have a priority column as well. So we create a not unique, non-clustered index with only the Priority column and the keyword as included column.
This gives the following performance advantages:
No additional constraint is checked when adding.
When inserting a keyword, any page which has free space left and containing the same priority entries can be used. Meaning less fragmentation, less maintenance (rebuild/reorganize) Note that we also delete from this table and priority is just a tinyint, so this happens in most cases.
We are having a huge table Table1(2.5 billion rows) with single column A(NVARCHAR(255) datatype). What is the right approach for seek operations against this table. Clustered index on A Vs Clustered Column store index on A.
We are already keeping this table in separate filegroup from the other table Table2, with which it will be Joined.
Do you suggest partitioning this table for better performance ? This column will have unicode data also. So, what kind of partitioning approach is fine for unicode datatype ?
UPDATE: To clarify further, the use case for the table is SEEK. The table is storing identifiers for individuals. The major concerns here are performance for SEEK in the case of huge table. This table will be referred inside a transaction. We want the transaction to be short.
Clustered index vs column store index depends on the use case for the table. Column store keeps track of unique entries in the column and the rows where those entries are stored. This makes it very useful for data warehousing tasks such as aggregates against the indexed columns, however not as optimal for transactional tasks that need to pull a small number of specific rows. If you are using SQL Server 2014 or later you can use both a clustered index and a columnstore index by creating a clustered columnstore index. It does have some limitations and overhead that you should read up on though.
Given that this is a seek for specific rows and not an aggregation of the column, I would recommend a clustered index instead of a column store index.
If one column in a table has both clustered and non-clustered index defined due to any reason, is there any disadvantage in that? Just curious.
If both indices are on the same identical column or columns (and in the same order) then yes, they both provide the same select query optimization for individual record selects; and although the Clustered index, in addition, provides enhanced performance for select queries that return multiple records filtered on a range of values for that column, the non-clustered on is redundant.
But by having both in place you incur an additional write (Insert/Update/Delete) performance hit for the process of having to update two indices instead of only one.
In the documentation for SQL server 2008 R2 is stated:
Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.
Does this mean, that when there is a search using non-clustered index, than the clustered indes is search also? I originally thought that the non-clustered index contains ditrectly the address of the page (block) with the row it references. From the text above it seems that it contains just the key from the non-clustered index instead of the address.
Could somebody explain please?
Yes, that's exactly what happens:
SQL Server searches for your search value in the non-clustered index
if a match is found, in that index entry, there's also the clustering key (the column or columns that make up the clustered index)
with that clustered key, a key lookup (often also called bookmark lookup) is now performed - the clustered index is searched for that value given
when the item is found, the entire data record at the leaf level of the clustered index navigation structure is present and can be returned
SQL Server does this, because using a physical address would be really really bad:
if a page split occurs, all the entries that are moved to a new page would be updated
for all those entries, all nonclustered indices would also have to be updated
and this is really really bad for performance.
This is one of the reasons why it is beneficial to use limited column lists in SELECT (instead of always SELECT *) and possibly even include a few extra columns in the nonclustered index (to make it a covering index). That way, you can avoid unnecessary and expensive bookmark lookups.
And because the clustering key is included in each and every nonclustered index, it's highly important that this be a small and narrow key - optimally an INT IDENTITY or something like that - and not a huge structure; the clustering key is the most replicated data structure in SQL Server and should be a small as possible.
The fact that these bookmark lookups are relatively expensive is also one of the reasons why the query optimizer might opt for an index scan as soon as you select a larger number of rows - at at time, just scanning the clustered index might be cheaper than doing a lot of key lookups.
One of the biggest benefits of table partitioning is that, "it is possible to rebuild an index on specific partition".
Imagine there is a partitioned table (has 12 partitions for now), which has clustered and a few non-clustered indices, all partition aligned.
I want to add a new nonclustered index to table, which does not have to be built for old partitions. I need this index for only the last 3 partitions.
So, how can I create a new nonclustered index for last 3 partitions of 12 partitioned table?
That is not possible. It would create problems for parameterized queries because the query planner would never statically know that the index can be used (except if there was a constant-expression predicate).
You can create a filtered index with `where partitionKey >= startOfSomePartition'. Your queries have to include this predicate statically, though.
You might try a view over two partitioned tables which have different schema. That's not very convenient to develop though.