Indexing in Sql Server - sql-server

What is Clustered and non clustered indexing? How to index a table using sql server 2000 Enterprise manager?

In a clustered index on ID, the table rows are ordered by ID.
In a non-clustered index on ID, the references to table rows are ordered by ID.
We can compare a database to a CSV file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue2
In a clustered table, when we insert a new row, we need to squeeze it between the existing rows:
ID,Value
-------
1,ReallyReallyLongValue1
2,ReallyReallyLongValue2
3,ReallyReallyLongValue3
, which is slow on insert but fast on retrieve.
In a non-clustered table, we keep a separate file index file which orders our rows:
Id,RowNumber
------------
1, 1
3, 2
When we insert the new row, we just append it to our main file and update the short index file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue3
2,ReallyReallyLongValue2
Id,RowNumber
------------
1, 1
2, 3
3, 2
, which is fast on insert but less efficient on retrieve.
In real databases indexes use more efficient binary trees, but the principle remains the same.
Clustered indexes are faster on SELECT, non-clustered indexes are faster on INSERT / UPDATE / DELETE

A clustered index means that the rows are physically ordered by the values in that index. A non-clustered index means that an index table is kept up to date that allows for quick seeking and sorting based upon value, but does not physically order the rows.
Only one clustered index can exist for a table, and if a primary key exists then that is the clustered index (in SQL Server).

A clustered index defines how the actual table is stored. The rows are stored in a way to make searches on the fields in the clustered index fast. (They're not physically stored in the sort order of the index fields, but in a binary tree or something similiar.)
You can have only one clustered index per table. The clustered index contains all fields in the table, for example:
indexfield1 - indexfield2 - field2 - field3 - ....
A non-clustered index is like a separate table. It contains the fields in the index, and a reference to the fields in the table. For example:
secondindexfield1 - secondindexfield2 - reference to table row
When searching a non-clustered index, SQL server will find the value in the index, do a "bookmark lookup" to the table, and retrieve the other row fields from there. This is why non-clustered indexes perform slightly less wel then clustered indexes.
To add an index in SQL Server Management Studio, expand the table node in object view. Right click on "Indexes" and select "New Index".

Clustered Index: Only one clustered index per table is allowed. If an index is clustered, it means that the table on which the clustered index is based is physically sorted according to that index. Think of the page numbers in an encyclopedia.
Non-clustered Index: Can have many non-clustered indexes per table. Think of the keyword index at the back of the book.

Related

execution plan suggesting to add an index on columns which are not part of where clause

I am running following query in SSMS and execution plan suggesting to add index on columns which are not part of where clause. I was planning to add index on two columns which are being used in where clause (OID and TransactionDate).
SELECT
[OID] , //this is not a PK. Primary key column is not a part of sql script
[CustomerNum] ,
[Amount] ,
[TransactionDate] ,
[CreatedDate]
FROM [dbo].[Transaction]
WHERE OID = 489
AND TransactionDate > '01/01/2018 06:13:06.46';
Index suggestion
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction] ([OID],[TransactionDate])
INCLUDE ([CustomerNum],[Amount],[CreatedDate])
Updated
Do i need to include other columns? Data is being imported to that table through a back end process using SQLBulkCopy class in .net. I am wondering if having non cluster index on all columns would reduce the performance. (In my table is Pk column called TransactionID which is not needed but i have this in the table in case its needed in the future otherwise SQLBulkCopy works better with heap. Other option is to drop and recreate indexes before and after SQLBulkCopy operation)
the INCLUDE keyword specifies the non-key columns to be added to the leaf level of the nonclustered index.
This means that if you will add this index and run the query again, SQL Server can get all the information needed from the index, thus eliminating the need to perform a lookup in the table as well.
As a general rule of thumb - when SSMS suggest an index, create it. You can always drop it later if it doesn't help.
You don't need to add all table columns in your non-clustered index, suggested index is good for the query provided. SQL Server database engine suggestions are usually really good.
INCLUDE keyword is required to avoid KEY LOOKUP and use NONCLUSTERED INDEX SEEK.
All in all: No NONCLUSTERED INDEX results in Clustered index scan
Created NONCLUSTERED INDEX with no included columns results in NONCLUSTERED INDEX scan plus key lookup.
Created NONCLUSTERED INDEX with included columns results in NONCLUSTERED INDEX SEEK.

Multiple Clustered Indexes on a Single Table?

I thought we could only place one clustered index on one table, and put multiple non-clustered indexes on a table, but using the code below I can easily add more than one clustered index to my table.
CREATE CLUSTERED INDEX TBL_MULTI_LC_HIST ON dbo.TBL_MULTI_LC_HIST (ID,AsOfDate)
Is this completely wrong?
It isn't possible to create multiple clustered indexes for a single table. From the docs (emphasis mine):
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
For example this will fail:
CREATE TABLE Thing
(
Column1 INT NOT NULL,
Column2 INT NOT NULL
)
CREATE CLUSTERED INDEX IX1 ON dbo.Thing(Column1)
CREATE CLUSTERED INDEX IX2 ON dbo.Thing(Column2)
Error:
Cannot create more than one clustered index on table 'dbo.Thing'. Drop the existing clustered index 'IX1' before creating another.
Example: http://www.sqlfiddle.com/#!18/53a63/1
You can however have a single index with multiple columns in it which is perhaps where you are getting confused:
CREATE CLUSTERED INDEX IX3 ON dbo.Thing(Column1, Column2)
You can only have one clustered index. A "Clustered" index IS the row... it contains all the columns. Every other index would just contain a pointer to the clustered row. The key of the clustered index enforces an 'ordering' on the rows by default.
If there is no clustered index, then the rows are basically stored in a heap, with no order or structure.

SQL Server : expected clustered index scan, but got non clustered index scan

I have this table:
CREATE TABLE Ta
(
coda int NOT NULL PRIMARY KEY,
a2 int UNIQUE
);
and a SQL select statement:
select *
from Ta
I have a clustered index, the primary key and a non-clustered index, specified by the unique constraint.
Executing the select I get the following execution plan:
But I'm not sure why.
The data should be on the leaf level, therefore it should scan the leaf level, hence it should do a clustered scan.
EDIT: the table has 10000 rows, coda has values from 9999 to 0 and a2 has values from 0 to 9999.
The non-clustered index is a covering index for the query. That is, the index contains all of the columns needed to satisfy the query.
The execution plan is showing that SQL Server is using the non-clustered index.
For the given query, it seems like a reasonable execution plan.
If there were some predicate (a WHERE clause condition on a column) or an ORDER BY clause, then we would expect that to influence which index is used.
But in this case, retrieving two columns (a2 and coda) for every row in the table with the rows returned in an unspecified order, then a full scan of either index is a suitable plan.

Why does SQL Server use an Index Scan instead of a Seek + RID lookup?

I have a table with approx. 135M rows:
CREATE TABLE [LargeTable]
(
[ID] UNIQUEIDENTIFIER NOT NULL,
[ChildID] UNIQUEIDENTIFIER NOT NULL,
[ChildType] INT NOT NULL
)
It has a non-clustered index with no included columns:
CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX]
ON [LargeTable]
(
[ChildID] ASC
)
(It is clustered on ID).
I wish to join this against a temporary table which contains a few thousand rows:
CREATE TABLE #temp
(
ChildID UNIQUEIDENTIFIER PRIMARY KEY,
ChildType INT
)
...add #temp data...
SELECT lt.ChildID, lt.ChildType
FROM #temp t
INNER JOIN [LargeTable] lt
ON lt.[ChildID] = t.[ChildID]
However the query plan includes an index scan on the large table:
If I change the index to include extra columns:
CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX] ON [LargeTable]
(
[ChildID] ASC
)
INCLUDE [ChildType]
Then the query plan changes to something more sensible:
So my question is: Why can't SQL Server still use an index seek in the first scenario, but with a RID lookup to get from the non-clustered index to the table data? Surely that would be more efficient than an index scan on such a large table?
The first query plan actually makes a lot of sense. Remember that SQL Server never reads records, it reads pages. In your table, a page contains many records, since those records are so small.
With the original index, if the second query plan would be used, after finding all the RID's in the index, and reading index pages to do so, pages in the clustered index need to be read to read the ChildType column. In a worst case scenario, that is an entire page for each record it needs to read. As there are many records per page, that might boil down to reading a large percentage of the pages in the clustered index.
SQL server guessed, based on statistics, that simply scanning the pages in the clustered index would require less page reads in total, because it then avoids reading the pages in the non-clustered index.
What matters here is the number of rows in the temp table compared to the number of pages in the large table. Assuming a random distribution of ChildID in the large table, as soon as the number of rows in the temp table approaches or supersedes the number of pages in the large table, SQL server will have to read virtually every page in the large table anyway.
Because the column ChildType isn't covered in an index, it has to go back to the clustered index (with the mentioned Row IDentifier lookup) to get the values for ChildType.
When you INCLUDE this column in the nonclustered index it will be added to the leaf-level of the index where it is available for querying.
Colloquially is called 'the index tipping point'. Basically, at what point does the cost based optimizer consider that is more effective to do a scan rather than seek + lookup. Usually is around 20% of the size, which in your case will base on an estimate coming from the #temp table stats. YMMV.
You already have your answer: include the required column, make the index covering.

Why does performance degrade when using a non-indexed field in the SELECT clause?

Consider these three queries:
select sampleno from sample
where markupdate > '1/1/2010'
select sampleno, markupdate from sample
where markupdate > '1/1/2010'
select sampleno, markuptime from sample
where markupdate > '1/1/2010'
sampleno and markupdate are indexed fields (sampleno is the primary key)
markuptime is not indexed
Queries 1 and 2 take about 1 second to run (returning 237K rows). Query 3 is still running after 3 minutes.
Why would the inclusion of a non-indexed field in the SELECT clause cause such a performance degradation?
This is a SQL 6.5 database.
A table's data (basically: all columns) is stored in a clustered index. A clustered index is a binary tree that allows a binary search on the indexed column(s). It is special (clustered) in that it contains all other columns at the leaf level. Usually, the clustered index is also the primary key. In your case, it's:
(sampleno) include (markupdate, markuptime, ...)
A non-clustered index contains the indexed column(s) and (at the leaf level) the clustered index. When you use a non-clustered index, the database has to look up all the other columns in the clustered index. That process is called a lookup. In your case, the non-clustered index on (markupdate) is:
(markupdate) include (sampleno)
This index contains all data for a query on markupdate, sampleno. The technical term for such an index is a covering index. But when you add markuptime to the query, the index is no longer covering. It has to look up the value for markuptime in the clustered index. And lookups are expansive.
Only your third query requires lookups. And that's why your third query is slower.

Resources