I have a table that doesn't have any primary key. data is already there. I have made a non clustered index. but when i run query, actual execution plan is not showing index scanning. I think non clustered index is not working. what could be the reason. Please Help Me
First of all - why isn't there a primary key?? If it doesn't have a primary key, it's not a table - just add one! That will help on so many levels....
Secondly: even if you have an index, SQL Server query optimizer will always look at your query to decide whether it makes sense to use the index (or not). If you select all columns, and a large portion of the rows, then using an index is pointless.
So things to avoid are:
SELECT * FROM dbo.YourTable is almost guaranteed not to use any indices
if you don't have a good WHERE clause in your query
if your index is on a column that doesn't really select a small percentage of data; an index on a boolean column, or a Gender column with at most three different values doesn't help at all
Without knowing a lot more about your table structure, the data contained in those tables, the number of rows, and what kind of queries you're executing, no one can really answer your question - it's just way too broad....
Update: if you want to create a clustered index on a table which is different from your primary key, do these steps:
1) First, design your table
2) Then open up the index designer - create a new, clustered index on a column of your choice. Mind you - this is NOT the primary key !
3) After that, you can put your primary key on the ID column - it will create an index, but that index is not clustered !
Without having any more information I'd guess that the reason is that the table is too small for an index seek to be worth it.
If your table has less than a few thousand rows then SQL Server will almost always choose to do a table / index scan regardless of the indexes on that table simply because an index scan is in fact faster.
An index scan in itself doesn't necessarily indicate a performance problem - is the query actually slow?
Related
THE SITUATION
I Have a table with only one index, a Clustered index (two columns).
I do a 'SELECT * FROM TABLE' and the optimizer decides a Table scan.
I get the rows kinda sorted by clustered index. I say kinda because it doesn't look randomly sorted, but it has a lot of glitches.
If I force Using the clustered index SELECT * FROM TABLE (index 1 MRU) I get exact the clustered table order.
QUESTIONS
how can the table scan result be different in order than clustered index scan if the data in a clustered table is sorted by its index?
Is the table scan in a clustered index a scan to the leaf level of the table, aren't those sorted?
Is the clustered index scan a scan to all the possible paths of the b-tree in an ordered manner?
excuse my possible lack of knowledge, I'm trying my best to undestand the underlying concepts.
HOW DID I TESTED THIS
I achived this inconsistent ordering results by testing two different clustered indexes (one with two columns and other with one column). creating and dropping the constraint and check the select statement.
after truncating the table and creating the index, the data is correctly sorted, but after dropping the index and creating a different one, that data is not perfectly sorted with a table scan. I need to force index use.
WHY IS THIS IMPORTANT
Because I want to garantee order without using an order by clause in a clustered table.
On 15.0 and upwards ALWAYS specify an order by if you want a specific order as the structure of the data and index varies between allpages and data only locked (DOL) tables.
The optimizer may choose to do parts of the query retrieval in parallel under the covers for example depending on your parallelism settings which is why the order by is important. Just saying select * hasn't requested any specific order.
Just add the order by and you'll be fine because the select * is going to tablescan anyway as you're asking for the whole table and therefore no need for index hints.
THE EXPLANATION
Clustered indexes are logically ordered but not physically ordered.
This means that a table scan if it's done in physical order will return different results than clustered index scan, which is sorted logically.
This logical-physical mapping is controlled by OAM (Object Allocation Map)
I have a database where all tables include a Site column (char(4)) and a PrimaryId column (int).
Currently the clustered index on all tables is the combination of these two columns. Many customers only have one site so in those cases I think it definitely makes sense to change the clustered index to only include the PrimaryId.
In cases where there are multiple sites though, I'm wondering whether it would still be advantageous to only use the PrimaryId as the clustered index? Might having a smaller clustered index produce better performance than having a unique one?
In case it's relevant, there are generally not going to be more than a few sites. 10 sites would be a lot.
The answer is simple UNIQUE index is always better then NON-UNIQUE. There is some maths behind it but the greater uniqueness is the faster server can look up a record from index.
CLUSTERED index is great as they physically order the records on disk and it always a good idea to use CLUSTERED INDEX on UNIQUE keys.
CLUSTER INDEX with PRIMARY KEY give very good performance with large data. If your data is not high in column then it will not matter much.
I have recently read a article about how nonclustered indexes are matching table rows. I will try to summarize what I believe is relevant to your question.
There are two types of tables (in the context of indexes):
heap - a table without clustered index
clustered index - a table with clustered index
In the first case a nonclustered index is matching rows using RIP-Based bookmarks which has the following format:
file number - page number - row number
and a nonclustered index is looking like this:
You can see the RIP bookmark is in red.
Generally speaking, the rows of a heap do not move; once they have
been inserted into a page they remain on that page. To be more
technically-precise: rows in a heap seldom move, and when they do
move, they leave a forwarding address at the old location. The rows of
a clustered index, however, can move; that is, they can be relocated
to another page during data modification or index reorganization.
In the second the nonclustered index is using the index key of the clustered index as a bookmark and the clustered index itself should meet several criteria:
it must be unique
it should be short
it should be static
I am going to describe the first criteria (the others are described in the link below):
Each index entry bookmark must allow SQL Server to find the one row in
the table that corresponds to that entry. If you create a clustered
index that is not unique, SQL Server will make the clustered index
unique by generating an additional value that "breaks the tie" for
duplicate keys. This extra value is generated by SQL Server to create
uniqueness is called the uniquifier and is transparent to any client
application. You should carefully consider whether or not to allow
duplicates in a clustered index, for the following reasons:
Generating uniquifiers is extra overhead. SQL Server must decide, at
insert time, if a new row's key is a duplicate of an existing row's
key; and, if so, generate a uniquifier values to add to the new row
The uniquifier is a meaningless piece of information; a meaningless
piece of information that is being propagated into the table's
nonclustered indexes. It's usually better to propagate a meaningful
piece of information into the nonclustered indexes.
The whole article can be found here.
I've got a very simple table which stores Titles for people ("Mr", "Mrs", etc). Here's a brief version of what I'm doing (using a temporary table in this example, but the results are the same):
create table #titles (
t_id tinyint not null identity(1, 1),
title varchar(20) not null,
constraint pk_titles primary key clustered (t_id),
constraint ux_titles unique nonclustered (title)
)
go
insert #titles values ('Mr')
insert #titles values ('Mrs')
insert #titles values ('Miss')
select * from #titles
drop table #titles
Notice that the primary key of the table is clustered (explicitly, for the sake of the example) and there's a non-clustered uniqueness constraint the the title column.
Here's the results from the select operation:
t_id title
---- --------------------
3 Miss
1 Mr
2 Mrs
Looking at the execution plan, SQL uses the non-clustered index over the clustered primary key. I'm guessing this explains why the results come back in this order, but what I don't know is why it does this.
Any ideas? And more importantly, any way of stopping this behavior? I want the rows to be returned in the order they were inserted.
Thanks!
If you want order, you need to specify an explicit ORDER BY - anything else does not produce an order (it's "order" is random and could change). There is no implied ordering in SQL Server - not by anything. If you need order - say so with ORDER BY.
SQL Server probably uses the non-clustered index (if it can - if that index has all the columns your query is asking for) since that it smaller - usually just the index column(s) and the clustering key (again: one or multiple columns). The clustered index on the other hand is the whole data (at the leaf level), so it might require a lot more data to be read, in order to get your answer (not in this over-simplified example, of course - but in the real world).
The only way to (absolutely and correctly) guarantee row order is to use ORDER BY -- anything else is an implementation detail and apt to explode, as demonstrated.
As to why the engine chose the unique index: it just didn't matter.
There was no criteria favoring one index over another
The unique index covered the data (title and PK) returned; this is somewhat speculative on my part, but SQL Server is doing what it thinks best.
Try it on a table with an additional column which is not covered -- no bets, but it may make the query planner change its mind.
Happy coding.
SQLServer probably chose the non clustered index because all the data you requested (the id and title) could be retrieved from that index.
For such a trivial table it doesn't really matter which access path was chosen as the worse path is still only two IOs.
As someone commented above if you want your data in a particular order you must specificaly request this using the "ORDER BY" clause otherwise its pretty random what you get back.
Nonclustered indexes are usually smaller than clustered ones so it is usually faster to scan a nonclustered index rather than a clustered one. That probably explains SQL Server's preference for a nonclustered index, even though in your case the indexes are the same size.
The only way to guarantee the order of rows returned is to specify ORDER BY. If you don't specify ORDER BY then you are implicitly telling the optimizer that it can choose what order to return the rows in.
This question already has answers here:
What are the differences between a clustered and a non-clustered index?
(13 answers)
Closed 7 years ago.
I need to add proper index to my tables and need some help.
I'm confused and need to clarify a few points:
Should I use index for non-int columns? Why/why not
I've read a lot about clustered and non-clustered index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.
I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
GUIDs as PRIMARY KEY and/or clustered key
The clustered index debate continues
Ever-increasing clustering key - the Clustered Index Debate..........again!
Disk space is cheap - that's not the point!
Marc
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
A comparison of a non-clustered index with a clustered index with an example
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
EmployeeID
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
EmployeeName, EmployeeAddress, etc
. will all actually be stored in the leaf node of the clustered index itself.
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
SELECT * FROM FOO WHERE FOO.BAR = 2
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING
You might want to consider a clustered index on FOO.BAR.
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
faster to read than non cluster as data is physically storted in index order
we can create only one per table.(cluster index)
quicker for insert and update operation than a cluster index.
we can create n number of non cluster index.
I am very beginner in SQL Server 2005 and I am learning it from online tutorial, here is some of my question:
1: What is the difference between Select * from XYZ and Select ALL * from XYZ.
2: The purpose of Clustered index is like to make the search easier by physically sorting the table [as far as I kknow :-)]. Let say if have primary column on a table than is it good to create a clustered index on the table? because we have already a column which is sorted.
3: Why we can create 1 Clustered Index + 249 Nonclustered Index = 250 Index on a table? I understand the requirement of 1 clustered index. But why 249?? Why not more than 249?
No difference SELECT ALL is the default as opposed to SELECT DISTINCT
Opinion varies. For performance reasons Clustered indexes should ideally be small, stable, unique, and monotonically increasing. Primary keys should also be stable and unique so there is an obvious fit there. However clustered indexes are well suited for range queries. Looking up individual records by PK can perform well if the PK is nonclustered so some authors suggest not "wasting" the clustered index on the PK.
In SQL Server 2008 you can create up to 999 NCIs on a table. I can't imagine ever doing so but I think the limit was raised as potentially with "filtered indexes" there might be a viable case for this many. Indexes add a cost to data modification operations though as the changes need to be propagated in multiple places so I would imagine it would only be largely read only (e.g. reporting) databases that ever achieve even double figures of non clustered non filtered indexes.
For 3:
Everytime when you insert/delete record in the table ALL indexes must be updated. If you will have too many indexes it takes too long time.
If your table have more then 5-6 indexes I think you need take the time and check yourself.