Clustered primary key on unique identifier ID column in SQL Server - sql-server

If your ID column on a table is a unique identifier (Guid), is there any point creating a clustered primary key on the ID column?
Given that they are globally unique, how would the sorting work?

GUIDs as they are are terrible for performance since they are effectively random values (this "breaks" clustered index), and they are awful for indexes, since less entries fit on a single page/extent (SQL Server terms). SQL Server 2005 introduces newsequentialid() which helps solving first problem.

I strongly advise against using clustered Guid key... We had big performance issues on SQL server because of such poor design a few years ago.
Also check out: Improving performance of cluster index GUID primary key

Putting a clustered index on a guid column is not such a good idea (unless you're making use of sequential guids).
The clustered index determines the physical order of how the records are stored.
This means that, if you put a clustered index on a column that does not sequentially grow, SQL Server will have some work making sure that the records are correctly ordered physically when you insert new records.

The idea of having a sorted index is very good in itself as searching then becomes very efficient.
The problem however is that in case of a GUID one never searches with "WHERE GUID = xyz". So the whole concept is wasted. So I would suggest to rather have a clustered index on a column which is used most often as a SARG to increase the query efficiency.

Related

Primary key indexes

Is searching a primary key column in a table faster than searching a non primary key column due to the default clustered index on primary keys in SQL Server?
from a previous question; "The reason we specify keys for a table is primarily to improve the data integrity and usefulness of the data. Keys guarantee the table is free from duplicate data and therefore they allow the user/consumer of the data to identify information correctly. DBMS query optimizers and storage engines are designed to take advantage of keys so having a key will also give your DBMS the best chance of executing some queries efficiently but there's no guarantee that adding a key will improve performance in every case"
That being said the existence of an Index should make searching faster in most cases, but is unrelated to the key per-se
Searching according to an index is generally faster than searching without an index (unless your table is extremely small, or the index is in a terrible condition). The fact that this index also supports a primary key is inconsequential to this discussion.
In general a relational database will be optimised to use primary keys efficiently, however your database structure, your query and which database engine on what platform will all be major factors in the performance.
Additionally, if the non-primary key column is not indexed, it will be orders of magnitude slower, regardless of your query.
Is searching a primary key column in a table faster than searching a
non primary key column due to the default clustered index on primary
keys in SQL Server?
Since the optimizer doesn't care whether an index supports a constraint or not, I'll paraphrase the question as:
Is searching using the clustered index key column(s) faster than searching via a
non-clustered index key column(s)?
A clustered index seek a disk-based table is the fastest method to return the entire row 1) for singleton lookups and 2) for key range searches. This minimizes the number of logical reads need to locate and retrieve rows.
SQL Server uses clustered as the default for primary key indexes to leverage this performance benefit and encourage the practice that all tables should have a clustered index. Also, since the clustered index key is implicitly included in all non-clustered indexes as the row locator, the likelihood that non-clustered indexes cover queries is increased.
This is not to say that the primary key should always be the clustered index. There may be a better choice when queries most often use other columns in join/where clause predicates.

What is the best choice for clustered index when the primary key can't be used?

Given an application where the primary key for most tables is uuid, what is the best choice for a clustered index on these tables?
The clustered index is required because the back-end database is Azure SQL Server and Azure requires a clustered index.
The uuid was a design choice to satisfy the need for n clients to create entities in a disconnected state and then sync when connected.
We are not using the primary key as the clustered index in order to avoid fragmentation issues due to the randomness of uuids.
Considering the following data types:
int or bigint. Easy - can be auto incrementing (good/bad) but seems so arbitrary and has limited utility. Feels most like a hack.
datetime - increased utility - could be a createdOnServer column. but would result in some dupes and thus require uniqueifiers (how big a problem this is, I don't know)
datetime2 - wider than datetime but greater precision and less dupes.
Looking for comments on which is best, things to consider, or alternative ideas.
I am not sure why you dismiss auto incrementing int. Anything that is widely used, works very well and as long as you are not planning on merging the table (via a Union) with other versions of the table why wouldn't you want to use it? It will provide a very good key (not too wide) for a balanced B Tree that clustered indexes are behind the curtain. Remember that in a table, that has a clustered index, all the other indexes will use the clustered index column to get to the desired page and row so you want to make it as small as possible.
If you don't want to create an index you can consider to upgrade/move to a V12 Azure SQL Database. These don't have the requirement for a clustered index.
On the index itself: You should create it on the columns that you query on. If you do lookups on the UUIDs + a date for example, you should create the index on the two to supported your queries.

Should every User Table have a Clustered Index?

Recently I found a couple of tables in a Database with no Clustered Indexes defined.
But there are non-clustered indexes defined, so they are on HEAP.
On analysis I found that select statements were using filter on the columns defined in non-clustered indexes.
Not having a clustered index on these tables affect performance?
It's hard to state this more succinctly than SQL Server MVP Brad McGehee:
As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index.
BOL echoes this sentiment:
With few exceptions, every table should have a clustered index.
The reasons for doing this are many and are primarily based upon the fact that a clustered index physically orders your data in storage.
If your clustered index is on a single column monotonically increases, inserts occur in order on your storage device and page splits will not happen.
Clustered indexes are efficient for finding a specific row when the indexed value is unique, such as the common pattern of selecting a row based upon the primary key.
A clustered index often allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.).
Clustering can speed up queries where data is commonly sorted by a specific column or columns.
A clustered index can be rebuilt or reorganized on demand to control table fragmentation.
These benefits can even be applied to views.
You may not want to have a clustered index on:
Columns that have frequent data changes, as SQL Server must then physically re-order the data in storage.
Columns that are already covered by other indexes.
Wide keys, as the clustered index is also used in non-clustered index lookups.
GUID columns, which are larger than identities and also effectively random values (not likely to be sorted upon), though newsequentialid() could be used to help mitigate physical reordering during inserts.
A rare reason to use a heap (table without a clustered index) is if the data is always accessed through nonclustered indexes and the RID (SQL Server internal row identifier) is known to be smaller than a clustered index key.
Because of these and other considerations, such as your particular application workloads, you should carefully select your clustered indexes to get maximum benefit for your queries.
Also note that when you create a primary key on a table in SQL Server, it will by default create a unique clustered index (if it doesn't already have one). This means that if you find a table that doesn't have a clustered index, but does have a primary key (as all tables should), a developer had previously made the decision to create it that way. You may want to have a compelling reason to change that (of which there are many, as we've seen). Adding, changing or dropping the clustered index requires rewriting the entire table and any non-clustered indexes, so this can take some time on a large table.
I would not say "Every table should have a clustered index", I would say "Look carefully at every table and how they are accessed and try to define a clustered index on it if it makes sense". It's a plus, like a Joker, you have only one Joker per table, but you don't have to use it. Other database systems don't have this, at least in this form, BTW.
Putting clustered indices everywhere without understanding what you're doing can also kill your performance (in general, the INSERT performance because a clustered index means physical re-ordering on the disk, or at least it's a good way to understand it), for example with GUID primary keys as we see more and more.
So, read Tim Lehner's exceptions and reason.
Performance is a big hairy problem. Make sure you are optimizing for the right thing.
Free advice is always worth it's price, and there is no substitute for actual experimentation.
The purpose of an index is to find matching rows and help retrieve the data when found.
A non-clustered index on your search criteria will help to find rows, but there needs to be additional operation to get at the row's data.
If there is no clustered index, SQL uses an internal rowId to point to the location of the data.
However, If there is a clustered index on the table, that rowId is replaced by the data values in the clustered index.
So the step of reading the rows data would not be needed, and would be covered by the values in the index.
Even if a clustered index isn't very good at being selective, if those keys are frequently most or all of the results requested - it may be helpful to have them as the leaf of the non-clustered index.
Yes you should have clustered index on a table.So that all nonclustered indexes perform in better way.
Consider using a clustered index when Columns that contain a large number of distinct values so to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values
Disadvantage : It takes longer to update records if only when the fields in the clustering index are changed.
Avoid clustering index constructions where there is a risk that many concurrent inserts will happen on almost the same clustering index value
Searches against a nonclustered index will appear slower is the clustered index isn't build correctly, or it does not include all the columns needed to return the data back to the calling application. In the event that the non-clustered index doesn't contain all the needed data then the SQL Server will go to the clustered index to get the missing data (via a lookup) which will make the query run slower as the lookup is done row by row.
Yes, every table should have a clustered index. The clustered index sets the physical order of data in a table. You can compare this to the ordering of music at a store, by bands name and or Yellow pages ordered by a last name. Since this deals with the physical order you can have only one it can be comprised by many columns but you can only have one.
It’s best to place the clustered index on columns often searched for a range of values. Example would be a date range. Clustered indexes are also efficient for finding a specific row when the indexed value is unique. Microsoft SQL will place clustered indexes on a PRIMARY KEY constraint automatically if no clustered indexes are defined.
Clustered indexes are not a good choice for:
Columns that undergo frequent changes
This results in the entire row moving (because SQL Server must keep
the data values of a row in physical order). This is an important
consideration in high-volume transaction processing systems where
data tends to be volatile.
Wide keys
The key values from the clustered index are used by all
nonclustered indexes as lookup keys and therefore are stored in each
nonclustered index leaf entry.

Difference between clustered and nonclustered index [duplicate]

This question already has answers here:
What are the differences between a clustered and a non-clustered index?
(13 answers)
Closed 7 years ago.
I need to add proper index to my tables and need some help.
I'm confused and need to clarify a few points:
Should I use index for non-int columns? Why/why not
I've read a lot about clustered and non-clustered index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.
I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
GUIDs as PRIMARY KEY and/or clustered key
The clustered index debate continues
Ever-increasing clustering key - the Clustered Index Debate..........again!
Disk space is cheap - that's not the point!
Marc
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
A comparison of a non-clustered index with a clustered index with an example
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
EmployeeID
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
EmployeeName, EmployeeAddress, etc
. will all actually be stored in the leaf node of the clustered index itself.
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
SELECT * FROM FOO WHERE FOO.BAR = 2
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING
You might want to consider a clustered index on FOO.BAR.
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
faster to read than non cluster as data is physically storted in index order
we can create only one per table.(cluster index)
quicker for insert and update operation than a cluster index.
we can create n number of non cluster index.

Sql Server 2005 novice query

I am very beginner in SQL Server 2005 and I am learning it from online tutorial, here is some of my question:
1: What is the difference between Select * from XYZ and Select ALL * from XYZ.
2: The purpose of Clustered index is like to make the search easier by physically sorting the table [as far as I kknow :-)]. Let say if have primary column on a table than is it good to create a clustered index on the table? because we have already a column which is sorted.
3: Why we can create 1 Clustered Index + 249 Nonclustered Index = 250 Index on a table? I understand the requirement of 1 clustered index. But why 249?? Why not more than 249?
No difference SELECT ALL is the default as opposed to SELECT DISTINCT
Opinion varies. For performance reasons Clustered indexes should ideally be small, stable, unique, and monotonically increasing. Primary keys should also be stable and unique so there is an obvious fit there. However clustered indexes are well suited for range queries. Looking up individual records by PK can perform well if the PK is nonclustered so some authors suggest not "wasting" the clustered index on the PK.
In SQL Server 2008 you can create up to 999 NCIs on a table. I can't imagine ever doing so but I think the limit was raised as potentially with "filtered indexes" there might be a viable case for this many. Indexes add a cost to data modification operations though as the changes need to be propagated in multiple places so I would imagine it would only be largely read only (e.g. reporting) databases that ever achieve even double figures of non clustered non filtered indexes.
For 3:
Everytime when you insert/delete record in the table ALL indexes must be updated. If you will have too many indexes it takes too long time.
If your table have more then 5-6 indexes I think you need take the time and check yourself.

Resources