In SQL Server, I have a non nullable column with a unique clustered index on it.
If I make this column a Primary Key the exact same index is created automatically plus
the column is recognized as a Primary Key.
I understand the abstract/semantic difference.
(Primary Key identifies the entity, while any other column with this index may not.
For example, a Person can have Email field which is Unique,Non-nullable... but can be changed)
But what bothers me is the actual difference when it comes to the DB engine itself.
What will happen if I will just create an Id column, make it non-nullable, create a unique clustered index for it, make it Identity Increment, but without the Primary Key constraint?
In what scenarios the Primary Key constraint comes into play?
(I've looked at many related questions before asking this, but all the answers I saw ended up with an abstract/theoretical explanation).
Nothing will be different really. You specify PRIMARY KEY to relay your intentions, not so that the engine does anything differently. When constructing a query plan, the optimizer will still use the uniqueness for all of its properties, and will still use the clustered index for all of its properties, regardless of whether you technically created it as a PRIMARY KEY. When creating a FOREIGN KEY, you can still reference the column(s) specified as unique (clustered or not). The difference is solely in the metadata (sys.indexes.is_primary_key) and in SSMS' representation to you (oh and the fact that you can create a unique clustered index on a NULLable column, but you can't create a PRIMARY KEY on that column).
In fact there are many cases where you want to completely separate the clustered index from the PRIMARY KEY. If you have a table where the PK is a GUID, for example, and you are typically running date range queries against the table, you are probably better off having the PK be non-clustered and have a clustered index on a naturally increasing column (the datetime column) - both to minimize page splits on heavy insert activity and also to best assist date range queries. The non-clustered index will be perfectly fine for looking up individual GUIDs. (I wanted to mention that because a lot of people think the primary key has to be clustered. Not true.)
Also interesting to note that if you create a PRIMARY KEY constraint, then create a unique clustered index with the same name using DROP_EXISTING, the is_primary_key column will still be 1 and Object Explorer will still show the index name under Keys.
Here is one scenario - a lot of code to data mapping frameworks look at the database metadata (what are the primary keys, foreign keys, etc) to determine how code is executed. For example Hibernate requires a primary key.
A typical scenario might be generating a where clause for an update.
Related
I have a table with composite primary key of 7 fields, but table is allowing duplicate entries with primary key. Later I noticed it also has Unique clustered index with 10 fields including 7 of primary key. Is that the reason system allowing to insert duplicate primary key data?
If so, I am not able to think of reason creating Unique Index with field additional fields not much in use for searching data except limit of fields in composite key. I tried to look for answer but didn't find result for limitation. Can someone please help. I am using sybase.
Peformance-wise, clustered key, set correctly (aka does not include all fields you deem unique), gives you the most advantage.
7-column PK is a strange construct; if you need to guard uniqueness, I would go for a combination of a cluster + unique constraint.
I have unique contraint on 5 nullable columns that represent identifier of one row.
Is it okay to create unique key and create clustered index on it instead of primary key? I cannot use primary key on these columns because they are nullable, and i cannot create identity column because there are lot of deletes and inserts and it will make overflow on this identity column.
Yes, and there's an argument that this is actually "better" than a primary key, as the rule that a primary key column is non nullable is in many ways an artificial constraint.
If you make it a UNIQUE CLUSTERED INDEX then you get just about everything that a primary key brings to the table except the unwanted rule that the columns must be non nullable. However, they must still be unique, so you could only ever have one row where all five columns in your index are null for example.
So you can use your index when creating foreign key constraints, you will guarantee the order data is stored, and each row must be unique. However, the index probably won't be incredibly useful for querying, and, because it's going to be wide, and you said there's a lot of deletes/ inserts, it will have a tendency to fragment your data.
Personally, I would be tempted to make it a unique constraint, but not clustered. Then it will do the job of keeping non-unique data from being created.
You could then add a surrogate key and make this the primary key. I doubt you would ever "run out" (or "overflow"?) of numbers doing this.
So why would I use a surrogate key?
Your surrogate key will be much narrower, so less impact from fragmentation due to so many inserts/ updates/ deletes.
It's then useful if you need to extend your database. Say you only have one table, and this is always going to be the only table in the entire database. In this one scenario it would make sense to not bother with a surrogate key. It doesn't give you any value; it's just an unnecessary overhead.
However, let's assume that you have other tables hanging off your "main" table (the one with 5 columns forming a unique key). Adding a surrogate key here allows you to make any child tables with a single id that links back to the parent table. The alternative would be to enforce the addition of ALL five columns forming the unique (candidate) key every time you create a child table.
Now you have a narrow clustered index that actually serves a purpose, and the fragmentation will not be quite as bad as it would with five columns.
I am doing a review of some DB tables that were created in our project and came across this. The table contains an Identity column (ID) which is the primarykey for the table and a clustered index has been defined using this ID column. But when I look at the SPROC that retrieves records from this table, I see that the ID column is never used in the query and they query the records based on a USERID column (this column is not unique) and there can be multiple records for the same USERID.
So my question is there any advantage/purpose in creating a clustered index when we know that the records wont be queried with that column?
If the IDENTITY column is never used in WHERE and JOIN clauses, or referenced by foreign keys, perhaps USERID should be a clustered primary key. I would question the need for the ID column at all in that case.
The best choice for the clustered index depends much on how the table is queried. If the majority of queries are by USERID, then it should probably be a unique clustered index (or clustered unique constraint) and the ID column non-clustered.
Keep in mind that the clustered index key is implicitly included in all non-clustered indexes as the row locator. The implication is that non-clustered indexes may more likely cover queries and non-clustered index leaf node pages wider as a result.
I would say your table is mis-designed. Someone apparently thought every table needs a primary key and the primary key is the clustered index. Adding a system-generated unique number as an identifier just adds noise if that number isn't used anywhere. Noise in the clustered index is unhelpful, to say the least.
They are different concepts, by the way. A primary key is a data modeling concern, a logical concept. An index is a physical design issue. A SQL DBMS must support primary keys, but need not have any indexes, clustered or no.
If USERID is what is usually used to search the table, it should be in your clustered index. The clustered index need not be unique and need not be the primary key. I would look at the data carefully to see if some combination of USERID and another column (or two, or more) form a unique identifier for the row. If so, I'd make that the primary key (and clustered index), with USERID as the first column. If query analysis showed that many queries use only USERID and nothing else (for existence testing) I might create a separate index just of USERID.
If no combination of columns constitutes a unique identifier, you have logical problem, to wit: what does the row mean? What aspect of the real world does it represent?
A basic tenet of the Relational Model is that elements in a relation (rows in a table) are unique, that each one identifies something. If two rows are identical, they identify the same thing. What does it mean to delete one of them? Is the thing that they both identify still there, or not? If it is, what purpose did the 2nd row serve?
I hope that gives you another way to think about clustered indexes and keys. I wouldn't be surprised if you find other tables that could be improved, too.
What meaning does the concept of a primary key have to the database engine of SQL Server? I don't mean the clustered/nonclustered index created on the "ID" column, i mean the constraint object "primary key". Does it matter if it exists or not?
Alternatives:
alter table add primary key clustered
alter table create clustered index
Does it make a difference?
In general, a KEY is a column (or combination of columns) that uniquely identifies each row in the table. It is possible to have multiple KEYs in a table (for example, you might have a Person table where both the social security number as well as an auto-increasing number are both KEYs).
The database designer chooses one of theses KEYs to be the PRIMARY KEY. Conceptually, it does not matter which KEY is chosen as the PRIMARY KEY. However, since the PRIMARY KEY is usually used to refer to entries in this table from other tables (through FOREIGN KEYs), choosing a good PRIMARY KEY can be relevant w.r.t. (a) performance and (b) maintainability:
(a) Since the primary key will usually be used in JOINs, the index on the primary key (its size, its distribution, ...) is much more relevant to performance than other indexes.
(b) Since the primary key is used as a foreign key in other tables, changing the primary key value is always a hassle, since all the foreign key values in the other tables need to be modified as well.
A PRIMARY KEY is a constraint - this is a logical object that says something about the rules that your data must adhere to. An index is an access structure - it says something about the way the machine can search through the data. To implement a PRIMARY KEY, most RDBMS-es use an index.
Some RDBMS-es (fe. MySQL) do not make the distinction between PRIMARY KEY or UNIQUE constraint and the index that is used to help implement it. But for example, Oracle does: in oracle you can do something like: ALTER TABLE t DROP pk KEEP INDEX. This is useful if you want to change the definition of the primary key (for example, you are replacing a natural primary key with a surrogate primary key) but you still want to have a unique constraint on the original primary key columns without rebuilding the index. That makes sense if the index is very large and would take considerable table and resources to rebuild.
From what I can see, MS SQL does not make the distinction. I mean a tool like Management studio does display "Keys", "Indexes" and "Constraints" in differrent folders, but changing the name of one immediately changes the name of the corresponding objects in the other folders. So I think here the distinction is not really present in this case.
I have a junction table in my SQL Server 2005 database that consist of two columns:
object_id (uniqueidentifier)
property_id (integer)
These values together make a compound primary key.
What's the best way to create this PK index for SELECT performance?
If the columns were two integers, I would just use a compound clustered index (the default). However, I've heard bad things about clustered indexes when uniqueidentifiers are involved.
Anyone have experience with this situation?
Yes, GUID's are really bad for clustered indexes, since the GUIDs is by design very random and thus leads to massive fragmentation and thus performance problems.
See Kim Tripp's blog - most notably "The CLustered Index Debate continues" and "GUIDs as PRIMARY and/or CLUSTERED key" - for a lot of valuable background info.
If you really need to have an index on these TWO columns, I'd suggest a non-clustered index - it can be a primary index - just better not a clustered index.
Marc
One alternative is to use what is known as a surrogate key (which incidentally can also be assigned as the primary key).
For example, adding an identity column that can be used to uniquely identify each row within the table i.e. a primary key.
Understand that a GUID is used to identify a record globally within SQL Server (which arguably is not a relationally correct practice however that is not a concern for us here).
The identity column, now also a primary key can/will have a clustered index applied. A separate, nonclustered index can then be applied to the compound key described by the original poster.
This practice avoids the issue of frequent page splits occurring within the clustered index (inserts into a random GUID primary key) as well as producing a smaller and more efficient clustered index, whilst also preserving the relationships defined within the database.
Surrogate Key Definition: http://en.wikipedia.org/wiki/Surrogate_key
I i would create an identity column & then make this your primary key & clustered index. You can then create non clustered indexes on objectid propertyid as needed.
You can create a unique constraint to ensure uniqueness of your key.
The reason for this is that the rows will be inserted sequentially, so your reducing page splits. in addition using an integer for your PK means you have a smaller value for your clustered index.