I have one table in sql server having duplicate ID,but I can not delete those duplicate records .Now the requirement is to create the primary key on that column which is having duplicate data. Is there any way to create the primary key without changing the data.
No, there is no way to a add a PRIMARY KEY constraint to a column that already has duplicate values.
Creating and Modifying PRIMARY KEY Constraints:
When a PRIMARY KEY constraint is added to an existing column or
columns in the table, the Database Engine examines the existing column
data and metadata to make sure that the following rules for primary
keys:
The columns cannot allow for null values.
There can be no duplicate values.
If a PRIMARY KEY constraint is added to a
column that has duplicate values or allows for null values, the
Database Engine returns an error and does not add the constraint.
In case ID column is incremental, then a possible workaround is to add a unique filtered index:
CREATE UNIQUE INDEX AK_MyUniqueIndex ON dbo.MyTable (ID)
WHERE ID > ... max value of existing ID here
This way, uniqueness will be applied only to newly added records.
I know this is old, but, had this idea that I wanted to share:
Step 1. Add a non-nullable int column with a default value that can
be 0
Optional step. Update that column to a 1, so you are able to
identify this existing records afterwards.
Step 2. Update column in all existing rows where there are duplicates with a standard rownumber() using a combination of unique columns or all columns.
Step 3. Define primary key with your ID column first (So, it is
indexed first), then add Step 1 column.
And you are done and with a special column that can helps identify the duplicates easily and the new records which will be all marked as 0, but the best practice would be to add a character or number to all Ids if possible and standardize (This approach helps to do that afterwards), or use something like by year sequence, etc.
Related
I would like to know which index gets automatically created when we create primary key and unique key on Oracle table.
Assume that there is no index present on table.
Oracle will create a unique index in both cases. A primary key can not contain null values. A unique key can.
If you want to try and see what happens, information about indexes is available in the user_indexes view.
We are trying to enforce a unique table constraint on certain datatables in SQL Server, which I have working but I am running into a few issues. I want it to be ordered by Primary Key, but if I include that in the Index Keys, it no longer enforces uniqueness because it obviously will always have a unique ID since its a primary key.
If I remove the ID from the indexed keys, it works as it is supposed to but it no longer sorts by Primary Key anymore, which is what I want. It sorts by another one of the columns.
How do I include the primary key in the constraint so I can use it for sorting, but have it be ignored when checking the table constraint for uniqueness(ie, it should still not allow a new record to be written if all other info is the same other than ID)?
UPDATE: How do I handle a situation where a table has more columns than can be put into an index? Can I not enforce no duplicate entries in these?
A Relational database is built based on Set theory and Predicate logic. And according to Set theory There is no difference between sets like A {1,2,3} & B {2,3,1}.
So this is the reason there is no guarantee in any RDBMS where results will come in particular order.
But you will get them in your order when you provide an ORDER BY in the SELECT statement explicitely.
So better you do it in front end or by adding an Order By clause to your query.
In SQL Server I have the following lookup table that holds degree levels:
create table dbo.DegreeLevel
(
Id int identity not null
constraint PK_DegreeLevel_Id primary key clustered (Id),
Name nvarchar (80) not null
constraint UQ_DegreeLevel_Name unique (Name)
)
Should I use identity on the ID?
When should I use identity or a simple int in a lookup table?
After dealing with multiple environments where we move changes from one environment to the next, I'd say not to use identity columns on look up tables.
Here's why: if you need to reference an ID as a "magic #", you need consistency. Ideally, you wouldn't ever reference a magic #, but in reality, that is not what is commonly done. And it's a pain to correct when the IDs are out of sync. And it's really not much more effort to insert the table's data with an ID.
In a lookup table, having a "normal" Id INT might be better, because it gives you the ability to pick and choose the Id values. You get to define which values you have, and what they mean.
Identity is very useful for actual data tables, where you just need to know that you have a good, unique ID value - but you don't really care about what that value is.
I guess it comes down to whether or not you have a natural candidate to use in the clustered index...
If you already have a property that can uniquely identify the row, then its definitely worth considering whether adding an identity column is the right move.
If you don't have a natural candidate, then you'd need to invent a value and in this case using an identity column or sequence is probably easier than hand-rolling something.
As an example of having a natural key, imagine a 'DegreeModule' table where each module had a 4-character reference code that was printed on course materials (e.g. U212)
In this case, I would definitely skip creating an internal identifier and use the natural identifier as primary key...
create table dbo.DegreeModule
(
Reference char(4) not null primary key clustered,
Name nvarchar(80) not null
constraint UQ_DegreeModule_Name unique (Name)
/* .. plus FK's for stuff like parent degree, prerequisites,etc .. */
)
When you specify Identity property on an integer column on any table, the column becomes an auto-incrementing integer column. If you want your lookup table to create the id value automatically when you insert any row, use identity. if you want to create it yourself, just define the column as int.
A Table can only have one identity column
You cannot manually insert / update values in an identity column unless you specify SET identity_insert on
If you are going to use some object relational mapping (ORM) tool, refer to its documentation. In that case, you most probably would like to allow ORM to handle the primary key and you should not use identity.
If you have no specific requirements for primary key generation, then using identity here is fine. Specific requirements may be: primary keys follow special format, primary keys should be globally unique, primary keys are imported from other database, e.g. by insert into DegreeLevel values (1, 'Bachelor') etc.
In a SQL Server db, what is the difference between a Primary Key and an Identity column? A column can be a primary key without being an indentity. A column cannot, however, be an identity without being a primary key.
In addition to the differences, what does a PK and Identity column offer that just a PK column doesn't?
A column can definitely be an identity without being a PK.
An identity is simply an auto-increasing column.
A primary key is the unique column or columns that define the row.
These two are often used together, but there's no requirement that this be so.
This answer is more of WHY identity and primary key than WHAT they are since Joe has answered WHAT correctly above.
An identity is a value your SQL controls. Identity is a row function. It is sequential, either increasing or decreasing in value, at least in SQL Server. It should never be modified and gaps in the value should be ignored. Identity values are very useful in linking table B to table A since the value is never duplicated. The identity is not the best choice for a clustered index in every case. If a table contains audit data the clustered index may be better being created on the date occurred as it will present the answer to the question " what happened between today and four days ago" with less work because the records for the dates are sequential in the data pages.
A primary key makes the column or columns in a row unique. Primary key is a column function. Only one primary key may be defined on any table, but multiple unique indexes may be created which simulates the primary key. Clustering the primary key is not always the correct choice. Consider a phone book. If the phone book is clustered by the primary key(phone number) the query to return the phone numbers on "First Street" will be very costly.
The general rules I follow for identity and primary key are:
Always use an identity column
Create the clustered index on the column or columns which are used in range lookups
Keep the clustered index narrow since the clustered index is added to the end of every other index
Create primary key and unique indexes to reject duplicate values
Narrow keys are better
Create an index for every column or columns used in joins
These are my GENERAL rules.
A primary key (also known as a candidate key) is any set of attributes that have the properties of uniqueness and minimality. That means the key column or columns are constrained to be unique. In other words the DBMS won't permit any two rows to have the same set of values for those attributes.
The IDENTITY property effectively creates an auto-incrementing default value for a column. That column does not have to be unique though, so an IDENTITY column isn't necessarily a key.
However, an IDENTITY column is typically intended to be used as a key and therefore it usually has a uniqueness constraint on it to ensure that duplicates are not permitted.
Major Difference between Primary and Identity Column
Primary Column:
Primary Key cannot have duplicate values.
It creates a clustered index for the Table.
It can be set for any column type.
We need to provide the primary column value while inserting in the table.
Identity Column:
Identity Column can have duplicate value.
It can only be set for Integer related columns like int, bigint, smallint, tinyint or decimal
No need to insert values in the identity column. It is inserted automatically based on the seed.
EDITS MADE BASED ON FEEDBACK
A key is unique to a row. It's a way of identifying a row. Rows may have none, one, or several keys. These keys may consist of one or more columns.
Keys are indexes with a unique constraint. This differentiates them from non-key indexes.
Any index with multi-columns is called a "composite index".
Traditionally, a primary key is viewed as the main key that uniquely identifies a row. There may only be one of these.
Depending on the table's design, one may have no primary key.
A primary key is just that - a "prime key". It's the main one that specifies the unique identity of a row. Depending on a table's design, this can be a misnomer and multiple keys express the uniqueness.
In SQL Server, a primary key may be clustered. This means the remaining columns are attached to this key at the leaf level of the index. In other words, once SQL Server has found the key, it has also found the row (to be clear, this is because of the clustered aspect).
An identity column is simply a method of generating a unique ID for a row.
These two are often used together, but this is not a requirement.
You can use IDENTITY not only with integers, but also with any numeric data type that has a scale of 0
primary key could have scale but its not required.
IDENTITY, combined with a PRIMARY KEY or UNIQUE constraint, lets you provide a simple unique row identifier
Primary key emphasizing on uniqueness and avoid duplication value for all records on the same column, while identity provides increasing numbers in a column without inserting data.
Both features could be on a single column or on difference one.
I am using SQL Server 2008, had a table with an id (numeric) column as the primary key. Also had a unique index on three varchar columns. I was able to add a row with the exact same set of the three columns. I verified it with a simple query on the values and 2 rows were returned.
I edited the index and added the id column. When I tried to edit it again and remove the id column it complained that there were duplicate rows, it deleted the index but couldn't create it.
I then clean the database of the duplicated, recreated the index with the same 3 varchars as unique and nonclustered and now it works properly, not allowing duplicates.
Does anyone know why the uniqueness of this index was ignored?
The index could had been disabled (see Disabling Indexes), your 'duplicate' values may had been different (trailing spaces for example), or your test may be incorrect.
For sure you did not insert a duplicate in a enforced unique index.
I'm not a pro on this subject, but the 'is unique'-setting of the index probably refers to the way the index is build/stored, not that it is a constraint. This probably also means the performance of the index is sub-optimal.
When creating the index, the DBMS might check this.