SQL Server identity column and imports - sql-server

I'll try and explain in simple terms leaving out the whys and wheres of how this occured.
Currently there are 2 databases that need to be merged, they have the same tables etc and in some cases lookup tables are identical, in some cases they are and in some cases records in one database have different identity values for there equivalent in the other DB. So it's a mess.
Let us say on one of the databases we update all the identity values bu adding 10,000 to them and updating the related records. Then we could import the data as is and yes in some cases lookups would have the same value twice with different identities.
The question will not be regarding the above mess :). I want to know after re enabling the identity column we will have seed values of
1,2,3,4,5 etc and 10001, 10002, 10003 etc. Should more rows be inserted and they continue from lets just say 9999 will the identity column use 10,000 and then 10,004 or will SQL Server complain on the next insert that the identity value is already used?

I just tested this with simple INSERT's: you have to disable IDENTITY_INSERT first for each table you want to import data
SET IDENTITY_INSERT table OFF
Then you can insert your data with their original identity column values (which you'll need in order to maintain the references correctly)
After
SET IDENTITY_INSERT table ON
SQL Server continues the sequence with the highest element plus one, so in your case (after inserting IDs 10001, 10002, 10003) it would continue with 10004.

It's important to realise that, although they frequently appear together, IDENTITY and PRIMARY KEY are two orthogonal concepts1. So, to the question as asked, the answer is no - as IDENTITY column will quite happily provide a value that has already been used in the same column:
set nocount on
go
create table II (
ID int IDENTITY(1,1) not null,
Value varchar(10) not null
)
insert into II(Value) values ('abc'),('def')
set identity_insert II on
insert into II(ID,Value) values (6,'ghi')
set identity_insert II off
select * from II
insert into II(Value) values ('jkl')
select * from II
GO
dbcc checkident (II, RESEED, 5);
GO
insert into II(Value) values ('mno'),('pqr')
select * from II
Results:
ID Value
----------- ----------
1 abc
2 def
6 ghi
ID Value
----------- ----------
1 abc
2 def
6 ghi
7 jkl
Checking identity information: current identity value '7'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
ID Value
----------- ----------
1 abc
2 def
6 ghi
7 jkl
6 mno
7 pqr
Whereas a PRIMARY KEY will complain if you attempt to insert a duplicate value:
create table III (
ID int IDENTITY(1,1) not null PRIMARY KEY,
Value varchar(10) not null
)
insert into III(Value) values ('abc'),('def')
set identity_insert III on
insert into III(ID,Value) values (6,'ghi')
set identity_insert III off
select * from III
insert into III(Value) values ('jkl')
select * from III
GO
dbcc checkident (III, RESEED, 5);
GO
insert into III(Value) values ('mno'),('pqr')
select * from III
go
(The only different from the previous script is the table name and the addition of PRIMARY KEY)
Results:
ID Value
----------- ----------
1 abc
2 def
6 ghi
ID Value
----------- ----------
1 abc
2 def
6 ghi
7 jkl
Checking identity information: current identity value '7'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK__III__3214EC27FCCBBCB7'. Cannot insert duplicate key in object 'dbo.III'. The duplicate key value is (6).
The statement has been terminated.
ID Value
----------- ----------
1 abc
2 def
6 ghi
7 jkl
1 The third concept that is frequently conflated with these two is that of the Clustered Index. It's perfectly possible for a table to have a Primary Key, an Identity Column and a Clustered Index that have no columns in common.

Related

Another way to insert with an Id, but with out updating the Identity seed

I have two tables, an old and a new (the old will be replaced by the new), both tables will and are being used.
We will migrate items over but some processes still use the old table and cannot be swapped over right away, so we want to create a "dummy item" in the old table using the new tables data, so we can utilize some old processes.
We want to insert the dummy items above at 500000 but keep the seed of the OLD table id below the 500000
Test table
CREATE TABLE [dbo].[OLD]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[RowNumber] [int] NOT NULL
)
INSERT INTO [dbo].[OLD] ([RowNumber])
VALUES (1), (2)
SET IDENTITY_INSERT OLD ON
INSERT INTO OLD (id, [RowNumber])
VALUES (500000, 500000)
SET IDENTITY_INSERT OLD OFF
-- -- Uncomment for the reseeding
-- DECLARE #Reseed AS INT
-- SET #Reseed = (SELECT MAX(ID) FROM OLD WHERE ID < 500000)
-- DBCC CHECKIDENT('OLD', RESEED,#Reseed)
INSERT INTO [dbo].[OLD] ([RowNumber])
VALUES (3)
SELECT *
FROM old
DROP TABLE [dbo].[OLD]
Current data
Id
RowNumber
1
1
2
2
So inserting, with out IDENTITY_INSERT, RowNumber 500003 after the 500000 IDENTITY_INSERT and have the Id automagically be the old seed.
Looking for:
Id
RowNumber
1
1
2
2
500000
500000
3
500003
I looked into reseeding but feel it could be quite dangerous, and the NOT FOR REPLICATION seems to be only for the SQL Server replication. Is there another way that doesn't feel as dangerous?

Resetting Primary key without deleting truncating table

I have a table with a primary key, now without any reason I don't know when I am inserting data it is being loaded like this
Pk_Col Some_Other_Col
1 A
2 B
3 C
1002 D
1003 E
1901 F
Is there any way I can reset my table like below, without deleting/ truncating the table?
Pk_Col Some_Other_Col
1 A
2 B
3 C
4 D
5 E
6 F
You can't update the IDENTITY column so DELETE/INSERT is the only way. You can reseed the IDENTITY column and recreate the data, like this:
DBCC CHECKIDENT ('dbo.tbl',RESEED,0);
INSERT INTO dbo.tbl (Some_Other_Col)
SELECT Some_Other_Col
FROM (DELETE FROM tbl OUTPUT deleted.*) d;
That assumes there are no foreign keys referencing this data.
If you really, really want to have neat identity values you can write a cursor (slow but maintainable) or investigate any number of "how can I find gaps in my sequence" question on SO and perform an UPDATE accordingly (runs faster but tricky to get right). This becomes exponentially harder when you start having foreign keys pointing back to this table. Be prepared to re-run this script any time data is put into, or removed from this table.
Edit: IDENTITY columns cannot be updated per se. You can, however, SET IDENTITY_INSERT dbo.MyTable ON;, INSERT a row with the desired IDENTITY value and the values from the other columns of an existing row, then DELETE the existing row. The nett effect on the data being the same as an UPDATE.
The only sensible reason to do this is if your table has about two billion rows and you're about to run out of integers for your identity column. If that's the case you have a whole world of other stuff to worry about, too.
But seriously - listen to #Damien, don't worry about it.
ALTER TABLE #temp1
DROP CONSTRAINT PK_Id
ALTER TABLE #temp1
DROP COLUMN Id
ALTER TABLE #temp1
ADD Id int identity(1,1)
Try this one.

Adding Unique ID to table

I am trying to add a unique ID to an existing table that has data inserted into it. I don't need a new value with each row, rather, with each instance of an insert. The time stamp indicates a new insert. Can anyone be kind enough point me in the right direction? My current table is basically column a and the time stamp.
ID COLUMN A TIME STAMP
1 abc 05-09-2013 11:00:23
1 bcd 05-09-2013 11:00:23
1 ab3 05-09-2013 11:00:23
2 abc 05-09-2013 11:15:00
2 123 05-09-2013 11:15:00
3 abc 05-09-2013 11:18:07
4 abc 05-09-2013 11:19:55
4 123 05-09-2013 11:19:55
4 165 05-09-2013 11:19:55
4 def 05-09-2013 11:19:55
Can't think of any easy way to put unique id based on each insert instance. One idea can be to use trigger on the table where you can inspect how the data getting inserted and add ID value before data gets inserted into table.
First create a ID column which allows null.
Then write a procedure which has a cursor on a query which does a order by and group by on time stamp column.
For every row increase a count by one
update table with id = counter whwre times tamp = time stamp in query
At end of this add constraint on id column to make it not null
You are going to likely want an auto-increment field. Use IDENTITY for this in SQL Server. This will enable you to insert without supplying this p_id value and the value will be auto incremented.
You don't want to use timestamps for uniqueness because it is difficult and costly to index. It is also difficult to ensure uniqueness among time values.
CREATE TABLE Persons
(
P_Id int PRIMARY KEY IDENTITY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
If your insert is not very frequent. You can use time stamp to get unique Id.
e.g. 130509110023
You can do some math to get appropriate function as required data length, which will give you unique id

SQL Server dependent Identity - is there such a thing?

I use SQL Server 2008 R2.
I'm looking for a feature that I describe as dependent identity.
I'll explain by an example.
consider a table like this one:
script
CREATE TABLE [dbo].[Rooms](
[RoomID] [int] NOT NULL,
[ItemID] [int] NOT NULL,
[ItemDescription] [nvarchar] (250))
GO
data:
RoomID ItemID ItemDescription
------ ------ ---------------
7 1 Door
7 2 Window (West)
7 3 Window (North)
8 1 Door
8 2 Table #1
8 3 Table #2
7 4 Table #1
8 4 Chair #1
7 5 Table #2
7 6 Table #3
8 5 Chair #2
(can anyone tell the secret how to format an example table here?)
I would have love to be able to declare a dependent identity column like this:
ItemID [int] Identity(RoomID,1,1) NOT NULL
A new row in [Rooms] should triggers a test for the max value of ItemID where RoomID = #roomID and add 1.
Instead of update with a change in RoomID use delete and insert the required data.
Nowadays I do that programmatically like this:
DECLARE #roomID INT
SET #roomID = 7
INSERT INTO [Allocation].[dbo].[Rooms]
([RoomID], [ItemID], [ItemDescription]) VALUES (#roomID,
(SELECT max([ItemID])+1 FROM [Allocation].[dbo].[Rooms] WHERE [RoomID]=#roomID)
,'Chair #1')
GO
So, Is there such a feature?
In the probable case there is none, could I program the server to set next dependent identity for me automatically, given a specific table, parent column and dependent identity column?
You can use a trigger, and an index to improve performance and ensure there are no duplicates.
Change your table to have a primary key, and allow null for ItemID
CREATE TABLE [dbo].[Rooms](
[RoomID] [int] NOT NULL,
[ItemID] [int] NULL,
[ItemDescription] [nvarchar](250) NULL,
[Id] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_Rooms] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
)
and then add a trigger
CREATE TRIGGER RoomTrigger
ON Rooms
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
update Rooms
set
ItemID = (select coalesce(MAX(itemid), 0) + 1
from Rooms r where r.RoomID = inserted.RoomID )
from
inserted where Rooms.Id = inserted.Id
END
Then you can do this
insert into Rooms (RoomID, ItemDescription) values (1, 'Test')
insert into Rooms (RoomID, ItemDescription) values (1, 'Test')
which results in
RoomID ItemID ItemDescription Id
2 0 Test 1
2 1 Test 2
As suggested by marc_s I've used SQL Query Stress with 10 threads to see what happens with this trigger under load. I didn't get any duplicates at all (using the default isolation level), but I did get loads of deadlocks as I would have expected.
Using the original query from the question I get a lot of duplicates.
Using the trigger approach I get deadlocks and results like this:
RoomID ItemID ItemDescription Id
1 6 Test 6
1 7 Test 9
1 8 Test 902
1 9 Test 903
Here ItemID is contiguous, but about 900 out of 1000 rows failed to be inserted leaving large gaps in Id.
If we add the following index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_Rooms] ON [dbo].[Rooms]
(
[RoomID] ASC,
[ItemID] ASC
)
in order to guarantee no duplicates, and improve the performance of calculating Max(ItemId) for a particular RoomID, then now:
the original query from the question causes duplicates and only manages to insert 500 rows.
the trigger version using the default isolation level succeeds without any deadlocks or errors and runs very fast.
Using the trigger with isolation level = serializable brings back deadlocks so only 40% of the inserts succeed (but no exceptions due to duplicates).
As a final test tried with trigger + 50 threads + isolation level = default. No errors.

Question about skipping IDs in an identity column in MSSQL

Say I have an MSSQL table with two columns: an int ID column that's the identity column and some other datetime or whatever column. Say the table has 10 records with IDs 1-10. Now I delete the record with ID = 5.
Are there any scenarios where another record will "fill-in" that missing ID? I.e. when would a record be inserted and given an ID of 5?
No, unless you specifically enable identity inserts (typically done when copying tables with identity columns) and insert a row manually with the id of 5. SQLServer keeps track of the last identity inserted into each table with identity columns and increments the last inserted value to obtain the next value on insert.
Only if you manually turn off identity IDs by using SET IDENTITY_INSERT command and then do a insert with ID=5
Otherwise MS-SQL will always increment to a higher number and missing slots are never re-used.
One scenario not already mentioned where another record will "fill-in" missing IDENTITY values is when the IDENTITY is reseeded. Example (SQL Server 2008):
CREATE TABLE Test
(
ID INTEGER IDENTITY(1, 1) NOT NULL,
data_col INTEGER NOT NULL
);
INSERT INTO Test (data_col)
VALUES (1), (2), (3), (4);
DELETE
FROM Test
WHERE ID BETWEEN 2 AND 3;
DBCC CHECKIDENT ('Test', RESEED, 1)
INSERT INTO Test (data_col)
VALUES (5), (6), (7), (8);
SELECT T1.ID, T1.data_col
FROM Test AS T1
ORDER
BY data_col;
The results are:
ID data_col
1 1
4 4
2 5
3 6
4 7
5 8
This shows that, not only are the 'holes' filled in with new auto-generated values, values that were auto-generated before the reseed are resued and can even duplicate existing IDENTITY values.

Resources