SQL Server fastest way to change data types on large tables - sql-server

We need to change the data type of about 10 primary keys in our db from numeric(19,0) to bigint. On the smaller tables a simple update of the datatype works just fine but on the larger tables (60-70 million rows) it takes a considerable amount of time.
What is the fastest way to achieve this, preferably without locking the database.
I've written a script that generates the following (which I believe I got from a different SO post)
--Add a new temporary column to store the changed value.
ALTER TABLE query_log ADD id_bigint bigint NULL;
GO
CREATE NONCLUSTERED INDEX IX_query_log_id_bigint ON query_log (id_bigint)
INCLUDE (id); -- the include only works on SQL 2008 and up
-- This index may help or hurt performance, I'm not sure... :)
GO
declare #count int
declare #iteration int
declare #progress int
set #iteration = 0
set #progress = 0
select #count = COUNT(*) from query_log
RAISERROR ('Processing %d records', 0, 1, #count) WITH NOWAIT
-- Update the table in batches of 10000 at a time
WHILE 1 = 1 BEGIN
UPDATE X -- Updating a derived table only works on SQL 2005 and up
SET X.id_bigint = id
FROM (
SELECT TOP 10000 * FROM query_log WHERE id_bigint IS NULL
) X;
IF ##RowCount = 0 BREAK;
set #iteration = #iteration + 1
set #progress = #iteration * 10000
RAISERROR ('processed %d of %d records', 0, 1, #progress, #count) WITH NOWAIT
END;
GO
--kill the pkey on the old column
ALTER TABLE query_log
DROP CONSTRAINT PK__query_log__53833672
GO
BEGIN TRAN; -- now do as *little* work as possible in this blocking transaction
UPDATE T -- catch any updates that happened after we touched the row
SET T.id_bigint = T.id
FROM query_log T WITH (TABLOCKX, HOLDLOCK)
WHERE T.id_bigint <> T.id;
-- The lock hints ensure everyone is blocked until we do the switcheroo
EXEC sp_rename 'query_log.id', 'id_numeric';
EXEC sp_rename 'query_log.id_bigint', 'id';
COMMIT TRAN;
GO
DROP INDEX IX_query_log_id_bigint ON query_log;
GO
ALTER TABLE query_log ALTER COLUMN id bigint NOT NULL;
GO
/*
ALTER TABLE query_log DROP COLUMN id_numeric;
GO
*/
ALTER TABLE query_log
ADD CONSTRAINT PK_query_log PRIMARY KEY (id)
GO
This works very well for the smaller tables but is extremely slow going for the very large tables.
Note this is in preparation for a migration to Postgres and the EnterpriseDB Migration toolkit doesn't seem to understand the numeric(19,0) datatype

If is not possible to change a primary key without locking. The fastest way with the least impact is to create a new table with the new columns and primary keys without foreign keys and indexes. Then batch insert blocks of data in sequential order relative to their primary key(s). When that is finished, add your indexes, then foreign keys back. Finally, drop or rename the old table and rename your new table to the systems expected table name.
In practice your approach will have to vary based on how many records are inserted, updated, and/or deleted. If you're only inserting then you can perform the initial load, and top of the table just before your swap.
This approach should provide the fastest migration, minimal logs, and very little fragmentation on your table and indexes.
You have to remember that every time you modify a record, the data is being modified, indexes are being modified, and foreign keys are being checked. All within one implicit or explicit transaction. The table and/or row(s) will be locked while all changes are made. Even if your database is set to simple logging, the server will still write all changes to the log files. Updates actually are a delete paired with an insert so it is not possible to prevent fragmentation during any other process.

Related

Can I determine when a Azure SQL DB row was last updated? [duplicate]

I need to create a new DATETIME column in SQL Server that will always contain the date of when the record was created, and then it needs to automatically update whenever the record is modified. I've heard people say I need a trigger, which is fine, but I don't know how to write it. Could somebody help with the syntax for a trigger to accomplish this?
In MySQL terms, it should do exactly the same as this MySQL statement:
ADD `modstamp` timestamp NULL
DEFAULT CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
Here are a few requirements:
I can't alter my UPDATE statements to set the field when the row is modified, because I don't control the application logic that writes to the records.
Ideally, I would not need to know the names of any other columns in the table (such as the primary key)
It should be short and efficient, because it will happen very often.
SQL Server doesn't have a way to define a default value for UPDATE.
So you need to add a column with default value for inserting:
ADD modstamp DATETIME2 NULL DEFAULT GETDATE()
And add a trigger on that table:
CREATE TRIGGER tgr_modstamp
ON **TABLENAME**
AFTER UPDATE AS
UPDATE **TABLENAME**
SET ModStamp = GETDATE()
WHERE **ID** IN (SELECT DISTINCT **ID** FROM Inserted)
And yes, you need to specify a identity column for each trigger.
CAUTION: take care when inserting columns on tables where you don't know the code of the application. If your app have INSERT VALUES command without column definition, it will raise errors even with default value on new columns.
This is possible since SQL Server 2016 by using PERIOD FOR SYSTEM_TIME.
This is something that was introduced for temporal tables but you don't have to use temporal tables to use this.
An example is below
CREATE TABLE dbo.YourTable
(
FooId INT PRIMARY KEY CLUSTERED,
FooName VARCHAR(50) NOT NULL,
modstamp DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL,
MaxDateTime2 DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,
PERIOD FOR SYSTEM_TIME (modstamp,MaxDateTime2)
)
INSERT INTO dbo.YourTable (FooId, FooName)
VALUES (1,'abc');
SELECT *
FROM dbo.YourTable;
WAITFOR DELAY '00:00:05'
UPDATE dbo.YourTable
SET FooName = 'xyz'
WHERE FooId = 1;
SELECT *
FROM dbo.YourTable;
DROP TABLE dbo.YourTable;
It has some limitations.
The time stored will be updated by the system and always be UTC.
There is a need to declare a second column (MaxDateTime2 above) that is completely superfluous for this use case. But it can be marked as hidden making it easier to ignore.
Okay, I always like to keep track of not only when something happened but who did it!
Lets create a test table in [tempdb] named [dwarfs]. At a prior job, a financial institution, we keep track of inserted (create) date and updated (modify) date.
-- just playing
use tempdb;
go
-- drop table
if object_id('dwarfs') > 0
drop table dwarfs
go
-- create table
create table dwarfs
(
asigned_id int identity(1,1),
full_name varchar(16),
ins_date datetime,
ins_name sysname,
upd_date datetime,
upd_name sysname,
);
go
-- insert/update dates
alter table dwarfs
add constraint [df_ins_date] default (getdate()) for ins_date;
alter table dwarfs
add constraint [df_upd_date] default (getdate()) for upd_date;
-- insert/update names
alter table dwarfs
add constraint [df_ins_name] default (coalesce(suser_sname(),'?')) for ins_name;
alter table dwarfs
add constraint [df_upd_name] default (coalesce(suser_sname(),'?')) for upd_name;
go
For updates, but the inserted and deleted tables exist. I choose to join on the inserted for the update.
-- create the update trigger
create trigger trg_changed_info on dbo.dwarfs
for update
as
begin
-- nothing to do?
if (##rowcount = 0)
return;
update d
set
upd_date = getdate(),
upd_name = (coalesce(suser_sname(),'?'))
from
dwarfs d join inserted i
on
d.asigned_id = i.asigned_id;
end
go
Last but not least, lets test the code. Anyone can type a untested TSQL statement in. However, I always stress testing to my team!
-- remove data
truncate table dwarfs;
go
-- add data
insert into dwarfs (full_name) values
('bilbo baggins'),
('gandalf the grey');
go
-- show the data
select * from dwarfs;
-- update data
update dwarfs
set full_name = 'gandalf'
where asigned_id = 2;
-- show the data
select * from dwarfs;
The output. I only waited 10 seconds between the insert and the delete. Nice thing is that who and when are both captured.
Create trigger tr_somename
On table_name
For update
As
Begin
Set nocount on;
Update t
Set t.field_name = getdate()
From table_name t inner join inserted I
On t.pk_column = I.pk_column
End
ALTER TRIGGER [trg_table_name_Modified]
ON [table_name]
AFTER UPDATE
AS
Begin
UPDATE table_name
SET modified_dt_tm = GETDATE() -- or use SYSDATETIME() for 2008 and newer
FROM Inserted i
WHERE i.ID = table_name.id
end

Missing rows after updating SQL Server index key column

"T-SQL Querying" book (http://www.amazon.com/Inside-Microsoft-Querying-Developer-Reference/dp/0735626030) has an interesting example, where, querying a table under default transaction isolation level during clustered index key column update, you may miss a row or read a row twice. It looks to be acceptable, since updating table/entity key is not a good idea anyway. However, I've updated this example so that the same happens, when you update non-clustered index key column value.
Following is the table structure:
SET NOCOUNT ON;
USE master;
IF DB_ID('TestIndexColUpdate') IS NULL CREATE DATABASE TestIndexColUpdate;
GO
USE TestIndexColUpdate;
GO
IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;
CREATE TABLE dbo.Employees
(
empid CHAR(900) NOT NULL, -- this column should be big enough, so that 9 rows fit on 2 index pages
salary MONEY NOT NULL,
filler CHAR(1) NOT NULL DEFAULT('a')
);
CREATE INDEX idx_salary ON dbo.Employees(salary) include (empid); -- include empid into index, so that test query reads from it
ALTER TABLE dbo.Employees ADD CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED(empid);
INSERT INTO dbo.Employees(empid, salary) VALUES
('A', 1500.00),('B', 2000.00),('C', 3000.00),('D', 4000.00),
('E', 5000.00),('F', 6000.00),('G', 7000.00),('H', 8000.00),
('I', 9000.00);
This is what needs to be done in the first connection (on each update, the row will jump between 2 index pages):
SET NOCOUNT ON;
USE TestIndexColUpdate;
WHILE 1=1
BEGIN
UPDATE dbo.Employees SET salary = 10800.00 - salary WHERE empid = 'I'; -- on each update, "I" employee jumps between 2 pages
END
This is what needs to be done in the second connection:
SET NOCOUNT ON;
USE TestIndexColUpdate;
DECLARE #c INT
WHILE 1 = 1
BEGIN
SELECT salary, empid FROM dbo.Employees
if ##ROWCOUNT <> 9 BREAK;
END
Normally, this query should return 9 records we inserted in the first code sample. However, very soon, I see 8 records being returned. This query reads all it's data from the "idx_salary" index, which is being updated by previous sample code.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
Do I interpret this behavior correctly? Does this mean, that even non-clustered index keys should not be updated?
UPDATE:
To solve this problem, you only need to enable "snapshots" on the db (READ_COMMITTED_SNAPSHOT ON). No more deadlocking or missing rows. I've tried summarize all of this here: http://blog.konstantins.net/2015/01/missing-rows-after-updating-sql-server.html
UPDATE 2:
This seems to be the very same problem, as in this good old article: http://blog.codinghorror.com/deadlocked/
Do I interpret this behavior correctly?
Yes.
Does this mean, that even non-clustered index keys should not be updated?
No. You should use a proper isolation level or make the application tolerate the inconsistencies that READ COMMITTED allows.
This issue of missing rows is not limited to clustered indexes. It is caused by moving a row in a b-tree. Clustered and nonclustered indexes are implemented as b-trees with only tiny physical differences between them.
So you are seeing the exact same physical phenomenon. It applies every time your query reads a range of rows from a b-tree. The contents of that range can move around.
Use an isolation level that provides you the guarantees that you need. For read-only transactions the snapshot isolation level is usually a very elegant and total solution to concurrency. It seems to apply to your case.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
This is an understandable request. On the other hand you specifically requested a low level of isolation. You can dial all the way up to SERIALIZABLE of you want. SERIALIZABLE presents you as-if serial execution.
Missing rows are just one special case of the many effects that READ COMMITTED allows. It makes no sense to specifically prevent them while allowing all kinds of other inconsistencies.
SET NOCOUNT ON;
USE TestIndexColUpdate;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE #c INT
WHILE 1 = 1
BEGIN
DECLARE #count INT
SELECT #count = COUNT(*) FROM dbo.Employees WITH (INDEX (idx_salary))
WHERE empid > '' AND CONVERT(NVARCHAR(MAX), empid) > '__'
AND salary > 0
if #count <> 9 BREAK;
END

Changing Datatype from int to bigint for tables containing billions of rows

I have couple tables with millions, and in some table billions, of rows, with one column as int now I am changing to bigint. I tried changing datatype using SSMS and it failed after a couple of hours as transaction log full.
Another approach I took is to create a new column and started updating value from old column to new column in batches, by setting ROWCOUNT property to 100000, it works but it very slow and it claims full server memory. With this approach, it may take a couple of days to complete, and it won't be acceptable in production.
What is the fast\best way to change datatype? The source column is not identity column and duplicate, and null is allowed. The table has an index on other columns, shall disabling index will speed up the process? Will adding Begin Tran and Commit help?
I ran a test for the ALTER COLUMN that shows the actual time required to make the change. The results show that the ALTER COLUMN is not instantaneous, and the time required grows linearly.
RecordCt Elapsed Mcs
----------- -----------
10000 184019
100000 1814181
1000000 18410841
My recommendation would be to batch it as you suggested. Create a new column, and pre-populate the column over time using a combination of ROWCOUNT and WAITFOR.
Code your script so that the WAITFOR value is read from a table. That way you can modify the WAITFOR value on-the-fly as your production server starts to bog down. You can shorten the WAITFOR during off-peak hours. (You can even use DMVs to make your WAITFOR value automatic, but this is certainly more complex.)
This is a complex update that will require planning and a lot of babysitting.
Rob
Here is the ALTER COLUMN test code.
USE tempdb;
SET NOCOUNT ON;
GO
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.TestTable'))
DROP TABLE dbo.TestTable;
GO
CREATE TABLE dbo.TestTable (
ColID int IDENTITY,
ColTest int NULL,
ColGuid uniqueidentifier DEFAULT NEWSEQUENTIALID()
);
GO
INSERT INTO dbo.TestTable DEFAULT VALUES;
GO 10000
UPDATE dbo.TestTable SET ColTest = ColID;
GO
DECLARE #t1 time(7) = SYSDATETIME();
DECLARE #t2 time(7);
ALTER TABLE dbo.TestTable ALTER COLUMN ColTest bigint NULL;
SET #t2 = SYSDATETIME();
SELECT
MAX(ColID) AS RecordCt,
DATEDIFF(mcs, #t1, #t2) AS [Elapsed Mcs]
FROM dbo.TestTable;
a simple alter table <table> alter column <column> bigint null should take basically no time. there won't be any conversion issues or null checks - i don't see why this wouldn't be relatively instant
if you do it through the GUI, it'll probably try to create a temp table, drop the existing table, and create a new one - definitely don't do that
In SQL Server 2016+, this alter table <table> alter column <column> bigint null statement will be a simple metadata change (instant) if the table is fully compressed.
More info here from #Paul White:
https://sqlperformance.com/2020/04/database-design/new-metadata-column-changes-sql-server-2016
Compression must be enabled:
On all indexes and partitions, including the base heap or clustered index.
Either ROW or PAGE compression.
Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
Changing from NULL to NOT NULL is not allowed.
The following integer type changes are supported:
smallint to integer or bigint.
integer to bigint.
smallmoney to money (uses integer representation internally).
The following string and binary type changes are supported:
char(n) to char(m) or varchar(m)
nchar(n) to nchar(m) or nvarchar(m)
binary(n) to binary(m) or varbinary(m)
All of the above only for n < m and m != max
Collation changes are not allowed

SQL Server bit column constraint, 1 row = 1, all others 0

I have a bit IsDefault column. Only one row of data within the table may have this bit column set to 1, all the others must be 0.
How can I enforce this?
All versions:
Trigger
Indexed view
Stored proc (eg test on write)
SQL Server 2008: a filtered index
CREATE UNIQUE INDEX IX_foo ON bar (MyBitCol) WHERE MyBitCol = 1
Assuming your PK is a single, numeric column, you could add a computed column to your table:
ALTER TABLE YourTable
ADD IsDefaultCheck AS CASE IsDefault
WHEN 1 THEN -1
WHEN 0 THEN YourPK
END
Then create a unique index on the computed column.
CREATE UNIQUE INDEX IX_DefaultCheck ON YourTable(IsDefaultCheck)
I think the trigger is the best idea if you want to change the old default record to 0 when you insert/update a new one and if you want to make sure one record always has that value (i.e. if you delete the record with the value you would assign it to a different record). You would have to decide on the rules for doing so. These triggers can be tricky because you have to account for multiple records in the inserted and deleted tables. So if 3 records in a batch try to update to become the default record, which one wins?
If you want to make sure the one default record never changes when someone else tries to change it, the filtered index is a good idea.
Different approaches can be taken here, but I think only two are correct. But lets do it step by step.
We have table Hierachy table in which we have Root column. This column tells us what row is currently the starting point. As in question asked, we want to have only one starting point.
We think that we can do it with:
Constraint
Indexed View
Trigger
Different table and relation
Constraint
In this approach first we need to create function which will do the job.
CREATE FUNCTION [gt].[fnOnlyOneRoot]()
RETURNS BIT
BEGIN
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
RETURN #result
END
GO
And then the constraint:
ALTER TABLE [gt].[Hierarchy] WITH CHECK ADD CONSTRAINT [ckOnlyOneRoot] CHECK (([gt].[fnOnlyOneRoot]()=(1)))
Unfortunately approach is wrong as this constraint won't allow us to change any values in the table. It need to have exactly one root marked (insert with Root=1 will throw exception, and update with set Root=0 also)
We could change the fnOnyOneRoot to allow having 0 selected roots but it not what we wanted.
Index
Index will remove all rows which are defined in the where clause and on the rest data will setup unique constraint. We have different options here:
- Root can be nullable and we can add in where Root!=0 and Root is not null
- Root must have value and we can add only in where Root!=0
- and different combinations
CREATE UNIQUE INDEX ix_OnyOneRoot ON [gt].[Hierarchy](Root) WHERE Root !=0 and Root is not null
This approach also is not perfect. Maximum one Root will be forced, but minimum not. To update data we need to set previous rows to null or 0.
Trigger
We can do two kinds of trigger both behaves differently
- Prevent trigger - which won't allow us to put wrong data
- DoTheJob trigger - which in background will update data for us
Prevent trigger
This is basically the same as constraint, if we want to force only one root than we cannot update or insert.
CREATE TRIGGER tOnlyOneRoot
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #rootAmount TINYINT
DECLARE #result BIT
SELECT #rootAmount=COUNT(1) FROM [gt].[Hierarchy] WHERE [Root]=1
IF #rootAmount=1
set #result=1
ELSE
set #result=0
IF #result=0
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
GO
DoTheJob trigger
This trigger will check for all inserted/updated rows and if more than one Root will be passed it will throw exception. In other case, so if one new Root will be updated or inserted, trigger will allow to do it and after operation it will change Root value for all other rows to 0.
CREATE TRIGGER tOnlyOneRootDoTheJob
ON [gt].[Hierarchy]
AFTER INSERT, UPDATE
AS
DECLARE #insertedCount TINYINT
SELECT #insertedCount = COUNT(1) FROM inserted WHERE [Root]=1
if (#insertedCount > 1)
BEGIN
RAISERROR ('Only one root',0,0);
ROLLBACK TRANSACTION
RETURN
END
DECLARE #newRootId INT
SELECT #newRootId = [HierarchyId] FROM inserted WHERE [Root]=1
UPDATE [gt].[Hierarchy] SET [Root]=0 WHERE [HierarchyId] <> #newRootId
GO
This is the solution we tried to achieve. Only one root rule is always meet. (Additional trigger for Delete should be done)
Different table and relation
This is lets say more normalized way. We create new table allow only to have one row (using the options described above) and we join.
CREATE TABLE [gt].[HierarchyDefault](
[HierarchyId] INT PRIMARY KEY NOT NULL,
CONSTRAINT FK_HierarchyDefault_Hierarchy FOREIGN KEY (HierarchyId) REFERENCES [gt].[Hierarchy](HierarchyId)
)
Does it will hit the performance?
With one column
SET STATISTICS TIME ON;
SELECT [HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
With join:
SET STATISTICS TIME ON;
SELECT h.[HierarchyId],[ParentHierarchyId],[Root]
FROM [gt].[Hierarchy] h
INNER JOIN [gt].[HierarchyDefault] hd on h.[HierarchyId]=hd.[HierarchyId]
WHERE [root]=1
SET STATISTICS TIME OFF;
Result
CPU time = 0 ms, elapsed time = 0 ms.
Summary
I will use the trigger. It is some magic in the table, but it did all job under the hood.
Easy table creation:
CREATE TABLE [gt].[Hierarchy](
[HierarchyId] INT PRIMARY KEY IDENTITY(1,1),
[ParentHierarchyId] INT NULL,
[Root] BIT
CONSTRAINT FK_Hierarchy_Hierarchy FOREIGN KEY (ParentHierarchyId)
REFERENCES [gt].[Hierarchy](HierarchyId)
)
You could apply an Instead of Insert trigger and check the value as it's coming in.
Create Trigger TRG_MyTrigger
on MyTable
Instead of Insert
as
Begin
--Check to see if the row is marked as active....
If Exists(Select * from inserted where IsDefault= 1)
Begin
Update Table Set IsDefault=0 where ID= (select ID from inserted);
insert into Table(Columns)
select Columns from inserted
End
End
Alternatively you could apply a unique constraint on the column.
The accepted answer to the below question is both interesting and relevant:
Constraint for only one record marked as default
"But the serious relational folks will tell you this information
should just be in another table."
Have a separate 1 row table that tells you which record is 'default'. Anon touched on this in his comment.
I think this is the best approach - simple, clean & doesn't require a 'clever' esoteric solution prone to errors or later misunderstanding. You can even drop the IsDefualt column.

Using a trigger to simulate a second identity column in SQL Server 2005

I have various reasons for needing to implement, in addition to the identity column PK, a second, concurrency safe, auto-incrementing column in a SQL Server 2005 database. Being able to have more than one identity column would be ideal, but I'm looking at using a trigger to simulate this as close as possible to the metal.
I believe I have to use a serializable isolation level transaction in the trigger. Do I go about this like Ii would use such a transaction in a normal SQL query?
It is a non-negotiable requirement that the business meaning of the second incrementing column remain separated from the behind the scenes meaning of the first, PK, incrementing column.
To put things as simply as I can, if I create JobCards '0001', '0002', and '0003', then delete JobCards '0002' and '0003', the next Jobcard I create must have ID '0002', not '0004'.
Just an idea, if you have 2 "identity" columns, then surely they would be 'in sync' - if not exactly the same value, then would differ by a constant value. If so, then why not add the "second identity" column as a COMPUTED column, which offsets the primary identity? Or is my logic flawed here?
Edit : As per Martin's comment, note that your calc might need to be N * id + C, where N is the Increment and C the offset / delta - excuse my rusty maths.
For example:
ALTER TABLE MyTable ADD OtherIdentity AS Id * 2 + 1;
Edit
Note that for Sql 2012 and later, that you can now use an independent sequence to create two or more independently incrementing columns in the same table.
Note: OP has edited the original requirement to include reclaiming sequences (noting that identity columns in SQL do not reclaim used ID's once deleted).
I would disallow all the deletes from this table altogether. Instead of deleting, I would mark rows as available or inactive. Instead of inserting, I would first search if there are inactive rows, and reuse the one with the smallest ID if they exist. I would insert only if there are no available rows already in the table.
Of course, I would serialize all inserts and deletes with sp_getapplock.
You can use a trigger to disallow all deletes, it is simpler than filling gaps.
A solution to this issue from "Inside Microsoft SQL Server 2008: T-SQL Querying" is to create another table with a single row that holds the current max value.
CREATE TABLE dbo.Sequence(
val int
)
Then to allocate a range of sufficient size for your insert
CREATE PROC dbo.GetSequence
#val AS int OUTPUT,
#n as int =1
AS
UPDATE dbo.Sequence
SET #val = val = val + #n;
SET #val = #val - #n + 1;
This will block other concurrent attempts to increment the sequence until the first transaction commits.
For a non blocking solution that doesn't handle multi row inserts see my answer here.
This is probably a terrible idea, but it works in at least a limited use scenario
Just use a regular identity and reseed on deletes.
create table reseedtest (
a int identity(1,1) not null,
name varchar(100)
)
insert reseedtest values('erik'),('john'),('selina')
select * from reseedtest
go
CREATE TRIGGER TR_reseedtest_D ON reseedtest FOR DELETE
AS
BEGIN TRAN
DECLARE #a int
SET #a = (SELECT TOP 1 a FROM reseedtest WITH (TABLOCKX, HOLDLOCK))
--anyone know another way to lock a table besides doing something to it?
DBCC CHECKIDENT(reseedtest, reseed, 0)
DBCC CHECKIDENT(reseedtest, reseed)
COMMIT TRAN
GO
delete reseedtest where a >= 2
insert reseedtest values('katarina'),('david')
select * from reseedtest
drop table reseedtest
This won't work if you are deleting from the "middle of the stack" as it were, but it works fine for deletes from the incrementing end.
Reseeding once to 0 then again is just a trick to avoid having to calculate the correct reseed value.
if you never delete from the table, you could create a view with a materialized column that uses ROW_NUMBER().
ALSO, a SQL Server identity can get out of sync with a user generated one, depending on the use of rollback.

Resources