Avoid filling identity gaps after migration

Avoid filling identity gaps after migration - sql-server

I have migrated a series of tables from an SQL Server 2008 to a new SQL Server 2012 machine via Transactional Replication. That works fine so far and I also replicated indexes and default values to ensure the replicated data is equal to the original one.
When I detach the new machine from replication to act standalone as a replacement of the old machine, SQL Server starts filling up all the gaps in my identity columns which may cause different troubles. I.e. my code can't rely on having the highest ID for the last inserted line anymore. Also previously deleted user-IDs are recycled which may cause confusion at users side too.
Is there a way to make the new instance of my DB acting exactly like the old replicated one? In my experience, a running instance of a database never fills up gaps of previously deleted rows.

I have never seen this either. However it could be that the table gets reseeded to 0 when it gets deployed. Observe this example:
SET NOCOUNT ON;
GO
CREATE TABLE dbo.fooblat(id INT IDENTITY(1,1));
GO
INSERT dbo.fooblat DEFAULT VALUES;
GO 3
DELETE dbo.fooblat WHERE id = 2;
GO
DBCC CHECKIDENT('dbo.fooblat', RESEED, 0);
GO
INSERT dbo.fooblat DEFAULT VALUES;
GO 3
SELECT id FROM dbo.fooblat ORDER BY id;
GO
DROP TABLE dbo.fooblat;
Results:
id
----
1
1
2
3
3
Now, if the identity column is also primary key or otherwise unique, this same operation could still fill in the gaps, only the duplicate violations would fail. So repeating again with this table definition:
CREATE TABLE dbo.fooblat(id INT IDENTITY(1,1) PRIMARY KEY);
GO
Yields this (the first successful insert filled the gap):
id
----
1
2
3
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_fooblat_3213E83F6BF0A0C6'. Cannot insert duplicate key in object 'dbo.fooblat'. The duplicate key value is (1).
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_fooblat_3213E83F6BF0A0C6'. Cannot insert duplicate key in object 'dbo.fooblat'. The duplicate key value is (3).
You can avoid this situation by ensuring that the table is seeded correctly when it is put on the other system. This may just mean using a more reliable deployment technique, or it may mean - after populating and before letting any users in - you do something like this:
DECLARE #i INT, #sql NVARCHAR(MAX);
SELECT #i = MAX(id) FROM dbo.fooblat;
SET #sql = N'DBCC CHECKIDENT(''dbo.fooblat'', RESEED, ' + RTRIM(#i) + ');';
EXEC sp_executesql #sql;

Related

Lock database table for just a couple of sentences

Suppose a table in SQLServer with this structure:
TABLE t (Id INT PRIMARY KEY)
Then I have a stored procedure, which is constantly being called, that works inserting data in this table among other kind of things:
BEGIN TRAN
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t
INSERT t VALUES (#Id)
...
-- Stuff that gets a long time to get completed
...
COMMIT
The problem with this aproach is sometimes I get a primary key violation because 2 or more procedure calls get and try to insert the same Id on the table.
I have been able to solve this problem adding a tablock in the SELECT sentence:
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t WITH (TABLOCK)
The problem now is sucessive calls to the procedure must wait to the completion of the transaction currently beeing executed to start their work, allowing just one procedure to run simultaneosly.
Is there any advice or trick to get the lock just during the execution of the select and insert sentence?
Thanks.

TABLOCK is a terrible idea, since you're serialising all the calls (no concurrency).
Note that with an SP you will retain all the locks granted over the run until the SP completes.
So you want to minimise locks except for where you really need them.
Unless you have a special case, use an internally generated id:
CREATE TABLE t (Id INT IDENTITY PRIMARY KEY)
Improved performance, concurrency etc. since you are not dependent on external tables to manage the id.
If you have existing data you can (re)set the start value using DBCC
DBCC CHECKIDENT ('t', RESEED, 100)
If you need to inject rows with a value preassigned, use:
SET IDENTITY_INSERT t ON
(and off again afterwards, resetting the seed as required).
[Consider whether you want this value to be the primary key, or simply unique.
In many cases where you need to reference a tables PK as a FK then you'll want it as PK for simplicity of join, but having a business readable value (eg, Accounting Code or OrderNo+OrderLine is completely valid) : that's just modelling]

Missing rows after updating SQL Server index key column

"T-SQL Querying" book (http://www.amazon.com/Inside-Microsoft-Querying-Developer-Reference/dp/0735626030) has an interesting example, where, querying a table under default transaction isolation level during clustered index key column update, you may miss a row or read a row twice. It looks to be acceptable, since updating table/entity key is not a good idea anyway. However, I've updated this example so that the same happens, when you update non-clustered index key column value.
Following is the table structure:
SET NOCOUNT ON;
USE master;
IF DB_ID('TestIndexColUpdate') IS NULL CREATE DATABASE TestIndexColUpdate;
GO
USE TestIndexColUpdate;
GO
IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;
CREATE TABLE dbo.Employees
(
empid CHAR(900) NOT NULL, -- this column should be big enough, so that 9 rows fit on 2 index pages
salary MONEY NOT NULL,
filler CHAR(1) NOT NULL DEFAULT('a')
);
CREATE INDEX idx_salary ON dbo.Employees(salary) include (empid); -- include empid into index, so that test query reads from it
ALTER TABLE dbo.Employees ADD CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED(empid);
INSERT INTO dbo.Employees(empid, salary) VALUES
('A', 1500.00),('B', 2000.00),('C', 3000.00),('D', 4000.00),
('E', 5000.00),('F', 6000.00),('G', 7000.00),('H', 8000.00),
('I', 9000.00);
This is what needs to be done in the first connection (on each update, the row will jump between 2 index pages):
SET NOCOUNT ON;
USE TestIndexColUpdate;
WHILE 1=1
BEGIN
UPDATE dbo.Employees SET salary = 10800.00 - salary WHERE empid = 'I'; -- on each update, "I" employee jumps between 2 pages
END
This is what needs to be done in the second connection:
SET NOCOUNT ON;
USE TestIndexColUpdate;
DECLARE #c INT
WHILE 1 = 1
BEGIN
SELECT salary, empid FROM dbo.Employees
if ##ROWCOUNT <> 9 BREAK;
END
Normally, this query should return 9 records we inserted in the first code sample. However, very soon, I see 8 records being returned. This query reads all it's data from the "idx_salary" index, which is being updated by previous sample code.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
Do I interpret this behavior correctly? Does this mean, that even non-clustered index keys should not be updated?
UPDATE:
To solve this problem, you only need to enable "snapshots" on the db (READ_COMMITTED_SNAPSHOT ON). No more deadlocking or missing rows. I've tried summarize all of this here: http://blog.konstantins.net/2015/01/missing-rows-after-updating-sql-server.html
UPDATE 2:
This seems to be the very same problem, as in this good old article: http://blog.codinghorror.com/deadlocked/

Do I interpret this behavior correctly?
Yes.
Does this mean, that even non-clustered index keys should not be updated?
No. You should use a proper isolation level or make the application tolerate the inconsistencies that READ COMMITTED allows.
This issue of missing rows is not limited to clustered indexes. It is caused by moving a row in a b-tree. Clustered and nonclustered indexes are implemented as b-trees with only tiny physical differences between them.
So you are seeing the exact same physical phenomenon. It applies every time your query reads a range of rows from a b-tree. The contents of that range can move around.
Use an isolation level that provides you the guarantees that you need. For read-only transactions the snapshot isolation level is usually a very elegant and total solution to concurrency. It seems to apply to your case.
This seems to be quite lax attitude towards data consistency from SQL Server. I would expect some locking coordination, when data is being read from index, while its key column is being updated.
This is an understandable request. On the other hand you specifically requested a low level of isolation. You can dial all the way up to SERIALIZABLE of you want. SERIALIZABLE presents you as-if serial execution.
Missing rows are just one special case of the many effects that READ COMMITTED allows. It makes no sense to specifically prevent them while allowing all kinds of other inconsistencies.
SET NOCOUNT ON;
USE TestIndexColUpdate;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE #c INT
WHILE 1 = 1
BEGIN
DECLARE #count INT
SELECT #count = COUNT(*) FROM dbo.Employees WITH (INDEX (idx_salary))
WHERE empid > '' AND CONVERT(NVARCHAR(MAX), empid) > '__'
AND salary > 0
if #count <> 9 BREAK;
END

mssql 2008 increasing id value every minute

I created primary key in my table
[id] [bigint] IDENTITY(1,1) NOT NULL
I noticed that id increasing about an every minute in last week. I'm using to check increasing:
DBCC CHECKIDENT ('Issue');
I can't find any trigger or something else. And I don't create and removing records in table. Could this be an error with MsSql?
I'm using MsSql 2008 R2.

The problem could be related to transaction rollbacks. Rollback will remove the data rows included in a transaction but it does not reset the identity value to its previous setting
check this
CREATE TABLE #ident
(
ID INT IDENTITY(1,1)
, column1 VARCHAR(20) NULL
)
INSERT #ident
VALUES ('a'),('b'),('c')
BEGIN TRAN
INSERT #ident
VALUES ('d'),('e'),('f')
ROLLBACK TRAN
INSERT #ident
VALUES ('g'),('h'),('i')
select * from #ident
and result is
ID column1
-- -------
1 a
2 b
3 c
7 g
8 h
9 i
so whenever a transaction is rolled-back identity wont be reseted

There can be situation that you added some rows and then delete.
After doing that, id will be added 1 in last row id which is deleted.

Another thing to check is that any new stored procedures are correct. I've ran into situations where people had scripted out changes and forgot a GO in between two parts, so the second part was included in the stored procedure. We were mysteriously getting records added to a table. It took a while to catch it.
In your case, it may be doing something (like seeding the data in the table), but a transaction is causing it to rollback.

SQL Server fastest way to change data types on large tables

We need to change the data type of about 10 primary keys in our db from numeric(19,0) to bigint. On the smaller tables a simple update of the datatype works just fine but on the larger tables (60-70 million rows) it takes a considerable amount of time.
What is the fastest way to achieve this, preferably without locking the database.
I've written a script that generates the following (which I believe I got from a different SO post)
--Add a new temporary column to store the changed value.
ALTER TABLE query_log ADD id_bigint bigint NULL;
GO
CREATE NONCLUSTERED INDEX IX_query_log_id_bigint ON query_log (id_bigint)
INCLUDE (id); -- the include only works on SQL 2008 and up
-- This index may help or hurt performance, I'm not sure... :)
GO
declare #count int
declare #iteration int
declare #progress int
set #iteration = 0
set #progress = 0
select #count = COUNT(*) from query_log
RAISERROR ('Processing %d records', 0, 1, #count) WITH NOWAIT
-- Update the table in batches of 10000 at a time
WHILE 1 = 1 BEGIN
UPDATE X -- Updating a derived table only works on SQL 2005 and up
SET X.id_bigint = id
FROM (
SELECT TOP 10000 * FROM query_log WHERE id_bigint IS NULL
) X;
IF ##RowCount = 0 BREAK;
set #iteration = #iteration + 1
set #progress = #iteration * 10000
RAISERROR ('processed %d of %d records', 0, 1, #progress, #count) WITH NOWAIT
END;
GO
--kill the pkey on the old column
ALTER TABLE query_log
DROP CONSTRAINT PK__query_log__53833672
GO
BEGIN TRAN; -- now do as *little* work as possible in this blocking transaction
UPDATE T -- catch any updates that happened after we touched the row
SET T.id_bigint = T.id
FROM query_log T WITH (TABLOCKX, HOLDLOCK)
WHERE T.id_bigint <> T.id;
-- The lock hints ensure everyone is blocked until we do the switcheroo
EXEC sp_rename 'query_log.id', 'id_numeric';
EXEC sp_rename 'query_log.id_bigint', 'id';
COMMIT TRAN;
GO
DROP INDEX IX_query_log_id_bigint ON query_log;
GO
ALTER TABLE query_log ALTER COLUMN id bigint NOT NULL;
GO
/*
ALTER TABLE query_log DROP COLUMN id_numeric;
GO
*/
ALTER TABLE query_log
ADD CONSTRAINT PK_query_log PRIMARY KEY (id)
GO
This works very well for the smaller tables but is extremely slow going for the very large tables.
Note this is in preparation for a migration to Postgres and the EnterpriseDB Migration toolkit doesn't seem to understand the numeric(19,0) datatype

If is not possible to change a primary key without locking. The fastest way with the least impact is to create a new table with the new columns and primary keys without foreign keys and indexes. Then batch insert blocks of data in sequential order relative to their primary key(s). When that is finished, add your indexes, then foreign keys back. Finally, drop or rename the old table and rename your new table to the systems expected table name.
In practice your approach will have to vary based on how many records are inserted, updated, and/or deleted. If you're only inserting then you can perform the initial load, and top of the table just before your swap.
This approach should provide the fastest migration, minimal logs, and very little fragmentation on your table and indexes.
You have to remember that every time you modify a record, the data is being modified, indexes are being modified, and foreign keys are being checked. All within one implicit or explicit transaction. The table and/or row(s) will be locked while all changes are made. Even if your database is set to simple logging, the server will still write all changes to the log files. Updates actually are a delete paired with an insert so it is not possible to prevent fragmentation during any other process.

Using a trigger to simulate a second identity column in SQL Server 2005

I have various reasons for needing to implement, in addition to the identity column PK, a second, concurrency safe, auto-incrementing column in a SQL Server 2005 database. Being able to have more than one identity column would be ideal, but I'm looking at using a trigger to simulate this as close as possible to the metal.
I believe I have to use a serializable isolation level transaction in the trigger. Do I go about this like Ii would use such a transaction in a normal SQL query?
It is a non-negotiable requirement that the business meaning of the second incrementing column remain separated from the behind the scenes meaning of the first, PK, incrementing column.
To put things as simply as I can, if I create JobCards '0001', '0002', and '0003', then delete JobCards '0002' and '0003', the next Jobcard I create must have ID '0002', not '0004'.

Just an idea, if you have 2 "identity" columns, then surely they would be 'in sync' - if not exactly the same value, then would differ by a constant value. If so, then why not add the "second identity" column as a COMPUTED column, which offsets the primary identity? Or is my logic flawed here?
Edit : As per Martin's comment, note that your calc might need to be N * id + C, where N is the Increment and C the offset / delta - excuse my rusty maths.
For example:
ALTER TABLE MyTable ADD OtherIdentity AS Id * 2 + 1;
Edit
Note that for Sql 2012 and later, that you can now use an independent sequence to create two or more independently incrementing columns in the same table.
Note: OP has edited the original requirement to include reclaiming sequences (noting that identity columns in SQL do not reclaim used ID's once deleted).

I would disallow all the deletes from this table altogether. Instead of deleting, I would mark rows as available or inactive. Instead of inserting, I would first search if there are inactive rows, and reuse the one with the smallest ID if they exist. I would insert only if there are no available rows already in the table.
Of course, I would serialize all inserts and deletes with sp_getapplock.
You can use a trigger to disallow all deletes, it is simpler than filling gaps.

A solution to this issue from "Inside Microsoft SQL Server 2008: T-SQL Querying" is to create another table with a single row that holds the current max value.
CREATE TABLE dbo.Sequence(
val int
)
Then to allocate a range of sufficient size for your insert
CREATE PROC dbo.GetSequence
#val AS int OUTPUT,
#n as int =1
AS
UPDATE dbo.Sequence
SET #val = val = val + #n;
SET #val = #val - #n + 1;
This will block other concurrent attempts to increment the sequence until the first transaction commits.
For a non blocking solution that doesn't handle multi row inserts see my answer here.

This is probably a terrible idea, but it works in at least a limited use scenario
Just use a regular identity and reseed on deletes.
create table reseedtest (
a int identity(1,1) not null,
name varchar(100)
)
insert reseedtest values('erik'),('john'),('selina')
select * from reseedtest
go
CREATE TRIGGER TR_reseedtest_D ON reseedtest FOR DELETE
AS
BEGIN TRAN
DECLARE #a int
SET #a = (SELECT TOP 1 a FROM reseedtest WITH (TABLOCKX, HOLDLOCK))
--anyone know another way to lock a table besides doing something to it?
DBCC CHECKIDENT(reseedtest, reseed, 0)
DBCC CHECKIDENT(reseedtest, reseed)
COMMIT TRAN
GO
delete reseedtest where a >= 2
insert reseedtest values('katarina'),('david')
select * from reseedtest
drop table reseedtest
This won't work if you are deleting from the "middle of the stack" as it were, but it works fine for deletes from the incrementing end.
Reseeding once to 0 then again is just a trick to avoid having to calculate the correct reseed value.

if you never delete from the table, you could create a view with a materialized column that uses ROW_NUMBER().
ALSO, a SQL Server identity can get out of sync with a user generated one, depending on the use of rollback.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight