Best way to get totally sequential int values in SQL Server - sql-server

I have a business requirement that the InvoiceNumber field in my Invoices table be totally sequential - no gaps or the auditors might think our accountants are up to something fishy!
My first thought was to simply use the primary key (identity) but if a transaction is rolled back a gap appears in the sequence.
So my second thought is to use a trigger which, at the point of insert, looks for the highest InvoiceNumber value in the table, adds 1 to it, and uses it as the InvoiceNumber for the new row. Easy to implement.
Are there potential issues with near-simultaneous inserts? For example, might two near simultaneous inserts running the trigger at the same time get the same 'currently highest InvoiceNumber' value and therefore insert rows with the same InvoiceNumber?
Are there other issues I might be missing? Would another approach be better?

Create a table which keeps tracks of 'counters'.
For your invoices, you can add some record to that table which keeps track of the next integer that must be used.
When creating an invoice, you should use that value, and increase it. When your transaction is rolled back, the update to that counter will be rollbacked as well. (Make sure that you put a lock on that table, to be sure that no other process can use the same value).
This is much more reliable than looking at the highest current counter that is being used in your invoice table.

You may still get gaps if data gets deleted from the table. But if data only goes in and not out, then with proper use of transactions on an external sequence table, it should be possible to do this nicely. Don't use MAX()+1 because it can have timing issues, or you may have to lock more of the table (page/table) than required.
Have a sequential table that has only one single record and column. Retrieve numbers from the table atomically, wrapping the retrieval and usage in a single transaction.
begin tran
declare #next int
update seqn_for_invoice set #next=next=next+1
insert invoice (invoicenumber,...) value (#next, ....)
commit
The UPDATE statement is atomic and cannot be interrupted, and the double assignment make the value of #next atomic. It is equivalent to using an OUTPUT clause in SQL Server 2005+ to return the updated value. If you need a range of numbers in one go, it is easier to use the PRE-update value rather than the POST-update value, i.e.
begin tran
declare #next int
update seqn_for_invoice set #next=next, next=next+3 -- 3 in one go
insert invoice (invoicenumber,...) value (#next, ....)
insert invoice (invoicenumber,...) value (#next+1, ....)
insert invoice (invoicenumber,...) value (#next+2, ....)
commit
Reference for SQL Server UPDATE statement
SET #variable = column = expression sets the variable to the same value as the column. This differs from SET #variable = column, column = expression, which sets the variable to the pre-update value of the column.

CREATE TABLE dbo.Sequence(
val int
)
Insert a row with an initial seed. Then to allocate a range of sufficient size for your insert (call it in the same transaction obviously)
CREATE PROC dbo.GetSequence
#val AS int OUTPUT,
#n as int =1
AS
UPDATE dbo.Sequence
SET #val = val = val + #n;
SET #val = #val - #n + 1;
This will block other concurrent attempts to increment the sequence until the first transaction commits.

Related

SQL server GetDate in trigger called sequentially has the same value

I have a trigger on a table for insert, delete, update that on the first line gets the current date with GetDate() method.
The trigger will compare the deleted and inserted table to determine what field has been changed and stores in another table the id, datetime and the field changed. This combination must be unique
A stored procedure does an insert and an update sequentially on the table. Sometimes I get a violation of primary key and I suspect that the GetDate() returns the same value.
How can I make the GetDate() return different values in the trigger.
EDIT
Here is the code of the trigger
CREATE TRIGGER dbo.TR
ON table
FOR DELETE, INSERT, UPDATE
AS
BEGIN
SET NoCount ON
DECLARE #dt Datetime
SELECT #dt = GetDate()
insert tableLog (id, date, field, old, new)
select I.id, #dt, 'field', D.field, I.field
from INSERTED I LEFT JOIN DELETED D ON I.id=D.id
where IsNull(I.field, -1) <> IsNull(D.field, -1)
END
and the code of the calls
...
insert into table ( anotherfield)
values (#anotherfield)
if ##rowcount=1 SET #ID=##Identity
...
update table
set field = #field
where Id = #ID
...
Sometimes the GetDate() between the 2 calls (insert and update) takes 7 milliseconds and sometimes it has the same value.
That's not exactly full solution but try using SYSDATETIME instead and of course make sure that target table can store up datetime2 up to microseconds.
Note that you can't force different datetime regardless of precision (unless you will start counting up to ticks) as stuff can just happen at the same time wihthin given precision.
If stretching up to microseconds won't solve the issue on practical level, I think you will have to either redesign this logging schema (perhaps add identity column on top of what you have) or add some dirty trick - like make this insert in try catch block and add like microsecond (nanosecond?) in a loop until you insert successfully. Definitely not s.t. I would recommend.
Look at this answer: SQL Server: intrigued by GETDATE()
If you are inserting multiple ROWS, they will all use the same value of GetDate(), so you can try wrapping it in a UDF to get unique values. But as I said, this is just a guess unless you post the code of your trigger so we can see what you are actually doing?
It sounds like you're trying to create an audit trail - but now you want to forge some of the entries?
I'd suggest instead adding a rowversion column to the table and including that in your uniqueness criteria - either instead of or as well as the datetime value that is being recorded.
In this way, even if two rows are inserted with identical date/time data, you can still tell the actual insertion order.

Generating Unique Random Numbers Efficiently

We are using the technique outlined here to generate random record IDs without collisions. In short, we create a randomly-ordered table of every possible ID, and mark each record as 'Taken' as it is used.
I use the following Stored Procedure to obtain an ID:
ALTER PROCEDURE spc_GetId #retVal BIGINT OUTPUT
AS
DECLARE #curUpdate TABLE (Id BIGINT);
SET NOCOUNT ON;
UPDATE IdMasterList SET Taken=1
OUTPUT DELETED.Id INTO #curUpdate
WHERE ID=(SELECT TOP 1 ID FROM IdMasterList WITH (INDEX(IX_Taken)) WHERE Taken IS NULL ORDER BY SeqNo);
SELECT TOP 1 #retVal=Id FROM #curUpdate;
RETURN;
The retrieval of the ID must be an atomic operation, as simultaneous inserts are possible.
For large inserts (10+ million), the process is quite slow, as I must pass through the table to be inserted via a cursor.
The IdMasterList has a schema:
SeqNo (BIGINT, NOT NULL) (PK) -- sequence of ordered numbers
Id (BIGINT) -- sequence of random numbers
Taken (BIT, NULL) -- 1 if taken, NULL if not
The IX_Taken index is:
CREATE NONCLUSTERED INDEX (IX_Taken) ON IdMasterList (Taken ASC)
I generally populate a table with Ids in this manner:
DECLARE #recNo BIGINT;
DECLARE #newId BIGINT;
DECLARE newAdds CURSOR FOR SELECT recNo FROM Adds
OPEN newAdds;
FETCH NEXT FROM newAdds INTO #recNo;
WHILE ##FETCH_STATUS=0 BEGIN
EXEC spc_GetId #newId OUTPUT;
UPDATE Adds SET id=#newId WHERE recNo=#recNo;
FETCH NEXT FROM newAdds INTO #id;
END;
CLOSE newAdds;
DEALLOCATE newAdds;
Questions:
Is there any way I can improve the SP to extract Ids faster?
Would a conditional index improve peformance (I've yet to test, as
IdMasterList is very big)?
Is there a better way to populate a table with these Ids?
As with most things in SQL Server, if you are using cursors, you are doing it wrong.
Since you are using SQL Server 2012, you can use a SEQUENCE to keep track of what random value you already used and effectively replace the Taken column.
CREATE SEQUENCE SeqNoSequence
AS bigint
START WITH 1 -- Start with the first SeqNo that is not taken yet
CACHE 1000; -- Increase the cache size if you regularly need large blocks
Usage:
CREATE TABLE #tmp
(
recNo bigint,
SeqNo bigint
)
INSERT INTO #tmp (recNo, SeqNo)
SELECT recNo,
NEXT VALUE FOR SeqNoSequence
FROM Adds
UPDATE Adds
SET id = m.id
FROM Adds a
INNER JOIN #tmp tmp ON a.recNo = tmp.recNo
INNER JOIN IdMasterList m ON tmp.SeqNo = m.SeqNo
SEQUENCE is atomic. Subsequent calls to NEXT VALUE FOR SeqNoSequence are guaranteed to return unique values, even for parallel processes. Note that there can be gaps in SeqNo, but it's a very small trade off for the huge speed increase.
Put a PK inden of BigInt on each table
insert into user (name)
values ().....
update user set = user.ID = id.ID
from id
left join usr
on usr.PK = id.PK
where user.ID = null;
one
insert into user (name) value ("justsaynotocursor");
set #PK = select select SCOPE_IDENTITY();
update user set ID = (select ID from id where PK = #PK);
Few ideas that came to my mind:
Try if removing the top, inner select etc. helps to improve the performance of the ID fetching (look at statistics io & query plan):
UPDATE top(1) IdMasterList
SET #retVal = Id, Taken=1
WHERE Taken IS NULL
Change the index to be a filtered index, since I assume you don't need to fetch numbers that are taken. If I remember correctly, you can't do this for NULL values, so you would need to change the Taken to be 0/1.
What actually is your problem? Fetching single IDs or 10+ million IDs? Is the problem CPU / I/O etc. caused by the cursor & ID fetching logic, or are the parallel processes being blocked by other processes?
Use sequence object to get the SeqNo. and then fetch the Id from idMasterList using the value returned by it. This could work if you don't have gaps in IdMasterList sequences.
Using READPAST hint could help in blocking, for CPU / I/O issues, you should try to optimize the SQL.
If the cause is purely the table being a hotspot, and no other easy solutions seem to help, split it into several tables and use some kind of simple logic (even ##spid, rand() or something similar) to decide from which table the ID should be fetched. You would need more checking if all tables have free numbers, but it shouldn't be that bad.
Create different procedures (or even tables) to handle fetching of single ID, hundreds of IDs and millions of IDs.

SQL Server select for update

I am struggling to find a SQL Server replacement for select for update that works.
I have a master table that contains a column which is used for next order number. The application does a select from update on this row, reads the current value (while locked) adds one to this value and then updates the row, then uses the number it received. This process works perfectly on all databases I've tried but for SQL Server which does not seem to have any process for selecting data for exclusive use.
How do I do a locked read and update of something like a next order number from a sequence table is SQL Server?
BTW, I know I can use things like IDENTITY cols and stuff, to do this, but in this case I must read from this existing column. Get the value and inc it, and do it in a safe locked manner to avoid 2 users getting the same value.
UPDATE::
Thank you, that works for this case :)
DECLARE #Output char(30)
UPDATE scheme.sysdirm
SET #Output = key_value = cast(key_value as int)+1
WHERE system_key='OPLASTORD'
SELECT #Output
I have one other place I do something similar. I read and lock a stock record too.
SELECT STOCK
FROM PRODUCT
WHERE ID = ? FOR UPDATE.
I then do some validation and the do
UPDATE PRODUCT SET STOCK = ?
WHERE ID=?
I can't just use your above method here, as the value I update is based on things I do from the stock I read. But I need to ensure no one else can mess with the stock while I do this. Again, easy on other DB's with SELECT FOR UPDATE... is there a SQL Server workaround?? :)
You can simple do an UPDATE that also reads out the new value into a SQL Server variable:
DECLARE #Output INT
UPDATE dbo.YourTable
SET #Output = YourColumn = YourColumn + 1
WHERE ID = ????
SELECT #Output
Since it's an atomic UPDATE statement, it's safe against concurrency issues (since only one connection can get an update locks at any one given time). A potential second session that wants to get the incremented value at the same time will have to wait until the first one completes, thus getting the next value from the table.
As an alternative you can use the OUTPUT clause of the UPDATE statement, although this will insert into a table variable.
Create table YourTable
(
ID int,
YourColumn int
)
GO
INSERT INTO YourTable VALUES (1, 1)
GO
DECLARE #Output TABLE
(
YourColumn int
)
UPDATE YourTable
SET YourColumn = YourColumn + 1
OUTPUT inserted.YourColumn INTO #Output
WHERE ID = 1
SELECT TOP 1 YourColumn
FROM #Output
**** EDIT
If you want to ensure that no-one can change the data after you have read it, you can use a repeatable read. You should be aware that any reads of any tables you do will be locked for Update (pessimistic locking) and may cause Deadlocking. You can also sue the SELECT ... FROM TABLE (UPDLOCK) hint within a transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT STOCK
FROM PRODUCT
WHERE ID = ?
.....
...
UPDATE Product
SET Stock = nnn
WHERE ID = ?
COMMIT TRANSACTION

Delete large amount of data in sql server

Suppose that I have a table with 10000000 record. What is difference between this two solution?
delete data like :
DELETE FROM MyTable
delete all of data with a application row by row :
DELETE FROM MyTable WHERE ID = #SelectedID
Is the first solution has best performance?
what is the impact on log and performance?
If you need to restrict to what rows you need to delete and not do a complete delete, or you can't use TRUNCATE TABLE (e.g. the table is referenced by a FK constraint, or included in an indexed view), then you can do the delete in chunks:
DECLARE #RowsDeleted INTEGER
SET #RowsDeleted = 1
WHILE (#RowsDeleted > 0)
BEGIN
-- delete 10,000 rows a time
DELETE TOP (10000) FROM MyTable [WHERE .....] -- WHERE is optional
SET #RowsDeleted = ##ROWCOUNT
END
Generally, TRUNCATE is the best way and I'd use that if possible. But it cannot be used in all scenarios. Also, note that TRUNCATE will reset the IDENTITY value for the table if there is one.
If you are using SQL 2000 or earlier, the TOP condition is not available, so you can use SET ROWCOUNT instead.
DECLARE #RowsDeleted INTEGER
SET #RowsDeleted = 1
SET ROWCOUNT 10000 -- delete 10,000 rows a time
WHILE (#RowsDeleted > 0)
BEGIN
DELETE FROM MyTable [WHERE .....] -- WHERE is optional
SET #RowsDeleted = ##ROWCOUNT
END
If you have that many records in your table and you want to delete them all, you should consider truncate <table> instead of delete from <table>. It will be much faster, but be aware that it cannot activate a trigger.
See for more details (this case sql server 2000):
http://msdn.microsoft.com/en-us/library/aa260621%28SQL.80%29.aspx
Deleting the table within the application row by row will end up in long long time, as your dbms can not optimize anything, as it doesn't know in advance, that you are going to delete everything.
The first has clearly better performance.
When you specify DELETE [MyTable] it will simply erase everything without doing checks for ID. The second one will waste time and disk operation to locate a respective record each time before deleting it.
It also gets worse because every time a record disappears from the middle of the table, the engine may want to condense data on disk, thus wasting time and work again.
Maybe a better idea would be to delete data based on clustered index columns in descending order. Then the table will basically be truncated from the end at every delete operation.
Option 1 will create a very large transaction and have a big impact on the log / performance, as well as escalating locks so that the table will be unavailable.
Option 2 will be slower, although it will generate less impact on the log (assuming bulk / full mode)
If you want to get rid of all the data, Truncate Table MyTable would be faster than both, although it has no facility to filter rows, it does a meta data change at the back and basically drops the IAM on the floor for the table in question.
The best performance for clearing a table would bring TRUNCATE TABLE MyTable. See http://msdn.microsoft.com/en-us/library/ms177570.aspx for more verbose explaination
Found this post on Microsoft TechNet.
Basically, it recommends:
By using SELECT INTO, copy the data that you want to KEEP to an intermediate table;
Truncate the source table;
Copy back with INSERT INTO from intermediate table, the data to the source table;
..
BEGIN TRANSACTION
SELECT *
INTO dbo.bigtable_intermediate
FROM dbo.bigtable
WHERE Id % 2 = 0;
TRUNCATE TABLE dbo.bigtable;
SET IDENTITY_INSERT dbo.bigTable ON;
INSERT INTO dbo.bigtable WITH (TABLOCK) (Id, c1, c2, c3)
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;
SET IDENTITY_INSERT dbo.bigtable OFF;
ROLLBACK TRANSACTION
The first will delete all the data from the table and will have better performance that your second who will delete only data from a specific key.
Now if you have to delete all the data from the table and you don't rely on using rollback think of the use a truncate table

Validating UPDATE and INSERT statements against an entire table

I'm looking for the best way to go about adding a constraint to a table that is effectively a unique index on the relationship between the record and the rest of the records in that table.
Imagine the following table describing the patrols of various guards (from the previous watchman scenario)
PK PatrolID Integer
FK GuardID Integer
Starts DateTime
Ends DateTime
We start with a constraint specifying that the start and end times must be logical:
Ends >= Starts
However I want to add another logical constraint: A specific guard (GuardID) cannot be in two places at the same time, meaning that for any record the period specified by Start/Ends should not overlap with the period defined for any other patrol by the same guard.
I can think of two ways of trying to approach this:
Create an INSTEAD OF INSERT trigger. This trigger would then use cursors to go through the INSERTED table, checking each record. If any record conflicted with an existing record, an error would be raised. The two problems I have with this approach are: I dislike using cursors in a modern version of SQL Server, and I'm not sure how to go about implimenting the same logic for UPDATEs. There may also be the complexity of records within INSERTED conflicting with each other.
The second, seemingly better, approach would be to create a CONSTRAINT that calls a user defined function, passing the PatrolID, GuardID, Starts and Ends. The function would then do a WHERE EXISTS query checking for any records that overlap the GuardID/Starts/Ends parameters that are not the original PatrolID record. However I'm not sure of what potential side effects this approach might have.
Is the second approach better? Does anyone see any pitfalls, such as when inserting/updating multiple rows at once (here I'm concerned because rows within that group could conflict, meaning the order they are "inserted" makes a difference). Is there a better way of doing this (such as some fancy INDEX trick?)
Use an after trigger to check that the overlap constraint has not been violated:
create trigger Patrol_NoOverlap_AIU on Patrol for insert, update as
begin
if exists (select *
from inserted i
inner join Patrol p
on i.GuardId = p.GuardId
and i.PatrolId <> p.PatrolId
where (i.Starts between p.starts and p.Ends)
or (i.Ends between p.Starts and p.Ends))
rollback transaction
end
NOTE: Rolling back a transaction within a trigger will terminate the batch. Unlike a normal contraint violation, you will not be able to catch the error.
You may want a different where clause depending on how you define the time range and overlap. For instance if you want to be able to say Guard #1 is at X from 6:00 to 7:00 then Y 7:00 to 8:00 the above would not allow. You would want instead:
create trigger Patrol_NoOverlap_AIU on Patrol for insert, update as
begin
if exists (select *
from inserted i
inner join Patrol p
on i.GuardId = p.GuardId
and i.PatrolId <> p.PatrolId
where (p.Starts <= i.Starts and i.Starts < p.Ends)
or (p.Starts <= i.Ends and i.Ends < p.Ends))
rollback transaction
end
Where Starts is the time the guarding starts and Ends is the infinitesimal moment after guarding ends.
The simplest way would be to use a stored procedure for the inserts. The stored procedure can do the insert in a single statement:
insert into YourTable
(GuardID, Starts, Ends)
select #GuardID, #Starts, #Ends
where not exists (
select *
from YourTable
where GuardID = #GuardID
and Starts <= #Ends
and Ends >= #Start
)
if ##rowcount <> 1
return -1 -- Failure
In my experience triggers and constraints with UDF's tend to become very complex. They have side effects that can require a lot of debugging to figure out.
Stored procedures just work, and they have the added advantage that you can deny INSERT permissions to clients, giving you fine-grained control over what enters your database.
CREATE TRIGGER [dbo].[emaill] ON [dbo].[email]
FOR INSERT
AS
BEGIN
declare #email CHAR(50);
SELECT #email=i.email from inserted i;
IF #email NOT LIKE '%_#%_.__%'
BEGIN
print 'Triggered Fired';
Print 'Invalid Emaill....';
ROLLBACK TRANSACTION
END
END
Can be done with constraints too:
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx

Resources