This is my scenario, I have a table like this:
CREATE TABLE [MyTable]
(
[Id] BIGINT PRIMARY KEY,
[Value] NVARCHAR(100) NOT NULL,
[IndexColumnA] NVARCHAR(100) NOT NULL,
[IndexColumnB] NVARCHAR(100) NOT NULL
)
CREATE INDEX [IX_A] ON [MyTable] ([IndexColumnA])
CREATE INDEX [IX_B] ON [MyTable] ([IndexColumnB])
And have two use cases with two different update commands
UPDATE [MyTable] SET [Value] = '...' WHERE [IndexColumnA] = '...'
and
UPDATE [MyTable] SET [Value] = '...' WHERE [IndexColumnB] = '...'
Both update commands may update multiple rows and these commands caused a deadlock when executed concurrently.
My speculation is that the two update commands use different index when scanning the rows, the order placing locks on rows are therefore different. As the result, one update command may try place an U lock on a row which already has a X lock placed by another update command. (I am not a database expert, correct me if I was wrong)
One possible solution to this should be forcing database to place locks in the same order. According to https://dba.stackexchange.com/questions/257217/why-am-i-getting-a-deadlock-for-a-single-update-query, it seems we can do this by SELECT ... ORDER BY ... FOR UPDATE in PostgreSQL.
Can we (and should we) do this in SQL Server? if not, is the only solution to this is handling the deadlock in application code?
Related
I have a result table that holds the output of a large, complicated, slow running query.
It's defined something like:
create table ResultsStore (
Foo int not null,
Bar int not null,
... other fields
constraint [PK_ResultsStore_foo_bar] primary key clustered
(
Foo asc,
Bar asc
)
)
I then insert to this table with a query like:
insert ResultsStore (Foo, Bar)
output inserted.*
select subQuery.ID, #bar
from (
-- large complex slow query
) subQuery
where subQuery.ID not in (
select Foo
from ResultsStore
where Bar = #bar
)
In testing this is fine, but in production, with lots of users hitting it regularly, we often get an exception:
Violation of PRIMARY KEY constraint 'PK_ResultsStore_foo_bar'. Cannot insert duplicate key in object 'ResultsStore'.
How is this possible? Surely the where should exclude any combination of the multiple primary key fields where they are already in the table?
How to best avoid this?
As written two sessions can run the query, both checking for the existence of the row concurrently, both not finding it, then both proceeding to attempt the insert. The first one will succeed in READ COMMITED, and the second one will fail.
You need WITH (UPDLOCK, HOLDLOCK, ROWLOCK) on the subquery to avoid this race condition. At default read committed isolation level either S locks taken by the sub query or row versioning is used and no locks at all are taken.
The HOLDLOCK gives serializable semantics and protects the range. UPDLOCK forces the read to use a U lock which will block other sessions from reading with UPDLOCK.
You can also use a temp table to hold interim results and perform the final insert at the end.
The following also includes a DISTINCT (which might or might not be needed), changes the dup test to use EXISTS, and applies WITH (UPDLOCK, HOLDLOCK, ROWLOCK) options to the final insert as suggested by others.
declare #TempResults table (
Foo int not null,
Bar int not null
)
insert #TempResults
select distinct subQuery.ID, #bar
from (
-- large complex slow query
) subQuery
insert ResultsStore (Foo, Bar)
output inserted.*
select T.Foo, T.Bar
from #TempResults T
where not exists (
select *
from ResultsStore RS with (updlock, holdlock, rowlock)
where RS.Foo = T.Foo
and RS.Bar = T.Bar
)
This lets your long running query run fast and dirty (as you intend), but should maintain integrity and minimize actual lock duration for the final insert.
Suppose a table in SQLServer with this structure:
TABLE t (Id INT PRIMARY KEY)
Then I have a stored procedure, which is constantly being called, that works inserting data in this table among other kind of things:
BEGIN TRAN
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t
INSERT t VALUES (#Id)
...
-- Stuff that gets a long time to get completed
...
COMMIT
The problem with this aproach is sometimes I get a primary key violation because 2 or more procedure calls get and try to insert the same Id on the table.
I have been able to solve this problem adding a tablock in the SELECT sentence:
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t WITH (TABLOCK)
The problem now is sucessive calls to the procedure must wait to the completion of the transaction currently beeing executed to start their work, allowing just one procedure to run simultaneosly.
Is there any advice or trick to get the lock just during the execution of the select and insert sentence?
Thanks.
TABLOCK is a terrible idea, since you're serialising all the calls (no concurrency).
Note that with an SP you will retain all the locks granted over the run until the SP completes.
So you want to minimise locks except for where you really need them.
Unless you have a special case, use an internally generated id:
CREATE TABLE t (Id INT IDENTITY PRIMARY KEY)
Improved performance, concurrency etc. since you are not dependent on external tables to manage the id.
If you have existing data you can (re)set the start value using DBCC
DBCC CHECKIDENT ('t', RESEED, 100)
If you need to inject rows with a value preassigned, use:
SET IDENTITY_INSERT t ON
(and off again afterwards, resetting the seed as required).
[Consider whether you want this value to be the primary key, or simply unique.
In many cases where you need to reference a tables PK as a FK then you'll want it as PK for simplicity of join, but having a business readable value (eg, Accounting Code or OrderNo+OrderLine is completely valid) : that's just modelling]
I have a question regarding locking in TSQL. Suppose I have a the following table:
A(int id, varchar name)
where id is the primary key, but is NOT an identity column.
I want to use the following pseudocode to insert a value into this table:
lock (A)
uniqueID = GenerateUniqueID()
insert into A values (uniqueID, somename)
unlock(A)
How can this be accomplished in terms of T-SQL? The computation of the next id should be done with the table A locked in order to avoid other sessions to do the same operation at the same time and get the same id.
If you have custom logic that you want to apply in generating the ids, wrap it up into a user defined function, and then use the user defined function as the default for the column. This should reduce concurrency issue similarly to the provided id generators by deferring the generation to the point of insert and piggy backing on the insert locking behavior.
create table ids (id int, somval varchar(20))
Go
Create function GenerateUniqueID()
returns int as
Begin
declare #ret int
select #ret = max(isnull(id,1)) * 2 from ids
if #ret is null set #ret = 2
return #ret
End
go
alter table ids add Constraint DF_IDS Default(dbo.GenerateUniqueID()) for Id
There are really only three ways to go about this.
Change the ID column to be an IDENTITY column where it auto increments by some value on each insert.
Change the ID column to be a GUID with a default constraint of NEWID() or NEWSEQUENTIALID(). Then you can insert your own value or let the table generate one for you on each insert.
On each insert, start a transaction. Then get the next available ID using something like select max(id)+1 . Do this in a single sql statement if possible in order to limit the possibility of a collision.
On the whole, most people prefer option 1. It's fast, easy to implement, and most people understand it.
I tend to go with option 2 with the apps I work on simply because we tend to scale out (and up) our databases. This means we routinely have apps with a multi-master situation. Be aware that using GUIDs as primary keys can mean your indexes are routinely trashed.
I'd stay away from option 3 unless you just don't have a choice. In which case I'd look at how the datamodel is structured anyway because there's bound to be something wrong.
You use the NEWID() function and you do not need any locking mechanism
You tell a column to be IDENTITY and you do not need any locking mechanism
If you generate these IDs manually and there is a chance parallel calls could generate the same IDs then something like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
#NextID = GenerateUniqueID()
WHILE EXISTS (SELECT ID FROM A WHERE ID = #NextID)
BEGIN
#NextID = GenerateUniqueID()
END
INSERT INTO A (ID, Text) VALUES (#NextID , 'content')
COMMIT TRANSACTION
#Markus, you should look at using either IDENTITY or NEWID() as noted in the other answers. if you absolutely can't, here's an option for you...
DECLARE #NewID INT
BEGIN TRAN
SELECT #NewID = MAX(ID) + 1
FROM TableA (tablockx)
INSERT TableA
(ID, OtherFields)
VALUES (#NewID, OtherFields)
COMMIT TRAN
If you're using SQL2005+, you can use the OUTPUT clause to do what you're asking, without any kind of lock (The table Test1 simulates the table you're inserted into, and since OUTPUT requires a temp table and not a variable to hold the results, #Result will do that):
create table test1( test INT)
create table #result (LastValue INT)
insert into test1
output INSERTED.test into #result(test)
select GenerateUniqueID()
select LastValue from #result
Just to update an old post. It is now possible with SQL Server 2012 to use a feature called Sequence. Sequences are created in much the same way a function and it is possible to specify the range, direction(asc, desc) and rollover point. After which it's possible to invoke the NEXT VALUE FOR method to generate the next value in the range.
See the following documentation from Microsoft.
http://technet.microsoft.com/en-us/library/ff878091.aspx
I have a MS SQL 2008 database which stores data for creating a weighted, undirected graph. The data is stored in tables with the following structure:
[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL
where [id1] and [id2] represents the two connected nodes and [weight] the weight of the edge that connects these nodes.
There are several different algorithms, that create the graph from some basic data. For each algorithm, I want to store the graph-data in a separate table. Those tables all have the same structure (as shown above) and use a specified prefix (similarityALB, similaritybyArticle, similaritybyCategory, ...) so I can identify them as graph-tables.
The client program can select, which table (i.e. by which algorithm the graph is created) to use for the further operations.
Access to the data is done by stored procedures. As I have different tables, I would need to use a variable tablename e.g.:
SELECT id1, id2, weight FROM #tableName
This doesn't work because SQL doesn't support variable tablenames in the statement. I have searched the web and all solutions to this problem use the dynamic SQL EXEC() statement e.g.:
EXEC('SELECT id1, id2, weight FROM ' + #tableName)
As most of them mentioned, this makes the statement prone to SQL-injection, which I'd like to avoid. A simple redesign idea would be to put all the different graphs in one table and add a column to identify the different graphs.
[graphId] [int] NOT NULL,
[id1] [int] NOT NULL,
[id2] [int] NOT NULL,
[weight] [float] NOT NULL
My problem with this solution is, that the graphs can be very large depending on the used algorithm (up to 500 Million entries). I need to index the table over (id1, id2) and (id2, id1). Now putting them all in one big table would makes the table even huger (and requests slower). Adding a new graph would result in bad performance, because of the active indicees. Deleting a graph could not be done by TRUNCATE anymore, I would need to use
DELETE * FROM myTable WHERE graphId=#Id
which performs very bad with large tables and creates a very large logfile (which would exceed my disk space when the graph is big enough). So I'd like to keep the independent tables for each graph.
Any suggestions how to solve this problems by either find a way to parametrize the tablename or to redesign the database structure while avoiding the aforementioned problems?
SQL injection can easily be avoided in this case by comparing #tableName to the names of the existing tables. If it isn't one of them, it's bad input. (Obligatory xkcd reference: That is, unless you have a table called "bobby'; drop table students;")
Anyway, regarding your performance problems, with partitioned tables (since SQLServer 2005), you can have the same advantages like having several tables, but without the need for dynamic SQL.
Maybe I did not understand everything, but:
CREATE PROCEDURE dbo.GetMyData (
#TableName AS varchar(50)
)
AS
BEGIN
IF #TableName = 'Table_1'
BEGIN
SELECT id1
,id2
,[weight]
FROM dbo.Table_1
END
IF #TableName = 'Table_2'
BEGIN
SELECT id1
,id2
,[weight]
FROM dbo.Table_2
END
END
and then:
EXEC dbo.GetMyData #TableName = 'Table_1'
A different technique involves using synonyms dynamically, for example:
DECLARE #TableName varchar(50)
SET #TableName = 'Table_1'
-- drop synonym if it exists
IF object_id('dbo.MyCurrentTable', 'SN') IS NOT NULL
DROP SYNONYM MyCurrentTable ;
-- create synonym for the current table
IF #TableName = 'Table_1'
CREATE SYNONYM dbo.MyCurrentTable FOR dbo.Table_1 ;
IF #TableName = 'Table_2'
CREATE SYNONYM dbo.MyCurrentTable FOR dbo.Table_2 ;
-- use synonym
SELECT id1, id2, [weight]
FROM dbo.MyCurrentTable
Partioned Table may be the answer to your problem. I've got another idea, that's "the other way around":
each graph has it's own table (so you can still truncate table)
define a view (with the structured you mentioned for your redefined table) as a UNION ALL over all graph-tables
I have no idea of the performance of a select on this view and so on, but it may give you what you are looking for. I'd be interested in the results if try this out ..
I was just prototyping a new system for deferring certain operations until out of hours on one of our databases. I've come up with (what I think) a pretty simple schema. I was first prototyping on SQL Server 2005 Express, but have confirmed the same problem on 2008 Developer. The error I'm getting is:
Msg 8646, Level 21, State 1, Procedure
Cancel, Line 6 Unable to find index
entry in index ID 1, of table
277576027, in database 'xxxxxx'. The
indicated index is corrupt or there is
a problem with the current update
plan. Run DBCC CHECKDB or DBCC
CHECKTABLE. If the problem persists,
contact product support.
The schema I'm using is:
create schema Writeback authorization dbo
create table Deferrals (
ClientID uniqueidentifier not null,
RequestedAt datetime not null,
CompletedAt datetime null,
CancelledAt datetime null,
ResolvedAt as ISNULL(CompletedAt,CancelledAt) persisted,
constraint PK_Writeback_Deferrals PRIMARY KEY (ClientID,RequestedAt) on [PRIMARY],
constraint CK_Writeback_Deferrals_NoTimeTravel CHECK ((RequestedAt <= CompletedAt) AND (RequestedAt <= CancelledAt)),
constraint CK_Writeback_Deferrals_NoSchrodinger CHECK ((CompletedAt is null) or (CancelledAt is null))
/* TODO:FOREIGN KEY */
)
create view Pending with schemabinding as
select
ClientID
from
Writeback.Deferrals
where
ResolvedAt is null
go
alter table Writeback.Deferrals add constraint
DF_Writeback_Deferrals_RequestedAt DEFAULT CURRENT_TIMESTAMP for RequestedAt
go
create unique clustered index PK_Writeback_Pending on Writeback.Pending (ClientID)
go
create procedure Writeback.Defer
#ClientID uniqueidentifier
as
set nocount on
insert into Writeback.Deferrals (ClientID)
select #ClientID
where not exists(select * from Writeback.Pending where ClientID = #ClientID)
go
create procedure Writeback.Cancel
#ClientID uniqueidentifier
as
set nocount on
update
Writeback.Deferrals
set
CancelledAt = CURRENT_TIMESTAMP
where
ClientID = #ClientID and
CompletedAt is null and
CancelledAt is null
go
create procedure Writeback.Complete
#ClientID uniqueidentifier
as
set nocount on
update
Writeback.Deferrals
set
CompletedAt = CURRENT_TIMESTAMP
where
ClientID = #ClientID and
CompletedAt is null and
CancelledAt is null
go
And the code that provokes the error is as follows:
declare #ClientA uniqueidentifier
declare #ClientB uniqueidentifier
select #ClientA = newid(),#ClientB = newid()
select * from Writeback.Pending
exec Writeback.Defer #ClientA
select * from Writeback.Pending
exec Writeback.Defer #ClientB
select * from Writeback.Pending
exec Writeback.Cancel #ClientB --<-- Error being raised here
select * from Writeback.Pending
exec Writeback.Complete #ClientA
select * from Writeback.Pending
select * from Writeback.Deferrals
I've seen a few others encountering such problems, but they seem to either have aggregates in their views (and a message back from MS saying they'd remove the ability to create such indexed views in 2005 SP 1), or they resolved it by applying a merge join in their join clause (but I don't have one).
Initially there was no computed column in the Deferrals table, and the where clause in the view was testing the CompletedAt and CancelledAt columns for NULL separately. But I changed to the above just to see if I could provoke different behaviour.
All of my SET options look right for using indexed views, and if they weren't, I'd expect a less violent error to be thrown.
Any ideas?
you have index corruption. run checkdb and see what errors it gives you. the easiest thing you could do is to rebuild your indexes.
also take a look at this KB article if it applies to your sitution.
Also note that putting a primary key on a GUID column will create a clustered index on it which is the worst thing performance wise you could do.
I managed to work out what's causing this error, by trying to build up this script, from scratch, adding in pieces as I went.
It's some kind of bug that's produced if the view is created as part of a CREATE SCHEMA statement. If I separate the CREATE SCHEMA into it's own batch, and then create the table and view in separate batches, everything works fine.
Long overdue edit - I raised this on Connect here. It was confirmed as being an issue in SQL Server 2008.
Internal builds (in 2010) indicated it was no longer an issue, and I have (just now, 2016) confirmed that the script in the question does not generate the same error in SQL Server 2012. The fix was not back-ported to SQL Server 2008.