So I'm trying, in a single query, to only insert a row if it doesn't exist already.
My query is the following:
INSERT INTO [dbo].[users_roles] ([user_id], [role_id])
SELECT 29851, 1 WHERE NOT EXISTS (
SELECT 1 FROM [dbo].[users_roles] WHERE user_id = 29851 AND role_id = 1)
Sometimes (very rarely, but still), it generates the following error:
Violation of PRIMARY KEY constraint 'PK_USERS_ROLES'. Cannot
insert duplicate key in object 'dbo.users_roles'. The duplicate
key value is (29851, 1).
PK_USERS_ROLES is [user_id], [role_id]. Here is the full SQL of the table's schema:
create table users_roles
(
user_id int not null
constraint FK_USERS_ROLES_USER
references user,
role_id int not null
constraint FK_USERS_ROLES_USER_ROLE
references user_role,
constraint PK_USERS_ROLES
primary key (user_id, role_id)
)
Context:
This is executed by a PHP script hosted on an Apache server, and "randomly" happens once out of hundreds of occurrences (most likely concurrency-related).
More info:
SELECT ##VERSION gives:
Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64) Jun 28 2012
08:36:30 Copyright (c) Microsoft Corporation Enterprise Edition
(64-bit) on Windows NT 6.1 (Build 7601: Service Pack)
SQL Server version: SQL Server 2008 R2
Transaction Isolation level: ReadCommitted
This is executed within an explicit transaction (through PHP statements, but I figure the end result is the same)
Questions:
Could someone explain why/how this is happening?
What would be an efficient way to safely insert in one go (in other words, in a single query)?
I've seen other answers such as this one but the solutions are meant for stored procedures.
Thanks.
It might help to be explicit about this. The below runs this in an explicit transaction, locks the row explicitly.
DECLARE #user_id INT; SET #user_id=29851;
DECLARE #role_id INT; SET #role_id=1;
BEGIN TRY
BEGIN TRANSACTION;
DECLARE #exists INT;
SELECT #exists=1
FROM [dbo].[users_roles] WITH(ROWLOCK,HOLDLOCK,XLOCK)
WHERE user_id=#user_id AND role_id=#role_id;
IF #exists IS NULL
BEGIN
INSERT INTO [dbo].[users_roles] ([user_id], [role_id])
VALUES(#user_id,#role_id);
END
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH
Is this table truncated or the rows deleted in some moment? And how often? It makes sense to me that the rows should not be found in some moment, as you're running "insert if not exists", and in this moment two or more queries may hit the database to insert the same data... only one will... the other should do nothing if the row was inserted before its "not exists" look up, or fail if the row was inserted after the look up.
I have only an Oracle database right now to do some tests and I can reproduce this problem. My commit mode is explicit:
Create the empty table, the unique constraint and grant select, insert to another user:
CREATE TABLE just_a_test (val NUMBER(3,0));
ALTER TABLE just_a_test ADD CONSTRAINT foobar UNIQUE (val);
GRANT SELECT, INSERT ON just_a_test TO user2;
DB session on user1:
INSERT INTO just_a_test
SELECT 10
FROM DUAL
WHERE NOT EXISTS
(
SELECT 1
FROM just_a_test
WHERE val = 10
)
;
-- no commit yet...
DB session on user2:
INSERT INTO user1.just_a_test
SELECT 10
FROM DUAL
WHERE NOT EXISTS
(
SELECT 1
FROM user1.just_a_test
WHERE val = 10
)
;
-- no commit yet, the db just hangs til the other session commit...
So I commit the first transaction, inserting the row, and then I get the following error on user2 session:
"unique constraint violated"
*Cause: An UPDATE or INSERT statement attempted to insert a duplicate key.
For Trusted Oracle configured in DBMS MAC mode, you may see
this message if a duplicate entry exists at a different level.
Now I rollback the second transaction and run again the same insert on user2 and now I get the following output:
0 rows inserted.
Probably your scenario is just like this one. Hope it helps.
EDIT
I'm sorry. You asked two questions and I answered only the Could someone explain why/how this is happening? one. So I missed What would be an efficient way to safely insert in one go (in other words, in a single query)?.
What exactly means "safely" for you? Let's say you're running an INSERT/SELECT of lots of rows and just one of them is duplicated compared to the stored ones. For your "safety" level you should ignore all rows being inserted or ignore only the duplicated, storing the others?
Again, I don't have a SQL Server right now to give it a try, but looks like you can tell SQL Server whether to deny all rows being inserted in case of any dup or deny only the dups, keeping the others. The same is valid for an insert of a single row... if it's a dup, throw an error... or just ignore it in the other hand.
The syntax should look like this to ignore dup rows only and throw no errors:
ALTER TABLE [TableName] REBUILD WITH (IGNORE_DUP_KEY = ON);
By default this option is OFF, which means SQL Server throws an error and discards non dup rows being inserted as well.
This way you would keep your INSERT/SELECT syntax, which looks good imo.
Hope it helps.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-table-index-option-transact-sql?view=sql-server-2008
https://stackoverflow.com/a/11207687/1977836
Related
I have a migration script with the following statement:
ALTER TABLE [Tasks] ALTER COLUMN [SortOrder] int NOT NULL
What will happen if I run that twice? Will it change anything the second time? MS SQL Management Studio just reports "Command(s) completed successfully", but with no details on whether they actually did anything.
If it's not already idempotent, how do I make it so?
I would say that second time, SQL Server checks metadata and do nothing because nothing has changed.
But if you don't like possibility of multiple execution you can add simple condition to your script:
CREATE TABLE Tasks(SortOrder VARCHAR(100));
IF NOT EXISTS (SELECT 1
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [TABLE_NAME] = 'Tasks'
AND [COLUMN_NAME] = 'SortOrder'
AND IS_NULLABLE = 'NO'
AND DATA_TYPE = 'INT')
BEGIN
ALTER TABLE [Tasks] ALTER COLUMN [SortOrder] INT NOT NULL
END
SqlFiddleDemo
When you execute it the second time, the query gets executed but since the table is already altered, there is no effect. So it makes no effect on the table.
No change is there when the script executes twice.
Here is a good MSDN read about: Inside ALTER TABLE
Let's look at what SQL Server does internally when performing an ALTER
TABLE command. SQL Server can carry out an ALTER TABLE command in any
of three ways:
SQL Server might need to change only metadata.
SQL Server might need to examine all the existing data to make sure
it's compatible with the change but then change only metadata.
SQL Server might need to physically change every row.
I'm quite experienced with SQL databases but mostly with Oracle and MySQL.
Now I'm dealing with SQL Server 2012 (Management Studio 2008) and facing a weird behaviour that I cannot explain.
Considering these 3 queries and an origin table made of 400k rows:
SELECT ID_TARJETA
INTO [SGMENTIA_TEMP].[dbo].[borra_borra_]
FROM [DATAMART_SEGMENTIA].[DESA].[CLIENTES]
ALTER TABLE [SGMENTIA_TEMP].[dbo].[borra_borra_]
ADD PRIMARY KEY (ID_TARJETA)
SELECT COUNT(*)
FROM [SGMENTIA_TEMP].[dbo].[borra_borra_]
If I run them one after the other it runs OK. (total: ~7sec).
If I select them all and run all the queries at once it runs BAD. (total: ~60sec)
Finally if I wrap it all with a transaction it runs OK again
BEGIN TRANSACTION;
SELECT ID_TARJETA
INTO [SGMENTIA_TEMP].[dbo].[borra_borra_]
FROM [DATAMART_SEGMENTIA].[DESA].[CLIENTES]
ALTER TABLE [SGMENTIA_TEMP].[dbo].[borra_borra_]
ADD PRIMARY KEY(ID_TARJETA)
SELECT COUNT(*)
FROM [SGMENTIA_TEMP].[dbo].[borra_borra_]
COMMIT;
The whole picture makes no sense to me, considering that creating transactions looks quite expensive the first scenario should be a slow one, and the second one should work far better, am I wrong?
The question is quite important for me since, I'm building programatically (jdbc) this sort of packages of queries and I need a way to tweak its performance.
The only difference between the two snippet provided, is that the first uses the default transaction mode and the second uses an Explicit Transaction.
Since SQL Server default transaction mode is Autocommit Transactions, each individual statement is a transaction.
You can find more information about transaction modes here.
You can try this to see if it run in 60 sec too:
BEGIN TRANSACTION;
SELECT ID_TARJETA
INTO [SGMENTIA_TEMP].[dbo].[borra_borra_]
FROM [DATAMART_SEGMENTIA].[DESA].[CLIENTES];
COMMIT;
BEGIN TRANSACTION;
ALTER TABLE [SGMENTIA_TEMP].[dbo].[borra_borra_]
ADD PRIMARY KEY(ID_TARJETA);
COMMIT;
BEGIN TRANSACTION;
SELECT COUNT(*)
FROM [SGMENTIA_TEMP].[dbo].[borra_borra_]
COMMIT;
I have a stored procedure that does an insert of a row like this:
CREATE PROCEDURE dbo.sp_add_test
#CreatedBy NVARCHAR (128),
#TestId INT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO dbo.Test (
CreatedDate,
Title,
ParentTestId,
)
SELECT
#CreatedDate
Title,
#TestId
FROM Test
WHERE TestId = #TestId;
SELECT * from Test
WHERE TestId = #TestId
AND CreatedDate = #CreatedDate;
END
When inserted a new identity value will be generated for the primary key. As soon as the insert is completed I then do a select from that table.
Can someone tell me if there is another way I can do this? The reason I do a second select is that I need to get a value for the new TestId which is an identity column.
I am not familiar with the way SQL Server caches data. Does it cache recently used rows in the same way as Oracle does or will it go to the disk to get the row it just inserted?
In SQL Server, the right way to do this is with the OUTPUT clause. The documentation is here.
As far as I know, SQL Azure supports the OUTPUT clause.
As for your question, when a database commits as page (that is, when the insert is completed), the page often remains in memory. Typically, the "commit" is a log operation, so the data page remains in memory. An immediate access to the page should be fast, in the sense that it doesn't require a disk access. But the OUTPUT clause is the right approach in SQL Server.
I have users with INSERT permissions on a table. They can insert records on the publisher until the table's primary range runs out. Then they start getting this error every time they try to do an INSERT:
[Microsoft][ODBC SQL Server Driver]Fractional truncation
[Microsoft][ODBC SQL Server Driver][SQL Server]The insert failed. It conflicted with an identity range check constraint in database 'TaxDB', replicated table 'dbo.ClientHistory', column 'ClientHistoryID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated.
ODBC--insert on a linked table 'ClientHistory' failed.
According to the MS Documentation for SQL Server 2008 R2:
If the Publisher exhausts its identity range after an insert, it can automatically assign a new range if the insert was performed by a member of the db_owner fixed database role. If the insert was performed by a user not in that role, the Log Reader Agent, Merge Agent, or a user who is a member of the db_owner role must run sp_adjustpublisheridentityrange (Transact-SQL).
So the docs say that the user must be a member of the 'db_owner' role but they do not say why. Here is the applicable section of T-SQL from one of the auto-generated MSmerge_ins triggers:
if is_member('db_owner') = 1
begin
-- select the range values from the MSmerge_identity_range table
-- this can be hardcoded if performance is a problem
declare #range_begin numeric(38,0)
declare #range_end numeric(38,0)
declare #next_range_begin numeric(38,0)
declare #next_range_end numeric(38,0)
select #range_begin = range_begin,
#range_end = range_end,
#next_range_begin = next_range_begin,
#next_range_end = next_range_end
from dbo.MSmerge_identity_range where artid='A2D114CE-8436-48BF-9235-E47A059ACB13' and subid='2689FFDE-991E-4122-BFC2-C9739CC55917' and is_pub_range=0
if #range_begin is not null and #range_end is not NULL and #next_range_begin is not null and #next_range_end is not NULL
begin
if IDENT_CURRENT('[dbo].[ClientHistory]') = #range_end
begin
DBCC CHECKIDENT ('[dbo].[ClientHistory]', RESEED, #next_range_begin) with no_infomsgs
end
else if IDENT_CURRENT('[dbo].[ClientHistory]') >= #next_range_end
begin
exec sys.sp_MSrefresh_publisher_idrange '[dbo].[ClientHistory]', '2689FFDE-991E-4122-BFC2-C9739CC55917', 'A2D114CE-8436-48BF-9235-E47A059ACB13', 2, 1
if ##error<>0 or #retcode<>0
goto FAILURE
end
end
end
I would like to provide users who have INSERT permissions on the table (but otherwise limited permissions) the ability to switch from the primary to secondary identity range. Making these users 'db_owner's is not an option. However, giving them limited additional permissions is certainly a possibility. I just don't know what those permissions would be.
Since auto identity range mgmt is turned on by default in SQL Server 2008 merge replication I incorrectly assumed it would "just work" out of the box. I'm starting to think I'd be better off going back to NO identity range management. Really I just want my users to be able to insert records without needing an admin to step in all the time.
I've just moved a database from a SQL 2000 instance to a SQL 2008 instance and have encountered an odd problem which appears to be related to IDENTITY columns and stored procedures.
I have a number of stored procedures in the database along the lines of this
create procedure usp_add_something #somethingId int, #somethingName nvarchar(100)
with encryption
as
-- If there's an ID then update the record
if #somethingId <> -1 begin
UPDATE something SET somethingName = #somethingName
end else begin
-- Add a new record
INSERT INTO something ( somethingName ) VALUES ( #somethingName )
end
go
These are all created as ENCRYPTED stored procedures. The id column (e.g. somethingId in this example) is an IDENTITY(1,1) with a PRIMARY KEY on it, and there are lots of rows in these tables.
Upon restoring onto the SQL 2008 instance a lot of my database seems to be working fine, but calls like
exec usp_add_something #somethingId = -1, #somethingName = 'A Name'
result in an error like this:
Violation of PRIMARY KEY constraint 'Something_PK'. Cannot insert duplicate key in object 'dbo.something'.
It seems that something is messed up that either causes SQL Server to not allocate the next IDENTITY correctly...or something like that. This is very odd!
I'm able to INSERT into the table directly without specifying the id column and it allocates an id just fine for the identity column.
There are no records with somethingId = -1 ... not that that should make any difference.
If I drop and recreate the procedure the problem goes away. But I have lots of these procedures so don't really want to do that in case I miss some or there is a customized procedure in the database that I overwrite.
Does anyone know of any known issues to do with this? (and a solution ideally!)
Is there a different way I should be moving my sql 2000 database to the sql 2008 instance? e.g. is it likely that Detach and Attach would behave differently?
I've tried recompiling the procedure using sp_recompile 'usp_add_something' but that didn't solve the problem, so I can't simply call that on all procedures.
thanks for any help
R
(cross-posted here)
If the problem is an improperly set identity seed, you can reset a table this way:
DBCC CHECKIDENT (TableName, RESEED, 0);
DBCC CHECKIDENT (TableName, RESEED);
This will automatically find the highest value in the table and set the seed appropriately so you don't have to do a SELECT Max() query. Now fixing the table can be done in automation, without dynamic SQL or manual script writing.
But you said you can insert to the table directly without a problem, so it's probably not the issue. But I wanted to post to set the record straight about the easy way to reset the identity seed.
Note: if your table's increment is negative, or you in the past reset the seed to use up all negative numbers starting at the lowest after consuming all the positive numbers, all bets are off. Especially in the latter case (having a positive increment, but you are using identity values lower than others already in the table), then you do not want to run DBCC CHECKIDENT without specifying NORESEED ever. Because just DBCC CHECKIDENT (TableName); will screw up your identity value. You must use DBCC CHECKIDENT (TableName, NORESEED). Fun times will ensue if you forget this. :)
First, check the maximum ID from your table:
select max(id_column) from YourTable
Then, check the current identity seed:
select ident_seed('YourTable')
If the current seed is lower than the maximum, reseed the table with dbcc checkident:
DBCC CHECKIDENT (YourTable, RESEED, 42)
Where 42 is the current maximum.
Demonstration code for how this can go wrong:
create table YourTable (id int identity primary key, name varchar(25))
DBCC CHECKIDENT (YourTable, RESEED, 42)
insert into YourTable (name) values ('Zaphod Beeblebrox')
DBCC CHECKIDENT (YourTable, RESEED, 41)
insert into YourTable (name) values ('Ford Prefect') --> Violation of PRIMARY KEY
I tried and was unable to replicate this on another server.
However, on my Live servers I dropped the problem database from sql 2008 and recreated it using a detach and reattach and this worked fine, without these PRIMARY KEY VIOLATION errors.
Since I wanted to keep the original database live, in fact my exact steps were:
back up sourceDb and restore as sourceDbCopy on the same instance
take sourceDbCopy offline
move the sourceDbCopy files to the new server
attach the database
rename the database to the original name
If recreating the procedures helps, here's an easy way to generate a recreation script:
Right click database -> Tasks -> Generate scripts
On page 2 ("Choose Objects") select the stored procedures
On page 3 ("set scripting options") choose Advanced -> Script DROP and CREATE and set it to Script DROP and CREATE.
Save the script somewhere and run it