SQL Server merge and pk violation - sql-server

I'm trying to understand why in a scenario like the following
DECLARE #source TABLE
(
orderId NVARCHAR(50),
customerId NVARCHAR(50)
)
DECLARE #target TABLE
(
orderId NVARCHAR(50) PRIMARY KEY,
customerId NVARCHAR(50) NOT NULL
)
INSERT INTO #source
VALUES ('test', '123'), ('test', '234')
MERGE #target AS TRG
USING (SELECT DISTINCT orderId, customerId
FROM #source) AS SRC ON SRC.orderId = TRG.orderId
WHEN MATCHED THEN
UPDATE SET TRG.customerId = SRC.customerId
WHEN NOT MATCHED BY TARGET THEN
INSERT (orderId, customerId)
VALUES (orderId, customerId);
I'm getting a duplicate key violation error:
Msg 2627, Level 14, State 1, Line 21
Violation of PRIMARY KEY constraint 'PK__#B3D7759__0809335D4BE1521F'. Cannot insert duplicate key in object 'dbo.#target'. The duplicate key value is (test).
What I expect is that the update statement finds the existing key and updates the customerId so that at the end I have in #target 1 row with orderId = 'test' and customerId = '234'.
For what I can assume, it instead tries to insert all records as it first doesn't find any key match at the beginning of the merge, causing the violation as the source contains the key multiple times.
Is this right? Is there any way to achieve what I am expecting using the merge function?
#user1443098
I've read your link, thanks. However I have a massive data insertion coming from a source table and going into 10 different tables; I tried to implement the procedure with a cursor and it took like 0.5s per record (with all the if exists statements). With merge statement, 300 rows have been inserted in the 10 different tables in less than one sec. So in my case it does a lot of difference in performance terms.

There are two records in #source with the same OrderID. There is not a match for either record in #target so the NOT MATCHED clause is trying to insert both of these records. But it can not do this because the primary key on OrderID in the #target table requires that all inserted records have unique values for OrderID. The duplication of values into the primary key causes the primary key violation.
If you are expecting duplicates are possible in the source... you should eliminate them in your USING sub-query. Something like this:
(SELECT orderId, max(customerId) customerId
FROM #source
group by orderId)

Related

Insert/Update, ignore rows which would violate FK

If someone performs an update or insert on a table which has a foreign key to another, if a nonexistent value appears, an error is thrown. Is there an automated enough way to just ignore the faulty columns and continue with the others?
I can only think of an instead-of trigger, but it sounds messy.
You should check the rows in your INSERT query, by joining the source data with the referenced table.
For example, let's suppose that you have the following data in the SourceData table and you want to insert it in the Sales.CurrencyRate table in the AdventureWorks2019 database:
CREATE TABLE SourceData (
CurrencyRateDate datetime,
FromCurrencyCode char(3),
ToCurrencyCode char(3),
PRIMARY KEY (CurrencyRateDate, FromCurrencyCode, ToCurrencyCode),
Rate money NOT NULL
)
INSERT INTO SourceData (CurrencyRateDate, FromCurrencyCode, ToCurrencyCode, Rate) VALUES
('20200429','EUR','RON',4.8425),
('20200429','EUR','ROL',48425),
('20200430','EUR','RON',4.8421),
('20200430','EUR','ROL',48421)
INSERT INTO Sales.CurrencyRate (CurrencyRateDate, FromCurrencyCode, ToCurrencyCode, AverageRate, EndOfDayRate)
SELECT sd.CurrencyRateDate, sd.FromCurrencyCode, sd.ToCurrencyCode, sd.Rate, sd.Rate
FROM SourceData sd
If you simply run the INSERT statement mentioned above, you would get an error saying "The INSERT statement conflicted with the FOREIGN KEY constraint "FK_CurrencyRate_Currency_ToCurrencyCode". The conflict occurred in database "AdventureWorks2019", table "Sales.Currency", column 'CurrencyCode'.", because the currency "RON" is not present in the Sales.Currency table.
To avoid this error and insert only the data that has corresponding rows in the referenced table, you would simply use a JOIN for each FK, like this:
INSERT INTO Sales.CurrencyRate (CurrencyRateDate, FromCurrencyCode, ToCurrencyCode, AverageRate, EndOfDayRate)
SELECT sd.CurrencyRateDate, sd.FromCurrencyCode, sd.ToCurrencyCode, sd.Rate, sd.Rate
FROM SourceData sd
INNER JOIN Sales.Currency c1 ON c1.CurrencyCode=sd.FromCurrencyCode
INNER JOIN Sales.Currency c2 ON c2.CurrencyCode=sd.ToCurrencyCode

Can I grab the inserted IDs when doing multiple inserts?

In my head this sounds improbable, but I'd like to know if I can do it:
INSERT INTO MyTable (Name)
VALUES ('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT INSERTED Name, ID FROM TheAboveQuery
Where ID is an auto-indexed column?
Just to clarify, I want to select ONLY the newly inserted rows.
Starting with SQL Server 2008 you can use OUTPUT clause with INSERT statement
DECLARE #T TABLE (ID INT, Name NVARCHAR(100))
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name INTO #T
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT Name, ID FROM #T;
UPDATE: if table have no triggers
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
Sure, you can use an IDENTITY property on your ID field, and create the CLUSTERED INDEX on it
ONLINE DEMO
create table MyTable ( ID int identity(1,1),
[Name] varchar(64),
constraint [PK_MyTable] primary key clustered (ID asc) on [Primary]
)
--suppose this data already existed...
INSERT INTO MyTable (Name)
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
--now we insert some more... and then only return these rows
INSERT INTO MyTable (Name)
VALUES
('Sixth'),
('Seventh')
select top (##ROWCOUNT)
ID,
Name
from MyTable
order by ID desc
##ROWCOUNT returns the number of rows affected by the last statement executed. You can always see this in the messages tab of SQL Server Management Studio. Thus, we are getting the number of rows inserted and combining it with TOP which limits the rows returned in a query to the specified number of rows (or percentage if you use [PERCENT]). It is important that you use ORDER BY when using TOP otherwise your results aren't guaranteed to be the same
From my previous edited answer...
If you are trying to see what values were inserted, then I assume you are inserting them a different way and this is usually handled with an OUTPUT clause, TRIGGER if you are trying to do something with these records after the insert, etc... more information would be needed.

SQL Server 2014 - parallel processes Insert same value in table with unique index

I have a table called dbo.mtestUnique with two column id and desc, I have a unique index on "desc" , two process inserting data to this table at a same time, how can I avoid inserting duplicate value and violating the unique index?
not exists and left join doesn't work.
to replicate this you can create a table on a test database:
CREATE TABLE mtestUnique
(
id INT ,
[DESC] varchar(50),
UNIQUE([DESC])
)
and then run the following script on two different queries on SSMS.
SET XACT_ABORT ON;
DECLARE #time VARCHAR(50)
WHILE (1=1)
BEGIN
IF OBJECT_ID('tempdb..#t') IS NOT NULL
DROP TABLE #t
SELECT #time = CAST(DATEPART(HOUR , GETDATE()) AS VARCHAR(10)) + ':' + RIGHT('00' +CAST(DATEPART(MINUTE , GETDATE())+1 AS VARCHAR(2)),2)
SELECT MAX(id) + 1 id , 'test' + #time [DESC]
INTO #t
FROM dbo.mtestUnique
-- to insert as exact same time
WAITFOR TIME #time
INSERT INTO dbo.mtestUnique
( id, [DESC] )
SELECT *
FROM #t t
WHERE NOT EXISTS (
SELECT 1
FROM dbo.mtestUnique u
WHERE u.[DESC] = t.[Desc]
)
END
I even put the insert in a TRAN but no luck.
thanks for your help in advance.
The only way to prevent a unique constraint violation is to not insert duplicate values for the column. If you have the unique constraint, it will throw an error when you try to insert a duplicate description, but it will not control what descriptions are attempted to be inserted.
With that said, if you only need a unique identifier I would highly recommend using the ID instead. Set it to an auto incriminating integer and do not insert in manually. Just provide the description and SQL Server will populate the ID for you avoiding duplicates.

Auto increment a duplicate number

I have a database that handles duplicate records by using a duplicate number as part of the primary key. I'm trying to insert some rows that I've recovered off the Audit table. However when I go to insert the records I violate the primary key. (These aren't duplicates however, the field that makes them unique should be factored into the key). My question is, is there a way to auto increment the duplicate number if a violation occurs when I'm inserting a record with an select/insert statement or do I need to take a differnt approach.
fieldA nvarchar(255) not null,
fieldB nvarchar(255) not null,
duplicateNbr int not null,
fieldD nvarchar(255) null, /* Should be part of the key */
Table1
'Dave','Jones','FieldD1'
'Dave', Jones','FieldD2'
INSERT INTO Table2 (FName, LName, DuplicateNbr, FieldD)
SELECT FName, LName, 0, FieldD FROM TABLE1
In your SELECT statement on TABLE1, you can use ROW_NUMBER() function with PARTITION BY clause to number duplicate rows in each duplicate set
Following SQL tutorial shows a way of detecting duplicates and deleting them using SQL Row_Number() with Partition By
Here is the query
INSERT INTO Table2 (fieldA, fieldB, DuplicateNbr, FieldD)
SELECT
fieldA,
fieldB,
duplicateNbr = ROW_NUMBER() over (partition by fieldA, fieldB Order by newid())-1,
fieldD
FROM TABLE1
I hope it helps,
I would do it using the merge statement.
Just generate a new unique id in the "matched" clause and a normal insert, if nothing matches.

Reordering Identity primary key in sql server

Yes i am very well aware the consequences. But i just want to reorder them. Start from 1 to end.
How do I go about reordering the keys using a single query ?
It is clustered primary key index
Reordering like
First record Id 1
second record Id 2
The primary key is Int
Drop PK constraint
Drop Identity column
Re-create Identity Column
Re-Create PK
USE Test
go
if(object_id('IdentityTest') Is not null)
drop table IdentityTest
create table IdentityTest
(
Id int identity not null,
Name varchar(5),
constraint pk primary key (Id)
)
set identity_insert dbo.IdentityTest ON
insert into dbo.IdentityTest (Id,Name) Values(23,'A'),(26,'B'),(34,'C'),(35,'D'),(40,'E')
set identity_insert dbo.IdentityTest OFF
select * from IdentityTest
------------------1. Drop PK constraint ------------------------------------
ALTER TABLE [dbo].[IdentityTest] DROP CONSTRAINT [pk]
GO
------------------2. Drop Identity column -----------------------------------
ALTER table dbo.IdentityTest
drop column Id
------------------3. Re-create Identity Column -----------------------------------
ALTER table dbo.IdentityTest
add Id int identity(1,1)
-------------------4. Re-Create PK-----------------------
ALTER TABLE [dbo].[IdentityTest] ADD CONSTRAINT [pk] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
--------------------------------------------------------------
insert into dbo.IdentityTest (Name) Values('F')
select * from IdentityTest
IDENTITY columns are not updatable irrespective of SET IDENTITY_INSERT options.
You could create a shadow table with the same definition as the original except for the IDENTITY property. Switch into that (this is a metadata only change with no movement of rows that just affects the table's definition) then update the rows and switch back though.
A full worked example going from a situation with gaps to no gaps is shown below (error handling and transactions are omitted below for brevity).
Demo Scenario
/*Your original table*/
CREATE TABLE YourTable
(
Id INT IDENTITY PRIMARY KEY,
OtherColumns CHAR(100) NULL
)
/*Some dummy data*/
INSERT INTO YourTable (OtherColumns) VALUES ('A'),('B'),('C')
/*Delete a row leaving a gap*/
DELETE FROM YourTable WHERE Id =2
/*Verify there is a gap*/
SELECT *
FROM YourTable
Remove Gaps
/*Create table with same definition as original but no `IDENTITY`*/
CREATE TABLE ShadowTable
(
Id INT PRIMARY KEY,
OtherColumns CHAR(100)
)
/*1st metadata switch*/
ALTER TABLE YourTable SWITCH TO ShadowTable;
/*Do the update*/
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY Id) AS RN
FROM ShadowTable
)
UPDATE CTE SET Id = RN
/*Metadata switch back to restore IDENTITY property*/
ALTER TABLE ShadowTable SWITCH TO YourTable;
/*Remove unneeded table*/
DROP TABLE ShadowTable;
/*No Gaps*/
SELECT *
FROM YourTable
I don't think there is any way to do this in a single query. Your best bet is to copy the data to a new table, drop and recreate the original table (or delete the data and reseed the identity) and reinsert the data in the original order using the previous identity as the ordering (but not re-inserting it).
CREATE TABLE Table1_Stg (bla bla bla)
INSERT INTO Table1_Stg (Column2, Column3,...) SELECT Column2, Column3,... FROM Table1 ORDER BY Id
Here the Id column is excluded from the SELECT column list.
Or, you can do:
SELECT * INTO Table1_Stg FROM Table1 ORDER BY Id
DROP Table1
sp_rename Table1_stg Table1
Please lookup the usage for sp_rename as I am doing this from memory.
Hope this helps.
EDIT: Please save a script with all your indexes and constraints if any on Table1.
EDIT2: Added second method of creating table and inserting into table.
UPDATE tbl SET id = (SELECT COUNT(*) FROM tbl t WHERE t.id <= tbl.id);
This last statement is genius. Just had to remove the primary key from the table design first and make sure under the design option Identity Specifications is set to no. Once you run the query set these options back.

Resources