Auto increment a duplicate number - sql-server

I have a database that handles duplicate records by using a duplicate number as part of the primary key. I'm trying to insert some rows that I've recovered off the Audit table. However when I go to insert the records I violate the primary key. (These aren't duplicates however, the field that makes them unique should be factored into the key). My question is, is there a way to auto increment the duplicate number if a violation occurs when I'm inserting a record with an select/insert statement or do I need to take a differnt approach.
fieldA nvarchar(255) not null,
fieldB nvarchar(255) not null,
duplicateNbr int not null,
fieldD nvarchar(255) null, /* Should be part of the key */
Table1
'Dave','Jones','FieldD1'
'Dave', Jones','FieldD2'
INSERT INTO Table2 (FName, LName, DuplicateNbr, FieldD)
SELECT FName, LName, 0, FieldD FROM TABLE1

In your SELECT statement on TABLE1, you can use ROW_NUMBER() function with PARTITION BY clause to number duplicate rows in each duplicate set
Following SQL tutorial shows a way of detecting duplicates and deleting them using SQL Row_Number() with Partition By
Here is the query
INSERT INTO Table2 (fieldA, fieldB, DuplicateNbr, FieldD)
SELECT
fieldA,
fieldB,
duplicateNbr = ROW_NUMBER() over (partition by fieldA, fieldB Order by newid())-1,
fieldD
FROM TABLE1
I hope it helps,

I would do it using the merge statement.
Just generate a new unique id in the "matched" clause and a normal insert, if nothing matches.

Related

MSSQL to create temporary identity column when selecting from table

I have a temporary table that I am selecting all the records from. Say it looks like this:
select * from #mytable
I need to add a temporary identity field to that selection. I've looked around and suggestions include using the IDENTITY(1,1) keyword in some fashion or just creating an auto-incrementing field like this:
row_Number() over (order by col1, col2) as myid
But that doesn't make the column an identity, it just creates a uniquely incremented field.
I know there should be a simple solution to this but I just can't find it.
I just want to know if it's possible to create a key identity field
while doing a select
Only by SELECT INTO:
SELECT IDENTITY(INT,1,1) AS IdentityColumn
,*
INTO #Temp
FROM sys.databases
But not via a plain SELECT, since IDENTITY is nothing more than just a column property that involves proprietary sequence generator and it works only on INSERT
You can use SELECT INTO
SELECT
IDENTITY (INT, 1, 1) AS NEW_ID, *
INTO #tempTable
FROM #mytable
SELECT * FROM #tempTable

SQL Server merge and pk violation

I'm trying to understand why in a scenario like the following
DECLARE #source TABLE
(
orderId NVARCHAR(50),
customerId NVARCHAR(50)
)
DECLARE #target TABLE
(
orderId NVARCHAR(50) PRIMARY KEY,
customerId NVARCHAR(50) NOT NULL
)
INSERT INTO #source
VALUES ('test', '123'), ('test', '234')
MERGE #target AS TRG
USING (SELECT DISTINCT orderId, customerId
FROM #source) AS SRC ON SRC.orderId = TRG.orderId
WHEN MATCHED THEN
UPDATE SET TRG.customerId = SRC.customerId
WHEN NOT MATCHED BY TARGET THEN
INSERT (orderId, customerId)
VALUES (orderId, customerId);
I'm getting a duplicate key violation error:
Msg 2627, Level 14, State 1, Line 21
Violation of PRIMARY KEY constraint 'PK__#B3D7759__0809335D4BE1521F'. Cannot insert duplicate key in object 'dbo.#target'. The duplicate key value is (test).
What I expect is that the update statement finds the existing key and updates the customerId so that at the end I have in #target 1 row with orderId = 'test' and customerId = '234'.
For what I can assume, it instead tries to insert all records as it first doesn't find any key match at the beginning of the merge, causing the violation as the source contains the key multiple times.
Is this right? Is there any way to achieve what I am expecting using the merge function?
#user1443098
I've read your link, thanks. However I have a massive data insertion coming from a source table and going into 10 different tables; I tried to implement the procedure with a cursor and it took like 0.5s per record (with all the if exists statements). With merge statement, 300 rows have been inserted in the 10 different tables in less than one sec. So in my case it does a lot of difference in performance terms.
There are two records in #source with the same OrderID. There is not a match for either record in #target so the NOT MATCHED clause is trying to insert both of these records. But it can not do this because the primary key on OrderID in the #target table requires that all inserted records have unique values for OrderID. The duplication of values into the primary key causes the primary key violation.
If you are expecting duplicates are possible in the source... you should eliminate them in your USING sub-query. Something like this:
(SELECT orderId, max(customerId) customerId
FROM #source
group by orderId)

Can I grab the inserted IDs when doing multiple inserts?

In my head this sounds improbable, but I'd like to know if I can do it:
INSERT INTO MyTable (Name)
VALUES ('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT INSERTED Name, ID FROM TheAboveQuery
Where ID is an auto-indexed column?
Just to clarify, I want to select ONLY the newly inserted rows.
Starting with SQL Server 2008 you can use OUTPUT clause with INSERT statement
DECLARE #T TABLE (ID INT, Name NVARCHAR(100))
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name INTO #T
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT Name, ID FROM #T;
UPDATE: if table have no triggers
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
Sure, you can use an IDENTITY property on your ID field, and create the CLUSTERED INDEX on it
ONLINE DEMO
create table MyTable ( ID int identity(1,1),
[Name] varchar(64),
constraint [PK_MyTable] primary key clustered (ID asc) on [Primary]
)
--suppose this data already existed...
INSERT INTO MyTable (Name)
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
--now we insert some more... and then only return these rows
INSERT INTO MyTable (Name)
VALUES
('Sixth'),
('Seventh')
select top (##ROWCOUNT)
ID,
Name
from MyTable
order by ID desc
##ROWCOUNT returns the number of rows affected by the last statement executed. You can always see this in the messages tab of SQL Server Management Studio. Thus, we are getting the number of rows inserted and combining it with TOP which limits the rows returned in a query to the specified number of rows (or percentage if you use [PERCENT]). It is important that you use ORDER BY when using TOP otherwise your results aren't guaranteed to be the same
From my previous edited answer...
If you are trying to see what values were inserted, then I assume you are inserting them a different way and this is usually handled with an OUTPUT clause, TRIGGER if you are trying to do something with these records after the insert, etc... more information would be needed.

Populating Row Number on new column for table without primary key

I have a table where I am adding a new column to indicate row numbers. This table doesn't have any primary key associated, so I am not sure how I can populate it with row number values.
Here is the code I have put in the CTE but in the primary table I don't know how to correlate the Id column. Id is the newly added column to which I need to start the numbering.
;with CTE_RowNum as (
select
ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowId
,Id
from dbo.Test
)
I can partition the table on certain columns but how would I join such criteria with original table?
Edit- Here is the schema that I am referring to. Id is the newly added column which I want to populate with as row numbers. Name and Error column together form the uniqueness.
Expected Output would be -
1,ABC,Time Out Issue
2,ABC,Page Not Found
3,DEF,Page Not Found
The order doesn't matter, I mean the last row can have '1' and the others some other value.
The other way I can think of repopulating data into a new table with Identity inserted but was just wondering is there a way to do that through T-SQL?
CREATE TABLE #Test
(
Id INT
,Name NVARCHAR(100)
,Error NVARCHAR(100)
)
INSERT INTO #Test (Name,Error) VALUES
('ABC','Time Out Issue')
,('ABC','Page Not Found')
,('DEF','Page Not Found')
It's really simple:
;WITH cte AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowId
FROM #test
)
UPDATE cte
SET Id = RowId

Sql Server doing a full table scan when first field in PK has few distinct values

I have this table (TableA):
(
[FieldA] [int] NOT NULL,
[FieldB] [int] NOT NULL,
[Value] [float] NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
[FieldA] ASC,
[FieldB] ASC
)
There are few distinct FieldA values, lets say FieldA can be {1,2,3,4,5,6}.
Why does this query causes a full table scan:
SELECT COUNT(*) FROM TableA WHERE FieldB = 1
While this doesn't:
SELECT COUNT(*) FROM TableA WHERE FieldB = 1 where FieldA in (1,2,3,4,5,6)
Can't Sql Server optimize this? If I had TableB where FieldA was a PK and I joined TableB and TableA the query would run similarly to the second query.
The clustered index you've created is based on two columns. If you're doing a lookup on just one of those columns, SQL Server cannot generate a "key" value to use in the lookup process on that index, so it falls back to a table-scan approach.
Even though FieldA has a very small range of values it could contain, the SQL optimizer doesn't look at that range of values to determine whether it could "fudge" a key out of the information you've given it.
If you want to improve the performance of the first query, you will have to create another index on FieldB. If, as you say, there are not many distinct values in FieldA, and you do most of your lookups on a FieldB exclusively, you might want to consider moving your clustered index to be built only on FieldB and generate a unique index over FieldA and FieldB.
Apparently, what I was looking for is a skip-scan optimization which is available on Oracle but not on SQL Server. Skip scan can utilize an index if the leading edge column predicate is missing:
http://social.msdn.microsoft.com/Forums/eu/transactsql/thread/48de15ad-f8e9-4930-9f40-ca74946bc401

Resources