Maintaining Foreign Key references while migrating data using t-sql - sql-server

I have 3 tables.
1) SourceTable - a source table with some data in it
2) DestinationTable - a destination table with the same schema as the Source table. both tables have similar kind of data
3) FKSourceTable - a totally different table that has a FK reference to SourceTable
4) FKDestinationTable - a totally different table that has a FK reference to DestinationTable. Has the same schema as FKSourceTable
Now I'm given the task of migrating some of the data from the SourceTable to the DestinationTable and FKSourceTable to FKDestinationTable
However I cannot migrate Primary Keys as the DestinationTable may have its own records with the same PK and that might create a PK violation.
DestinationTable as an Auto Identity column for the PK and when I do a Bulk insert, I don't specify the PK column so Auto Identity will do its Job.
This means the new records in DestionationTable will have brand new IDs.
The problem I'm having is, how do I maintain the FK reference when migrating FKSourceTable to FKDestinationTable? When I do a bulk insert to DestinationTable as follows, I lose track of the Identities:
INSERT INTO DestionationTable
(Col1, Col2)
SELECT st.Col1, st.Col2
FROM SourceTable st
(DestionationTable has 3 columns: Id, Col1, Col2)
The challenge is that I cannot use SSIS or any other ETL solution. I need to be able to do this with a simple SQL Script.
Does anyone have any ideas to tackle this? I've tried using OUTPUT INTO etc. but I haven't figured out a way to keep a reference between the original Id and the new Id
Any help is greatly appreciated
Thank you
Nandun.

This is probably not the most optimal solution but it should get the job done.
Idea is to disable identity insert and generate IDs yourself based on what is already in the table.
What this does is it iterates through source data and inserts it into destination tables one row at a time.
Pls review this code thoroughly before executing because I didn’t test this myself
declare #col1 varchar(20)
declare #col2 varchar(20)
declare #col3 varchar(20)
declare #new_id int
set identity_insert on
declare source_data cursor for
select col1, col2, colx
from SourceTable
open source_data
fetch next from source_data
into #col1, #col2, #col3
WHILE ##FETCH_STATUS = 0
begin
set #new_id = select MAX(ID) + 1 from SourceTable
insert into DestinationTable (ID, col1, col2, col3) values (#new_id,#col1,#col2,#col3)
-- do something similar for FKDestinationTable
insert into FKDestinationTable (ID, col1, col2, col3) values (#new_id,#col1,#col2,#col3)
fetch next from source_data
into #col1, #col2, #col3
end
set identity_insert off

Insert the data into the Destination table using Ident_Current of Destination table
DECLARE #ID INT = IDENT_CURRENT('DestionationTable')
INSERT INTO DestionationTable
(ID, Col1, Col2)
SELECT #ID + ROW_NUMBER() OVER(ORDER BY st.ID),st.Col1, st.Col2
FROM SourceTable st
WHERE -----
Now you have information of what each ID in source table = which ID in destination table.
SELECT #ID + ROW_NUMBER() OVER(ORDER BY st.ID) [NEW_ID], st.ID [OLD_ID]
FROM SourceTable st
WHERE -----
Note: Make sure this is done in a transaction and the transaction type depends on the usage of these tables

Related

How to update a column in a table in SQL Server Management Studio and not introduce duplicates

Hi this is my first post on stack overflow and I'm hoping to get some help with a query.
I am trying to update a table in SSMS by changing some entries of a column (due to a data entry mistake), however when I try to update the table I am faced with an error like this:
Violation of PRIMARY KEY constraint 'XXXXX'. Cannot insert duplicate key in object 'XXXXX'. The duplicate key value is XXXXX.
I understand that the error is due to trying to insert duplicate rows but what I also want to do is to sum up on one column which aggregates the data but I am not sure how to do this as I cannot use sum(count) and group by x,y,z.
I have created a sample of data below to illustrate the problem and then what I'd like it to look like.
Change Apples to Oranges where col3 = 'Large' and col4 = 'USA' and Sum(Count)
[Sample data before update]
[Sample data after update]
Sample code for this problem that I have tried:
USE database1;
GO
UPDATE [dbo].[table]
SET col5 = 'Oranges', Count = sum(Count)
WHERE col3 = 'Large' and col4 = 'USA'
group by col3, col4, col5;
GO
Thanks in advance for any suggestions! :)
You cannot write sum(count) that way.
Replace sum(count) with following query
select sum([count]) from dbo.table where col3 = 'LARGE' and col4 = 'USA'
group by col3, col4, col5;
Regarding violation of primary key, you need to remove primary key constraint on which column it is imposed.
You cannot add duplicate values in a Primary key column.
Below is the sample query to remove primary keys from a table
DECLARE #table NVARCHAR(512), #sql NVARCHAR(MAX);
SELECT #table = N'dbo.table';
SELECT #sql = 'ALTER TABLE ' + #table
+ ' DROP CONSTRAINT ' + name + ';'
FROM sys.key_constraints
WHERE [type] = 'PK'
AND [parent_object_id] = OBJECT_ID(#table);
EXEC sp_executeSQL #sql;

SQL Server - how to manually increment a PK in multiple-row INSERT transaction

I am working in SQL Server. I have a table that has a PK int column. This column does not have auto-increment enabled, and I am not allowed to change the schema. I need to insert lots of rows (perhaps thousands) into this table manually. None of the data inserted will come from any existing table. However, I need to ensure that the PK column gets incremented by +1 for each new row. My current script is like the following:
BEGIN TRAN
INSERT INTO DB1.dbo.table1
(PK_col, col1)
VALUES
(10, 'a')
,(11, 'something')
,(12, 'more text')
;
where I already know via a pre-query (SELECT MAX(PK_col) + 1) that PK_col is currently at 9.
My problem is ensuring that the PK column gets incremented by +1 for each new row. Because there could be thousands of rows to insert, I want to reduce the possibility of skipping values or a PK constraint violation being thrown. I know that I can achieve this outside of the DB (via Excel), as long as I validate the PK values prior to running the SQL script. However, I would like to create a solution that handles the auto-increment within the TRAN statement itself. Is this possible (without running into a race condition)? If so, how?
The following should do what you want:
INSERT INTO DB1.dbo.table1(PK_col, col1)
SELECT COALESCE(l.max_pk_col, 0) + row_number() over (order by (select null)) as PK_col,
col1
FROM (VALUES ('a'), ('something'), ('more text')) v(col1) CROSS JOIN
(SELECT MAX(pk_col) as max_pk_col FROM DB1.dbo.table1) l;
You need to be careful with this arrangement. Locking the entire table for the duration of the INSERT is probably a good idea -- if anything else could be updating the table.
One way to do this would be to create a new temporary table with an identity column and a data column, do your insert and then insert the contents of this table back into your desired original table. Then clean it up
BEGIN TRAN
DECLARE #initialIndex INT
SELECT #initialIndex = MAX(PK_col) + 1)
FROM DB1.dbo.table1
CREATE TABLE #tempData(
PK_col INT IDENTITY(#initialIndex, 1),
col1 VARCHAR(MAX)
)
INSERT INTO #tempData (col1)
VALUES
('a')
,('something')
,('more text')
INSERT INTO DB1.dbo.table1
SELECT PK_col, col1
FROM #tempData
DROP TABLE #tempData

SQL Server 2014 - parallel processes Insert same value in table with unique index

I have a table called dbo.mtestUnique with two column id and desc, I have a unique index on "desc" , two process inserting data to this table at a same time, how can I avoid inserting duplicate value and violating the unique index?
not exists and left join doesn't work.
to replicate this you can create a table on a test database:
CREATE TABLE mtestUnique
(
id INT ,
[DESC] varchar(50),
UNIQUE([DESC])
)
and then run the following script on two different queries on SSMS.
SET XACT_ABORT ON;
DECLARE #time VARCHAR(50)
WHILE (1=1)
BEGIN
IF OBJECT_ID('tempdb..#t') IS NOT NULL
DROP TABLE #t
SELECT #time = CAST(DATEPART(HOUR , GETDATE()) AS VARCHAR(10)) + ':' + RIGHT('00' +CAST(DATEPART(MINUTE , GETDATE())+1 AS VARCHAR(2)),2)
SELECT MAX(id) + 1 id , 'test' + #time [DESC]
INTO #t
FROM dbo.mtestUnique
-- to insert as exact same time
WAITFOR TIME #time
INSERT INTO dbo.mtestUnique
( id, [DESC] )
SELECT *
FROM #t t
WHERE NOT EXISTS (
SELECT 1
FROM dbo.mtestUnique u
WHERE u.[DESC] = t.[Desc]
)
END
I even put the insert in a TRAN but no luck.
thanks for your help in advance.
The only way to prevent a unique constraint violation is to not insert duplicate values for the column. If you have the unique constraint, it will throw an error when you try to insert a duplicate description, but it will not control what descriptions are attempted to be inserted.
With that said, if you only need a unique identifier I would highly recommend using the ID instead. Set it to an auto incriminating integer and do not insert in manually. Just provide the description and SQL Server will populate the ID for you avoiding duplicates.

How to refactor this deadlock issue?

I ran into a deadlock issue synchronizing a table multiple times in a short period of time. By synchronize I mean doing the following:
Insert data to be synchronized into a temp table
Update existing records in destination table
Insert new records into the destination table
Delete records that are not in the synch table under certain
circumstances
Drop temp table
For the INSERT and DELETE statements, I'm using a LEFT JOIN similar to:
INSERT INTO destination_table (fk1, fk2, val1)
FROM #tmp
LEFT JOIN destination_table dt ON dt.fk1 = #tmp.fk1
AND dt.fk2 = #temp.fk2
WHERE dt.pk IS NULL;
The deadlock graph is reporting the destination_table's primary key is under an exclusive lock. I assume the above query is causing a table or page lock instead of a row lock. How would I confirm that?
I could rewrite the above query with an IN, EXIST or EXCEPT command. Are there any additional ways of refactoring the code? Will refactoring using any of these commands avoid the deadlock issue? Which one would be the best? I'm assuming EXCEPT.
Well under normal circumstances I could execute scenario pretty well. Given below is the test script I created. Are you trying something else?
drop table #destination_table
drop table #tmp
Declare #x int=0
create table #tmp(fk1 int, fk2 int, val int)
set #x=2
while (#x<1000)
begin
insert into #tmp
select #x,#x,100
set #x=#x+3
end
create table #destination_table(fk1 int, fk2 int, val int)
while (#x<1000)
begin
insert into #destination_table
select #x,#x,100
set #x=#x+1
end
INSERT INTO #destination_table (fk1, fk2, val)
select t.*
FROM #tmp t
LEFT JOIN #destination_table dt ON dt.fk1 = t.fk1
AND dt.fk2 = t.fk2
WHERE dt.fk1 IS NULL

Check if a temporary table exists and delete if it exists before creating a temporary table

I am using the following code to check if the temporary table exists and drop the table if it exists before creating again. It works fine as long as I don't change the columns. If I add a column later, it will give an error saying "invalid column". Please let me know what I am doing wrong.
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
)
select company, stepid, fieldid from #Results
--Works fine to this point
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
NewColumn NVARCHAR(50)
)
select company, stepid, fieldid, NewColumn from #Results
--Does not work
I cannot reproduce the error.
Perhaps I'm not understanding the problem.
The following works fine for me in SQL Server 2005, with the extra "foo" column appearing in the second select result:
IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results
GO
CREATE TABLE #Results ( Company CHAR(3), StepId TINYINT, FieldId TINYINT )
GO
select company, stepid, fieldid from #Results
GO
ALTER TABLE #Results ADD foo VARCHAR(50) NULL
GO
select company, stepid, fieldid, foo from #Results
GO
IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results
GO
The statement should be of the order
Alter statement for the table
GO
Select statement.
Without 'GO' in between, the whole thing will be considered as one single script and when the select statement looks for the column,it won't be found.
With 'GO' , it will consider the part of the script up to 'GO' as one single batch and will execute before getting into the query after 'GO'.
Instead of dropping and re-creating the temp table you can truncate and reuse it
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
Truncate TABLE #Results
else
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
)
If you are using Sql Server 2016 or Azure Sql Database then use the below syntax to drop the temp table and recreate it. More info here MSDN
Syntax
DROP TABLE [ IF EXISTS ] [ database_name . [ schema_name ] . |
schema_name . ] table_name [ ,...n ]
Query:
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
)
I think the problem is you need to add GO statement in between to separate the execution into batches. As the second drop script i.e. IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results did not drop the temp table being part of single batch. Can you please try the below script.
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
)
GO
select company, stepid, fieldid from #Results
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results
CREATE TABLE #Results
(
Company CHAR(3),
StepId TINYINT,
FieldId TINYINT,
NewColumn NVARCHAR(50)
)
GO
select company, stepid, fieldid, NewColumn from #Results
This could be accomplished with a single line of code:
IF OBJECT_ID('tempdb..#tempTableName') IS NOT NULL DROP TABLE #tempTableName;
This worked for me:
social.msdn.microsoft.com/Forums/en/transactsql/thread/02c6da90-954d-487d-a823-e24b891ec1b0?prof=required
if exists (
select * from tempdb.dbo.sysobjects o
where o.xtype in ('U')
and o.id = object_id(N'tempdb..#tempTable')
)
DROP TABLE #tempTable;
Now you can use the below syntax if you are using one of the new versions of SQL Server (2016+).
DROP TABLE IF EXISTS schema.yourtable(even temporary tables #...)
Just a little comment from my side since the OBJECT_ID doesn't work for me. It always returns that
`#tempTable doesn't exist
..even though it does exist. I just found it's stored with different name (postfixed by _ underscores) like so :
#tempTable________
This works well for me:
IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like '#tempTable%') BEGIN
DROP TABLE #tempTable;
END;
This worked for me,
IF OBJECT_ID('tempdb.dbo.#tempTable') IS NOT NULL
DROP TABLE #tempTable;
Here tempdb.dbo(dbo is nothing but your schema) is having more importance.
pmac72 is using GO to break down the query into batches and using an ALTER.
You appear to be running the same batch but running it twice after changing it: DROP... CREATE... edit... DROP... CREATE..
Perhaps post your exact code so we can see what is going on.
I usually hit this error when I have already created the temp table; the code that checks the SQL statement for errors sees the "old" temp table in place and returns a miscount on the number of columns in later statements, as if the temp table was never dropped.
After changing the number of columns in a temp table after already creating a version with less columns, drop the table and THEN run your query.
I recently saw a DBA do something similar to this:
begin try
drop table #temp
end try
begin catch
print 'table does not exist'
end catch
create table #temp(a int, b int)
Note: This also works for ## temp tables.
i.e.
IF OBJECT_ID('tempdb.dbo.##AuditLogTempTable1', 'U') IS NOT NULL
DROP TABLE ##AuditLogTempTable1
Note: This type of command only suitable post SQL Server 2016.
Ask yourself .. Do I have any customers that are still on SQL Server 2012 ?
DROP TABLE IF EXISTS ##AuditLogTempTable1
My code uses a Source table that changes, and a Destination table that must match those changes.
--
-- Sample SQL to update only rows in a "Destination" Table
-- based on only rows that have changed in a "Source" table
--
--
-- Drop and Create a Temp Table to use as the "Source" Table
--
IF OBJECT_ID('tempdb..#tSource') IS NOT NULL drop table #tSource
create table #tSource (Col1 int, Col2 int, Col3 int, Col4 int)
--
-- Insert some values into the source
--
Insert #tSource (Col1, Col2, Col3, Col4) Values(1,1,1,1)
Insert #tSource (Col1, Col2, Col3, Col4) Values(2,1,1,2)
Insert #tSource (Col1, Col2, Col3, Col4) Values(3,1,1,3)
Insert #tSource (Col1, Col2, Col3, Col4) Values(4,1,1,4)
Insert #tSource (Col1, Col2, Col3, Col4) Values(5,1,1,5)
Insert #tSource (Col1, Col2, Col3, Col4) Values(6,1,1,6)
--
-- Drop and Create a Temp Table to use as the "Destination" Table
--
IF OBJECT_ID('tempdb..#tDest') IS NOT NULL drop Table #tDest
create table #tDest (Col1 int, Col2 int, Col3 int, Col4 int)
--
-- Add all Rows from the Source to the Destination
--
Insert #tDest
Select Col1, Col2, Col3, Col4 from #tSource
--
-- Look at both tables to see that they are the same
--
select *
from #tSource
Select *
from #tDest
--
-- Make some changes to the Source
--
update #tSource
Set Col3=19
Where Col1=1
update #tSource
Set Col3=29
Where Col1=2
update #tSource
Set Col2=38
Where Col1=3
update #tSource
Set Col2=48
Where Col1=4
--
-- Look at the Differences
-- Note: Only 4 rows are different. 2 Rows have remained the same.
--
Select Col1, Col2, Col3, Col4
from #tSource
except
Select Col1, Col2, Col3, Col4
from #tDest
--
-- Update only the rows that have changed
-- Note: I am using Col1 like an ID column
--
Update #tDest
Set Col2=S.Col2,
Col3=S.Col3,
Col4=S.Col4
From ( Select Col1, Col2, Col3, Col4
from #tSource
except
Select Col1, Col2, Col3, Col4
from #tDest
) S
Where #tDest.Col1=S.Col1
--
-- Look at the tables again to see that
-- the destination table has changed to match
-- the source table.
select *
from #tSource
Select *
from #tDest
--
-- Clean Up
--
drop table #tSource
drop table #tDest
Yes, "invalid column" this error raised from the line "select company, stepid, fieldid, NewColumn from #Results".
There are two phases of runing t-sql,
first, parsing, in this phase the sql server check the correction of you submited sql string, including column of table, and optimized your query for fastest retreival.
second, running, retreiving the datas.
If table #Results exists then parsing process will check the columns you specified are valid or not, else (table doesn't exist) parsing will be by passsed the checking columns as you specified.
When you change a column in a temp table, you must drop the table before running the query again. (Yes, it is annoying. Just what you have to do.)
I have always assumed this is because the "invalid column" check is done by parser before the query is run, so it is based on the columns in the table before it is dropped..... and that is what pnbs also said.

Resources