Best way to move data between tables and generate mapping of old to new identity values - sql-server

I need to merge data from 2 tables into a third (all having the same schema) and generate a mapping of old identity values to new ones. The obvious approach is to loop through the source tables using a cursor, inserting the old and new identity values along the way. Is there a better (possibly set-oriented) way to do this?
UPDATE: One additional bit of info: the destination table already has data.

Create your mapping table with an IDENTITY column for the new ID. Insert from your source tables into this table, creating your mapping.
SET IDENTITY_INSERT ON for your target table.
Insert into the target table from your source tables joined to the mapping table, then SET IDENTITY_INSERT OFF.

I created a mapping table based on the OUTPUT clause of the MERGE statement. No IDENTITY_INSERT required.
In the example below, there is RecordImportQueue and RecordDataImportQueue, and RecordDataImportQueue.RecordID is a FK to RecordImportQueue.RecordID. The data in these staging tables needs to go to Record and RecordData, and FK must be preserved.
RecordImportQueue to Record is done using a MERGE statement, producing a mapping table from its OUTPUT, and RecordDataImportQueue goes to RecordData using an INSERT from a SELECT of the source table joined to the mapping table.
DECLARE #MappingTable table ([NewRecordID] [bigint],[OldRecordID] [bigint])
MERGE [dbo].[Record] AS target
USING (SELECT [InstanceID]
,RecordID AS RecordID_Original
,[Status]
FROM [RecordImportQueue]
) AS source
ON (target.RecordID = NULL) -- can never match as RecordID is IDENTITY NOT NULL.
WHEN NOT MATCHED THEN
INSERT ([InstanceID],[Status])
VALUES (source.[InstanceID],source.[Status])
OUTPUT inserted.RecordID, source.RecordID_Original INTO #MappingTable;
After that, you can insert the records in a referencing table as folows:
INSERT INTO [dbo].[RecordData]
([InstanceID]
,[RecordID]
,[Status])
SELECT [InstanceID]
,mt.NewRecordID -- the new RecordID from the mappingtable
,[Status]
FROM [dbo].[RecordDataImportQueue] AS rdiq
JOIN #MappingTable AS mt
ON rdiq.RecordID = mt.OldRecordID
Although long after the original post, I hope this can help other people, and I'm curious for any feedback.

I think I would temporarily add an extra column to the new table to hold the old ID. Once your inserts are complete, you can extract the mapping into another table and drop the column.

Related

How can split a row of data and insert each row (but different columns) into two tables (with a FK relationship) in SSIS?

I have two tables in SQL Server:
Person
ID (PK, int, IDENTITY)
Name (varchar(100))
UploadedBy (varchar(50))
DateAdded (datetime)
PersonFile
ID (PK, int, IDENTITY)
PersonId (FK, int)
PersonFile (varchar(max))
I am reading in a large file (150MB), and I have a script component that can successfully parse the file into several columns. The issue is that I need to insert the first 3 columns of my parsed data row into my Person table first, then use the ID of that Row to insert the final column into my PersonFile table. Is there an easy way to do this in SSIS?
I suppose I could technically script everything out to handle inserts in the database, but I feel like in that case, I might as well just skip SSIS altogether and user powershell. I also thought about writing a procedure in SQL server and then passing the information to the procedure to handle inserts. But again, this seems very inefficient.
What's the best way for me to insert a row of data into two tables, if one of them has a foreign key constraint?
I think the best way is to use a stage table in the database to hold the parsed source file and then use stored procedures or SQL-query to load your tables. There is a lookup component in SSIS that can be used for your case but I try avoiding it for various reasons.
Create a table resembeling the source file, something like:
CREATE TABLE dbo.[SourceFileName](
Name nvarchar(100) NULL,
UploadedBy nvarchar(50) NULL,
DateAdded datetime NULL,
PersonFile nvarchar(max) NULL
)
Truncate the stage table. Use a dataflow component to get the source data. Use script or stored procedures to insert the source data in your destination table (begin with Person and the load PersonFile). Your SSIS dataflow should look something like this:
For the insert script for person do something like:
INSERT INTO dbo.Person (Name, UploadedBy,DateAdded)
SELECT Name,UploadedBy,DateAdded
FROM dbo.SourceFileName;
For the insert for PersonFile make a join to the destination table:
INSERT INTO dbo.PersonFile(PersonId,PersonFile)
SELECT
Person.ID,
SourceFile.PersonFile
FROM dbo.SourceFileName SourceFile
JOIN dbo.Person Person
ON Person.Name = SourceFile.Name
You should also add a UNIQUE CONSTRAINT to the column that identifies the person (Name for example).
One very common thing to do would be to stage the data first.
So you insert all columns into a table on the server, which also has an extra nullable column for the PersonID.
Then you’d have a stored procedure which inserts unique Person records into the Person table, and updates the staging table with the resulting PersonID, which is the extra field you need for the PersonFile insert, which could then be performed either in the same procedure or another one. (You’d call these procedures in SSIS with an Execute SQL Task.)
I suppose this could possibly be done purely in SSIS, for example with a Script Destination that performs an insert and retrieves the PersonID for a second insert, but I’m fairly sure performance would take a huge hit with an approach like that.

SQL Insert does column order matter

I have two tables with the same field names and a stored procedure that updates table B with Table A's data by doing a delete from current table and insert into current table from another table that has update values in it:
delete from ac.Table1
insert into ac.Table1
select *
from dbo.OriginalTable
where dtcreate <getdate()-1
I had to recreate Table1 through GIS software which adds GlobalIDs and an Object ID field. The original order had Object ID at the end and the new table has it at the front. Will this impact executing the SQL statement above?
Yes it will. The order of the columns should match for each value to go in desired column
You can try
Insert into ac.Table1 (column1....columnN)

SQL Server Insert If Not Exists - No Primary Key

I have Table A and Table B.
Table A contains data from another source.
Table B contains data that is inserted from Table A along with data from other tables. I have done the initial insert of data from A to B but now what I am trying to do is insert the records that do not exist already in Table B from Table A on a daily basis. Unfortunately, there is no primary key or unique identifier in Table A which is making this difficult.
Table A contains a field called file_name which has values that looks like this:
this_is_a_file_name_01011980.txt
There can be duplicate values in this column (multiple files from the same date).
In Table B I created a column data_date which extracts the date from the table a.file_name field. There is also a load_date field which just uses GETDATE() at the time the data is inserted.
I am thinking I can somehow compare the dates in these tables to decide what needs to be inserted. For example:
If the file date from Table A (would need to extract again) is greater than the load_date of Table B, then insert these records into Table B.
Let me know if any clarification is needed.
You could use exists or except. With the explanation here it seems like except would make short work of this. Something like this.
insert tableB
select * from tableA
except
select * from tableB

update data when importing a duplicate record in SQL

I have a unique requirement - I have a data list which is in excel format and I import this data into SQL 2008 R2., once every year, using SQL's import functionality. In the table "Patient_Info", i have a primary key set on the column "MemberID" and when i import the data without any duplicates, all is well.
But some times, when i get this data, some of the patient's info gets repeated with updated address / telephone , etc., with the same MemberID and since I set this as primary key, this record gets left out without importing into the database and thus, i dont have an updated record for that patient.
EDIT
I am not sure how to achieve this, to update some of the rows which might have existing memberIDs and any pointer to this is greatly appreciated.
examples below:
List 1:
List 2:
This is not a terribly unique requirement.
One acceptable pattern you can use to resolve this problem would be to import your data into "staging" table. The staging table would have the same structure as the target table to which you're importing, but it would be a heap - it would not have a primary key.
Once the data is imported, you would then use queries to consolidate newer data records with older data records by MemberID.
Once you've consolidated all same MemberID records, there will be no duplicate MemberID values, and you can then insert all the staging table records into the target table.
EDIT
As #Panagiotis Kanavos suggests, you can use a SQL MERGE statement to both insert new records and update existing records from your staging table to the target table.
Assume that the Staging table is named Patient_Info_Stage, the target table is named Patient_Info, and that these tables have similar schemas. Also assume that field MemberId is the primary key of table Patient_Info.
The following MERGE statement will merge the staging table data into the target table:
BEGIN TRAN;
MERGE Patient_Info WITH (SERIALIZABLE) AS Target
USING Patient_Info_Stage AS Source
ON Target.MemberId = Source.MemberId
WHEN MATCHED THEN UPDATE
SET Target.FirstName = Source.FirstName
,Target.LastName = Source.LastName
,Target.Address = Source.Address
,Target.PhoneNumber = Source.PhoneNumber
WHEN NOT MATCHED THEN INSERT (
MemberID
,FirstName
,LastName
,Address
,PhoneNumber
) Values (
Source.MemberId
,Source.FirstName
,Source.LastName
,Source.Address
,Source.PhoneNumber
);
COMMIT TRAN;
*NOTE: The T-SQL MERGE operation is not atomic, and it is possible to get into a race condition with it. To insure it will work properly, do these things:
Ensure that your SQL Server is up-to-date with service packs and patches (current rev of SQL Server 2008 R2 is SP3, version 10.50.6000.34).
Wrap your MERGE in a transaction (BEGIN TRAN;, COMMIT TRAN;)
Use SERIALIZABLE hint to help prevent a potential race condition with the T-SQL MERGE statement.

Microsoft SQL server: have one auto-incrementing column update another table

I have a table of orders with orderID. I want when I create a new row in orders, and automatically have it add the same orderID to a new row in orderDetails. I got the auto incrementing to work, however whenever I try to link the two, adding cascade delete, it gives me an error.
'order' table saved successfully
'orderDetail' table
- Unable to create relationship 'FK_orderDetail_order'.
Cascading foreign key 'FK_orderDetail_order' cannot be created where the referencing column 'orderDetail.orderID' is an identity column.
Could not create constraint. See previous errors.
Which seems to be because of the fact there is no orderID at row creation. Without these two linked it's pretty hard to link an order to its information.
I am using Microsoft SQL server mgt studio. I learned via command-line MySQL, not SQL, so this whole GUI stuff is throwing me off (and I'm a tad rusty).
Your problem is that 'orderDetail.orderID' should not be an identity column (auto-incrementing). It should be based on the orderId in the Order table. You can do that in a variety of ways. If you are using stored procedures, and making separate calls to the database for the orderDetail records, have the code save the order row first, and return the newly created OrderId value, then use that value on the calls to save orderdetails. If you are making one call to a stored proc that saves the order header record and all order detail records in one call, then in the stored procd, insert the ordfer record forst, use Scope_identity() to extract the newly created orderId into a T-SQL variable,
Declare #orderId Integer
Insert Orders([Order table columns])
Values([Order table column values])
Set #orderId = scope_Identity()
and then use the value in #orderId for all inserts into the OrderDetails table...
Insert OrderDetails(OrderId, [Other OrderDetail table columns])
Values(#orderId , [Other OrderDetail table column values])
You want a AFTER INSERT trigger on the order table - in this, the newly given ID is available as NEW.orderID and can now easily be inserted into orderDetails.
Just do this via the command line. I certainly do.

Resources