How to do regular updates to a database table in SSIS. The table has foreign key constraints.
I have a package running every week, and I have to update the data in the table from a flat file. Most of the contents are the same with update values and other new rows.
UPDATE : My data file contains updated contents ( some rows missing, some rows added, some modified ). The data file does not have the Primary keys ( I create the primary keys when I first bulk insert the data from the data file ), on subsequent SSIS package runs, I need to update the table with new data file contents.
e.g.
table
---------------------------------------------
1 Mango $0.99
2 Apple $0.59
3 Orange $0.33
data file
---------------------------------------------
Mango 0.79
Kiwi 0.45
Banana 0.54
How would I update the table with data from the file. The table has foreign key constraints with other tables.
another approach, to load massive group data instead of dealing row by row:
On database
create an staging table (e.g. StagingTable [name], [price])
Create a procedure (you may need to change the objects names, and add
transaction control and error handling etc just a draft):
create procedure spLoadData
as
begin
update DestinationTable
set DestinationTable.Price = StagingTable.Price
from DestinationTable
join StagingTable
on DestinationTable.Name = StagingTable.Name
insert into DestinationTable
(Name, Price)
select Name, Price
from StagingTable
where not exists (select 1
from DestinationTable
where DestinationTable.name = StagingTable.Name)
end
On SSIS
Execute SQL Task with (truncate [staging_table_name])
Data Flow task transferring from your Flat File to the Staging Table
Execute SQL Task calling the procedure you created (spLoadData).
Following are the few thoughts/steps:
Create a Flat File Connection manger.
Take Data flow task.
Create Flat File Source with connection manager just created.
Take lookup transformation(s) as many as you need to get FK values based on your source file values.
Take a lookup transformation after all above lookups, to get all values from Destination table.
Keep Conditional split and compare source values and destination values.
If all columns matched then UPDATE, else INSERT.
Map above conditional split results accordingly to OLEDB Destnation/OLEDB Command.
Give a try and let me know the results/comments.
Related
I have two tables in SQL Server:
Person
ID (PK, int, IDENTITY)
Name (varchar(100))
UploadedBy (varchar(50))
DateAdded (datetime)
PersonFile
ID (PK, int, IDENTITY)
PersonId (FK, int)
PersonFile (varchar(max))
I am reading in a large file (150MB), and I have a script component that can successfully parse the file into several columns. The issue is that I need to insert the first 3 columns of my parsed data row into my Person table first, then use the ID of that Row to insert the final column into my PersonFile table. Is there an easy way to do this in SSIS?
I suppose I could technically script everything out to handle inserts in the database, but I feel like in that case, I might as well just skip SSIS altogether and user powershell. I also thought about writing a procedure in SQL server and then passing the information to the procedure to handle inserts. But again, this seems very inefficient.
What's the best way for me to insert a row of data into two tables, if one of them has a foreign key constraint?
I think the best way is to use a stage table in the database to hold the parsed source file and then use stored procedures or SQL-query to load your tables. There is a lookup component in SSIS that can be used for your case but I try avoiding it for various reasons.
Create a table resembeling the source file, something like:
CREATE TABLE dbo.[SourceFileName](
Name nvarchar(100) NULL,
UploadedBy nvarchar(50) NULL,
DateAdded datetime NULL,
PersonFile nvarchar(max) NULL
)
Truncate the stage table. Use a dataflow component to get the source data. Use script or stored procedures to insert the source data in your destination table (begin with Person and the load PersonFile). Your SSIS dataflow should look something like this:
For the insert script for person do something like:
INSERT INTO dbo.Person (Name, UploadedBy,DateAdded)
SELECT Name,UploadedBy,DateAdded
FROM dbo.SourceFileName;
For the insert for PersonFile make a join to the destination table:
INSERT INTO dbo.PersonFile(PersonId,PersonFile)
SELECT
Person.ID,
SourceFile.PersonFile
FROM dbo.SourceFileName SourceFile
JOIN dbo.Person Person
ON Person.Name = SourceFile.Name
You should also add a UNIQUE CONSTRAINT to the column that identifies the person (Name for example).
One very common thing to do would be to stage the data first.
So you insert all columns into a table on the server, which also has an extra nullable column for the PersonID.
Then you’d have a stored procedure which inserts unique Person records into the Person table, and updates the staging table with the resulting PersonID, which is the extra field you need for the PersonFile insert, which could then be performed either in the same procedure or another one. (You’d call these procedures in SSIS with an Execute SQL Task.)
I suppose this could possibly be done purely in SSIS, for example with a Script Destination that performs an insert and retrieves the PersonID for a second insert, but I’m fairly sure performance would take a huge hit with an approach like that.
In the source database we have a table, lets call it TableA. with primary key PK_TableA. This table has a dependent table in source database, lets call it TableB, via a FK - lets call it FK_TableA.
We syncronize TableA from source database to target database, with same table names.
We do NOT syncronize TableB from source database to target database, but it exists in target database with the same name and has the same relation of dependence with TableA.
When a row is deleted from TableA in source database, TableB is updated by modifying all the rows with the deleted FK, setting FK_TableA column to null.
We intend to produce the same behaviour in target database without having to syncronize TableB.
So, on delete of a row from TableA in source database we:
1) want to update, to null, column FK_TableA from TableB in the target database, for the corresponding rows
2) delete the row from TableA in targert database
Is this possible?
What is the best mechanism? Transforms or Table Triggers (maybe with a Sync On Delete Condition)?
Can you please try to explain the way to do it?
Thanks.
Either a load filter or a load transform would work. The load filter is probably simpler for this case. Use sym_load_filter to configure a "before write" BeanShell script that does this:
if (data.getDataEventType().name().equals("DELETE")) {
context.findTransaction().execute("update tableb set fk_tablea = null " +
"where fk_tablea = " + OLD_FK_TABLEA);
}
return true;
The script checks that it's a DELETE statement, then it will run the SQL you need. The values for the table columns on the current row are available as upper case variables. The script returns true so the original delete will also run.
See https://www.symmetricds.org/doc/3.10/html/user-guide.html#_load_filters for more details on how to use load filters.
I have a unique requirement - I have a data list which is in excel format and I import this data into SQL 2008 R2., once every year, using SQL's import functionality. In the table "Patient_Info", i have a primary key set on the column "MemberID" and when i import the data without any duplicates, all is well.
But some times, when i get this data, some of the patient's info gets repeated with updated address / telephone , etc., with the same MemberID and since I set this as primary key, this record gets left out without importing into the database and thus, i dont have an updated record for that patient.
EDIT
I am not sure how to achieve this, to update some of the rows which might have existing memberIDs and any pointer to this is greatly appreciated.
examples below:
List 1:
List 2:
This is not a terribly unique requirement.
One acceptable pattern you can use to resolve this problem would be to import your data into "staging" table. The staging table would have the same structure as the target table to which you're importing, but it would be a heap - it would not have a primary key.
Once the data is imported, you would then use queries to consolidate newer data records with older data records by MemberID.
Once you've consolidated all same MemberID records, there will be no duplicate MemberID values, and you can then insert all the staging table records into the target table.
EDIT
As #Panagiotis Kanavos suggests, you can use a SQL MERGE statement to both insert new records and update existing records from your staging table to the target table.
Assume that the Staging table is named Patient_Info_Stage, the target table is named Patient_Info, and that these tables have similar schemas. Also assume that field MemberId is the primary key of table Patient_Info.
The following MERGE statement will merge the staging table data into the target table:
BEGIN TRAN;
MERGE Patient_Info WITH (SERIALIZABLE) AS Target
USING Patient_Info_Stage AS Source
ON Target.MemberId = Source.MemberId
WHEN MATCHED THEN UPDATE
SET Target.FirstName = Source.FirstName
,Target.LastName = Source.LastName
,Target.Address = Source.Address
,Target.PhoneNumber = Source.PhoneNumber
WHEN NOT MATCHED THEN INSERT (
MemberID
,FirstName
,LastName
,Address
,PhoneNumber
) Values (
Source.MemberId
,Source.FirstName
,Source.LastName
,Source.Address
,Source.PhoneNumber
);
COMMIT TRAN;
*NOTE: The T-SQL MERGE operation is not atomic, and it is possible to get into a race condition with it. To insure it will work properly, do these things:
Ensure that your SQL Server is up-to-date with service packs and patches (current rev of SQL Server 2008 R2 is SP3, version 10.50.6000.34).
Wrap your MERGE in a transaction (BEGIN TRAN;, COMMIT TRAN;)
Use SERIALIZABLE hint to help prevent a potential race condition with the T-SQL MERGE statement.
I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.
Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx
If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH
I want to export data from one table into a new one with a nightly job.
To prevent generate dublicates, I implemented a column named "ExportState" in the source table which is 0 for not exported and 1 for exported.
My problem is, that I want to export the data and then setting the State to 1. But I can not make a INSERT INTO ... SELECT and then UPDATE Statements because it is possible that additional Data would inserted to the source table while the export routine runs. So I would at the end UPDATE the ExportState to 1 on records which I never INSERTed to the destination table.
Do you have suggestions to the following solutions ?
A. INSERT INTO ... SELECT and UPDATE ExportState row by row
B. Take a Snaphot INSERT and UPDATE ExportState of the snapshoted Data
Which makes more sense ?
The second problem: The source and destination tables are on different SQL Servers and database instances. Ideas ?
I would create a stored procedure to perform the task.
Within the stored procedure create a table variable or temp table. Insert the data from the source table where ExportState = 0 into the temp table. (If you have a primary key on this table just store the primary key in your temp table.)
Perform your insert statement from source table to destination table.
Using your temp table, perform your update statement to set ExportState = 1 for each record in your temp table.
Wrap all of this within a transaction.
Sample Code:
BEGIN TRAN
DECLARE #Exported TABLE (PK INTEGER NOT NULL);
INSERT INTO #Exported (PK) SELECT PK FROM SourceTable WHERE ExportState = 0;
INSERT INTO #DestinationTable (Field Names)
SELECT FieldNames
FROM SourceTable s
INNER JOIN #Exported e
ON s.PK = e.PK
WHERE s.ExportStatus = 0;
UPDATE s SET ExportStatus=1
FROM SourceTable s
INNER JOIN #Exported e
on s.PK =e.PK;
COMMIT TRAN
Invoke the stored procedure from your nightly job.
To connect to databases on other SQL Servers, look into using Linked Servers. You should be able to configure one under the "Server Objects" folder in SSMS 2008. Here is a link to more info if you are interested...http://msdn.microsoft.com/en-us/library/ff772782.aspx