SSIS I need to use primary key values in additional tables - sql-server

I am new to SSIS and I hope someone can point me in the right direction!
I need to move data from one database to another. I have written a query that takes data from a number of tables (SOURCE). I then use a conditional split (Condition: Id = id) to a number of tables in the destination database. Here is my problem, I need another table populating which takes the ‘id’ value from the three tables and uses them in a fourth table as attributes, along with additional data from SOURCE.
I think I need to pass the id values to parameters but there does not seem a way to do this when inserting to ADO NET Destination.
Fourth table will have inserted id values(auto incremented) from table1, table2 and table3.
Am I going about this correctly or is there a better way?
Thanks in advance!

I know of no way to get the IDENTITY values of rows inserted in a Dataflow destination for use in the same Dataflow.
Probably the way to do what you want to do is to make a fourth branch in your dataflow inserting the columns that you have into the fourth table, and leaving the foreign keys (the ids from the other 3 tables) blank.
Then after the Dataflow, use an ExecuteSQL task to call a stored procedure that populates the missing columns in the fourth table by looking up their ids in the other three tables.
If your fourth table doesn't have the values you need to lookup the ids in the other three tables, then you can have the dataflow go to a staging table that does have those values, and populate the fourth table from the staging table while looking up the ids from the corresponding values.

Related

SQL Server: adding rows/tables with the same columns

I have two tables in SQL Server and both of those tables have the same headers, which means its the same columns, but since I added them from Excel, it means that I was not able to import them as one table, since it is more then 1 million rows.
So now I have one table with a bit less than a million rows and one with like 400000 rows and actually it should be one table, but Excel only allows around one million.
I have them both imported into SQL Server and actually I really want them to be both in one table like union.
The question is how to do it.
I just want to put one of them below the other since it is exactly the same column header.
What you should have done was import the first sheet, and create the table at the same time, and then import the second sheet into the existing table, in a separate import process. Or, if you were using SSIS, you could have used a Union Data Transformation to "combine" the 2 datasets into one, and then insert all the data into a single table.
You can, however, easily get the data into one table. Assuming you want to retain Table1 and that Table1 and Table2 do indeed have the same definitions (and don't have IDENTITY columns) you can just do the following:
INSERT INTO dbo.Table1
SELECT *
FROM dbo.Table2;
DROP TABLE dbo.Table2;
Now all your data is in one table, Table1.

SSIS flat file with joins

I have a flat file which has following columns
Device Name
Device Type
Device Location
Device Zone
Which I need to insert into SQL Server table called Devices.
Devices table has following structure
DeviceName
DeviceTypeId (foreign key from DeviceType table)
DeviceLocationId (foreign key from DeviceLocation table)
DeviceZoneId (foreign key from DeviceZone table)
DeviceType, DeviceLocation and DeviceZone tables are already prepopulated.
Now I need to write ETL which reads flat file and for each row get DeviceTypeId, DeviceLocationId and DeviceZoneId from corresponding tables and insert into Devices table.
I am sure this is not new but its being a while I worked on such SSIS packages and help would be appreciated.
Load the flat content into a staging table and write a stored procedure to handle the inserts and updates in T-SQL.
Having FK relationships between the destination tables, can probably make a lot of trouble with a single data flow and a multicast.
The problem is that you have no control over the order of the inserts so the child record could be inserted before the parent.
Also, for identity columns on the tables, you cannot retrieve the identity value from one stream and use it in another without using subsequent merge joins.
The simplest way to do that, is by using Lookup Transformation to get the ID for each value. You must be aware that duplicates may lead to a problem, you have to make sure that the value is not found multiple times in the foreign tables.
Also, make sure to redirect rows that have no match into a staging table to check them later.
You can refer to the following article for a step by step guide to Lookup Transformation:
An Overview of the LOOKUP TRANSFORMATION in SSIS

SSIS only extract Delta changes

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.
Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.
What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.
I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.
Thanks for any help.
The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.
For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.
Add the column HashValue to your table as a varbinary data type. NOT NULL as well.
Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.
Create a RowsToSend table that's structurally identical to your main table, including the HashValue.
Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)
Send out your full data set.
Now move all of the key values and HashValues to the SentRows table. Truncate your main table.
On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.
Primary key match + hash match = Unchanged row
Primary key match + hash mismatch = Updated row
Primary key in incoming data but missing from existing data set = New row
Primary key not in incoming data but in existing data set = Deleted row
Pull out any changes you need to send to the RowsToSend table.
Send the changes from RowsToSend.
Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.
Truncate the SentRows table to get ready for tomorrow.
If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.
And away you go. Nothing but deltas from now on.
Edit 2019-10-31:
Step by step (or TL;DR):
1) Flush and Fill MainTable.
2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.
3) Move new/changed rows to RowsToSend.
4) Send the rows that are in RowsToSend.
5) Move all the rows from RowsToSend to SentRows.
6) Truncate RowsToSend.

DB2: Update query with several tables involved

I have two tables, A and B. Both are related by a field. What I want to do is update a field of a subset of data from table B. This subset is filtered by a table A data. The data to be set is also taken from table B but from another different subset of data also filtered by another data from table A. Can this be done in DB2 using a direct query?
Thank you

Extract and load only new records

I have simple SSIS data flow which extract records from table A and load them to the table B. Table A and Table B has unique key.
What is the best way to extract and load only new records?
A)if records are on sequential unique key, check the max record in table B and then select record from table A greater than this value.
B)After each row of iteration, save the unique key in some table/xml file. Use this value at the start of task to select records from table A.
I have found the LOOKUP-task solution, but I am still searching another good solutions.

Resources