SSIS Join Recordset With Table - sql-server

I have an SSIS package in which I'm reading the records from a Flat File and storing them in a recordset. Is it possible to compare the values in the recordset with the values in a database table and update the table?
I'm Using SQL Server 2008 R2 and Same version of SSIS.

Leran2002's answer in general is right, the most straight forward way is to have a lookup component set up to Redirect rows to no match output and use a destination and a OLE DB Command afterwards.
However depending on the size of the result sets, this might be slow, since the lookup component will check each row one-by-one and if your destination table has lots of records, this will take some time. Furthermore, depending on your cache settings in the lookup component, it can use lots of memory.
There are two more ways to achieve this:
Merge Join
Using your file source and your destination table as a source, you can use a Merge Join. The logic in the DFT is a bit more complex, but this more a set-based approach and with large result sets it is working better.
You'll have to implement the logic which record has to be updated, inserted, deleted or discarded from the file using a conditional split component.
I highly recommend this question (not exactly your problem, but a good comparison in my opinion): What are the differences between Merge Join and Lookup transformations in SSIS?
Staging table
Another way is to use a staging table to temporarily store the records from a file. In this case, your DFT just loads the records from a file into the staging table, then with one or more Execute SQL Task you can do the merging of the two data sets. (UPDATE, INSERT, DELETE, MERGE, you can use what fits your needs).

Usualy I use Lookup-component with option Redirect rows to no match output.
And after that you can use two rowsets which named Lookup No Match Output and Lookup Match Output.
PS. I have three articles about SSIS, but they in Russian (but there is a lot of SQL-scripts and pictures).
If it's interesting you, you can look the following link - https://habrahabr.ru/post/330618/

Related

How to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS?

Do you know how to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS? There is no problem transfering new data only between two tables in the same database and server, but is this possible to do such operation between completely different servers and databases?
Ps. I know about solution using Lookup but it is not very efficient if anybody needs to check and add a lot of records (50k and more) several times per day. I would like to operate with new data only.
You have several options:
Timestamp based solution
If you have a column which stores the insertation time in the source system, you can select only the new records created since the last load. With the same logic, you can transfer modified records too, just mark the records with the timestamp value when it change.
Sequence based solution
If there is a sequence in the source table, you can load the new records based on that sequence. Query the last value from the destination system, then load avarything which is larger than that value.
CDC based solution
If you have CDC (Change Data Capture) in your source system, you can track the changes and you can load them based on the CDC entries.
Full load
This is the most resource hungry solution: you have to copy all data from the source to the destination. If you do not have any column which marks the new records, you should use this solution.
You have several options to achieve this:
TRUNCATE the destination table and reload it from source
Use a Lookup component to determine which records are missing
Load all data from source to a temporary table and write a query which retrieves the new/changed records.
Summary
If you have at least one column, which marks the new/modified records, you can use it to implement a differential/incremental load with SSIS. If you do not have any clue, which columns/rows are changed, you have to load (or at least query) all of them.
There is no solution which enables a one-query (INSERT .. SELECT) solution using multiple servers without transferring all data. (Please note, that a multi-server query using Linked Servers are transfers the data from the source system).
What about variables? Is it possible to use the same variable between different databases and servers in SSIS?
I would like to transfer last id number from a destination table and transfer it to the source table (different server!).
I can set a variable in a database scope like this:
DECLARE #Last int
SET #Last = (SELECT TOP 1 Id FROM dbo.Table_1 ORDER BY Id DESC)
SELECT *
FROM dbo.Table_2
WHERE ID > #Last;
However it works between two tables in the same database (as a SQL command) only. I can create a variable for a entire SSIS package in Variables --> Add variable, but I don't know it is possible to use the variable in a similar way as above - to keep an information about last id in a destination table and pass it to another table on a source server as data limit.

Use SSIS Lookup Transformation to update ADO .NET Destination table

I have an SSIS package that I want to use to update a column in a datawarehouse staging table based on the values of a surrogate key mapping table that contains the surrogate key paired with the natural key. Specifically I want to use the cache Lookup to update the fact staging table to contain the surrogate key for the inventory dimention in the same way that the following SQL would.
UPDATE A
SET A.DWHSurrogateKey = B.DWHSurrogateKey
FROM SaleStagingTable A INNER JOIN inventoryStagingTable on B.OLTPInventoryKey = A.OLTPInventoryKey
Unfortunately the nature of the data flow from Lookup transformation to destination means that it creates a whole new row, rather than updating the existing matched row. Is it possible to manipulate SSIS to do this?
Couple of constraints:
My destination is an ADO .NET destination, and we cannot use OLE DB Destinations or sources (we need to be able to use named parameters and you can't do that with OLE DB Connections)
I need to do this for multiple dimensions to link them to the fact table, so I can't just push the mapped data to new tables every time, as that becomes really messy and hard to manage
I'd like to be able to do what these guys have suggested but with ADO connectors rather than OLE DB:
http://redsouljaz.wordpress.com/2009/11/30/ssis-update-data-from-different-table-if-data-is-null/
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html
For such a simple update I would use an Execute SQL Task and save the hassle of having to mess around with a data flows. If you have lots of similar updates but with different fields and tables, I would store the column and table names in a Foreach Loop Container using a Foreach Item Enumerator, I would then add a Script Task that would take the item names and generate some dynamic SQL which could be stored in a variable, Next add the Execute SQL Task and get it to use the SQL variable.

Export large amounts of binary data from one SQL database and import it into another database of the same schema

I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)

ADO - Can I edit results of a complex query with multiple join statements?

I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.

SSIS, splitting a single row into multiple rows

My problem is as follows. I have a CSV file (~100k rows) containting history information with the column format of:
ID1,History1,ID2,History2...ID110,History110
Each row may have anywhere between 0 and 110 history entries. Each separate entry requires a stored procedure to be called.
If there were a small number of possible entries per row, I imagine the way to do this would be to transform the data using a script, and send it to a unique path. Creating 110 paths would probably work, but isn't very elegant (and quite time consuming).
What would the best way to approach this be?
Just load the data (raw csv unchanged, one row per file line) into a staging table. Then, call a stored procedure that will use a string splitter to break up and loop over the staging table rows and call your other procedure for each history entry.
see: Arrays and Lists in SQL Server 2005 and Beyond
also see this previous answer: SQL comma delimted column => to rows then sum totals?
If you want to solve this in SSIS without the staging tables, you could create a destination script component. You could use switch statement or hashtable to lookup the right sproc to execute for the data row.
It is unclear whether this is a better solution then the staging table approach above; but it is an alternative.
I know you already accepted an answer, but couldn't you use an Unpivot task to achieve what you wanted to do here?

Resources