SSIS Insert Row and Get Resulting ID Using OLE DB Command - sql-server

Having seen other questions with answers that don't totally address what I am after, I am wondering how in SSIS to use an OLE DB Command transformation to do an Insert and immediately get the resulting primary key for each row inserted as a new column, all within the same Data Flow Task. That sounds like it should be a common, built-in, fairly simple thing to ask for in SSIS, right?
So the obvious first choice for me would be to use an OLE DB Command where I do a SELECT and include an OUTPUT clause in my command:
INSERT INTO dbo.MyReleaseTable(releaseDate)
OUTPUT ?=Inserted.id
VALUES (?)
Only I can't figure out how to do this in an OLE DB Command (with an output) and it not complain. I've read about using stored procedures to do this, so am I required to use a stored procedure if I want to do this?
Let's say this won't work. I could use a Script Transformation and execute direct SQL in that, right? Well if that's what I must do, then the line between using custom code and SSIS block-components gets blurred and I am tempted to throw SSIS away and just do the whole ETL in code.
Then I hear talk about using an Execute SQL task. So now I can't even do 1 data flow within 1 data flow task? Am I getting that right? I'd like to keep 1 single data flow contained within 1 data flow task and not have to break my 1 flow out between separate tasks.
If it turns out that this seemingly simple data flow objective is not built into SSIS then I will consider dumping SSIS altogether. Talend has a free ETL offering, don't they?

Well, this can be done with SSIS inside DataFlow, but with some tricks. You need to create a stored procedure with input and output parameters and reuse it in DataFlow, as described here, fetching result value.
Drawbacks of this approach:
You need to create a Stored Procedure
Each row is processed with SP, which causes implicit transactions, instead of batch processing. This can slow down your package.
Solution without performance penalty - do it in two DataFlows, first doing value insert into some temp table, and the second DF - doing SQL MERGE command at OLE DB source and handling output data as you wish. All this inside transaction, handled either by MSDTC or by your own.

Related

SSIS - importing identical data from multiple databases

I want to copy and merge data from tables with identical structure (in a number of different source databases) to a single table of similar structure in a destination database. From time to time I need to add or remove a source database.
This is currently achieved using a Data Flow Task containing an OLEDB source with a SQL query within which there is a UNION for each of the databases I am extracting from. There is quite a lot of SQL within each UNION so, if I need to add fields, I need to add the same additional SQL to each UNION. Similarly, when I add or remove a source database I need to add or remove a UNION.
I was hoping that, rather than use such a UNION with a lot of duplicated code, I could, instead, use a Foreach Loop Container that executes SQL contained in a variable using parameters to substitute the name of the database and other database dependent items within the SQL on each iteration but I hit problems with that as I assume the Data Flow Task within the loop could not interpret the incoming fields because of the use what is effectively dynamic SQL.
Any suggestions as to how I might best achieve this without duplicating a lot of SQL?
It sounds like you have your loop figured out for moving from database to database. As long as the table schemas are identical (other than names as noted) from database to database, this should work for you.
Inside the For Each Loop container, create either a Script Task or an Execute SQL Task, whichever you're more comfortable working with.
Use that task to dynamically generate the SQL of your OLE DB Source query, changing the Customer Code prefix for each iteration. Assign the SQL text to a variable, either directly in a Script Task, or by assigning the Result Set of your Execute SQL Task (the result set being the query text) to a variable.
Inside your Data Flow Task, in the OLE DB Source, under Data Access Mode select "SQL Command from variable". Select the variable that you populated with your query in the last task.
You'll also need to handle changing the connection string between iterations, but, again, it sounds like you have a handle on that part already.

How do I execute a SQL Server stored procedure from Informatica Developer (10.1, not Power Center)

I am trying to execute (call) a SQL Server stored procedure from Infa Developer, I created a mapping (new mapping from SQL Query). I am trying to pass it runtime variables from the previous mapping task in order to log these to a SQL Server table (the stored procedure does an INSERT). It generated the following T-SQL query:
?RETURN_VALUE? = call usp_TempTestInsertINFARunTimeParams (?Workflow_Name?, ?Instance_Id?, ?StartTime?, ?EndTime?, ?SourceRows?, ?TargetRows?)
However, it does not validate, the validation log states 'the mapping must have a source' and '... must have a target'. I have a feeling I'm doing this completely wrong. And: this is not Power Center (no sessions, as far as I can tell).
Any help is appreciated! Thanks
Now with the comments I can confirm and answer your question:
Yes, Soure and Target transformations in Informatica are mandatory elements of the mapping. It will not be a valid mapping without them. Let me try to explain a bit more.
The whole concept of ETL tool is to Extract data from the Source, do all the needed Transformations outside the database and Load the data to required Target. It is possible - and quite often necessary - to invoke Stored Procedures before or after the data load. Sometimes even use the exisitng Stored Procedures as part of the dataload. However, from ETL perspective, this is the additional feature. ETL tool - here Informatica being a perfect example - is not meant to be a tool for invoking SPs. This reminds me a question any T-SQL developer asks with his first PL-SQL query: what in the world is this DUAL? Why do I need 'from dual' if I just want to do some calculation like SELECT 123*456? That is the theory.
Now in real world it happens quite often that you NEED to invoke a stored procedure. And that it is the ONLY thing you need to do. Then you do use the DUAL ;) Which in PowerCenter world means you use DUAL as the Source (or actually any table you know that exists in the source system), you put 1=2 in the Source Filter property (or put the Filter Transforation in the mapping with FALSE as the condition), link just one port with the target. Next, you put the Stored Procedure call as Pre- or Post-SQL property on your source or target - depending on where you actually want to run it.
Odd? Well - the odd part is where you want to use the ETL tool as a trigger, not the ETL tool ;)

SSIS trying to only load file if all rows are good

I am trying to use a SSIS package to insert data from a file into a table but only if all the data in the file is good. I have read around and realise that I can split my good data and bad data with a conditional split.
However I cannot come up with a way to not write the good data if there is some bad data rows.
I can solve my problem use a staging table. I just thought I would ask if I am missing a more elegant way to do this within SSIS package rather than load then transform with TSQL.
Thanks
SSIS way allows wrapping actions in a Transaction. According to your task, you need to count bad rows in the dataflow, and if there is at least one bad row - do nothing i.e. rollback.
Below is how I would do it in Pure SSIS. Create a sequence and specify TransactionOption=Required on it, move your dataflow to the sequence. Add Count Rows transformation to your bad rows dataflow and store its result to some variable. After DataFlow inside sequence - create conditional task link where you check whether bad_rowcount variable > 0, and on the next - do little script task which raise an error to roll back transaction.
Pure SSIS - yes! Simpler than using staging table - not sure.

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

Copy data from one database to another using VB.NET

I need to copy data from one database to another using a VB.NET program.
The target database is SQL Server the source database is some proprietary ODBC compliant database.
I need to loop through a list of table to copy. Read the data from the source database table for a given modified date. Delete the corresponding date from the target database table and insert the records from the source table. The databases are of the same structure i.e. table names and field names, but the data types may differ (however they are compliant e.g. double in source, float in target). No primary keys exist.
Heres how I may do it :
Firstly execute a Delete command to the target.
I could then use a DataReader to obtain data from the source, loop through the Items and create an Insert Command for each row. Add Parameters to the Command with the appropriate values and execute. And wrap the whole thing in a Transaction.
I was just wondering if I am missing a trick here. Any Suggestions
I think you should use the right for the job and I'm guessing that that is SSIS in this case, but I could be wrong and perhaps you have already explored that path.
In that case yes a datareader would do depnding how much data you have. A datatable might even be eassier and faster to program (no need to worry about datatypes since the adapter should take care of that.
The trick would be to use set based operations and not the 'row at a time' concept which we programmers were first taught :)
Here's some pseudocode
INSERT INTO DestTable (columns, columns...)
(Select ModifiedRow from SourceTable where date = Modified)
Perhaps your requirements are more complicated and may need the row by row approach, but this is normally not the case.
I'd opt to put this code in a job step and schedule on SQL. It could also be a stored procedure run from .net.
Also, using SSIS for a db to db transfer is most likely overkill unless you are going to be using some of the special transformations in there.
Take a look at the SqlBulkCopy class. If you can get the source into a DataTable or read it with an IDataReader then it's eligible. It will also attempt to convert between compatible types. See Single Bulk Copy Operations for more details.
This would be more desirable than using INSERT statements for each row.
Dim reader As System.IO.DirectoryInfo
reader = My.Computer.FileSystem.GetDirectoryInfo("c:\program Files\Microsoft SQL Server\MSSQL.1\mssql\data")
If (reader.Attributes And System.IO.FileAttributes.ReadOnly) > 0 Then
MsgBox("File is readonly!")
Else
MsgBox("Database is not read-only protected")
End If
Check all the tables first

Resources