I want to copy and merge data from tables with identical structure (in a number of different source databases) to a single table of similar structure in a destination database. From time to time I need to add or remove a source database.
This is currently achieved using a Data Flow Task containing an OLEDB source with a SQL query within which there is a UNION for each of the databases I am extracting from. There is quite a lot of SQL within each UNION so, if I need to add fields, I need to add the same additional SQL to each UNION. Similarly, when I add or remove a source database I need to add or remove a UNION.
I was hoping that, rather than use such a UNION with a lot of duplicated code, I could, instead, use a Foreach Loop Container that executes SQL contained in a variable using parameters to substitute the name of the database and other database dependent items within the SQL on each iteration but I hit problems with that as I assume the Data Flow Task within the loop could not interpret the incoming fields because of the use what is effectively dynamic SQL.
Any suggestions as to how I might best achieve this without duplicating a lot of SQL?
It sounds like you have your loop figured out for moving from database to database. As long as the table schemas are identical (other than names as noted) from database to database, this should work for you.
Inside the For Each Loop container, create either a Script Task or an Execute SQL Task, whichever you're more comfortable working with.
Use that task to dynamically generate the SQL of your OLE DB Source query, changing the Customer Code prefix for each iteration. Assign the SQL text to a variable, either directly in a Script Task, or by assigning the Result Set of your Execute SQL Task (the result set being the query text) to a variable.
Inside your Data Flow Task, in the OLE DB Source, under Data Access Mode select "SQL Command from variable". Select the variable that you populated with your query in the last task.
You'll also need to handle changing the connection string between iterations, but, again, it sounds like you have a handle on that part already.
Related
I am trying to create some sort of automation whereby I can generate a series of pipe-delimited text extracts for about 100 different tables each month. Each extract would be based on a simple query like this:
SELECT *
FROM tablename
WHERE AsOfDate = 'currentmonth'
where both tablename and currentmonth would be variables. The tablename variable name would change for each of the tables but currentmonth would remain the same throughout the execution.
I have been attempting to build an SSIS package that uses a ForEach Loop container that runs through a list of all the table names and passes that variable into a SQL string, which is then used by the OLE DB Data source in the data flow.
However, all of these tables have different columns. Based on what I can tell, it would not be feasible to do a simple OLE DB Source to a Flat File Destination within that loop container since the Flat File Connection Manager must be configured to account for the different columns of each table.
Would there be any feasible way to do this outside of configuring the process manually for each of the 100+ tables?
You could look into BiML which programmatically creates your dataflows based on metadata.
Or you could use a Script task that loops through the tables, loops through their columns, and generates text files instead of using any dataflow at all.
Having seen other questions with answers that don't totally address what I am after, I am wondering how in SSIS to use an OLE DB Command transformation to do an Insert and immediately get the resulting primary key for each row inserted as a new column, all within the same Data Flow Task. That sounds like it should be a common, built-in, fairly simple thing to ask for in SSIS, right?
So the obvious first choice for me would be to use an OLE DB Command where I do a SELECT and include an OUTPUT clause in my command:
INSERT INTO dbo.MyReleaseTable(releaseDate)
OUTPUT ?=Inserted.id
VALUES (?)
Only I can't figure out how to do this in an OLE DB Command (with an output) and it not complain. I've read about using stored procedures to do this, so am I required to use a stored procedure if I want to do this?
Let's say this won't work. I could use a Script Transformation and execute direct SQL in that, right? Well if that's what I must do, then the line between using custom code and SSIS block-components gets blurred and I am tempted to throw SSIS away and just do the whole ETL in code.
Then I hear talk about using an Execute SQL task. So now I can't even do 1 data flow within 1 data flow task? Am I getting that right? I'd like to keep 1 single data flow contained within 1 data flow task and not have to break my 1 flow out between separate tasks.
If it turns out that this seemingly simple data flow objective is not built into SSIS then I will consider dumping SSIS altogether. Talend has a free ETL offering, don't they?
Well, this can be done with SSIS inside DataFlow, but with some tricks. You need to create a stored procedure with input and output parameters and reuse it in DataFlow, as described here, fetching result value.
Drawbacks of this approach:
You need to create a Stored Procedure
Each row is processed with SP, which causes implicit transactions, instead of batch processing. This can slow down your package.
Solution without performance penalty - do it in two DataFlows, first doing value insert into some temp table, and the second DF - doing SQL MERGE command at OLE DB source and handling output data as you wish. All this inside transaction, handled either by MSDTC or by your own.
Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.
I have an SSIS package that I want to use to update a column in a datawarehouse staging table based on the values of a surrogate key mapping table that contains the surrogate key paired with the natural key. Specifically I want to use the cache Lookup to update the fact staging table to contain the surrogate key for the inventory dimention in the same way that the following SQL would.
UPDATE A
SET A.DWHSurrogateKey = B.DWHSurrogateKey
FROM SaleStagingTable A INNER JOIN inventoryStagingTable on B.OLTPInventoryKey = A.OLTPInventoryKey
Unfortunately the nature of the data flow from Lookup transformation to destination means that it creates a whole new row, rather than updating the existing matched row. Is it possible to manipulate SSIS to do this?
Couple of constraints:
My destination is an ADO .NET destination, and we cannot use OLE DB Destinations or sources (we need to be able to use named parameters and you can't do that with OLE DB Connections)
I need to do this for multiple dimensions to link them to the fact table, so I can't just push the mapped data to new tables every time, as that becomes really messy and hard to manage
I'd like to be able to do what these guys have suggested but with ADO connectors rather than OLE DB:
http://redsouljaz.wordpress.com/2009/11/30/ssis-update-data-from-different-table-if-data-is-null/
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html
For such a simple update I would use an Execute SQL Task and save the hassle of having to mess around with a data flows. If you have lots of similar updates but with different fields and tables, I would store the column and table names in a Foreach Loop Container using a Foreach Item Enumerator, I would then add a Script Task that would take the item names and generate some dynamic SQL which could be stored in a variable, Next add the Execute SQL Task and get it to use the SQL variable.
I need to copy data from one database to another using a VB.NET program.
The target database is SQL Server the source database is some proprietary ODBC compliant database.
I need to loop through a list of table to copy. Read the data from the source database table for a given modified date. Delete the corresponding date from the target database table and insert the records from the source table. The databases are of the same structure i.e. table names and field names, but the data types may differ (however they are compliant e.g. double in source, float in target). No primary keys exist.
Heres how I may do it :
Firstly execute a Delete command to the target.
I could then use a DataReader to obtain data from the source, loop through the Items and create an Insert Command for each row. Add Parameters to the Command with the appropriate values and execute. And wrap the whole thing in a Transaction.
I was just wondering if I am missing a trick here. Any Suggestions
I think you should use the right for the job and I'm guessing that that is SSIS in this case, but I could be wrong and perhaps you have already explored that path.
In that case yes a datareader would do depnding how much data you have. A datatable might even be eassier and faster to program (no need to worry about datatypes since the adapter should take care of that.
The trick would be to use set based operations and not the 'row at a time' concept which we programmers were first taught :)
Here's some pseudocode
INSERT INTO DestTable (columns, columns...)
(Select ModifiedRow from SourceTable where date = Modified)
Perhaps your requirements are more complicated and may need the row by row approach, but this is normally not the case.
I'd opt to put this code in a job step and schedule on SQL. It could also be a stored procedure run from .net.
Also, using SSIS for a db to db transfer is most likely overkill unless you are going to be using some of the special transformations in there.
Take a look at the SqlBulkCopy class. If you can get the source into a DataTable or read it with an IDataReader then it's eligible. It will also attempt to convert between compatible types. See Single Bulk Copy Operations for more details.
This would be more desirable than using INSERT statements for each row.
Dim reader As System.IO.DirectoryInfo
reader = My.Computer.FileSystem.GetDirectoryInfo("c:\program Files\Microsoft SQL Server\MSSQL.1\mssql\data")
If (reader.Attributes And System.IO.FileAttributes.ReadOnly) > 0 Then
MsgBox("File is readonly!")
Else
MsgBox("Database is not read-only protected")
End If
Check all the tables first