SSIS - Dynamically loop over multiple databases - sql-server

I have to consolidate data from from 1000+ databases having the same structure/tables in one unique DB.
DBs may be added and removed potentially on a daily basis so I need to retrieve the list of DBs dynamically and run the dynamically generated SQL query to extract data on each of them.
I designed the Data Flow with a query from a variable that is working fine if executed with a static value:
With a SQL task I get the list of instances, I loop over the them and with a nested Foreach Loop/SQL task I retrieve the database names and create the dynamic SQL with the following statement (DB name is anonymized):
SELECT 'select ''' + name + ''' as DatabaseName, ID from ' + name + '.[dbo].[Orders] as querytext FROM sys.databases WHERE name LIKE ( 'XXX%_%' );
This part is also working fine:
How can I use the result of the SQL task "Execute SQL Task - Get query text" as query to be executed in the Source "OLE DB Source 1" (part of "Data Flow Task 3")?
I tried mapping an Object variable "User::SqlCommandFromSQLTask" in the result set of the SQL task, then set it up as ADO object source variable and with a Script task convert it to string and pass the value to the variable SqlStringFromSQLTask3 (used as source in "OLE DB Source 1") but I get the error Violation of PRIMARY KEY constraint, like if the data flow is always running with a static value I set up as default:
While, if I remove the value from the variable panel, I get the error "Command text was not set for the command object.", even changing the property DelayValidation of the Data Flow to false.
Any help is much appreciated.

When I have used SSIS to connect to multiple SQL Server boxes, I have stored those SQL Server connection strings in a table in a central database. Then I use a query of that table as the input to the foreach loop data flow task. If we ever have to change a sql server connection string, which does happen, we just update that table with the newest value.

Related

How to execute a SQL Server stored procedure after a data flow task in SSIS

I am new to SSIS. I am trying to create an ETL pipeline to automate the updating and deleting process for a database.
I have created a data flow task which reads the Excel file and sends the data to respective staging tables in SQL Server.
For the data to be updated in the main database, it has to go through some transformation in the staging tables. I have created a stored procedure that will enforce these changes.
I want the stored procedure to get called right after data is loaded through the data flow task to the staging tables rather than me going to SSMS to manually execute the stored procedure.
I have tried adding the "Execute SQL Task" on the control flow tab but not getting any results.
I would like to further add many more transformations in this whole process in future steps. Any ideas on how to make this whole process more convenient would also be appreciated.
[Data Flow Task] -> [Execute SQL Task]
Configure the Execute SQL Task with a Direct Input value of
EXECUTE dbo.MasterQuery;
Based on the image of your stored procedure, it would appear you have a logic error in there.
IF EXISTS(SELECT 1 FROM dbo.OutlookDataStg WHERE [Flag] = 'Outlook')
BEGIN
UPDATE dbo.OutlookDataStg
SET [Data Type] = 'Outlook'
WHERE [Flag] = 'Actual'
-- Cut off at this point
END
The logic provided is
If there is at least one row in the table dbo.OutlookDataStg where the value flag is Outlook, then update the same table but set the Data type to Outlook for any rows with a flag of Actual.
Unless you have some unusual condition, it would see you've mixed your Flag and Data Type values

How to do an inner join rather than for each loop in SSIS?

On the ETL server I have a DW user table.
On the prod OLTP server I have the sales database. I want to pull the sales only for users that are present in the user table on the ETL server.
Presently I am using an execute SQL task to fetch the DW users into a SSIS System.Object variable. Then using a for each loop to loop through each item (userid) in this variable and via a data flow task fetch the OLTP sales table for each user and dump it into the DW staging table. The for each is taking long time to run.
I want to be able to do an inner join so that the response is quicker, but I cant do this since they are on separate servers. Neither can I use a global temp table to make the inner join, for the same reason.
I tried to collect the DW users into a comma separated string variable and then using it (via string_split) to query into OLTP, but this is also taking more time at the pre-execute phase (not sure why exactly) even for small number of users.
I also am aware of lookup transform but that too will result in all oltp rows to be brought into the dw etl server to test the lookup condition.
Is there any alternate approach to be able to do an inner join by taking the list of users into the source?
Note: I do not have write permissions on the OLTP db.
Based on the comments, I think we can use a temporary table to solve this.
Can you help me understand this restriction? "Neither can I use a global temp table to make the inner join, for the same reason."
The restriction is since oltp server and dw server are separate so can't have global temp table common to both servers. Hope makes sense.
The general pattern we're going to do is
Execute SQL Task to create a temporary table on the OLTP server
A Data Flow task to populate the new temporary table. Source = DW. Destination = OLTP. Ensure Delay Validation = True
Modify existing Data Flow. Modify source to be a query that uses the temporary table i.e. SELECT S.* FROM oltp.sales AS S WHERE EXISTS (SELECT * FROM #SalesPerson AS SP WHERE SP.UserId = S.UserId); Ensure Delay Validation = True
A long form answer on using temporary tables (global to set the metadata, regular thereafter)
I don't use temp table in SSIS
Temporary tables, live in tempdb. Your OLTP and DW connection managers likely do not point to tempdb. To be able to reference a temporary table, local or global, in SSIS you need to either define an additional connection manager for the same server that points explicitly at tempdb so you can use the drop down in the source/destination components (technically accurate but dumb). Or, you use an SSIS Variable to hold the name of the table and use the ~From Variable~ named option in source/destination component (best option, maximum flexibility).
Soup to nuts example
I will use WideWorldImporters as my OLTP system and WideWorldImportersDW as my DW system.
One-time task
Open SQL Server Management Studio, SSMS, and connect to your OLTP system. Define a global temporary table with a unique name and the expected structure. Leave your connection open so the table structure remains intact during initial development.
I used the following statement.
DROP TABLE IF EXISTS #SO_70530036;
CREATE TABLE #SO_70530036(EmployeeId int NOT NULL);
Keep track of your query because we'll use it later on but as I advocate in my SSIS answers, perform the smallest task, test that it works and then go on to the next. It's the only way to debug.
Connection Managers
Define two OLE DB Connection Managers. WWI_DW uses points to the named instance DEV2019UTF8 and WWI_OLTP points to DEV2019EXPRESS. Right click on WWI_OLTP and select Properties. Find the property RetainSameConnection and flip that from the default of False to True. This ensures the same connection is used throughout the package. As temporary tables go out of scope when the connection goes away, closing and reopening a connection in a package will result in a fatal error.
These two databases on different instances so we can't cheat and directly comingle data.
Variables
Define 4 variables in SSIS, all of type String.
TempTableName - I used a value of ##SO_70530036 but use whatever value you specified in the One-time task section.
QuerySourceEmployees - This will be the query you run to generate the candidate set of data to go into the temporary table. I used SELECT TOP (3) E.[WWI Employee ID] AS EmployeeId FROM Dimension.Employee AS E WHERE E.[Is SalesPerson] = CAST(1 AS bit);
QueryDefineTables - Remember the drop/create statements from the on-time task? We're going to use the essence of them but use the expression builder to let us dynamically swap the table name. I clicked the ellipses, ..., on the Expression section and used the following "DROP TABLE IF EXISTS " + #[User::TempTableName] + "; CREATE TABLE " + #[User::TempTableName] + "( EmployeeId int NOT NULL);" You should be able to copy the Value from the row and paste it into SSMS to confirm it works.
QuerySales - This is the actual query you're going to use to pull your filtered set of sales data. Again, we'll use the Expression to allow us to dynamically reference the temporary table name. The prettified version of the expression would look something like
"SELECT
SI.InvoiceID
, SI.SalespersonPersonID
, SO.OrderID
, SOL.StockItemID
, SOL.Quantity
, SOL.OrderLineID
FROM
Sales.Invoices AS SI
INNER JOIN
Sales.Orders AS SO
ON SO.OrderID = SI.OrderID
INNER JOIN
Sales.OrderLines AS SOL
ON SO.OrderID = SOL.OrderID
WHERE
EXISTS (SELECT * FROM " + #[User::TempTableName] + " AS TT WHERE TT.EmployeeID = SI.SalespersonPersonID);"
Again, you should be able to pull the Value from the three queries and run them independently and verify they work.
Execute SQL Task
Add an Execute SQL task to the Control Flow. I named mine SQL Create temporary table My Connection Manager is WWI_OLTP and I changed the SQLSourceType to Variable and the SourceVariable is User::QueryDefineTables
Every time your package runs, the first thing it will do is establish create the temporary table. Which is good because SSIS is a metadata driven ETL engine and the next two steps would fail if the table didn't exist.
Data Flow Task - Prime the pump
This data flow is where we'll transfer DW data back to the OLTP system so can filter in the source system.
Drag a Data Flow Task onto the Control Flow. I named mine DFT Load Temp and before you click into it, right click on the Task and find the DelayValidation property and change this from the default of False to True. Normally, a package validates all metadata before actual execution begins as the idea is you want to know everything is good before any data starts moving. Since we're using temporary tables, we need to tell the execution engine "trust us, it'll be ready"
Double click inside the Data Flow Task.
Add an OLE DB Source. I named mine OLESRC SourceEmployees I use the connection manager WWI_DW. My data access mode changes to SQL command from variable and then I select my variable User::QuerySourceEmployees
Add an OLE DB Destination. I named mine OLEDST TempTableName and double clicked to configure it. The Connection Manager is WWI_OLTP and again, since the table lives in tempdb, we can't select it from the drop down. Change the Data access mode to Table name or view name variable - fast load and then select your variable name User::TempTableName. Click the Mapping tab and ensure source columns map to destination columns.
Data Flow Task - Transfer data
Finally, we will pull our source data, nicely filtered against the data from our target system.
Add an OLE DB Source. I named it OLESRC QuerySales. The Connection Manager is WWI_OLTP. Data access mode again changes to SQL command from variable and the variable name is User::QuerySales
From here, do whatever else you need to do to make the magic happen.
Instead of having 270k rows with an unfiltered query
I have 67k as there are only 3 employees in the temporary table.
Reference package
But wait, there's more!
Close out visual studio, open it back up and try to touch something in the data flows. Suddenly, there are red Xs everywhere! Any time you close a data flow component, it fires a revalidate metadata operation and guess what, it can't do that as the connection to the temporary table is gone.
The package will run fine, it will not throw VS_NEEDSNEWMETADATA but editing/maintenance becomes a pain.
If you switched from global temporary table to local, switch the table name variable's value back to a global and then run the define statement in SSMS. Once that's done, then you can continue editing the package.
I assure you, the local temporary table does work once you have the metadata set and you use queries via variables for source/destination.
No need for the global temporary table hack, or the SET FMTONLY OFF hack (which no longer works).
Just specify the result set metadata in the SQL query with WITH RESULT SETS. eg
EXEC ('
create table #t
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
insert into #t (Id, Name, Number)
select object_id, name, 12
from sys.objects
select * from #t
')
WITH RESULT SETS
(
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
)
If you need to parameterize the query, there's a bit of a catch because there are some limitations in how SSIS discovers parameters. SSIS runs sp_describe_undeclared_parameters, which doesn't really work with batches that call sp_executesql, because sp_executesql has a very unique way it handles parameters, one which you couldn't replicate with a user stored procedure.
So to parameterize the query you'll either need to pass the parameter values into the query using the "query from variable" and SSIS expressions, or push all this TSQL into a stored procedure.

Generate UniqueIdentifier GUID in SSIS

I am using a GUID for a batch identifier in SSIS. My final output goes to SQL Server.
I know how I can generate one using Select NewId() MyUniqueIdentifier in Sql Server - I can generate one using a query and an Execute SQL task.
I am however looking to do this within a SSIS package if possible without SQL Server available.
Can I generate a GUID within SSIS?
I had a similar problem. To fix it, I created an SSIS "Composant Script" in which I created a "guid" output. The script VS C# code was the following :
Row.guid = Guid.NewGuid();
Finally, I routed the output as a derived column into my database "OLE DB Destination" to generate a guid for every new entry.
Simply do it in an Execute SQL Task.
Open the task
Under General -> SQL Statement, enter your query Select NewID() MyID in the "SQLStatement" field
Under General -> Result Set, choose "Single row"
Under Parameter Mapping, Enter your User::myID in Variable Name, "Input" as direction, 0 as Parameter Name, and -1 as Parameter Size
Under Result Set, enter "MyID" for your Result Name and type the variable in Variable Name
-Click OK
Done. Note that "MyID" is a value you can choose. EDIT: "User::myID" corresponds to the SSIS variable that you create.

SSIS: use a table to provide a single SQL Server 2012 query and DB2 query whose outputs will be inserted into a single SQL Server column

My best attempt at visualizing this:
ForEach row in dbo.runThese
**** Start Loop
(grab select statements from sql table)
dbo.runThese
Output:
ID db2_script sql_script
---------------------------------------------------------------------------
1 'select count(\*) from db2_cstmr' 'select count(*) from sql_cstmr'
(Run each script on an individual connection to the DB2 and SQL Server database)
(create a combined string with each result)
149, 149
(Insert the combined results into a SQL Server table)
INSERT INTO dbo.storeResults
VALUES (149,149)
**** End Loop
I see three different ways to do this, but I'll provide the one I see is the most elegant. I will split the tasks based on location within the package:
1. Variables
New variable "Statements" of Object data type, which will hold a list of db2 and sql server statements
db2_script: String
sql_script: String
id: int32
2. Control Flow
Execute SQL Task: Get all the records (sql statements) into the object variable using something like: SELECT id, db2_script, sql_script FROM dbo.StatementsToExecute. You need to set ResultSet property of the component to "Full result set" and configure the object variable in the Result Set pane
For Each Loop: using the object variable as enumerator (Foreach From Variable Enumerator) in the Collection pane, and assigning into db2_script, sql_script and id variables in the Variable Mapping pane
Data Flow component (see next)
3. Data Flow
OLEDB Source for DB2 database: specify variable db2_script for source statement (Data access mode: SQL command from variable)
OLEDB Source for SQL Server database: specify variable sql_script for source statement (Data access mode: SQL command from variable)
Edit both sources with Advanced Editor, got to "Input and Output Properties" tab, click on "OLE DB Source Output", set IsSorted=True, click on "OLE DB Source Output"->"Output Columns"->db2_count/sql_count
MERGE: merge both sources into one single pipeline and two different output columns
OLEDB Target: map db2_count and sql_count to the target columns
Note: you would need to provide aliases for the Counts in each Select statement (e.g. SELECT COUNT(*) AS db2_count FROM ...) because they will give names to the columns in the Data Flow's pipeline. Another way is to edit in advanced mode both sources and give ad-hoc names

Create a copy of a table within the same database with SSIS

I want to create a copy of a table, say TestTable, with a new name, say TestTableNew, in the same database with the use of an SSIS package. I've created a "Transfer SQL Server Objects Task" for this with the source database specified as both the SourceDatabase and the DestinationDatabase. When I run this task, the original table TestTable is overwritten with a new -empty- TestTable.
This might well be something really obvious that I've overlooked, but can I somehow specify another name for the destination table somewhere in this transfer task? Or should I solve this in another way?
You can't use the "Transfer SQL Server Objects Task" to copy a table to the same database because there isn't an option to specify the new table name. You would be copying table "TestTable" to table "TestTable", which will fail because they both have the same name.
You can set the "DropObjectsFirst" property to true, but that will make you lose your original table and its data, which I think you did on your test, otherwise you would have received a failure message.
The best option here is to use an "Execute SQL Task" to create the structure of your TestTableNew based on your TestTable and then do a simple OleDBSource -> OleDBDestination transformation to load all the data from one table to another.
My knowledge of SSIS is very limited but I assume you can run sql commands passing in
parameters and therefore generating something like the following dynamically
select *
insert into TestTableNew
from TestTable

Resources