Scenario
Note: I am using SQL Server 2017 Enterprise
I am looping through a list of databases and copying data to them out of one database. This database will only be accessed by the script (no other transactions will be made against it from something else). The copy includes copying straight table to table, or will have more complex, longer-running queries or stored procedures. All of this is done with SQL Server jobs calling procedures; I'm not using anything like SSIS.
Question
Instead of looping through all the databases and running the statements one at a time, I want to be able to run them in parallel. Is there an easy way to do this?
Options I've thought of:
Run each data transfer as a job and then run all the jobs at once. From my understanding, they would be executed asynchronously, but I'm not 100% sure.
Generate the SQL statements and write a script outside of SQL Server (e.g. Powershell or Python) and run all the commands in parallel
Leverage SSIS
I prefer not to do this, since this would take too much work and I'm not very familiar with it. This may be used down the road though.
Use powershell...
Create a table on the central database to house instance / connection string details. (Remember to obfuscate for security)
Create another table to house the queries.
Create a third table to map Instance to Query.
In powershell create a collection / list based object. Deserialized from your data entries. The object will be made up of three properties {Source / Destination / Query}
Write a method / function to carry out the ETL based work. CONNECT TO DB, READ FROM SOURCE, WRITE TO DEST.
Iterate over the collection using Foreach-Parallel construct with your function nested within. This will initiate a new SPID based on the number of elements in the collection and pass those values into your function where the work will be carried out.
Related
First thing first. I'm totally new to SSIS and trying to figure out its potential when it comes to ETL and eventually go to SSAS. I have the following scenario:
I have an Intersystems Database which I can connect via ADO .NET
I want to take data from this db and make inserts into MS SQL through incremental loads
My proposed solution/target is:
Have table in the MS SQL that stores the last pointer read or date/time snapshot. (irrevevant at this stage). Let's keep it simple and say we are going to use the record ID that exists in the Intersystems Database
Get the pointer from this table and use it as a parameter through ODBC to read the source database and then make inserts into the target MS SQL db
Update the pointer with the last record read so that next time we continue from there. (I don't want to get into the complications of updates/deletes. let's keep it simple)
Progress so far:
I have succeed to make a connection with MS SQL to read the pointer from there and place it in a variable
I have managed to use the [Execute SQL task] using parameters to read data from Intersystems Db and I'm placing that into a variable using FullResultSet
I have managed to use the [ForEach Loop Container] using the [Foreach ADO Enumerator] to go through each record and each field (yeeeey!)
Now. I can use a [Script task] that makes inserts into the MS SQL database using VB.NET code (theoretically) and then update the counter with the last record read from the source database. I have spent endless hours looking for solutions using ODBC parameters and the above is the only way forward I could see working.
My question is this:
Is this the only way and best practise? Isn't there some easy way that I can plug this resultset into some dataflow components which does the inserts and updates the record pointer for me??
Please assume that I do not have rights access to write into Intersystems Db and thus I cannot make any changes there to the tables structures. But I can only read data so that I can place it into MS SQL.
Over to you guys (or gals?)
I would suggest using a dataflow to improve your design for both efficiency (bulk loading vs row by row in script) and ease of use (no need for scripting).
You should use an execute SQL to get your pointer and save it into a variable.
You should build a sql variable using dynamic sql and above variable.
Make a data connection in manager to Source
Add a dataflow and go into it
Add a source manager and select your source from popup
Choose sql from variable and choose your variable
At this point you should have all the data you want and you can continue to transform or directly load to your target.
Edit: Record Pointer part
Add a multicast (this makes as many copies as you want)
Add an Aggregate Object and max(whatever your pointer is)
OleDBSQL Object (Allows live SQL and used mainly for updates
9a. UPDATE "YourPointerTable" SET "PointerField in DB" = ? (? is actually what you need to enter.
9b. Map to whatever you named in step 8.
This will allow you to handle insert/updates
From Multicast flow a new stream to a lookup object and map your key to the key of destination table
Specify no matches to redirect to no match output
Your matches map to an UPDATE
Your no matches map to an Insert
What I want to do is build a dynamic data pull from different SQL source servers (Server1,Server2,Server3) etc.
To pull down to dynamic locations on my SQL server (Dev,Prod) into databases (database1,database2,etc)
The tables will be dropped and recreated each time the package is run so that I am sure I match the source servers if they change anything on source (field names, datatypes, lengths, etc)
I will still get the data to extract. I want to pull this down using a single dataflow in a foreach loop.
I have a table that has all the server names and tables and databases in it and
I want to loop through that table and pull all the rows of tables inside down to my server (server1.database1.table_x,server5.database3.table_y,etc) So that I don't have to build a new data flow for each table.
In order to do this I have already built the foreach loop with a sql task that is dumping results into an object. Then the foreach loop takes that object that has 7 different fields (Source_Server_Name,Source_Server_Type_Driver,Source_Database,Source_Table,Source_Where_Clause,Source_Connection_String,then destination stuff) and it puts each of those fields into a different String variable for use inside the loop.
I can change the Connections dynamically using the variables but I can't figure out how to get the column mapping in the dataflow to function,
Is there some kind of script task I can use to edit the backend XML that will create the column mapping for me so the metadata does not error out? Any help would be greatly appreciated :-)
This is the best illustrated example I could find of what I am doing just remember I need to have a different metadata setup for each table I pull down to my server.
http://sql-bi-dev.blogspot.com/2010/07/dynamic-database-connection-using-ssis.html
The solution I ended up using is BIML which generates the package on the fly using dynamic sql and BIML. Not pretty but it works :-)
I have heard that it is possible to dynamically generate and publish the packages but I would never go this route. I have done something similar using c# code which can be run from an application via sql agent or from inside an SSIS package script task.
If you try this approach look into SqlConnection and SqlCommand. Then write code to build the sql statements dynamically.
For example create table statements using ExecuteNonQuery(), use datareader to pipe in input and pass that reader to SqlBulkCopy to write to the destination.
Is it possible to use SSIS (sql server integration services) Transfer SQL Server Objects, to loop through a pre-made list of Table names and then execute the Task on just that one Table? (with variable name, or whatever)?
I see the place where you can pre-select a bunch of tables to Copy, but for the bigger picture of the overall automated process, logging, and logging information about backup events back into another table..... I'd prefer to execute the Task once on each table - but how can I tell the task to do just that one table?
Do I have to change it via script task at run time? Why doesn't it just have an expression that can be set to a variable and "single table" option?
Yes.
You can use a For-each Loop with a 'Foreach ADO Enumerator'. You need to first save the sql query result to a 'Rowset destination' with an execute sql task. Following that, use the Foreach task. you may need to add some variables too, to realize the condition checking.
I found a wonderful script that collects all the (shared) datasources used on a reportserver:
LINK
I simply love this script.
However, I am looking for a way to execute this script on several reportservers and add the results to a centralised table. That way my colleagues and me would be able to see pretty quickly what datasources are used.
I could place this script on each reportserver, collect the csv's on a central server and then use SSIS to insert them into a MSSQL table. That way I would have a nice central overview of all the used datasources.
However, I would prefer to have the script in one location and then execute that script on a list of servers.
Something like:
Loop through table with servers
execute script (see link)
insert resulting csv into central table (preferably skip this step, have script insert data in table directly)
next server
Any suggestions as to what the best approach would be? Should it be a webservicetask? Scripttask?
Something else completeley?
The level of scripting in the mentioned script is right at the edge of what I understand, so if someone would know how to adapt the script in such a way that I could use it as input in a dataflow in SSIS I would be very happy.
Thanks for thinking with me,
Henro
This script is called using a utility called rs.exe so you would use an execute process task to call it. To avoid writing to a file, you could modify the script and have it insert the results into a table. The package could be set up as follows:
Create a foreach loop which iterates over a list or ado.net recordset of your servers
Put the server name in a variable
Create a variable for the arguments for the process task, referencing the server variable from step 2
Add a process task which uses the above argument and calls rs.exe
Due to an employee quitting, I've been given a project that is outside my area of expertise.
I have a product where each customer will have their own copy of a database. The UI for creating the database (licensing, basic info collection, etc) is being outsourced, so I was hoping to just have a single stored procedure they can call, providing a few parameters, and have the SP create the database. I have a script for creating the database, but I'm not sure the best way to actually execute the script.
From what I've found, this seems to be outside the scope of what a SP easily can do. Is there any sort of "best practice" for handling this sort of program flow?
Generally speaking, SQL scripts - both DML and DDL - are what you use for database creation and population. SQL Server has a command line interface called SQLCMD that these scripts can be run through - here's a link to the MSDN tutorial.
Assuming there's no customization to the tables or columns involved, you could get away with using either attach/reattach or backup/restore. These would require that a baseline database exist - no customer data. Then you use either of the methods mentioned to capture the database as-is. Backup/restore is preferrable because attach/reattach requires the database to be offline. But users need to be sync'd before they can access the database.
If you got the script to create database, it is easy for them to use it within their program. Do you have any specific pre-requisite to create the database & set permissions accordingly, you can wrap up all the scripts within 1 script file to execute.