Import data from multiple MySQL data sources via SSIS? - sql-server

I have a particularly challenging situation that I could use some assistance with.
I work for a manufacturing facility and am working on a proof of concept.
I have a number of client devices (PIs) fixed to manufacturing equipment, all collecting data from the equipment and storing this data locally within an embedded MySQL database on the device. I would like to import the data from each of the devices, into a central Microsoft SQL Data Warehouse. I would prefer this to be pulled from the devices by the server, rather than being pushed from the client devices.
I would then like the embedded database on the device to be updated / purged, to prevent the same data from being resent (initially I was thinking a date field in a table which I just timestamp once that record has been copied).
My feelings are that a SSIS package would be the way to go here, I have IP addresses and connection information for the PIs in a table within the DW, and so would like to connect to each client in turn to import the data and update it.
Is there a way to change a connection string on the fly within SSIS? OR would there be a better way to achieve this - maybe via a sproc on the DW?
I'm ok with sprocs, but very new to SSIS. If you have any links/tutorials/posts that may help, please share. Thanks.
EDIT: This is what I already have
Here are my variables:
As you can see it is showing an error when attempting to run on the first step.
Also, FWIW, here's the progress output...

Is there a way to change a connection string on the fly within SSIS?
Use a variable to store the connection string, and use that variable to populate the Expression value of the connection string. Then when you change the value of the variable, you will change the value of the connection string.

Its not the answer but something like this.
A) you create a table all the IP address, and connection strings.
B) SSIS create variables for each property i.e Variable IPAddress.
C) Create Execute SQL Task; Set Full Result set.
Also Add Result Set: Result Name: 0 Varaible Name Rows.
D) Create another Variable Rows: DataType System.Object.
E) Add Foreach Loop Container: ADO: Rows
Variable Mapping: IPAddress
F) Create Source Connection Manager
Expression set the connection as of your Variables
G) Add a Data Flow Task and fetch the data from each connection.

Related

SSIS - importing identical data from multiple databases

I want to copy and merge data from tables with identical structure (in a number of different source databases) to a single table of similar structure in a destination database. From time to time I need to add or remove a source database.
This is currently achieved using a Data Flow Task containing an OLEDB source with a SQL query within which there is a UNION for each of the databases I am extracting from. There is quite a lot of SQL within each UNION so, if I need to add fields, I need to add the same additional SQL to each UNION. Similarly, when I add or remove a source database I need to add or remove a UNION.
I was hoping that, rather than use such a UNION with a lot of duplicated code, I could, instead, use a Foreach Loop Container that executes SQL contained in a variable using parameters to substitute the name of the database and other database dependent items within the SQL on each iteration but I hit problems with that as I assume the Data Flow Task within the loop could not interpret the incoming fields because of the use what is effectively dynamic SQL.
Any suggestions as to how I might best achieve this without duplicating a lot of SQL?
It sounds like you have your loop figured out for moving from database to database. As long as the table schemas are identical (other than names as noted) from database to database, this should work for you.
Inside the For Each Loop container, create either a Script Task or an Execute SQL Task, whichever you're more comfortable working with.
Use that task to dynamically generate the SQL of your OLE DB Source query, changing the Customer Code prefix for each iteration. Assign the SQL text to a variable, either directly in a Script Task, or by assigning the Result Set of your Execute SQL Task (the result set being the query text) to a variable.
Inside your Data Flow Task, in the OLE DB Source, under Data Access Mode select "SQL Command from variable". Select the variable that you populated with your query in the last task.
You'll also need to handle changing the connection string between iterations, but, again, it sounds like you have a handle on that part already.

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

How can I use a variable in a data source?

I am transferring a very large table that has a column called EndOfSessionTime and I would like to get a var LastSess=MAX(EndOfSessionTime) from the destination and transfer only rows WHERE EndOfSessionTime > LastSess.
I set the variable using a ScriptTask in the control flow, but I can't seem to find a straight-forward way to use the variable in a data source.
2 options:
1) Stuff your entire SQL Query into a variable, and in the OLEDB Data Source, choose "SQL Command From a Variable"
2) Use parameters in your SQL query. Indicate a placeholder for a parameter with a Question Mark character, and then hop over to the Parameters tab to assign the variable to the parameter. Google "SSIS Data Source Parameters" for tutorials and examples.
To use connection parameters is a must in SSIS, but I had a heck of a time to find how. Microsoft made up a word "Parametrization" and using it as a search term seems to be the only way to come up with how.
The connection manager names at the bottom of the SSIS desktop are actual objects. Right-clicking on one and selecting "Parametrization" is how connection parameters are set. One more note on that: There is no "Apply" on this dialog, so set one paramter at a time, OK, then right-click-Parametrize again. If you set a parameter and go on to the next one without clicking OK, the first one will be lost.

How do I pass value to a stored procedure parameter in OLE DB Source component?

I am working with SSIS 2008. I have a select query name sqlquery1 that returns some rows:
aq
dr
tb
This query is not implemented on the SSIS at the moment.
I am calling a stored procedure from an OLE DB Source within a Data Flow Task. I would like to pass the data obtained from the query to the stored procedure parameter.
Example:
I would like to call the stored procedure by passing the first value aq
storedProdecure1 'aq'
then pass the second value dr
storedProdecure1 'dr'
I guess it would be something like a cycle. I need this because the data generated by the OLE DB Source through the stored procedure needs to be sent to another destination and this must be done for each record of the sqlquery1.
I would like to know how to call the query sqlquery1 and pass its output to call another stored procedure.
How do I need to do this in SSIS?
Conceptually, what your solution will look like is an execute your source query to generate your result set. Store that into a variable and then you'll need to do iterate through those results and for each row, you'll want to call your stored procedure with that row's value and send the results into a new Excel file.
I'd envision your package looking something like this
An Execute SQL Task, named "SQL Load Recordset", attached to a Foreach Loop Container, named "FELC Shred Recordset". Nested inside there I have a File System Task, named "FST Copy Template" which is a precedence for a Data Flow Task, named "DFT Generate Output".
Set up
As you're a beginner, I'm going to try and explain in detail. To save yourself some hassle, grab a copy of BIDSHelper. It's a free, open source tool that improves the design experience in BIDS/SSDT.
Variables
Click on the background of your Control Flow. With nothing selected, right-click and select Variables. In the new window that pops up, click the button that creates a New Variable 4 times. The reason for clicking on nothing is that until SQL Server 2012, the default behaviour of variable creation is to create them at the scope of the current object. This has resulted in many lost hairs for new and experienced developers alike. Variable names are case sensitive so be aware of that as well.
Rename Variable to RecordSet. Change the Data type from Int32 to Object
Rename Variable1 to ParameterValue. Change the data type from Int32 to String
Rename Variable2 to TemplateFile. Change the data type from Int32 to String. Set the value to the path of your output Excel File. I used C:\ssisdata\ShredRecordset.xlsx
Rename Variable 4 to OutputFileName. Change the data type from Int32 to String. Here we're going to do something slightly advanced. Click on the variable and hit F4 to bring up the Properties window. Change the value of EvaluateAsExpression to True. In Expression, set it to "C:\\ssisdata\\ShredRecordset." + #[User::ParameterValue] + ".xlsx" (or whatever your file and path are). What this does, is configures a variable to change as the value of ParameterValue changes. This helps ensure we get a unique file name. You're welcome to change naming convention as needed. Note that you need to escape the \ any time you are in an expression.
Connection Managers
I have made the assumption you are using an OLE DB connection manager. Mine is named FOO. If you are using ADO.NET the concepts will be similar but there will be nuances pertaining to parameters and such.
You will also need a second Connection Manager to handle Excel. If SSIS is temperamental about data types, Excel is flat out psychotic-stab-you-in-the-back-with-a-fork-while-you're-sleeping about data types. We're going to wait and let the data flow actually create this Connection Manager to ensure our types are good.
Source Query to Result Set
The SQL Load Recordset is an instance of the Execute SQL Task. Here I have a simple query to mimic your source.
SELECT 'aq' AS parameterValue
UNION ALL SELECT 'dr'
UNION ALL SELECT 'tb'
What's important to note on the General tab is that I have switched my ResultSet from None to Full result set. Doing this makes the Result Set tab go from being greyed out to usable.
You can observe that I have assigned the Variable Name to the variable we created above (User::RecordSet) and I the Result Name is 0. That is important as the default value, NewResultName doesn't work.
FELC Shred Recordset
Grab a Foreach Loop Container and we will use that to "shred" the results that were generated in the preceding step.
Configure the enumerator as a Foreach ADO Enumerator Use User::RecordSet as your ADO object source variable. Select rows in the first table as your Enumeration mode
On the Variable Mappings tab, you will need to select your variable User::ParameterValue and assign it the Index of 0. This will result in the zerotth element in your recordset object being assigned to the variable ParameterValue. It is important that you have data type agreement as SSIS won't do implicit conversions here.
FST Copy Template
This a File System Task. We are going to copy our template Excel File so that we have a well named output file (has the parameter name in it). Configure it as
IsDestinationPathVariable: True
DestinationVarible: User::OutputFileName
OverwriteDestination: True
Operation: Copy File
IsSourcePathVariable: True
SourceVariable: User::TemplateFile
DFT Generate Output
This is a Data Flow Task. I'm assuming you're just dumping results straight to a file so we'll just need an OLE DB Source and an Excel Destination
OLEDB dbo_storedProcedure1
This is where your data is pulled from your source system with the parameter we shredded in the Control Flow. I am going to write my query in here and use the ? to indicate it has a parameter.
Change your Data access mode to "SQL Command" and in the SQL command text that is available, put your query
EXECUTE dbo.storedProcedure1 ?
I click the Parameters... button and fill it out as shown
Parameters: #parameterValue
Variables: User::ParameterValue
Param direction: Input
Connect an Excel Destination to the OLE DB Source. Double click and in the Excel Connection Manager section, click New... Determine if you're needing 2003 or 2007 format (.xls vs .xlsx) and whether you want your file to have header rows. For you File Path, put in the same value you used for your #User::TemplatePath variable and click OK.
We now need to populate the name of the Excel Sheet. Click that New... button and it may bark that there is not sufficient information about mapping data types. Don't worry, that's semi-standard. It will then pop up a table definition something like
CREATE TABLE `Excel Destination` (
`name` NVARCHAR(35),
`number` INT,
`type` NVARCHAR(3),
`low` INT,
`high` INT,
`status` INT
)
The "table" name is going to be the worksheet name, or precisely, the named data set in the worksheet. I made mine Sheet1 and clicked OK. Now that the sheet exists, select it in the drop down. I went with the Sheet1$ as the target sheet name. Not sure if it makes a difference.
Click the Mappings tab and things should auto-map just fine so click OK.
Finally
At this point, if we ran the package it would overwrite the template file every time. The secret is we need to tell that Excel Connection Manager we just made that it needs to not have a hard coded name.
Click once on the Excel Connection Manager in the Connection Managers tab. In the Properties window, find the Expressions section and click the ellipses ... Here we will configure the Property ExcelFilePath and the Expression we will use is
#[User::OutputFileName]
If your icons and such look different, that's to be expected. This was documented using SSIS 2012. Your work flow will be the same in 2005 and 2008/2008R2 just the skin is different.
If you run this package and it doesn't even start and there is an error about the ACE 12 or Jet 4.0 something not available, then you are on a 64bit machine and need to tell BIDS/SSDT that you want to run in 32 bit mode.
Ensure the Run64BitRuntime value is False. This project setting can be found by right clicking on the project, expand the Configuration Properties and it will be an option under Debugging.
Further reading
A different example of shredding a recordset object can be found on How to automate the execution of a stored procedure with an SSIS package?

What's the best way to allow queries/views to see how tables looked in the past based on a datetime stamp in SQL Server 2008?

Scenario: We have a great deal of server environmental information (names, IPs, roles, firewall ports, etc.). Currently it's stored in a big Excel workbook on a SharePoint, which trivially allows for side-by-side comparisons of past versions of the data with current, for example.
we're planning to move it into a SQL Server 2008 database to make it easier for tools/automation to tap into as well as for better reporting. However, as you'd expect, one of the requirements given was that the admins would like to be able to see how an environment looked at some point in the past. Some piece of magic like: sp_getEnvironmentsAsOf('PERF1', '2009-11-14 00:00:00') and suddenly all the data that was current as of 11/14/09 is returned.
I'm looking into SQL Server 2008 Change Tracking and Change Data Capture, but all of the scenarios and examples don't see to relate to the specific requirement of seeing the data in the tables as they were at some arbitrary point in the past.
Is CT/CDC apropos? And what are the other options, beyond rolling my own solution out of ticky-tacky and hope?
You should design your schema to track this changes instead of relying on a dbms feature. Something like:
Devices
Id
Description
Serial number
Some immutable properties
Properties
Id
Description
Device-Properties
DeviceId
PropertyId
Value
TimeStamp
You never update or delete Device-Properties, you only add rows with a new timestamp.
Sample Data:
Devices
1,Server A,1123123
2,Server B,1323454
Properties
1,IP Address
2,Location
3,Rol
Devices-Properties
1,1,192.168.0.10,2010-02-12
1,2,Rack D,2010-02-12
1,3,Proxy,2010-02-12
2,1,192.168.0.105,2010-02-12
2,2,Rack C,2010-02-12
2,3,Mail server,2010-02-12
1,1,192.168.0.11,2010-02-15
In the sample data, Server A IP address was changed from 192.168.0.10 to 192.168.0.11 on 2010-02-15
You can construct views or stored procedures to join and filter the data as needed.
CDC is appropriate but you might also want to look at AutoAudit on CodePlex.
What comes to my mind is 'snapshot' feature (snapshot shows you a state of you database at the time snapshot is created). However, snapshot looks like a different database (so you will query something like 'MyDBSnapshot_DATE' instead of 'MyDB' and it definitely takes resources [to track changes].
Other option is ... do it yourself.

Resources