How to use centralized SSIS dataflow and modularize SSIS packages - sql-server

Perhaps I am just too new to SSIS or have not really understood the basic concept. But I am a programmer who like to reuse as much as possible.
We have several SSIS projects that a have many things in common. E.g. we have programmed a flow to handle errors in specific way. Today we copy and paste this flow into each new project. It would be more convienient to refer to an external project/package so we can enhance the error handler centrally instead of copying it into each an every project. We think about something like the good old DLL concept.
The only ways we have found so far is to exchange data via DB tables or to use real external libraries. We would prefer to use as much built-in functionality as possible.
I have not found any literature or tutorial to modularize SSIS projects. Would be great to see the best practice here.

There is no way to reuse your C#/VB code written in Task Scripts of SSIS package even inside the same package.
You can put codes into a DLL (assembly) and put that into GAC.
Then you can add as reference into the project inside a Task Script and use your classes.
This is the only way to reuse code in SSIS.

On modularizing SSIS packages. You have several possibilities:
Pack defined actions into SSIS Custom Task or Custom Transformation, which is a specific DLL, and then - use (refer) it in your SSIS packages. Have you changed your Custom Component and installed the updated version - next Package runs will use the updated logic. Complexity - high, you have to implement specific interfaces and develop UI to be used inside Visual Studio developer.
Code driven SSIS package generation. You define SSIS package logic in some code, and then use the code to generate SSIS packages. Either it could be done with C# ManagedDTS class (see examples) or with EzAPI classes which provide some abstraction over ManagedDTS, or with BIML script language.
Complexity - med/high, you either have to generate SSIS package in C# after modelling it in Visual Studio, or in BIML script language.
Package generation is practical if you have multitude of similar packages. Otherwise it does not pay off; single package can be created in Visual Studio.
Why SSIS is not modular. My two cents - it was designed for non-programmers who can draw dataflows in designer. This can be done on ad-hoc basis, fast and with low personnel level. Besides, SSIS was created right after 2000's.
Nowdays, the approach changed, we are talking CI/CD etc, but SSIS concept stays the same.

Related

SSIS - export to excel through open XML

Earlier we have used Microsoft OLEDB JetProvider in SSIS package. After recent update from Microsoft, now we are facing issues with SSIS package. So we have decide to export data to excel using open XML. What should be the best approach for implementation since still we are using (xls) version 1997-2003.
Note: We already tried Microsoft Access Database engine 2010 Redistributable.
from my point of view, you have the following options (all about the ScriptTask unfortunately):
Call REST API and create a document there (using Open XML SDK). It's easy to develop, support and deploy
Use Open XML SDK directly in the ScriptTask
I would recommend following the first approach, but it all depends on your system though
UPDATE:
Following the first option, you have to develop a small Web API Service. Here is the link with an example on C#
Per the second option, in order to use external DDLs, such as OpenXML, you have to register it in the GAC (if the installer doesn't). Here is the link with an example of using external libraries.
If you are going to follow this option, I would recommend you develop a DDL that would work with Open XML directly and have simple API for calling it from SSIS Script Task. You will register your DDL in GAC and have a link in Script task. It will help you avoid a number of debugging issues.

How can I export an SSIS diagram as part of my SSIS build step?

Is it even possible to programmatically export an SSIS package's flow diagrams from outside of Visual Studio?
We're setting up our SSIS project for automatic builds inside a TeamCity server using devenv.exe (per this walkthrough). I'd like to make a build step that exports the Control and/or Data Flow diagrams.
Thanks ahead of time for any advice. All the responses I see when I search the web are suggestions to just screencap inside VS :/
There's nothing built in or easy that does this.
If you're feeling ambitious you could write a script task that does it.
You can do pretty much anything with a script task.

Committing Stored Procedures to SVN Repository

My current development environment for C# projects is Visual Studio, with a SQL Server database and using VisualSVN to connect to my SVN repository. To manage revisions of my Stored Proceduress, Views, etc I save the ALTER script to a folder watched by my SVN client so these get included in the repository.
I have checked out some (now older) posts like this one (How to keep Stored Procedures and other scripts in SVN/Other repository? and Is there a SVN plugin for SQL Server Management Studio 2005 or 2008?) and have seen a recommendation for these tools: http://www.red-gate.com/products/sql-development/sql-source-control/ and http://www.zeusedit.com/agent/ssms/ms_ssms.html .
As I infrequently work with projects doing much DB-side programming, this has never been a major bother (a dozen scripts in a folder with some naming scheme is not much to manage manually), but I have just inherited a project with a few hundred views and 1000+ Stored Procedures which have never been included in version control.
My question is:
What process do others follow for managing the versioning of their SQL Server code - is there a an accepted, clever or otherwise obvious approach I am missing here? I am leaning currently towards the purchase of one of the aforementioned tools - but am looking for advice from the community before I do this.
I realize this may result in a tool recommendation rather than a code solution but posted to SO as I think this is the appropriate crowd to ask this of.
I would recommend you go with something like the redgate tool, and treat any SQL database in the same way you'd treat your C# source code; manually keeping track of the ALTER statements will trip you up sonner or later as the number of modifications grow..can't speak for the zeus edit tool but having used the redgate one, it "just works" - and another benefit of using a tool like this is that it can manage your migration scripts so you can make a bunch of changes on your development version, then generate a single update script to update your testing database, etc,including data changes which is imho the biggest PITA to manually manage.
The other thing to consider, even if the number of changes are infrequent and you get away with manually tracking the ALTER statements, what if someone else ends up working on the same project; now you have another potential for mismanaged change scripts....
Anyway, do let us know how you get on and best of luck with it!
I’ve been maintaining a database with around 800+ db objects in it. We've always just scripted the database objects to a svn-watched folder as you describe. We have had some issues with this method, mostly with people forgetting to script new or modified objects. At the end of the day it hasn't been a huge problem for our project, but yours may be different.
We’ve looked into a couple tools, but they always assume you are starting from scratch, and we have almost 10 years of history we’d like to preserve. In the end we just end up settling back into our text-based manual solution. It's cheap and easy.
Another option you might want to look into is setting up a Visual Studio Database Project. It will script all your objects and provide some deployment options as well. My opinion was that it tired to be a little too tightly integrated for our tastes - we have a few named references to linked databases that it just wouldn't give up on.

Generate Data Change Scripts from VSTS Database Edition

I'm using the GDR release of VSTS Database edition source control the DB and generate deployment scripts. It works pretty well but the problem is that it only seems to handle scripting and deploying the schema. It stops short of handling scripting and deployment of the actual data itself (i.e. the lookup and standing data which also deployed with the DB).
I know it's easy enough to write the deployment scripts by hand, but is this what every one does? Is there a recommended way of deploying data with the VSTS deployment engine? Is there some tooling that help with this - I don't mean a full product like SQLCompare, just something that fills the gap with VSTS DB.
Thanks in advance.
Kaneda
The VSTS: DB best practices blog advocates using post-deployment scripts to insert reference data into temporary tables, then update the target tables based on the delta (ie update x inner join temp where x.something <> temp.something)
There's some suggestions floating around that this might make a powertool, and at least one MVP has written a tool to generate those scripts.
(NB: I haven't tried this - I only just found out about it myself)
Personally I would still stick with RedGate if I had any choice in the matter.
GDR comes with a data comparison engine, but as far as I've been able to tell so far a data comparison can't even be stored in a project (let alone be properly supported by it) - so it's pretty ad-hoc. Unlike a Schema Compare, there is no File \ Save As.
The comparison engine can be automated via DDE but that's automation within the Visual Studio IDE, and not really suitable for some kind of scripted installation process. As much as anything there's no way I could see to specify which tables to include in the comparison (since all you get to do via DDE is open the wizard for the user to select)
Alternatively all the functionality appears to reside in Microsoft.VisualStudio.TeamSystem.DataPackage.dll , but since the API documentation hasn't been written yet (the help doco that comes with GDR is full of errors as it is) it's going to be a bit of a hit-and-miss adventure to work out where to start.
As someone who's used RedGate's SqlCompare, SqlDataCompare and their respective APIs to do this before, much of the GDR functionality seems a bit half-baked to me.
What I will probably do this time round is sync the data with a SSIS package (export to CSV at build time / import from CSV at install time), but I'd far rather be using the SqlDataCompare API (or SqlPackager) right now.

How do I perform automated unit testing in SSIS packages?

How can I unit test SSIS packages? I want to be able to create and maintain unit tests for various components such as the workflow tasks, data flow tasks, event handlers, etc.
Are there any existing techniques, frameworks, and/or tools that can be used?
ssisUnit
A unit testing framework for SQL Server Integration Services
Nowadays, ssisUnit isn't up to date and exist modern unit testing framework for SQL Server Integration Services called SSISTester.
MSDN article
Nuget
some testing practices I usually follow when testing SSIS packages.
I always test at package level (it usually does not make a lot of sense to me to test at a lower level than this.... )
I usually keep a testing data environment with pretty small data sets.
Also a testing configuration profile (config files) pointing to the testing data sets and any other different testing parameters.
Depending of the nature of the project sometimes I also keep some database backups used to be restored whenever we want to reset the environment initial status (or any other statuses in the ETL process).
All of these combined in a good set of testing scripts (python, powershell...) calling the packages via dtexec, it's a pretty useful recipe for me ;-)
ssisUnitLearning is a SSIS Tester
SSIS project to learn SSIS-Unit testing
For more go to with "ssisUnit testing" series at bartekr

Resources