How do I perform automated unit testing in SSIS packages? - sql-server

How can I unit test SSIS packages? I want to be able to create and maintain unit tests for various components such as the workflow tasks, data flow tasks, event handlers, etc.
Are there any existing techniques, frameworks, and/or tools that can be used?

ssisUnit
A unit testing framework for SQL Server Integration Services

Nowadays, ssisUnit isn't up to date and exist modern unit testing framework for SQL Server Integration Services called SSISTester.
MSDN article
Nuget

some testing practices I usually follow when testing SSIS packages.
I always test at package level (it usually does not make a lot of sense to me to test at a lower level than this.... )
I usually keep a testing data environment with pretty small data sets.
Also a testing configuration profile (config files) pointing to the testing data sets and any other different testing parameters.
Depending of the nature of the project sometimes I also keep some database backups used to be restored whenever we want to reset the environment initial status (or any other statuses in the ETL process).
All of these combined in a good set of testing scripts (python, powershell...) calling the packages via dtexec, it's a pretty useful recipe for me ;-)

ssisUnitLearning is a SSIS Tester
SSIS project to learn SSIS-Unit testing
For more go to with "ssisUnit testing" series at bartekr

Related

Does anyone has "Best Practice" to share for Unit/Integration/Regression testing with Snowflake?

We are embarking in a project using both Matillion and Snowflake and want to put in place some Unit/Integration/Regression testing.
Automated would be brilliant but manual would be good too.
We could invent something (simple) ourselves... but it would be better to benefit from other people experience.
Indeed there is a lack of possibility to unit/integration testing your models within Matillion.
You need some external tools to implement these - we were using a Spring Boot Microservice with the following steps:
Setup the Testdata with some Plain SQL scripts via JDBC connection to the underlying Database
Run the correspondig job via the Matillion REST API
Using JUnit to make assertion and verify the outcome
Looks like Matillion just released Object Validation in version 1.46. I haven't tried it yet but I imagine this would allow users to set up unit tests, which could then be scheduled or run after the orchestration job it's testing.

Auto-deploy Zend Framework 2 application + Database schema + Actual data

Background:
I am using GitHub to store a ZF2 application.
The database schema + the actual data stored inside the schema are not being stored inside a version control. At the moment I am in development mode, so I have some database dump scripts that I load into the database when I need to. I also tweak entries in the database via phpMyAdmin when I need ongoing granular control for immediate testing purposes. I am also looking into using Doctrire ORM, so my schema will be part of my code via Annotations, and that will be checked into GitHub. Doctrine ORM will generate the actual schema for me, although it is still a separate step in the deployment process. The actual data however, will still be outside of the application and outside of the repository and currently has to be dealt with separately and is not automated.
Goal:
I want to be able to deploy ZF2 application and the database schema, and the data onto Zend Server and have it "just work" in the most automated, least manual way possible.
Question:
What is a recommended, best practice way to deploy every aspect of ZF2 application in the most automated, least manual way possible and have it "just work"? Let's focus on the Development and Testing mode here, as in Production it may be good to have separate deployment steps to protect against accidental live data overwrites.
You can try Phing (http://www.phing.info/) for deploying your PHP application, adjusting directory permissions, running database migrations, running unit tests, etc. I used Phing in couple of my projects with great success.

Integration tests in Continuous Integration environment: Database and filesystem state

I'm trying to implement automated integration tests for my application. It's a very complex monster. You could say that its database and part of the filesystem are part of its state, because it saves image files in the hard drive, and references to those in the DB. The software needs all those, in a coherent state, to work properly.
Back to writing tests: To run any relevant test, I need some image files in the filesystem, and certain records filled in the database. I thought of putting all of these in a separate folder called TestEnvironmentData in the repository, and retrieving them from the Continuous Integration Server (Team City), but a colleague said the repo is quite full as it is, and that I should set up a special directory, and databases, only in the Continuous Integration server. I don't like that because the tests success depend on me manually mantaining stuff in the server, and restoring initial state before every test becomes cumbersome.
What do you guys do when you need to write integration tests for an app like this? The main goal is having an automated test harness to approach a large scale refactoring. There's lots of spaghetti code and the app's current architecture is hardly unit testable, that's why I decided on integration tests first.
Any alternative approach is welcome.
Developer Repeatability is key when setting up a Continous Integrations Server. I have set one up for my last three employers and I have found the key to success is the developers being able to run the same tests from their dev system in order to get the same results as the CI Server.
The easiest way to do this would be to check in the test artifacts into source control but you could also use dropbox or a Network Share that you copy them from in one of the build steps.
For a .Net solution I have always used MsBuild as you can most easily replicate the build process of Visual Studio and get the same binaries/deployables. As for keeping your database in sync so that tests can be repeatable in the past I used the MbUnit test framework and the [Rollback] attribute as it would roll back any changes to Sql Server that happened in the test. I believe that Nunit now has this attribute as well.
The CI server is great for finding code that breaks existing functionality but unless developers can reproduce the error on their machine they won't trust the CI server for some time.
First of all, we use Maven to build our code. It's like ant, but it relies on convention instead of configuration for many things, like Ruby On Rails does. One of those conventions is a standardized directory structure:
(project)----src----main----(language)
| | \--resources
| \--test----(language)
| \--resources
\--target---...
Using a directory structure like this makes it easy to keep your application resources and testing resources near each other, yet still be able to build for test or build for production, or just build both but just package up the application parts after running the tests.
As far as resetting the database between tests, how you do that is greatly dependent on the DBMS you're using. For instance, if you're using MySQL it's very easy to get the test data the way you want and do a mysqldump to a file you then load before the test. With other DBMSs you may have to drop and recreate the tables and reload the data, or make separate tables for the starting point and use a CREATE/SELECT sql statement to duplicate it each time.
There really is no reliable way around the "reset the database between tests" step.

Spring JPA: Testing DAO layer with multiple databases in a CI environment

We are working on a project where database requirement is not clear. So we are building a database agnostic application.
See my previous question here: Database Agnostic Application
Now I want to test my Spring application DAO with multiple database. I've written number of test cases using TestNG and DBUnit.
When I run these test in a CI environment, I want them to test the application against all the configured databases. I've installed the databases on the 'test server'.
e.g. I want something like this:
for ( each database configured ) {
run each dao test
}
Not sure what is the best way of doing this? And help is welcome.
Thanks,
Adi
If you want to be database independent, you have to test against every single database system you want to support. There are very fine differences which leak through Hibernate.
What I did in the past was to make the test retrieve their database configuration through some System Property. Typically by using hibernate_.property instead of the default hibernate.property. Then setup CI Jobs, which set the property to different values and provide one hibernate_xxx.property for every database to test against. I did this using JUnit Rules, to have the logic in one place. Don't know the apropriate tool for TestNG
I'm not to fond of the loop construct you are hinting at, because it might make it difficult to run a test suit against a single specific database.
I'm also not to fond of dbunit, because it seems to make maintaining testdata rather painful. I prefer in most cases a handcrafted DSL. Have a look at some articles I wrote about it:
http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/
http://blog.schauderhaft.de/2011/03/20/testing-databases-with-junit-and-hibernate-part-2-the-mother-of-all-things/
http://blog.schauderhaft.de/2011/03/27/testing-databases-with-junit-and-hibernate-part-3-cleaning-up-and-further-ideas/
If you're building a database agnostic application and not using any of the inherent features of a specific database vendor, then the scope of your test cases should be to test the setup, manipulation, and accessing of the data through the DAO objects and less with the testing of the actual database backend. Hibernate 3.5 has dialects available for both Oracle 11g and DB2, so if you were writing test cases that tested the integration of the database agnostic application with a specific database vendor, then really what you are doing is testing that the hibernate dialects do as they say they do (which I'm sure has been covered by test cases in the hibernate project).
In other words, in your case I would think that the testing should focus more on the DAO retrieving the data that you think that it will retrieve after you've set that data up, and in-memory databases are fine for that.
Now all that said, both DB2 and Oracle have very good documentation related to setup. Indeed, both of them have "wizards" to do that. If you still think that it's prudent to test adding data to the database and retrieving it from the physical, non-in-memory database, then I would recommend setting up a "test database" environment and pointing your datasource to that during your continuous integration tests. If you're using Hudson or Jenkins for CI, you can set it up to run a script after the build completes that will truncate the database tables so that the next round of tests work from a blank slate.
EDIT:
I just saw the updates that you posted to your question, so let me address them. Since you already have the databases setup and configured then what you really want to do is dynamically select what the database should be. One way to do this would be to setup your datasource using System properties that can be inherited from a properties file, and running your tests in a "DB2-test" environment and an "Oracle-test" environment. Using this method, you'll have to setup the datasource programmatically and have it read system environment variables to determine which database it connects to. This would essentially require you to change your CI script to run the DB2-test environment first, then the Oracle-test environment following that -- your test suites will run twice.
Hope this helps!
Unit 4.9 has a new Feature: TestRule
You should be able to write a rule, that repeat a test for different databases.
There is this stack overflow question: How to Re-run failed JUnit tests immediately?
It is a slightly different question, but the solution should be the same technique.

SQL Server Database Management with Continuous Integration

Let's say we have a continuous integration server. When I check in, the post-hook pulls the latest code, runs the tests, packages everything. What is the best way to also automate the database changes?
Ideally, I'd build an installer that could either build a database from scratch or update an existing one using some automated syncing method.
I've recently bumped into an article, that might be of use.
The author explained some of the best continuous integration practices including testing, processing and automation.
Here are some of the key takeaways:
In many shops code is unit tested at the point of commit. For databases, it is preferred running all unit tests at once and in sequence against a QA database, vs development, as a part of the Test step
The test step is a critical part of any CI/CD process. Test scripts, including unit tests themselves, should also be versioned in source control, extracted at the point of the Build step and executed
Pulling data from production is appealing as a quick expedient, but is never a good idea
The best approach is using a tool or script to quickly, repeatedly and reliably create synthetic test data for your transactional tables
Running unit tests to produce manual summary results for human consumption defeats the purpose of automation. We need machine readable results, that can allow an automated process to abort, branch and/or continue.
Running a CI process, which requires 100% of all tests to pass, is akin to not having CI at all, if the workflow pipeline is set up atomically to stop on failure, which it should. To thread the needle, tests should have built in thresholds, that will raise an error based on either the % of tests failing or in some cases, if certain high priority tests fail.
All processes should ultimately produce a Boolean result of pass or fail, but some non-automated processes can easily find their way into your CI workflow pipeline (e.g. unit testing). Software should be plug-n-play into any workflow pipeline, taking known inputs and producing expected outputs – like pass, fail.
CI/CD process should be aborted on failure and a notification email should be immediately sent vs continuing to cycle the pipeline.
The CI process should not cycle again until any errors in the last build are fixed. On failure, the entire team should get the failure notification, including as many details as to what failed as possible.
If a pipeline takes 1 hour, from start to finish, to complete, including all the testing, then all the build intervals should be set to no less than one hour and all new commits should be queued, and applied to the next build.
No plain text passwords should exist in automation scripts
If you have the opportunity to define and control the whole database management and db creation process, have a serious look at DB Ghost - it's more than just a tool - it's a process.
If you like it and can implement it, you'll get great returns on it - but it's a bit of a "all-or-nothing" kind of approach. Recommended.
I would caution against using a db backup as a development artifact, most CI best practices suggest that you manage the schema, procedures, triggers, and views as first class development artifacts. The side effects is that you can take this one step further and use them to build a new database whenever you want, ideally you also have some data that can be pushed into the database.
Here is a cliff notes version to get your feet wet, but there is lots out there in this space:
http://www.infoq.com/news/2008/02/versioning_databases_series
I like some of the ideas that Scott Ambler has here as well, the site is good but the book is surprisingly deep for such a difficult set of problems.
http://www.agiledata.org/
http://www.amazon.com/exec/obidos/ASIN/0321293533/ambysoftinc
Red Gate is a quite robust solution and it works out of the box.
But the best thing is that you can integrate it with your continuous integration process. I use it with Msbuild and Hudson.
quickly explaining how it works:
http://blog.vincentbrouillet.com/post/2011/02/10/Database-schema-synchronisation-with-RedGate
if you need to know more about this, feel free to ask
The Red Gate approach using SQL Source Control and the SQL Compare Pro command line is detailed with code samples here:
http://downloads.red-gate.com/HelpPDF/ContinuousIntegrationForDatabasesUsingRedGateSQLTools.pdf
Troy Hunt wrote an article on Simple Talk entitled "Continuous Integration for SQL Server Databases":
http://www.simple-talk.com/content/article.aspx?article=1247
Have you looked at FluentMigrator? The default download includes Nant scripts that would be easy to add in to a CI. Free, open source and easy to use. Works for a wide variety of databases.
The latest version (5.0) of DB Ghost doesn't suffer from the "non ASCII character" problem (it just means that the file is UTF8 encoded) and it should be able to do exactly what you need.
Also, the tools can actually be used standalone to perform the various functions (scripting, building, comparing, upgrading and packaging) if you want, it's just that using them all together provides a full end-to-end process thus making the overall value greater than the sum of it's parts.
In essence, to make changes to the schema you update individual object creation scripts and per-table insert scripts (for reference data) that are held under source control just like you were developing a “day one” greenfield database. The DB Ghost tools are used to enable the whole thing by building these scripts into a brand new database (using continuous integration if required) and then comparing and upgrading a target database, which can be a copy of the production database. This process produces a delta script which can be used on the real production database during go-live.
You can even produce a Visual Studio database project and add it into any solutions you currently have.
Malc
I know this post is old, but we have a new solution that takes the following approach:
Developers script individual SQL changes and commit them to source
control.
Our program (OneScript) pulls the change script files from
source control, filters and sorts them, and generates a single
release script file.
That release script file is then applied to a
database to do a release.
Our home page here explains this process in more detail and has a link to an example that does these steps automatically from a Subversion hook. So soon after a commit, the developer receives an email saying if the release was successful or had errors. The PowerScript code is included.
Disclaimer -I'm working at the company that makes OneScript.

Resources