How to organize live data integrity tests and code unit tests? - database

I have several files with code testing code (which uses a "unittest" class).
Later I found it would be nice to test database integrity also. I put this into a separate directory tree. (Things like the keys have correct format, parent and child nodes are pointing correctly and such. Edit: this is a nosql project, where I can not rely on database level checks liek referential integrity and such.)
I use the same unittest class for the integrity tests.
Now I wonder if it makes really sense to keep this separate. To test the integrity of data I often duplicate parts of code that I use to test the code that handles the data.
But it is not the same. The code tests use test databases (that get deleted after each test) and the integrity tests connect to the live data and analyze it. The integrity tests I want to call from cron and send an alarm if something happens in the live database.
How would you handle that? Are there standards for such a setup? What is your experience?
My tendency is to put everything in the same file, which would result in the code tests also being executed by the cron on the production environment.
Edit: What also drives me is to try to keep the project simple and not to have too many files touched by a single task or workflow. Without all the testing i already have a class file, a subclass, a related class, some library (helper) files and the main code. Testing adds one file. It helps me keep my attention focussed while coding, it is less stressing and I believe I make less errors, and I can faster remember and find a certain code part with less files affected. Only one testing file per workflow would help here. If I keep it seperate there are 2 files (data integrity testing and code testing) and maybe 3 (a common library for both). Abstraction would add complexity.
Edit2: I am now refactoring a little bit and only moving the data testing files to the same directory tree where also the code testing lives, but keeping different files with the name indicating "integrity" or "testing". I will not (yet) merge the files, because 2 people recommended against it, and I believe in their experience and advice for now. I will live with code duplication for the moment.
Edit3: I forgot to mention that the selection of the tests per run is not determined by the tree structure in this case. The tests are enumerated in a master file, so I have 2 master files "integrity" and "code testing" at the present, and the test can live in the same directury structure.
Maybe more people will answer. Thank you so far for the valuable input, which is already helping me develop the final structure!
Edit4: I did more refactoring now. It seems I should keep 2 files, but with slightly modified purpose. One targeted for scheduled monitoring on the production server. And another one for development. But in both files can be integrity tests or code tests. And in both files operations can be performed on test databases (that are erased after the test) and on the permanent database (each one has a permanent database, production server and develpment server). And one important thing: I find myself moving lots of common code from the testing files to the class files. So the classes get also abilities that are for testing only. I like this so far, feels clean. I am not (yet) creating a library for testing that is shared between the 2 testing frontends, this code has gone to the class file of the obejct that is being teted for now.
Please note that my comments below are signed with "user89021" but it's me, karlthorwald. I can't do anything about it.

You should separate the database related tests from the "pure" unit tests.
The cost of having two different assemblies is very low considering the benefits - you have one suite of fast, no environment set required tests that you can run on any machine and a slower suite that tests the database integrity that can run only on specific places (e.g. build server).
Another benefit is that you can have two build processes (quick and nightly) that runs different tests suites.
To avoid duplicating code you can create another assembly with the common methods/actions that both test suites needs. Don't worry too much about duplicatimng the actual tests because you're testing different things (either logic or database) so sooner or later your tests will become quite different depending on what you're trying to test.

Your approach of separating them is good.
Your concern about code duplication is 100% valid.
The solution is fairly straightfowrard - abstract away common functionality between the tests - e.g. "RunTest", "AnalyzeResult", "ConnectToDB" - into a common library (you did not specify which language but I assume it has a concept of a library) which can be passed configration details such as which database to connect to.
Then use that library independently from the unit test driver and integrity test driver - which, if you are skilled/lucky enough, might have very little code of its own other than configuration (e.g. which database to connect to, how to report results, and which tests to run).
Similarly, if needed, common inputs/datasets can be placed in common directory

One more answer. You have two types of tests. What I would like to do is address the integrity tests. What you may want to do is include the integrity tests as a function of the production code. IOW, have the integrity as part of the system.
You already mentioned that duplication is an issue and that you are refactoring to remove the duplication. The refactored code of course has development tests?
System Monitoring can be production code. So what ever code you write becomes part of the system.
The nice thing about this is that you evolve your code through your development tests.

Related

SpecFlow Integration Testing with Database Patterns

I'm attempting to set up SpecFlow for integration/acceptance testing. Our product has a backing database (not a huge one though) in Sqlite.
This is actually proving to be a slightly sticky point though; how do I model the database for the tests?
I would like to know what patterns others out there use for doing integration/acceptance testing with backing databases.
I can think of the following approaches:
Compile a database into the assembly with the tests, then shadow-copy it for each test. Seems slow though.
I could create the database in memory and populate it with pre-determined data.
I could create the database in memory and somehow have Givens populate the database. This seems like it would bloat the tests horribly, but might give them more control and make the tests less fragile.
I could abstract every database interaction and use mocks. Not in love with this idea since I'd like to use this to test the database interactions as well.
Compile the database into the tests and rely on clean-up code to return it to the base state (this one seems dodgy to me). Don't want to do it with transactions since there will be multiple interactions with some tests (so write an item then attempt to read it back with different privileges).
Before considering the How to test, I think you might find it valuable to look at What you want to test.
Starting with what data, I find that it really helps to take a single element, or a small number, and imagine a set of events around them in order to give you the right test data to run your tests with. For example;
If you were working on a healthcare system, you might define a person "Bob" and then produce his life events. Bob was born 37 years ago today, fell off his bike as a child and broke his arm, got married, and has two children.
If you are working on a financial trading system, you might look at a day between opening and closing for a couple of stocks, e.g. "MSFT" and "APPL". On this day you might see one starting low and climbing, the other starting high and falling. A piece of news comes out that reverses their fortunes.
Now you have the what you can actually evaluate which of your scenarios actually work for your data, e.g. “MSFT” and “APPL” could have 1,000s of price changes throughout the day, so generating the Givens and Mocks would be very time consuming. This data lends itself to being pre-captured. On the other hand the “Bob” data works particularly well when using generated data because the data can always change so that it is his birthday today.
One thing your question doesn’t seem to need to consider is updating your data. For example you might want to have a set of tests that work at various stages of your entities life cycle, e.g. Some tests deal with “Baby Bob”, others with “10yr old Bob”,or “Married Bob” etc. If your DB is read only then this isn’t a problem if you can write your tests so that they just don’t see the other data, but sometimes you want build a story through your tests. If your tests do change the data, then you will have problems with ensuring that either your tests run in order (see MSTest OrderedTest or mbUnit DependsOn), or that you can separate your tests so they each deal with an isolated data entity (this is fine if your entity can be described in a single row, but gets harder when you have to read many tables to get it).
You also might want to consider what code you are testing, you can vary the approach inside your different test sets. I currently work on a multi-tier application that has a UI Views, View Models, Client Models, multiple communication systems, and server models. I also have different sets of tests for these. I have some tests that work in a single tier, mocking out other tiers to keep my tests small. Other tests fire up a local server and local client and wire the two up directly. Finally I have some tests that launch a full server process, communicate via EMS and run some simple client side operations using everything but the UI Views.
So now to actually answer your question,
Shadow copy your database - Yes, I’ve done this once with SQLServer Developer and had an xxx.mdb that got copied in before running the tests. However some modern testing frameworks will run tests in parallel e.g. NCrunch, so this just breaks.
Create the database and pre-populate - Not done this one, but my concerns would be what happens where a test changes the database to an unexpected state. Other tests will fail when they have done nothing wrong.
Create the database and use Givens - I’ve done this with NUnit via [SetupFixture]on top of a Linq-to-sql DB.You still have concerns about parallel test runs and you have to balance the granularity of your givens (see StackOverflow-When do BDD scenarios become too specific), and you have the data update ordering/data isolation problem, but this can work really well to allow you to work through your data stories and grow the data throughout your tests. On the other hand, should one test fail and leave the data in a bad state you can end up with lots of failures, but at least you simply need to look at the one that fails first. This kind of testing will also be not play very nicely for developers on their workstation as they can’t just run a single test, particularly with tools such as NCrunch, which can just run tests whose code has changed.
Mock the database This is how I choose to do things now. The trick is that if you are personally following a reasonably strict TDD process where you only test the method you are working on, then you actually end up with some tiers that test the database interaction, e.g. [Test]DALLayerTests.ShouldReadARowAndCreatePOCO(), but most others that used mocked data to test what actually happens e.g. [Test]BusinessObjectPersonTests.ShouldGetBirthdayCongratulations().
Use clean up code - Never tried it, it sounds dodgy :-)

When to mock database access

What I've done many times when testing database calls is setup a database, open a transaction and rollback it at the end. I've even used an in-memory sqlite db that I create and destroy around each test. And this works and is relatively quick.
My question is: Should I mock the database calls, should I use the technique above or should I use both - one for unit test, one for integration tests (which, to me at least, seems double work).
the problem is that if you use your technique of setting up a database, opening transactions and rolling back, your unit tests will rely on database service, connections, transactions, network and such. If you mock this out, there is no dependency to other pieces of code in your application and there are no external factors influencing your unit-test results.
The goal of a unit test is to test the smallest testable piece of code without involving other application logic. This cannot be achieved when using your technique IMO.
Making your code testable by abstracting your data layer, is a good practice. It will make your code more robust and easier to maintain. If you implement a repository pattern, mocking out your database calls is fairly easy.
Also unit-test and integration tests serve different needs. Unit tests are to prove that a piece of code is technically working, and to catch corner-cases.
Integration tests verify the interfaces between components against a software design. Unit-tests alone cannot verify the functionality of a piece of software.
HTH
All I have to add to #Stephane's answer is: it depends on how you fit unit testing into your own personal development practices. If you've got end-to-end integration tests involving a real database which you create and tidy up as needed - provided you've covered all the different paths through your code and the various eventualities which could occur with your users hacking post data, etc. - you're covered from a point of view of your tests telling you if your system is working, which is probably the main reason for having tests.
I would guess though that having each of your tests run through every layer of your system makes test-driven development very difficult. Needing every layer in place and working in order for a test to pass pretty much excludes spending a few minutes writing a test, a few minutes making it pass, and repeating. This means your tests can't guide you in terms of how individual components behave and interact; your tests won't force you to make things loosely-coupled, for example. Also, say you add a new feature and something breaks elsewhere; granular tests which run against components in isolation make tracking down what went wrong much easier.
For these reasons I'd say it's worth the "double work" of creating and maintaining both integration and unit tests, with your DAL mocked or stubbed in the latter.

MSTest unit tests and database access without touching the actual database

In my code I interact with a database (not part of my solution file). The database is owned by a separate team of DBA's, and the code we developers write is only allowed to access stored procs. However we have full view of the database's procs, tables, and columns (it's definition). For my code that is dependent upon data, I currently write unit tests that dumb-up data in the tables (and tear down/remove those rows after the unit test is done), so I can run unit tests to exercise my code that interacts with the DB. All of the code to do this is in the test file (especially in the ClassInitialize() and ClassCleanup() functions). However I've been given some amount of grief from my new coworkers call my style of unit tests "destructive" because I read/write to the dev database inserting and removing rows. At the time we code the unit tests, the database design is generally not stable, so many times we can find issues in the stored proc code before we unleash the QA department on our programs (saves resources). They all tell me there's a way to clone to the database into memory at the time the MSTest unit tests are run, however they don't know how to do it. I've researched around the web and cannot find a way to do what my coworkers need me to do.
Can someone tell me for sure whether or not this can happen in the environment I shown above? If so, can you point me in the right direction?
Do you have SQL scripts that can be used to create your database? You should have, and they should be under version control. If so, then you can do the following:
In your test setup code:
create a 'temporary' database using the SQL scripts. Use a unique name, for example unitTestDatabase_[timestamp].
setup the data you require for your test in the test database. Ideally using public API functions (eg CreateUser, AddNewCustomer), but where the required API does not exist, use SQL commands. Using the API to set up test data makes the tests more robust against changes to the low-level implementation (i.e. database schema). Which is one reason why we write unit tests, to ensure that changes to the implementation do not break functionality.
run your unit tests, using dependency injection to pass the test database connection string from the test code into the code under test.
and in your test teardown code, delete the database. Ideally should be done using your database uninstall scripts, which should also be under version control.
You can control how often you want to create a unit test database: e.g. per test project, test class or test method, or a combination, by creating the database in either an [AssemblyInitialize], [ClassInitialize] or [TestInitialize] method.
This is a technique we use with great success. The advantages are:
every time we run the unit tests, we are testing that our database installation scripts work together with the code.
test isolation, that is the tests only affect their test database. And it doesn't matter if the rollback code goes wrong, you are not touching anyone else's data.
Confidence in the code. That is, because we are using a real database, the unit tests give me more confidence that the code works than if I was mocking the database. Of course, this depends on how good your suite of higher level integration/component tests are.
Disadvantages:
the unit tests are dependant on an external system (the DBMS). You will need to find the name of a DBMS in your test setup code. This can be done by using a config file or by looking at run time for a running local DBMS.
Tests may be slowed down by the database installation scripts. In our experience, the tests are still running quickly enough, and there are plenty of opportunities to optimize. We run our test suite of approx 400 unit tests in approx 1 min, which includes creating 5 separate database on a local installation of SQLServer 2008.
If you can create a 'seam' between the business logic code and your data access layer you should be ok. Use interfaces to represent the contract your DAL exposes to your business logic and then either write your own set of Fake objects or use a mocking tool such as rhino-mocks.
If you are writing tests that hit that database then you have a huge maintenance headache, since as you state, the database is changing, and also it makes it difficult to maintain an environment that has access to the database. What you are actually writing are integration tests, which are still valid, but true unit test's shouldnt have any dependencies on databases, file system, etc.
I would mock out the database, rather than trying to interact with a test instance. This will make your tests faster (so you're more likely to run them).
Assuming you can't do what the others suggested because you're actually testing the stored procedures do what you expect then I think what your colleagues are referring to is using an in-memory database.
When people talk about in-memory databases for testing they're usually referring to SQLite. They build up the database in memory at the start of the test and destroy it at the end. Unfortunately SQLite doesn't support Stored Procedures so that won't help you.
What I would suggest is that you write specific integration tests for the Stored Procedures and insert/remove data as you currently do. Note that it's easier if you wrap the test in a transaction that you then roll back. You could also use the database "unit testing" features in Visual Studio for testing the sprocs if you have that available.
For the rest of your code mock your DAL as #Ben suggested and test your business logic as a normal unit test. However, given the complexity of your DAL being a static class you're going to have to do some work to wrap the DAL and start using the wrapper class throughout your application - a little like how ASP.NET MVC deals with HttpContext.
Even with a bounty, I could not find out if this does exist. I would assume at this point, the people who had told me that this technology does exist might have been mistaken.
Can we not ask DBA to provide backup of DB and Restore it on your local machine and perform test on it.
Backup and restore is fastest way i think.

What's the best strategy for unit-testing database-driven applications?

I work with a lot of web applications that are driven by databases of varying complexity on the backend. Typically, there's an ORM layer separate from the business and presentation logic. This makes unit-testing the business logic fairly straightforward; things can be implemented in discrete modules and any data needed for the test can be faked through object mocking.
But testing the ORM and database itself has always been fraught with problems and compromises.
Over the years, I have tried a few strategies, none of which completely satisfied me.
Load a test database with known data. Run tests against the ORM and confirm that the right data comes back. The disadvantage here is that your test DB has to keep up with any schema changes in the application database, and might get out of sync. It also relies on artificial data, and may not expose bugs that occur due to stupid user input. Finally, if the test database is small, it won't reveal inefficiencies like a missing index. (OK, that last one isn't really what unit testing should be used for, but it doesn't hurt.)
Load a copy of the production database and test against that. The problem here is that you may have no idea what's in the production DB at any given time; your tests may need to be rewritten if data changes over time.
Some people have pointed out that both of these strategies rely on specific data, and a unit test should test only functionality. To that end, I've seen suggested:
Use a mock database server, and check only that the ORM is sending the correct queries in response to a given method call.
What strategies have you used for testing database-driven applications, if any? What has worked the best for you?
I've actually used your first approach with quite some success, but in a slightly different ways that I think would solve some of your problems:
Keep the entire schema and scripts for creating it in source control so that anyone can create the current database schema after a check out. In addition, keep sample data in data files that get loaded by part of the build process. As you discover data that causes errors, add it to your sample data to check that errors don't re-emerge.
Use a continuous integration server to build the database schema, load the sample data, and run tests. This is how we keep our test database in sync (rebuilding it at every test run). Though this requires that the CI server have access and ownership of its own dedicated database instance, I say that having our db schema built 3 times a day has dramatically helped find errors that probably would not have been found till just before delivery (if not later). I can't say that I rebuild the schema before every commit. Does anybody? With this approach you won't have to (well maybe we should, but its not a big deal if someone forgets).
For my group, user input is done at the application level (not db) so this is tested via standard unit tests.
Loading Production Database Copy:
This was the approach that was used at my last job. It was a huge pain cause of a couple of issues:
The copy would get out of date from the production version
Changes would be made to the copy's schema and wouldn't get propagated to the production systems. At this point we'd have diverging schemas. Not fun.
Mocking Database Server:
We also do this at my current job. After every commit we execute unit tests against the application code that have mock db accessors injected. Then three times a day we execute the full db build described above. I definitely recommend both approaches.
I'm always running tests against an in-memory DB (HSQLDB or Derby) for these reasons:
It makes you think which data to keep in your test DB and why. Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why and if something breaks, it wasn't me!!" ;)
It makes sure the database can be recreated with little effort in a new place (for example when we need to replicate a bug from production)
It helps enormously with the quality of the DDL files.
The in-memory DB is loaded with fresh data once the tests start and after most tests, I invoke ROLLBACK to keep it stable. ALWAYS keep the data in the test DB stable! If the data changes all the time, you can't test.
The data is loaded from SQL, a template DB or a dump/backup. I prefer dumps if they are in a readable format because I can put them in VCS. If that doesn't work, I use a CSV file or XML. If I have to load enormous amounts of data ... I don't. You never have to load enormous amounts of data :) Not for unit tests. Performance tests are another issue and different rules apply.
I have been asking this question for a long time, but I think there is no silver bullet for that.
What I currently do is mocking the DAO objects and keeping a in memory representation of a good collection of objects that represent interesting cases of data that could live on the database.
The main problem I see with that approach is that you're covering only the code that interacts with your DAO layer, but never testing the DAO itself, and in my experience I see that a lot of errors happen on that layer as well. I also keep a few unit tests that run against the database (for the sake of using TDD or quick testing locally), but those tests are never run on my continuous integration server, since we don't keep a database for that purpose and I think tests that run on CI server should be self-contained.
Another approach I find very interesting, but not always worth since is a little time consuming, is to create the same schema you use for production on an embedded database that just runs within the unit testing.
Even though there's no question this approach improves your coverage, there are a few drawbacks, since you have to be as close as possible to ANSI SQL to make it work both with your current DBMS and the embedded replacement.
No matter what you think is more relevant for your code, there are a few projects out there that may make it easier, like DbUnit.
Even if there are tools that allow you to mock your database in one way or another (e.g. jOOQ's MockConnection, which can be seen in this answer - disclaimer, I work for jOOQ's vendor), I would advise not to mock larger databases with complex queries.
Even if you just want to integration-test your ORM, beware that an ORM issues a very complex series of queries to your database, that may vary in
syntax
complexity
order (!)
Mocking all that to produce sensible dummy data is quite hard, unless you're actually building a little database inside your mock, which interprets the transmitted SQL statements. Having said so, use a well-known integration-test database that you can easily reset with well-known data, against which you can run your integration tests.
I use the first (running the code against a test database). The only substantive issue I see you raising with this approach is the possibilty of schemas getting out of sync, which I deal with by keeping a version number in my database and making all schema changes via a script which applies the changes for each version increment.
I also make all changes (including to the database schema) against my test environment first, so it ends up being the other way around: After all tests pass, apply the schema updates to the production host. I also keep a separate pair of testing vs. application databases on my development system so that I can verify there that the db upgrade works properly before touching the real production box(es).
I'm using the first approach but a bit different that allows to address the problems you mentioned.
Everything that is needed to run tests for DAOs is in source control. It includes schema and scripts to create the DB (docker is very good for this). If the embedded DB can be used - I use it for speed.
The important difference with the other described approaches is that the data that is required for test is not loaded from SQL scripts or XML files. Everything (except some dictionary data that is effectively constant) is created by application using utility functions/classes.
The main purpose is to make data used by test
very close to the test
explicit (using SQL files for data make it very problematic to see what piece of data is used by what test)
isolate tests from the unrelated changes.
It basically means that these utilities allow to declaratively specify only things essential for the test in test itself and omit irrelevant things.
To give some idea of what it means in practice, consider the test for some DAO which works with Comments to Posts written by Authors. In order to test CRUD operations for such DAO some data should be created in the DB. The test would look like:
#Test
public void savedCommentCanBeRead() {
// Builder is needed to declaratively specify the entity with all attributes relevant
// for this specific test
// Missing attributes are generated with reasonable values
// factory's responsibility is to create entity (and all entities required by it
// in our example Author) in the DB
Post post = factory.create(PostBuilder.post());
Comment comment = CommentBuilder.comment().forPost(post).build();
sut.save(comment);
Comment savedComment = sut.get(comment.getId());
// this checks fields that are directly stored
assertThat(saveComment, fieldwiseEqualTo(comment));
// if there are some fields that are generated during save check them separately
assertThat(saveComment.getGeneratedField(), equalTo(expectedValue));
}
This has several advantages over SQL scripts or XML files with test data:
Maintaining the code is much easier (adding a mandatory column for example in some entity that is referenced in many tests, like Author, does not require to change lots of files/records but only a change in builder and/or factory)
The data required by specific test is described in the test itself and not in some other file. This proximity is very important for test comprehensibility.
Rollback vs Commit
I find it more convenient that tests do commit when they are executed. Firstly, some effects (for example DEFERRED CONSTRAINTS) cannot be checked if commit never happens. Secondly, when a test fails the data can be examined in the DB as it is not reverted by the rollback.
Of cause this has a downside that test may produce a broken data and this will lead to the failures in other tests. To deal with this I try to isolate the tests. In the example above every test may create new Author and all other entities are created related to it so collisions are rare. To deal with the remaining invariants that can be potentially broken but cannot be expressed as a DB level constraint I use some programmatic checks for erroneous conditions that may be run after every single test (and they are run in CI but usually switched off locally for performance reasons).
For JDBC based project (directly or indirectly, e.g. JPA, EJB, ...) you can mockup not the entire database (in such case it would be better to use a test db on a real RDBMS), but only mockup at JDBC level.
Advantage is abstraction which comes with that way, as JDBC data (result set, update count, warning, ...) are the same whatever is the backend: your prod db, a test db, or just some mockup data provided for each test case.
With JDBC connection mocked up for each case there is no need to manage test db (cleanup, only one test at time, reload fixtures, ...). Every mockup connection is isolated and there is no need to clean up. Only minimal required fixtures are provided in each test case to mock up JDBC exchange, which help to avoid complexity of managing a whole test db.
Acolyte is my framework which includes a JDBC driver and utility for this kind of mockup: http://acolyte.eu.org .

Unit-Testing Databases

This past summer I was developing a basic ASP.NET/SQL Server CRUD app, and unit testing was one of the requirements. I ran into some trouble when I tried to test against the database. To my understanding, unit tests should be:
stateless
independent from each other
repeatable with the same results i.e. no persisting changes
These requirements seem to be at odds with each other when developing for a database. For example, I can't test Insert() without making sure the rows to be inserted aren't there yet, thus I need to call the Delete() first. But, what if they aren't already there? Then I would need to call the Exists() function first.
My eventual solution involved very large setup functions (yuck!) and an empty test case which would run first and indicate that the setup ran without problems. This is sacrificing on the independence of the tests while maintaining their statelessness.
Another solution I found is to wrap the function calls in a transaction which can be easily rolled back, like Roy Osherove's XtUnit. This work, but it involves another library, another dependency, and it seems a little too heavy of a solution for the problem at hand.
So, what has the SO community done when confronted with this situation?
tgmdbm said:
You typically use your favourite
automated unit testing framework to
perform integration tests, which is
why some people get confused, but they
don't follow the same rules. You are
allowed to involve the concrete
implementation of many of your classes
(because they've been unit tested).
You are testing how your concrete
classes interact with each other and
with the database.
So if I read this correctly, there is really no way to effectively unit-test a Data Access Layer. Or, would a "unit test" of a Data Access Layer involve testing, say, the SQL/commands generated by the classes, independent of actual interaction with the database?
There's no real way to unit test a database other than asserting that the tables exist, contain the expected columns, and have the appropriate constraints. But that's usually not really worth doing.
You don't typically unit test the database. You usually involve the database in integration tests.
You typically use your favourite automated unit testing framework to perform integration tests, which is why some people get confused, but they don't follow the same rules. You are allowed to involve the concrete implementation of many of your classes (because they've been unit tested). You are testing how your concrete classes interact with each other and with the database.
DBunit
You can use this tool to export the state of a database at a given time, and then when you're unit testing, it can be rolled back to its previous state automatically at the beginning of the tests. We use it quite often where I work.
The usual solution to external dependencies in unit tests is to use mock objects - which is to say, libraries that mimic the behavior of the real ones against which you are testing. This is not always straightforward, and sometimes requires some ingenuity, but there are several good (freeware) mock libraries out there for .Net if you don't want to "roll your own". Two come to mind immediately:
Rhino Mocks is one that has a pretty good reputation.
NMock is another.
There are plenty of commercial mock libraries available, too. Part of writing good unit tests is actually desinging your code for them - for example, by using interfaces where it makes sense, so that you can "mock" a dependent object by implmenting a "fake" version of its interface that nonetheless behaves in a predictable way, for testing purposes.
In database mocks, this means "mocking" your own DB access layer with objects that return made up table, row, or dataset objects for your unit tests to deal with.
Where I work, we typically make our own mock libs from scratch, but that doesn't mean you have to.
Yeah, you should refactor your code to access Repositories and Services which access the database and you can then mock or stub those objects so that the object under test never touches the database. This is much faster than storing the state of the database and resetting it after every test!
I highly recommend Moq as your mocking framework. I've used Rhino Mocks and NMock. Moq was so simple and solved all the problems I had with the other frameworks.
I've had the same question and have come to the same basic conclusions as the other answerers here: Don't bother unit testing the actual db communication layer, but if you want to unit test your Model functions (to ensure they're pulling data properly, formatting it properly, etc.), use some kind of dummy data source and setup tests to verify the data being retrieved.
I too find the bare-bones definition of unit testing to be a poor fit for a lot of web development activities. But this page describes some more 'advanced' unit testing models and may help to inspire some ideas for applying unit testing in various situations:
Unit Test Patterns
I explained a technique that I have been using for this very situation here.
The basic idea is to exercise each method in your DAL - assert your results - and when each test is complete, rollback so your database is clean (no junk/test data).
The only issue that you might not find "great" is that i typically do an entire CRUD test (not pure from the unit testing perspective) but this integration test allows you to see your CRUD + mapping code in action. This way if it breaks you will know before you fire up the application (saves me a ton of work when I'm trying to go fast)
What you should do is run your tests from a blank copy of the database that you generate from a script. You can run your tests and then analyze the data to make sure it has exactly what it should after your tests run. Then you just delete the database, since it's a throwaway. This can all be automated, and can be considered an atomic action.
Testing the data layer and the database together leaves few surprises for later in the
project. But testing against the database has its problems, the main one being that
you’re testing against state shared by many tests. If you insert a line into the database
in one test, the next test can see that line as well.
What you need is a way to roll back the changes you make to the database.
The TransactionScope class is smart enough to handle very complicated transactions,
as well as nested transactions where your code under test calls commits on its own
local transaction.
Here’s a simple piece of code that shows how easy it is to add rollback ability to
your tests:
[TestFixture]
public class TrannsactionScopeTests
{
private TransactionScope trans = null;
[SetUp]
public void SetUp()
{
trans = new TransactionScope(TransactionScopeOption.Required);
}
[TearDown]
public void TearDown()
{
trans.Dispose();
}
[Test]
public void TestServicedSameTransaction()
{
MySimpleClass c = new MySimpleClass();
long id = c.InsertCategoryStandard("whatever");
long id2 = c.InsertCategoryStandard("whatever");
Console.WriteLine("Got id of " + id);
Console.WriteLine("Got id of " + id2);
Assert.AreNotEqual(id, id2);
}
}
If you're using LINQ to SQL as the ORM then you can generate the database on-the-fly (provided that you have enough access from the account used for the unit testing). See http://www.aaron-powell.com/blog.aspx?id=1125

Resources