I'm running Oracle 11g SE1 .
Just wondering if there're any tools that would allow me to test the data integrity of a ( mostly read-only ) schema. Essentially, what I want to do is to have some queries that run every night or so and see if they return the expected result. For example:
SELECT COUNT(*) FROM PATIENTS WHERE DISEASE = 'Clone-Killing Nanovirus';
Expected result : 59.
How do people normally do such testing ?
I've used SQLUnit and written about it here. I don't believe any new development is being done on it but it should accomplish your goal.
SQL Developer (free, as in beer) also has a Unit Testing framework. I have installed it and that's about it. I want to use it more, but I've been working with BI the past few years so no external pressure to learn it.
The tests that you want to create sound pretty simple, so either of those should work well for you. The next step would be to have them run on a schedule (cron, windows scheduler, etc) or you can go crazy with a continuous integration tool like Atlassian's Bamboo (haven't used it).
Of course you could skip the tools altogether and just write up scripts that are called from the command line. Fancy would have you writing the results to a database table so you can easily skin it, simple would be piping the results to a text file and reviewing that each day.
Hope this helps.
You could batch up your queries and run a simple perl script using DBI that would run the queries and check them against an accepted tolerance and email you if something didn't meet thresholds. I know I have written such db checking code before to make sure items were within thresholds. Perl is a good tool for this sort of thing as the DBI module can connect to your database and then you can run some canned queries and easily send yourself an email using the MIME package. http://www.perl.com/pub/1999/10/DBI.html
Related
So I'm learning about Spark and I have a question about how client libs works.
My goal is to do some sort of data analysis in Spark, telling it where are the data sources (databases, cvs, etc) to process, and store results in hdfs, s3 or any kind of database like MariaDB or MongoDB.
I though about having a service (API application) that "tells" spark what I want to do. The question is: Is it enough setting the master configuration with spark:remote-host:7077 at context creation or should I send the application to spark with some sort of spark-submit command?
This completely depends on how your environment is set up, if all paths are linked to your account you should be able to run one of the two commands, to efficiently open the shell and run test commands. The reason to have a shell, is this will allow you to dynamically run commands and validate/learn how to run/tether commands onto one another and see what results come out.
Scala
spark-shell
Python
pyspark
Inside of the environment, if everything is linked to Hive tables you can check the tables by running
spark.sql("show tables").show(100,false)
The above command will run a "show tables" on the Spark-Hive-Metastore Catalogue and will return all active tables you can see (doesn't mean you can access the underlying data). The 100 means I am going to look at 100 rows and the false means to show the full string not the first N many characters.
In a mythical example if one of the tables you see is called Input_Table you can bring it into the environmrnt with the below commands
val inputDF = spark.sql("select * from Input_Table")
inputDF.count
I would heavily advise, while your learning, not to run the commands via Spark-Submit, because you will need to pass through the Class and Jar, forcing you to edit/rebuild for each testing making it difficult to figure how commands will run without have a lot of down time.
I am a newbie in tdd. I have watched Brandon Satrom's videos. I am trying to implement tests like them ,outer loop for acceptance tests and inner loop for unit tests. I have thought acceptance test was againist to Database ,too.So i expect to find examples about [BeginScenario/AfterScenario] events for database clean up in Specflow.It is said to be used for database Clean up. But None of the examples i saw do it.
Am i misundestanding the acceptance test concept? Doesn't it cover the database too? Should we use mock objects there like we did in unit tests?
I'm using a real MS SQL Server database in my integration unit tests (MSTest) and acceptance testing with BDD tool SpecFlow in this way: I have a dump of my test database (MDF/LDF files) stored as a template. On test initialize I copy them to a temporary location, attach them to a dedicated SQL Server using sp_attach_db stored procedure (you may use an Express edition for this), then I run whatever test code I want and on test cleanup I detach the test database and delete the MDF/LDF files. The whole copy/attach/detach/delete cycle is pretty fast (at least much faster than I thought before).
If you're interested, I could put it into some more words on my blog.
At last i am convinced that i must use the real database in my acceptance tests. I have to see some examples, and read it from several resources before i settle it in my mind.
Now i am using acceptence test as supposed for testing the flow of my user interfaces and database.
i wrote a happy path scenerio for my registration page to design page flow. then i wrote some test for logic that kept in my stored procedures in database. Other logic is on controllers and model classes. So for them i used unit tests. Now it makes more sense to me, until my next confusion about tdd :).
As for clean up process, i use [BeginScenario/AfterScenario] events. At BeginScenario i use a global varible to keep a DateTime.Now.Ticks value and merge it in beginnigs of the values that i sent to db. Then i find the records that start with this DateTime.Now.Ticks value when i making the clean up for that scenario at AfterScenario event. So it helped me to make unique values that doesnt interfere with other records. It seemed to work by now.
Regarding this matter, this article, is very helpful.
It describes the use of transactions in MSDTC, starting at BeginScenario and rolled back at AfterScenario.
(SpecFlow is not used in the article, but its the same concept)
We are currently using this technique with success in a mid scale development project.
Is there anybody out there writing unit tests for their TSQL stored procedures, triggers, functions ... etc.
I've recently started making database and restores and installs part of our automated Cruise Control build process. Now I'm thinking about taking it to the next level where we do the install, then run through a list of stored procedure tests etc.
I was going to just roll my own using MsBuild Extensions to invoke the tests. However I'm aware of http://www.tsqltest.org/ and http://tsqlunit.sourceforge.net/. I'm also aware that TFS has sql testing.
I just wanted to see what people in the real world are doing and if they have any suggestions.
Thanks
The critical parts:
Make it automated and integrated with your build/test (so you have a
green or red from your build)
Make it easy to add a new test
Keep your tests up-to-date
Advanced:
test failure conditions in your code
make sure your tests clean up after
themselves (TSqlTest's example
scripts use #beforeCount and
#afterCount variables to validate the
clean-up)
Stored procedures. I generally include test queries in comments in the SP header, and record correct results and query times. This still leaves it as a manual exercise, however.)
Functions. Again, put SQL statements in the header with the same info.
Triggers. I avoid them for a number of reasons, one of them being that they are so hard to test and debug for so little benefit compared to putting the same logic in another tier. It's like asking how to test for Referential Integrity.
This is still a manual process, however. But since I think one should intentionally design SQL artifacts to be totally uncoupled (e.g. no SPs calling SPs, same with functions, and another strike against triggers IMHO) it's relatively less complex.
I have used the database testing that is built into Visual Studio 2008 Database Edition on a project here. It works well, but feels more like a third party bolt-on to Visual Studio than a native component. Some of the pains I felt with it are:
Because SQL code lives in the res files and a single code file can include multiple tests, it is not as easy to search for tests based on table/column names.
Because multiple tests live in the same code files, you have some annoying variable name collisions (eg, if you have two tests in a single code file, all of the assertions for those tests have to have unique names; That means your assertion names will probably look like "testname_assertionname", which really shouldn't be necessary).
Refactoring your tests is not easy - for example, if you want to move a test from one code file to another, the easiest way is to create the test from scratch in the new file because there are bits and pieces of the test scattered about the res file and the code file.
All of that said, as I started with - It does work well. Unfortunately, we have not added these tests to our continuous integration server yet, so I can't comment on how easy it is to automate the running of these tests. We are using TFS for CI, and I am assuming that automation of the tests would work very similar to automation of standard unit tests; In other words, it seems like there should be an MSTest command line that would run the tests.
Of course, this is only an option if you are licensed to run Visual Studio 2008 DB Edition (which I understand is now included in the VS 2008 Pro license).
I've done this in java, using dbunit.
Basically, anything you do in the database either:
returns a result set
or alters the state of the database.
The state of the database can be described as all the values in all the rows in all the table in all the schemas of a database; the state of any subset is the state of all the data affect by some test.
So, start with a database filled with enough test data that you can perform you tests, call this the baseline. Extract a snapshot, with dbunit or the tool of your choice.
Given that your database is at baseline, any result set is deterministic (as long as your sp is deterministic, less so, if it does a "select random();").
Get the baseline result set of all your SPs, save those as snapshots with dbunit or whatever tool you're using.
To test operations that don't change state, just test that the result set you get is the one you initially got. To test operations that change the database, test that baseline + operation = expected change. After each test that potentially chnages the db, restoe it to baseline.
Basically, the ability to restore to a baseline makes the testing possible.
Have you tried using the red-gate.com API?
They have a bunch of products for comparing things in SQL Server and the API allows virtually the same functionality programmatically.
http://help.red-gate.com/help/SQLDataCompareAPIv5/4/en/GettingStartedAPI.html
Can anyone provide some real examples as to how best to keep script files for views, stored procedures and functions in a SVN (or other) repository.
Obviously one solution is to have the script files for all the different components in a directory or more somewhere and simply using TortoiseSVN or the like to keep them in SVN, Then whenever a change is to be made I load the script up in Management Studio etc. I don't really want this.
What I'd really prefer is some kind of batch script that I can run periodically (nightly?) that would export all the stored procedures / views etc that had changed in a given timeframe and then commit them to SVN.
Ideas?
Sounds like you're not wanting to use Revision Control properly, to me.
Obviously one solution is to have the
script files for all the different
components in a directory or more
somewhere and simply using TortoiseSVN
or the like to keep them in SVN
This is what should be done. You would have your local copy you are working on (Developing new, Tweaking old, etc) and as single components/procedures/etc get finished, you would commit them individually until you have to start the process over.
Committing half-done code just because it's been 'X' time since it was last committed is sloppy and guaranteed to cause anyone else using the repository grief.
I find it best to treat Stored Procedures just like any other compilable code: Code lives in the repository, you check it out to make changes and load it in your development tool to compile or deploy the code.
You can create a batch file and schedule it:
delete the contents of your scripts directory
using something like ExportSQLScript to export all objects to script/scripts
svn commit
Please note: That although you'll have the objects under source control, you'll not have the data or it's progression (is that a renamed field, or 1 new field and 1 deleted?).
This approach is fine for maintaining change history. But, of course, you should never be automatically committing to the "production build" (unless you like broken builds).
Although you didn't ask for it: This approach also won't produce a set of scripts that will upgrade a current DB. You'll only have initial creation scripts. Recording data progression and creation upgrade scripts is beyond basic source control systems.
I'd recommend Redgate SQL Compare for this - it allows you to compare database versions and generate change scripts - it's also fairly easily scriptable.
Based on your expanded question, you really want to use DDL triggers. Check out this article that details how to create a changelog system for your database.
Not sure on your price range, however DB Ghost could be an option for you.
I don't work for this company (or own the product) but in my researching of the same issue, this product looked quite promising.
I should've been a little more descriptive. The database in question is for an internal ERP system and thus we don't have many versions of our database, just Production/Testing/Development. When we've done a change request, some new fancy feature or something, we simply execute a script or series of scripts to update the procedures in question on the Testing database, if that is all good, then we do the same to Production.
So I'm not really after a full schema script per se, just something that can keep track of the various edits to the stored procedures over time. For example, PROCESS_INVOICE does stuff. It gets updated in some minor way in March. Some time later in say May it is discovered that in a rare case customers get double invoiced (or some other crazy corner case). I'd like to be able to see what has happened over time to this procedure. Currently the way the development environment is setup here I don't have that, which I'm trying to change.
I can recommend DBPro which is part of Visual Studio Team Edition. Have been using it for a few months for storing all parts of the database in Team Foundation Server as well as for deployment and database compares, etc.
Of course, as someone else mentioned, it does depend on your environment and price range.
I wrote a utility for dumping all of the relevant parts of my db into a directory structure that I use SVN on. I never got around to trying to incorporate it into the Manager but, if you're interested, it's here: http://www.reluctantdba.com/dbas-and-programmers/sqltools/svnforsql2005.aspx
It's free and, since I regularly run it, you know any bugs get fixed quickly.
You can always try integrating SourceSafe with SQL Server. Here's a quick start : link . To work with it you've got to have Managment Studio Developers Edition.
Does anyone have some good hints for writing test code for database-backend development where there is a heavy dependency on state?
Specifically, I want to write tests for code that retrieve records from the database, but the answers will depend on the data in the database (which may change over time).
Do people usually make a separate development system with a 'frozen' database so that any given function should always return the exact same result set?
I am quite sure this is not a new issue, so I would be very interested to learn from other people's experience.
Are there good articles out there that discuss this issue of web-based development in general?
I usually write PHP code, but I would expect all of these issues are largely language and framework agnostic.
You should look into DBUnit, or try to find a PHP equivalent (there must be one out there). You can use it to prepare the database with a specific set of data which represents your test data, and thus each test will no longer depend on the database and some existing state. This way, each test is self contained and will not break during further database usage.
Update: A quick google search showed a DB unit extension for PHPUnit.
If you're mostly concerned with data layer testing, you might want to check out this book: xUnit Test Patterns: Refactoring Test Code. I was always unsure about it myself, but this book does a great job to help enumerate the concerns like performance, reproducibility, etc.
I guess it depends what database you're using, but Red Gate (www.red-gate.com) make a tool called SQL Data Generator. This can be configured to fill your database with sensible looking test data. You can also tell it to always use the same seed in its random number generator so your 'random' data is the same every time.
You can then write your unit tests to make use of this reliable, repeatable data.
As for testing the web side of things, I'm currently looking into Selenium (selenium.openqa.org). This appears to be a cross-browser capable test suite which will help you test functionality. However, as with all of these web site test tools, there's no real way to test how well these things look in all of the browsers without casting a human eye over them!
We use an in-memory database (hsql : http://hsqldb.org/). Hibernate (http://www.hibernate.org/) makes it easy for us to point our unit tests at the testing db, with the added bonus that they run as quick as lightning..
I have the exact same problem with my work and I find that the best idea is to have a PHP script to re-create the database and then a separate script where I throw crazy data at it to see if it breaks it.
I have not ever used any Unit testing or suchlike so cannot say if it works or not sorry.
If you can setup the database with a known quantity prior to running the tests and tear down at the end, then you'll know what data you are working with.
Then you can use something like Selenium to easily test from your UI (assuming web-based here, but there are a lot of UI testing tools out there for other UI-flavours) and detect the presence of certain records pulled back from the database.
It's definitely worth setting up either a test version of the database - or make your test scripts populate the database with known data as part of the tests.
You could try http://selenium.openqa.org/ it is more for GUI testing rather than a data layer testing application but does record your actions which then can be played back to automate tests across different platforms.
Here's my strategy (I use JUnit, but I'm sure there's a way to do the equivalent in PHP):
I have a method that runs before all of the Unit Tests for a specific DAO class. It puts the dev database into a known state (adds all test data, etc.). As I run tests, I keep track of any data added to the known state. This data is cleaned up at the end of each test. After all the tests for the class have run, another method removes all the test data in the dev database, leaving it in the state it was in before the tests were run. It's a bit of work to do all this, but I usually write the methods in a DBTestCommon class where all of my DAO test classes can get to them.
I would propose to use three databases. One production database, one development database (filled with some meaningful data for each developer) and one testing database (with empty tables and maybe a few rows that are always needed).
A way to test database code is:
Insert a few rows (using SQL) to initialize state
Run the function that you want to test
Compare expected with actual results. Here you could use your normal unit testing framework
Clean up the rows that were changed (so the next run won't see the previous run)
The cleanup could be done in a standard way (of course, only in the testing database) with DELETE * FROM table.
In general I agree with Peter but for creating and deleting of test data I wouldn't use SQL directly. I prefer to use some CRUD API that is used in product to create data as similar to production as possible...