stored procedures and testing -- still a problem even today. Why? - sql-server

Right now this is just a theory so if I'm way off, feel free to comment and give some ideas (I'm running out of them). I guess it's more so an update to this question and as I look at the "related question" list -- there's a lot of 0 answers. This tells me there's a real gap.
We have multiple problems with our sql setups in general, the majority of which stem from stored procedures that have grown into monsters from hell and some other user functions skattered about into the db. My biggest concern is they're completely untested -- when something goes wrong, no one can say with 100% certainty "yes, I know for a fact this works". Makes debugging a recurring nightmare.
This afternoon, I got this crazy idea we could start writing some assemblies (CLR-ing yo!) for SQL and test them. I ran into the constraints (static methods only, safe/external/unsafe, etc) and overall, that didn't go all that well. At least not as well as I'd hoped and didn't help me move toward my goal.
I've also tried setting up data in a test by hand (they tried it here too before I showed up). Even using an ORM to seed the data -- this also becomes rather difficult very quickly and a maintenance hassle. Of course, most of this pain is in the data setup and not the actual test.
So what's out there now in 2011 that helps fix/curb this problem or have we (as devs) abandonded the idea of testing stored procedures because of the heavy cost?

You can actually make stored procedure tests as a project. Our DBEs at work do that - here's a link you might like: Database Unit Testing with Visual Studio

We've had a lot of success with DbFit.
Yes, there is a cost to setting up test data (there is no way to avoid this cost IMHO), but the Fitnesse platfom (on which DbFit is based) enables you to reuse data population scripts by including them within multiple tests.

Corporate culture rules the day. Some places test extensively. Other place, well, not so much.
I did a short-term contract with a Fortune 500 a few years ago. Design, build, and deploy internally. My database had to interface with two legacy systems. It was clear early on that I was going to have to spend more time testing that usual. (Some days, a query of historical data would return 35 rows. Other days the identical query would return 20,000 rows.)
I built a tool in Microsoft Access that stored and executed SQL statements. (Access was the only tool I was allowed to use.) I could build a current version of the database, populate it with test data, and run all the tests I'd built--several hundred of them--in about 20 minutes.
It helped a lot to be able to go into meetings armed with a one-page printout that said my code was working exactly like it was when they signed off on it. But it wasn't easily automated--most of the SQL was hand-coded.

Can DBUnit help you?
Not used it much myself but you should be able to set the database to a known state, execute the procedure and then verify the data has changed as expected.
EDIT: After looking in to this more it would seem you need something like SQLunit rather than DBUnit. SQLUnit is described as
SQLUnit is a regression and unit
testing harness for testing database
stored procedures. An SQLUnit test
suite would be written as an XML file.
The SQLUnit harness, which is written
in Java, uses the JUnit unit testing
framework to convert the XML test
specifications to JDBC calls and
compare the results generated from the
calls with the specified results.
There are downsides; it's Java based which might not be your preference and more importantly there doesn't seem to have been much activity on the project since June '06 :(

Related

Entity Framework performance difference with many includes

In my application I have a big nasty query that uses 25 includes. I know that might be a bit exessive, but it haven't given us much problems and have been working fine. If I take the query generated by EF and run it manually in the database it takes around 500ms, and from the code EF uses around 700ms to get the data from the database and build up the object structure, and that is perfectly acceptable.
The problem however is on the production server. If I run the query manually there I see the same around 500ms time usage to fetch the data, but entity framework now uses around 11000ms to get the data and build the object, and that is of course not good by any measure.
So my question is: What can be the cause of these extreme differences when the query fired manually on the database is roughly the same?
I ended up using Entity Framework in a more "manual" way.
So instead of using dbContext.Set<T> and a lot of includes, I had to manually use a series of dbContext.Database.SqlQuery<T>("select something from something else"). After a bit of painful coding binding all the objects together I tested it on the machines that had the problem, and now it worked as expected on all machines.
So I don't know why it worked on some machines and not others, but it seems that EF has problems on some machine setups when there are a great deal of includes.

Feasible way to do automated performance testing on various database technologies?

A lot of guys on this site state that: "Optimizing something for performance is the root of all evil". My problem now is that I have a lot of complex SQL queries, many of them utilizing user created functions in PL/pgSQL or PL/python. My problem is that I do not have any performance profiling tool to show me, which functions actually make the queries slow. My current method is to exclude the various functions and take the time on the query for each one. I know that I could use explain analyze as well, but I do not think it will provide me with the information about user created functions.
My current method is quite tedious, especially since there is not query progress in PostgreSQL so I have sometimes have to wait for the query to run for 60 seconds, if I choose to run it on too much data.
Therefore, I am thinking whether it could be a good idea to create a tool, which will automatically do a performance profiling of SQL queries by modifying the SQL query and take the actual processing time on various versions of it. Each version would be a simplified one, which would maybe just contain a single user created function. I know that I am not describing how to do this clearly, and I can think of a lot of complicating factors, but I can also see that there are workarounds for many of these factors. I basically need your gut feeling on whether such a method is feasible.
Another similar idea is to run the query setting server settings work_mem to various values, and showing how this would impact the performance.
Such a tool could be written using JDBC so it could be modified to work across all major databases. In this case it might be a viable commercial product.
Apache JMeter can be used to load test and monitor the performance of SQL Queries (using JDBC). It will howerever not modify your SQL.
Actually I don't think any tool out there could simplify and then re-run your SQL. How should that "simplifying" work?

Does it make sense to use an OR-Mapper?

Does it make sense to use an OR-mapper?
I am putting this question of there on stack overflow because this is the best place I know of to find smart developers willing to give their assistance and opinions.
My reasoning is as follows:
1.) Where does the SQL belong?
a.) In every professional project I have worked on, security of the data has been a key requirement. Stored Procedures provide a natural gateway for controlling access and auditing.
b.) Issues with Applications in production can often be resolved between the tables and stored procedures without putting out new builds.
2.) How do I control the SQL that is generated? I am trusting parse trees to generate efficient SQL.
I have quite a bit of experience optimizing SQL in SQL-Server and Oracle, but would not feel cheated if I never had to do it again. :)
3.) What is the point of using an OR-Mapper if I am getting my data from stored procedures?
I have used the repository pattern with a homegrown generic data access layer.
If a collection needed to be cached, I cache it. I also have experience using EF on a small CRUD application and experience helping tuning an NHibernate application that was experiencing performance issues. So I am a little biased, but willing to learn.
For the past several years we have all been hearing a lot of respectable developers advocating the use of specific OR-Mappers (Entity-Framework, NHibernate, etc...).
Can anyone tell me why someone should move to an ORM for mainstream development on a major project?
edit: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html seems to have a strong discussion on this topic but it is out of date.
Yet another edit:
Everyone seems to agree that Stored Procedures are to be used for heavy-duty enterprise applications, due to their performance advantage and their ability to add programming logic nearer to the data.
I am seeing that the strongest argument in favor of OR mappers is developer productivity.
I suspect a large motivator for the ORM movement is developer preference towards remaining persistence-agnostic (don’t care if the data is in memory [unless caching] or on the database).
ORMs seem to be outstanding time-savers for local and small web applications.
Maybe the best advice I am seeing is from client09: to use an ORM setup, but use Stored Procedures for the database intensive stuff (AKA when the ORM appears to be insufficient).
I was a pro SP for many, many years and thought it was the ONLY right way to do DB development, but the last 3-4 projects I have done I completed in EF4.0 w/out SP's and the improvements in my productivity have been truly awe-inspiring - I can do things in a few lines of code now that would have taken me a day before.
I still think SP's are important for some things, (there are times when you can significantly improve performance with a well chosen SP), but for the general CRUD operations, I can't imagine ever going back.
So the short answer for me is, developer productivity is the reason to use the ORM - once you get over the learning curve anyway.
A different approach... With the raise of No SQL movement now, you might want to try object / document database instead to store your data. In this way, you basically will avoid the hell that is OR Mapping. Store the data as your application use them and do transformation behind the scene in a worker process to move it into a more relational / OLAP format for further analysis and reporting.
Stored procedures are great for encapsulating database logic in one place. I've worked on a project that used only Oracle stored procedures, and am currently on one that uses Hibernate. We found that it is very easy to develop redundant procedures, as our Java developers weren't versed in PL/SQL package dependencies.
As the DBA for the project I find that the Java developers prefer to keep everything in the Java code. You run into the occassional, "Why don't I just loop through all the Objects that just returned?" This caused a number of "Why isn't the index taking care of this?" issues.
With Hibernate your entities can contain not only their linked database properties, but can also contain any actions taken upon them.
For example, we have a Task Entity. One could Add or Modify a Task among other things. This can be modeled in the Hibernate Entity in Named Queries.
So I would say go with an ORM setup, but use procedures for the database intensive stuff.
A downside of keeping your SQL in Java is that you run the risk of developers using non-parameterized queries leaving your app open to a SQL Injection.
The following is just my private opinion, so it's rather subjective.
1.) I think that one needs to differentiate between local applications and enterprise applications. For local and some web applications, direct access to the DB is okay. For enterprise applications, I feel that the better encapsulation and rights management makes stored procedures the better choice in the end.
2.) This is one of the big issues with ORMs. They are usually optimized for specific query patterns, and as long as you use those the generated SQL is typically of good quality. However, for complex operations which need to be performed close to the data to remain efficient, my feeling is that using manual SQL code is stilol the way to go, and in this case the code goes into SPs.
3.) Dealing with objects as data entities is also beneficial compared to direct access to "loose" datasets (even if those are typed). Deserializing a result set into an object graph is very useful, no matter whether the result set was returned by a SP or from a dynamic SQL query.
If you're using SQL Server, I invite you to have a look at my open-source bsn ModuleStore project, it's a framework for DB schema versioning and using SPs via some lightweight ORM concept (serialization and deserialization of objects when calling SPs).

Testers knowledge on database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a tester, how much should one expect a tester to know about databases?
Is it only writing queries in sql or do we need to know about stored procedures, triggers etc.
Might want to tag this with a testing tag.
Depending what your product is, a tester may or may not need to know specific things. From my perspective, the professional tester is the last line of defense before a product gets into the wild. Thus, testers should at the root of it be using the product like users (crazy, manic users who are only interested in pain and misery, of course).
That mindset in place, you can use black-box testing (testers use no more information than users), white-box testing (testers use all information, source code, etc.), or something in between.(1) In my company we are more black-box, but even for products it is helpful to have a detailed understanding of implementations. This is true not necessarily from a development perspective, but to give ideas about where the complexities - and therefore, often, bugs - lie. Depending on the quality of your developers, the testers will need to be more or less capable of determining this on their own, as it is a rare programmer that can thoroughly test her own code.
Once you have settled what you are testing and what your customers are doing, then you will know if testers need to know SQL, procedures, triggers, etc. If you are delivering, for example, a hosted database solution, then your testers will have to know these things. If you use a traditional, non-custom database server on the back end of your delivered software package and you are a black-box shop, then your testers don't necessarily need to know anything about SQL at all - the software should be handling it. (That's not to say it's not helpful in debugging, test selection, etc., but I'd always rather have a good tester with no knowledge of my field, than an average tester with coincidental domain knowledge.)
Looking at the question again, if you are just asking from a personal skills perspective - then yes, it is always a good thing to learn something else, and it will undoubtedly come in handy eventually. :)
1) I have a strong preference toward black-box testing, but there are plenty of arguments for white-box as well, so it would be worthwhile to review the differences if you're in a position to determine overall testing strategy on a project.
Shouldn't know much, but if you know how to write queries (you understand the language) you know stored procedures and triggers :)
You don't really need to know anything about db administration tho.
I've never expected a tester to have any real knowledge of sql... Unless they were testing a database implementation ;)
My feeling is that testers should be on par with end users plus certain other knowledge. The "other knowledge" is a bit ambiguous based on the project needs but usually is one or more of the following:
Has detailed knowledge of the business plan and exactly what the program is supposed to do. In this case they need to work hand in hand with your business people.
Knows how to do edge case checking. For example, entering 1/1/1800 or 12/31/2040 in a birthdate field. Is this allowed? Other examples include entering negative amounts or even alpha characters in numeric entry fields.
Has experience doing what the users typically do. This might be simply from on the job training such as sitting next to a user for a week or two every so often.
Has experience or knowledge of tools of the trade. There are a ton of testing tools out there which allow recording and playback of testing scenarios. I would expect a tester to be proficient in those tools.
Now, sometimes running simple sql queries is part of the job. For example, you might have to verify a workflow process that kicks off automated systems. In this case, being able to execute sql and reading the results is important.
However, those queries are usually provided to the testing staff by development or a dba. Again, most people can be trained on how to copy / paste, maybe change a parameter, and execute a query while interpreting results in less than an hour.
Obviously, it depends on your unique situation, as any software shop will do business differently than elsewhere.
I've worked with testers who knew nothing of SQL. We trained them to simply execute the sproc tests we wrote and notice if they changed. They would write up the logic for the test cases, and we'd do the code implementation.
So I'd say the testers could know nothing of SQL if your situation would allow them to merely be trained to do whatever tasks they're really needed for.
However, in an ideal world, I'd vote to say the testers should have a good-enough knowledge of code or SQL to write their own tests themselves, though this may introduce layers of arguments in interfacing when you have 2 testing teams (1 strict dev, 1 strict testing).
You will be a better tester if you know SQL. You don't need to know it in depth, but it's useful to know for a number of reasons:
You'll be able to do simple queries
to check the data that is displayed
on the screen is correct with
respect to the database, without having to go to a developer to check. This saves time.
You can start to analyse a bug
without having to go to a developer,
for instance, if the data in the
database is incorrect, if it is
displayed incorrectly. This will
save time, and help you to target
your tests more effectively. This will help you find bugs.
You'll understand (a little) of the
job of the developer, and you can
start to understand whats possible
and what is not. This will help you with your relationship with the developers.
You can improve the quality of your
testing: in the past, I've written
simple Excel spreadsheets which
calculated values based upon
database queries (you are
independently checking the
developers work). This helps the overall quality of the product.
The principle is that your core competence is testing, but it is always a good idea to know a little of all of the surrounding competences: A tester should know a little development, a analysis, a project management, but not to the same depth. In the same way, a developer should know a little about how a tester works (AND analysis and project management).
You don't NEED to know SQL, but it's a good idea. The same applies to triggers and stored procedures.

What's the best database access pattern for testability?

I've read about and dabbled with some including active record, repository, data transfer objects. Which is best?
'Best' questions are not really valid. The world is filled with combination and variations. You should start with the question that you have to answer: What problem are you trying to solve. After you answer that you look at the tools that work best with the issue.
While I agree "best" questions are not the greatest form (since they are so arbitrary), they're not totally irrelevant either.
In the world constructed here at S.O. where developers vote on what's "best", why not have best questions? "Best questions" prompt discussion and differing opinions.
Eventually, when someone "googles" 'data access pattern' they should come to this page and see a plethora of answers then, right?
Repository is probably the best pattern for testability since it allows you to replace a repository with a mock when you need to test. ActiveRecord ties your models to the database (convenient sometimes, but generally more dificcult to test).
It really depends on your task. At least you should know and understand all database access patterns to choose one most suitable for current problem.
This is a good question which should provoke some thought.
Database access is often not subject to rigorous testing - particularly not automated testing, and I would certainly like to increase the amount of testing on my database.
I'm using the MbUnit test framework running from inside Visual Studio to do some testing.
Our application uses stored procedures wherever possible, and the tests I have written set up the database for testing, call a stored procedure, and check the results.
For a collection of related stored procedures, we have a C# file with tests for those stored procs. (However, our coverage is probably about 1% so far!).
Active record is an attractive option because of Ruby's built-in emphasis on automated testing. If I were starting over, that would be a point for using active record.

Resources