What's the best database access pattern for testability? - database

I've read about and dabbled with some including active record, repository, data transfer objects. Which is best?

'Best' questions are not really valid. The world is filled with combination and variations. You should start with the question that you have to answer: What problem are you trying to solve. After you answer that you look at the tools that work best with the issue.

While I agree "best" questions are not the greatest form (since they are so arbitrary), they're not totally irrelevant either.
In the world constructed here at S.O. where developers vote on what's "best", why not have best questions? "Best questions" prompt discussion and differing opinions.
Eventually, when someone "googles" 'data access pattern' they should come to this page and see a plethora of answers then, right?

Repository is probably the best pattern for testability since it allows you to replace a repository with a mock when you need to test. ActiveRecord ties your models to the database (convenient sometimes, but generally more dificcult to test).

It really depends on your task. At least you should know and understand all database access patterns to choose one most suitable for current problem.

This is a good question which should provoke some thought.
Database access is often not subject to rigorous testing - particularly not automated testing, and I would certainly like to increase the amount of testing on my database.
I'm using the MbUnit test framework running from inside Visual Studio to do some testing.
Our application uses stored procedures wherever possible, and the tests I have written set up the database for testing, call a stored procedure, and check the results.
For a collection of related stored procedures, we have a C# file with tests for those stored procs. (However, our coverage is probably about 1% so far!).
Active record is an attractive option because of Ruby's built-in emphasis on automated testing. If I were starting over, that would be a point for using active record.

Related

stored procedures and testing -- still a problem even today. Why?

Right now this is just a theory so if I'm way off, feel free to comment and give some ideas (I'm running out of them). I guess it's more so an update to this question and as I look at the "related question" list -- there's a lot of 0 answers. This tells me there's a real gap.
We have multiple problems with our sql setups in general, the majority of which stem from stored procedures that have grown into monsters from hell and some other user functions skattered about into the db. My biggest concern is they're completely untested -- when something goes wrong, no one can say with 100% certainty "yes, I know for a fact this works". Makes debugging a recurring nightmare.
This afternoon, I got this crazy idea we could start writing some assemblies (CLR-ing yo!) for SQL and test them. I ran into the constraints (static methods only, safe/external/unsafe, etc) and overall, that didn't go all that well. At least not as well as I'd hoped and didn't help me move toward my goal.
I've also tried setting up data in a test by hand (they tried it here too before I showed up). Even using an ORM to seed the data -- this also becomes rather difficult very quickly and a maintenance hassle. Of course, most of this pain is in the data setup and not the actual test.
So what's out there now in 2011 that helps fix/curb this problem or have we (as devs) abandonded the idea of testing stored procedures because of the heavy cost?
You can actually make stored procedure tests as a project. Our DBEs at work do that - here's a link you might like: Database Unit Testing with Visual Studio
We've had a lot of success with DbFit.
Yes, there is a cost to setting up test data (there is no way to avoid this cost IMHO), but the Fitnesse platfom (on which DbFit is based) enables you to reuse data population scripts by including them within multiple tests.
Corporate culture rules the day. Some places test extensively. Other place, well, not so much.
I did a short-term contract with a Fortune 500 a few years ago. Design, build, and deploy internally. My database had to interface with two legacy systems. It was clear early on that I was going to have to spend more time testing that usual. (Some days, a query of historical data would return 35 rows. Other days the identical query would return 20,000 rows.)
I built a tool in Microsoft Access that stored and executed SQL statements. (Access was the only tool I was allowed to use.) I could build a current version of the database, populate it with test data, and run all the tests I'd built--several hundred of them--in about 20 minutes.
It helped a lot to be able to go into meetings armed with a one-page printout that said my code was working exactly like it was when they signed off on it. But it wasn't easily automated--most of the SQL was hand-coded.
Can DBUnit help you?
Not used it much myself but you should be able to set the database to a known state, execute the procedure and then verify the data has changed as expected.
EDIT: After looking in to this more it would seem you need something like SQLunit rather than DBUnit. SQLUnit is described as
SQLUnit is a regression and unit
testing harness for testing database
stored procedures. An SQLUnit test
suite would be written as an XML file.
The SQLUnit harness, which is written
in Java, uses the JUnit unit testing
framework to convert the XML test
specifications to JDBC calls and
compare the results generated from the
calls with the specified results.
There are downsides; it's Java based which might not be your preference and more importantly there doesn't seem to have been much activity on the project since June '06 :(

Testers knowledge on database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a tester, how much should one expect a tester to know about databases?
Is it only writing queries in sql or do we need to know about stored procedures, triggers etc.
Might want to tag this with a testing tag.
Depending what your product is, a tester may or may not need to know specific things. From my perspective, the professional tester is the last line of defense before a product gets into the wild. Thus, testers should at the root of it be using the product like users (crazy, manic users who are only interested in pain and misery, of course).
That mindset in place, you can use black-box testing (testers use no more information than users), white-box testing (testers use all information, source code, etc.), or something in between.(1) In my company we are more black-box, but even for products it is helpful to have a detailed understanding of implementations. This is true not necessarily from a development perspective, but to give ideas about where the complexities - and therefore, often, bugs - lie. Depending on the quality of your developers, the testers will need to be more or less capable of determining this on their own, as it is a rare programmer that can thoroughly test her own code.
Once you have settled what you are testing and what your customers are doing, then you will know if testers need to know SQL, procedures, triggers, etc. If you are delivering, for example, a hosted database solution, then your testers will have to know these things. If you use a traditional, non-custom database server on the back end of your delivered software package and you are a black-box shop, then your testers don't necessarily need to know anything about SQL at all - the software should be handling it. (That's not to say it's not helpful in debugging, test selection, etc., but I'd always rather have a good tester with no knowledge of my field, than an average tester with coincidental domain knowledge.)
Looking at the question again, if you are just asking from a personal skills perspective - then yes, it is always a good thing to learn something else, and it will undoubtedly come in handy eventually. :)
1) I have a strong preference toward black-box testing, but there are plenty of arguments for white-box as well, so it would be worthwhile to review the differences if you're in a position to determine overall testing strategy on a project.
Shouldn't know much, but if you know how to write queries (you understand the language) you know stored procedures and triggers :)
You don't really need to know anything about db administration tho.
I've never expected a tester to have any real knowledge of sql... Unless they were testing a database implementation ;)
My feeling is that testers should be on par with end users plus certain other knowledge. The "other knowledge" is a bit ambiguous based on the project needs but usually is one or more of the following:
Has detailed knowledge of the business plan and exactly what the program is supposed to do. In this case they need to work hand in hand with your business people.
Knows how to do edge case checking. For example, entering 1/1/1800 or 12/31/2040 in a birthdate field. Is this allowed? Other examples include entering negative amounts or even alpha characters in numeric entry fields.
Has experience doing what the users typically do. This might be simply from on the job training such as sitting next to a user for a week or two every so often.
Has experience or knowledge of tools of the trade. There are a ton of testing tools out there which allow recording and playback of testing scenarios. I would expect a tester to be proficient in those tools.
Now, sometimes running simple sql queries is part of the job. For example, you might have to verify a workflow process that kicks off automated systems. In this case, being able to execute sql and reading the results is important.
However, those queries are usually provided to the testing staff by development or a dba. Again, most people can be trained on how to copy / paste, maybe change a parameter, and execute a query while interpreting results in less than an hour.
Obviously, it depends on your unique situation, as any software shop will do business differently than elsewhere.
I've worked with testers who knew nothing of SQL. We trained them to simply execute the sproc tests we wrote and notice if they changed. They would write up the logic for the test cases, and we'd do the code implementation.
So I'd say the testers could know nothing of SQL if your situation would allow them to merely be trained to do whatever tasks they're really needed for.
However, in an ideal world, I'd vote to say the testers should have a good-enough knowledge of code or SQL to write their own tests themselves, though this may introduce layers of arguments in interfacing when you have 2 testing teams (1 strict dev, 1 strict testing).
You will be a better tester if you know SQL. You don't need to know it in depth, but it's useful to know for a number of reasons:
You'll be able to do simple queries
to check the data that is displayed
on the screen is correct with
respect to the database, without having to go to a developer to check. This saves time.
You can start to analyse a bug
without having to go to a developer,
for instance, if the data in the
database is incorrect, if it is
displayed incorrectly. This will
save time, and help you to target
your tests more effectively. This will help you find bugs.
You'll understand (a little) of the
job of the developer, and you can
start to understand whats possible
and what is not. This will help you with your relationship with the developers.
You can improve the quality of your
testing: in the past, I've written
simple Excel spreadsheets which
calculated values based upon
database queries (you are
independently checking the
developers work). This helps the overall quality of the product.
The principle is that your core competence is testing, but it is always a good idea to know a little of all of the surrounding competences: A tester should know a little development, a analysis, a project management, but not to the same depth. In the same way, a developer should know a little about how a tester works (AND analysis and project management).
You don't NEED to know SQL, but it's a good idea. The same applies to triggers and stored procedures.

Why should you use an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.
The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)
Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.
Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.
You can move to different database software easily because you are developing to an abstraction.
Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.
So that your object model and persistence model match.
To minimise duplication of simple SQL queries.
The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution
Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.
I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.
Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.
.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.
convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you
I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

What is a good balance in an MVC model to have efficient data access?

I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.

Where to put your code - Database vs. Application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been developing web/desktop applications for about 6 years now. During the course of my career, I have come across application that were heavily written in the database using stored procedures whereas a lot of application just had only a few basic stored procedures (to read, insert, edit and delete entity records) for each entity.
I have seen people argue saying that if you have paid for an enterprise database use its features extensively. Whereas a lot of "object oriented architects" told me its absolute crime to put anything more than necessary in the database and you should be able to drive the application using the methods on those classes?
Where do you think is the balance?
Thanks,
Krunal
I think it's a business logic vs. data logic thing. If there is logic that ensures the consistency of your data, put it in a stored procedure. Same for convenience functions for data retrieval/update.
Everything else should go into the code.
A friend of mine is developing a host of stored procedures for data analysis algorithms in bioinformatics. I think his approach is quite interesting, but not the right way in the long run. My main objections are maintainability and lacking adaptability.
I'm in the object oriented architects camp. It's not necessarily a crime to put code in the database, as long as you understand the caveats that go along with that. Here are some:
It's not debuggable
It's not subject to source control
Permissions on your two sets of code will be different
It will make it more difficult to track where an error in the data came from if you're accessing info in the database from both places
Anything that relates to Referential Integrity or Consistency should be in the database as a bare minimum. If it's in your application and someone wants to write an application against the database they are going to have to duplicate your code in their code to ensure that the data remains consistent.
PLSQL for Oracle is a pretty good language for accessing the database and it can also give performance improvements. Your application can also be much 'neater' as it can treat the database stored procedures as a 'black box'.
The sprocs themselves can also be tuned and modified without you having to go near your compiled application, this is also useful if the supplier of your application has gone out of business or is unavailable.
I'm not advocating 'everything' should be in database, far from it. Treat each case seperately and logically and you will see which makes more sense, put it in the app or put it in the database.
I'm coming from almost the same background and have heard the same arguments. I do understand that there are very valid reasons to put logic into the database. However, it depends on the type of application and the way it handles data which approach you should choose.
In my experience, a typical data entry app like some customer (or xyz) management will massively benefit from using an ORM layer as there are not so many different views at the data and you can reduce the boilerplate CRUD code to a minimum.
On the other hand, assume you have an application with a lot of concurrency and calculations that span a lot of tables and that has a fine-grained column-level security concept with locking and so on, you're probably better off doing stuff like that directly in the database.
As mentioned before, it also depends on the variety of views you anticipate for your data. If there are many different combinations of columns and tables that need to be presented to the user, you may also be better off just handing back different result sets rather than map your objects one-by-one to another representation.
After all, the database is good at dealing with sets, whereas OO code is good at dealing with single entities.
Reading these answers, I'm quite confused by the lack of understanding of database programming. I am an Oracle Pl/sql developer, we source control for every bit of code that goes into the database. Many of the IDEs provide addins for most of the major source control products. From ClearCase to SourceSafe. The Oracle tools we use allow us to debug the code, so debugging isn't an issue. The issue is more of logic and accessibility.
As a manager of support for about 5000 users, the less places i have to look for the logic, the better. If I want to make sure the logic is applied for ALL applications that use the data , even business logic, i put it in the DB. If the logic is different depending on the application, they can be responsible for it.
#DannySmurf:
It's not debuggable
Depending on your server, yes, they are debuggable. This provides an example for SQL Server 2000. I'm guessing the newer ones also have this. However, the free MySQL server does not have this (as far as I know).
It's not subject to source control
Yes, it is. Kind of. Database backups should include stored procedures. Those backup files might or might not be in your version control repository. But either way, you have backups of your stored procedures.
My personal preference is to try and keep as much logic and configuration out of the database as possible. I am heavily dependent on Spring and Hibernate these days so that makes it a lot easier. I tend to use Hibernate named queries instead of stored procedures and the static configuration information in Spring application context XML files. Anything that needs to go into the database has to be loaded using a script and I keep those scripts in version control.
#Thomas Owens: (re source control) Yes, but that's not source control in the same sense that I can check in a .cs file (or .cpp file or whatever) and go and pick out any revision I want. To do that with database code requires a potentially-significant amount of effort to either retrieve the procedure from the database and transfer it to somewhere in the source tree, or to do a database backup every time a minor change is made. In either case (and regardless of the amount of effort), it's not intuitive; and for many shops, it's not a good enough solution either. There is also the potential here for developers who may not be as studious at that as others to forget to retrieve and check in a revision. It's technically possible to put ANYTHING in source control; the disconnect here is what I would take issue with.
(re debuggable) Fair enough, though that doesn't provide much integration with the rest of the application (where the majority of the code could live). That may or may not be important.
Well, if you care about the consistency of your data, there are reasons to implement code within the database. As others have said, placing code (and/or RI/constraints) inside the database acts to enforce business logic, close to the data itself. And, it provides a common, encapsulated interface, so that your new developer doesn't accidentally create orphan records or inconsistent data.
Well, this one is difficult. As a programmer, you'll want to avoid TSQL and such "Database languages" as much as possible, because they are horrendous, difficult to debug, not extensible and there's nothing you can do with them that you won't be able to do using code on your application.
The only reasons I see for writing stored procedures are:
Your database isn't great (think how SQL Server doesn't implement LIMIT and you have to work around that using a procedure.
You want to be able to change a behaviour by changing code in just one place without re-deploying your client applications.
The client machines have big calculation-power constraints (think small embedded devices).
For most applications though, you should try to keep your code in the application where you can debug it, keep it under version control and fix it using all the tools provided to you by your language.

Resources