Does anyone know of any design patterns for interfacing with relational databases? For instance, is it better to have SQL inline in your methods, or instantiate a SQL object where you pass data in and it builds the SQL statements? Do you have a static method to return the connection string and each method just gets that string and connects to the DB, performs its action, then disconnects as needed or do you have other structures that are in charge of connecting, executing, disconnecting, etc?
In otherwords, assuming the database already exists, what is the best way for OO applications to interact with it?
Thanks for any help.
I recommend the book Patterns of Enterprise Application Architecture by Martin Fowler for a thorough review of the most common answers to these questions.
There are a few patterns you can use:
Repository Pattern
Active Record Pattern
I personally would hate to work with a database without an ORM. NHibernate is preferable but iBatis is also an option for existing databases (not to say that NH can't handle existing databases).
In general, the best way for OO applications to interface with a relational database is through an ORM; while this isn't a design pattern per se, it's a type of tool that has a specific usage pattern, so it's similar enough. Object Relational Mapping (ORM) tools provide a mapping between a database and a set of objects in memory; usually, these tools provide means for managing things such as sessions, connections, and transactions. A good example of an ORM that works fantastically well would be Hibernate (NHibernate on .NET).
In my experience it is best to have no SQL statements at all (most ORMs will allow that), and it is best not to have any knowledge of connection details (connection string, etc). Even better if you can have the exact same piece of code working with any major db vendor.
POEAA has a wealth of knowledge on the issue if you intend to roll your own.
Everything you will identify by googling for DAL describes a design pattern. Seems like standing in the middle of the forest and asking to see a tree. There are dozens if not thousands.
Here's a quote from a book I'm reading to start with for looking for resources.
... it is impossible to discuss ORM without talking about patterns and best practices for building persistence layers. Then again, it is also impossible to discuss ORM patterns without calling out the gurus in the industry, namely Martin Fowler, Eric Evans, Jimmy Nilsson, Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, the last four of whom are known in the industry as the Gang of Four (GoF). The purposes of this chapter are to explain and expand some of the patterns created by these gurus and to provide concrete examples using language that every developer can understand.
Related
We are in the early stages of design of a big business application that will have multiple modules. One of the requirements is that the application should be database independent, it should support SQL Server, Oracle, MySQL and DB2.
From what I have read on the web, database independence is a very bad idea: it would result in a hard-to-maintain code, database design with the least-common features in all supported DBMSs, bad performance and bad scalability. My personal gut feeling is that the complexity of this feature, more than any other feature, could increase the development cost and time exponentially. The code will be dreadful.
But I cannot persuade anybody to ignore this feature. The problem is that most data on this issue are empirical data, lacking numbers to support the case. If anyone can share any numbers-supported data on the issue I would appreciate it.
One of the possible design options is to use Entity framework for the database tier with provider for each DBMS. My personal feeling is that writing SQL statements manually without any ORM would be a "must" since you have no control on the SQL generated by the entity framework, and a database-independent scenario will need some SQL tweaking based on the DBMS the code is targeting, and I think that third-party entity framework providers will have a significant amount of bugs that only appear in the complex scenarios that the application will have. I would like to hear from anyone who has had an experience with using entity framework for database-independent scenario before.
Also, one of the possibilities discussed by the team is to support one DBMS (SQL Server, for example) in the first iteration and then add support for other DBMSs in successive iterations. I think that since we will need a database design with the least common features, this development strategy is bad, since we need to know all the features of all databases before we start writing code for the first DBMS. I need to hear from you about this possibility, too.
Have you looked at Comparison of different SQL implementations ?
This is an interesting comparison, I believe it is reasonably current.
Designing a good relational data model for your application should be database agnostic, for the simple reason that all RDBMSs are designed to support the features of relational data models.
On the other hand, implementation of the model is normally influenced by the personal preferences of the people specifying the implementation. Everybody has their own slant on doing things, for instance you mention autoincremented identity in a comment above. These personal preferences for implementation are the hurdles that can limit portability.
Reading between the lines, the requirement for database independence has been handed down from above, with the instruction to make it so. It also seems likely that the application is intended for sale rather than in-house use. In context, the database preference of potential clients is unkown at this stage.
Given such requirements, then the practical questions include:
who will champion each specific database for design and development ? This is important, inasmuch as the personal preferences for implementation of each of these people need to be reconciled to achieve a database-neutral solution. If a specific database has no champion, chances are that implementing the application on this database will be poorly done, if at all.
who has the depth of database experience to act as moderator for the champions ? This person will have to make some hard decisions at times, but horsetrading is part of the fun.
will the programming team be productive without all of their personal favourite features ? Stored procedures, triggers etc. are the least transportable features between RDBMs.
The specification of the application itself will also need to include a clear distinction between database-agnostic and database specific design elements/chapters/modules/whatever. Amongst other things, this allows implementation with one DBMS first, with a defined effort required to implement for each subsequent DBMS.
Database-agnostic parts should include all of the DML, or ORM if you use one.
Database-specific parts should be more-or-less limited to installation and drivers.
Believe it or not, vanilla-flavoured sql is still a very powerful programming language, and personally I find it unlikely that you cannot create a performant application without database-specific features, if you wish to.
In summary, designing database-agnostic applications is an extension of a simple precept:
Encapsulate what varies
I work with Hibernate which gives me the benefits of the ORM plus the database independence. Database specific features are out of the question and this usually improves my design. Everything (domain model, business logic and data access methods) are testable so development is not painful.
مرحبا , Muhammed!
Database independence is neither "good" nor "bad". It is a design decision; it is a trade-off.
Let's talk about the choices:
It would result in a hard to maintain code
This is the choice of your programmers. If you make your code database-independent, then you should use a layer between your code and the database. The best kind of layer is one that someone else has written.
...Database design with the least common features in all supported DBMSs
This is, by definition, true. Luckily, the common features in all supported databases are fairly broad; they should all implement the SQL-99 standard.
...bad performance and bad scalability
This should not be true. The layer should add minimal cost to the database.
...this is the most feature ever that could increase the development cost and time exponentially with complexity. The code will be dreadful.
Again, I recommend that you use a layer between your code and the database.
You didn't specify which language or platform you're writing for. Luckily, many languages have already abstracted out databases:
Java has JDBC drivers
Python has the Python Database API
.NET has ADO.NET
Good luck.
Database independence is an overrated application feature. In reality, it is very rare for a large business application to be moved onto a new database platform after it's built and deployed. You can also miss out on DBMS specific features and optimisations.
That said, if you really want to include database independence, you might be best to write all your database access code against interfaces or abstract classes, like those used in the .NET System.Data.Common namespace (DbConnection, DbCommand, etc.) or use an O/RM library that supports multiple databases like NHibernate.
I am developing an application which at the moment queries a (rather large) database via ADO.NET and hard-coded SQL statements. Admittedly this is ugly (i.e. no compile time errors thrown if a mistake is made in the SQL) and potentially dangerous (due to SQL injections, etc although this is unlikely to be a problem for this particular application) but this wasn't considered initially because this application is really only interested in a very small subset of tables in this database (at least for now...).
LinqToSQL seemed interesting but because this application is required to have the ability to connect to Oracle databases as well, that plan was a non-starter.
Is a project like mine suitable for integration with an ORM framework or would that be overkill?
I think an ORM should always at least be considered.
But it doesn't sound like you're even using business objects (Sometimes referred to as a Data Access Layer or DAL) which greatly undermines the usefulness of an object oriented language. I would address this first. If you find it's too time consuming to create all the CRUD for the business objects it's time for an ORM...
My personal favorite is nHibernate. Big learning curve but definitely worth it.
I would recommend a generated DAL instead of an ORM, or Linq.
Look into subsonic http://subsonicproject.com/. It is an open source DAL generator that is very easy to learn and use, and has a very low overhead.
I would definitely say that it is a candidate for an ORM framework. The overhead of setting up the ORM is quite small once you have familiarized yourself with a framework, and the benefits are many.
As you say, LinqToSQL is not appropriate if you might need Oracle support, but most other frameworks support Oracle.
If you only use a small subset of the tables, then you will only have to map a small subset of the tables and hence the setup cost will decrease even further.
Good luck!
Try using something that generates sql (Like Linq, only with Oracle), instead of an orm.
Why? Jeff Atwood explains.
Quote:
"At first you're like "whee! objects!" and then you realize-- hey, this is a lot of tedious, error-prone mapping code I didn't have to write before... "
I am developing a transactional application in .NET and would like to get some input on how to properly encapsulate database access so that:
I don't have connection strings all
over the place
Multiple calls to the same stored
procedure from different functions
or WORSE, multiple stored
procedures that are different by a
single column
I am interested in knowing if using an ORM like NHibernate is useful, as it may just add another layer of complexity to a rapidly changing data model, and artifacts need to be produced on a tight schedule.
I am more interested in methods or patterns OTHER than ORM packages.
There are at least two widely accepted design patterns used to encapsulate data access:
repository (DDD)
DAO (Data Access Object)
For the sake of completeness I suggest you these books:
Patterns of Enterprise Application Architecture (Fowler)
Domain Driven Design (Evans)
If, as it appears, this is an important project and the DAL is a major risk factor, get someone involved who has done it before. You're exactly right that there are too many ways to run off the rails by trying to get this right the first time without solid experience.
There are any number of patterns for accomplishing this, but I'd look for someone who has a simple set of well-defined patterns they are fully comfortable with.
as stated above, check out the repository and unit of work patterns. the books by fowler and evans are highly recommended. so is karl seguin's reader which gave me a cooler introduction to the just mentioned books. grab it at http://codebetter.com/blogs/karlseguin/archive/2008/06/24/foundations-of-programming-ebook.aspx
As Java Developer, i could suggest to read about jdbc templates, despite of it s'n not .NET you could learn how Spring framework encapsulates data access tier and get some ideas.
I am continuing to delve into Erlang. I am thinking of starting my next web project using Erlang, and at this stage the only thing I will really miss from Ruby on Rails is ActiveRecord.
Is there a good alternative technology for Erlang?
Update:
The closest I have come to a solution is to ErlyDB, a component of ErlyWeb.
ErlyDB is a database abstraction layer
generator for Erlang. ErlyDB combines
database metadata and user-provided
metadata to generate functions that
let you perform common data access
operations in an intuitive manner. It
also provides a single API for working
with different database engines
(although currently, only MySQL is
supported), letting you write portable
data access code.
Well, the major advantages of ActiveRecord (as I see it) are:
You can persist your objects in a relational database nearly transparently.
You can search the database by any attribute of your objects.
You can validate objects when persisting them.
You can have callbacks on deleting, updating, or inserting objects.
With Mnesia:
You can persist any Erlang data absolutely transparently.
Using pattern matching, you can search the database by any attribute of your data or their combination.
QLC gives you a nice query interface for cases when pattern matching isn't enough.
No solutions for validating and callbacks, however...
So, what else do you have in ActiveRecord that is lacking in Mnesia?
I don't think there really is at the time of this writing. That may be because the kinds of systems being written in erlang and the type of people writing them don't really call for Relational Databases. I see much more code using mnesia, CouchDB, Tokyo Cabinet and other such alternative database technologies.
That's not to say someone might not want to create something like active record. It's just hasn't really been a need yet. Maybe you will be the first? :-)
You might be interested in Chicago Boss's "BossRecords":
http://www.chicagoboss.org/api-record.html
They are quite explicitly modeled on the Active Record pattern, and use a lot of compiler magic to make the syntax squeaky clean. BossRecords support save/validate as well as has_many/belongs_to associations. Attributes in your data model are made available through generated functions (e.g. "Employee:first_name()").
Some googling reveals libs / clients / wrappers for Couchdb described "ActiveRecord like libraries like CouchFoo", and advise to steer clear:
http://upstream-berlin.com/2009/03/31/the-case-of-activerecord-vs-couchdb/
http://debasishg.blogspot.com/2009/04/framework-inertia-couchdb-and-case-of.html#
as to your comment on "not suited for web apps yet", I think the pieces are there: mochiweb, couch, yaws, nitrogen, erlyweb. There's some powerful tools, very different paradigm, certainly, from rails, django and PHP.
There is a lot of information out there on object-relational mappers and how to best avoid impedance mismatch, all of which seem to be moot points if one were to use an object database. My question is why isn't this used more frequently? Is it because of performance reasons or because object databases cause your data to become proprietary to your application or is it due to something else?
Familiarity. The administrators of databases know relational concepts; object ones, not so much.
Performance. Relational databases have been proven to scale far better.
Maturity. SQL is a powerful, long-developed language.
Vendor support. You can pick between many more first-party (SQL servers) and third-party (administrative interfaces, mappings and other kinds of integration) tools than is the case with OODBMSs.
Naturally, the object-oriented model is more familiar to the developer, and, as you point out, would spare one of ORM. But thus far, the relational model has proven to be the more workable option.
See also the recent question, Object Orientated vs Relational Databases.
I've been using db4o which is an OODB and it solves most of the cons listed:
Familiarity - Programmers know their language better then SQL (see Native queries)
Performance - this one is highly subjective but you can take a look at PolePosition
Vendor support and maturity - can change over time
Cannot be used by programs that don't also use the same framework - There are OODB standards and you can use different frameworks
Versioning is probably a bit of a bitch - Versioning is actually easier!
The pros I'm interested in are:
Native queries - Db4o lets you write queries in your static typed language so you don't have to worry about mistyping a string and finding data missing at runtime,
Ease of use - Defining buissiness logic in the domain layer, persistence layer (mapping) and finally the SQL database is certainly violation of DRY. With OODB you define your domain where it belongs.
I agree - OODB have a long way to go but they are going. And there are domain problems out there that are better solved by OODB,
One objection to object databases is that it creates a tight coupling between the data and your code. For certain apps this may be OK, but not for others. One nice thing that a relational database gives you is the possibility to put many views on your data.
Ted Neward explains this and a lot more about OODBMSs a lot better than this.
It has nothing to do with performance. That is to say, basically all applications would perform better with an OODB. But that would also put lots of DBA's out of work/having to learn a new technology. Even more people would be out of work correcting errors in the data. That's unlikely to make OODBs popular with established companies. Gavin seems to be totally clueless, a better link would be Kirk
Cons:
Cannot be used by programs that
don't also use the same framework
for accessing the data store, making
it more difficult to use across the
enterprise.
Less resources available online for
non SQL-based database
No compatibility across database
types (can't swap to a different db
provider without changing all the
code)
Versioning is probably a bit of a
bitch. I'd guess adding a new
property to an object isn't quite as
easy as adding a new column to a
table.
Sören
All of the reasons you stated are valid, but I see the problem with OODBMS is the logical data model. The object-model (or rather the network model of the 70s) is not as simple as the relational one, and is therefore inferior.
jodonnel, i dont' see how use of object databases couples application code to the data. You can still abstract your application from the OODB through using a Repository pattern and replace with an ORM backed SQL database if you design things properly.
For an OO application, an OO database will provide a more natural fit for persisting objects.
What's probably true is that you tie your data to your domain model, but then that's the crux!
Wouldn't it be good to have a single way of looking at both data, business rules and processes using a domain centric view?
So, a big pro is that an OODB matches how most modern, enterprise level object orientated software applications are designed, there is no extra effort to design a data layer using a different (relational) design. Cheaper to build and maintain, and in many cases general higher performance.
Cons, just general lack of maturity and adoption i reckon...