Related
We are in the early stages of design of a big business application that will have multiple modules. One of the requirements is that the application should be database independent, it should support SQL Server, Oracle, MySQL and DB2.
From what I have read on the web, database independence is a very bad idea: it would result in a hard-to-maintain code, database design with the least-common features in all supported DBMSs, bad performance and bad scalability. My personal gut feeling is that the complexity of this feature, more than any other feature, could increase the development cost and time exponentially. The code will be dreadful.
But I cannot persuade anybody to ignore this feature. The problem is that most data on this issue are empirical data, lacking numbers to support the case. If anyone can share any numbers-supported data on the issue I would appreciate it.
One of the possible design options is to use Entity framework for the database tier with provider for each DBMS. My personal feeling is that writing SQL statements manually without any ORM would be a "must" since you have no control on the SQL generated by the entity framework, and a database-independent scenario will need some SQL tweaking based on the DBMS the code is targeting, and I think that third-party entity framework providers will have a significant amount of bugs that only appear in the complex scenarios that the application will have. I would like to hear from anyone who has had an experience with using entity framework for database-independent scenario before.
Also, one of the possibilities discussed by the team is to support one DBMS (SQL Server, for example) in the first iteration and then add support for other DBMSs in successive iterations. I think that since we will need a database design with the least common features, this development strategy is bad, since we need to know all the features of all databases before we start writing code for the first DBMS. I need to hear from you about this possibility, too.
Have you looked at Comparison of different SQL implementations ?
This is an interesting comparison, I believe it is reasonably current.
Designing a good relational data model for your application should be database agnostic, for the simple reason that all RDBMSs are designed to support the features of relational data models.
On the other hand, implementation of the model is normally influenced by the personal preferences of the people specifying the implementation. Everybody has their own slant on doing things, for instance you mention autoincremented identity in a comment above. These personal preferences for implementation are the hurdles that can limit portability.
Reading between the lines, the requirement for database independence has been handed down from above, with the instruction to make it so. It also seems likely that the application is intended for sale rather than in-house use. In context, the database preference of potential clients is unkown at this stage.
Given such requirements, then the practical questions include:
who will champion each specific database for design and development ? This is important, inasmuch as the personal preferences for implementation of each of these people need to be reconciled to achieve a database-neutral solution. If a specific database has no champion, chances are that implementing the application on this database will be poorly done, if at all.
who has the depth of database experience to act as moderator for the champions ? This person will have to make some hard decisions at times, but horsetrading is part of the fun.
will the programming team be productive without all of their personal favourite features ? Stored procedures, triggers etc. are the least transportable features between RDBMs.
The specification of the application itself will also need to include a clear distinction between database-agnostic and database specific design elements/chapters/modules/whatever. Amongst other things, this allows implementation with one DBMS first, with a defined effort required to implement for each subsequent DBMS.
Database-agnostic parts should include all of the DML, or ORM if you use one.
Database-specific parts should be more-or-less limited to installation and drivers.
Believe it or not, vanilla-flavoured sql is still a very powerful programming language, and personally I find it unlikely that you cannot create a performant application without database-specific features, if you wish to.
In summary, designing database-agnostic applications is an extension of a simple precept:
Encapsulate what varies
I work with Hibernate which gives me the benefits of the ORM plus the database independence. Database specific features are out of the question and this usually improves my design. Everything (domain model, business logic and data access methods) are testable so development is not painful.
مرحبا , Muhammed!
Database independence is neither "good" nor "bad". It is a design decision; it is a trade-off.
Let's talk about the choices:
It would result in a hard to maintain code
This is the choice of your programmers. If you make your code database-independent, then you should use a layer between your code and the database. The best kind of layer is one that someone else has written.
...Database design with the least common features in all supported DBMSs
This is, by definition, true. Luckily, the common features in all supported databases are fairly broad; they should all implement the SQL-99 standard.
...bad performance and bad scalability
This should not be true. The layer should add minimal cost to the database.
...this is the most feature ever that could increase the development cost and time exponentially with complexity. The code will be dreadful.
Again, I recommend that you use a layer between your code and the database.
You didn't specify which language or platform you're writing for. Luckily, many languages have already abstracted out databases:
Java has JDBC drivers
Python has the Python Database API
.NET has ADO.NET
Good luck.
Database independence is an overrated application feature. In reality, it is very rare for a large business application to be moved onto a new database platform after it's built and deployed. You can also miss out on DBMS specific features and optimisations.
That said, if you really want to include database independence, you might be best to write all your database access code against interfaces or abstract classes, like those used in the .NET System.Data.Common namespace (DbConnection, DbCommand, etc.) or use an O/RM library that supports multiple databases like NHibernate.
This is vaguely related to:
Should I design the application or model (database) first?
Design from the database first through to UI or t’other way round?
But my question is more about modeling and artifacts and less about the right way to do design. I'm trying to figure out what sort of design artifact would best enunciate the link between features (use cases), screens and database elements (tables and columns, most particularly). UML is very code-centric. Database models are very database centric. And of screen designs are UI centric!
Here's the deal... my team is working on the first release of the product. We used use cases, then did screen designs and database design was somewhat isolated from the two. A critical area for bugs was the lack of traceability between the use cases and their accompanying screens and the database. In our product, there's a very high degree of overlap between use cases and database elements. Many use cases touch over 75% of the database infrastructure. So we have high contention over database design areas, and it's easy for a small database change to disrupt the lower levels of business logic.
For our next release, I want the developers and our DBA to have a really clear insight into what parts of the database each feature touches. The use case/screen design approach worked well, so we're keeping it... the trick is linking each use case and screens to the database model so the relationships are really obvious and hard to forget about.
On smaller projects (we're only 10 people, but often I've worked on teams of 3 or less), I've created my own custom diagrams to show this part of the design. Sort of a fusion of screen, UML and database table, done in Visio with no link to actual code or SQL. I'm not sure it will work for a larger team, as its highly manual to keep up to date, and it doesn't auto generate code the way our database modeling tool does.
Any recommendations? Is there a commonly accepted mechanism for this?
FYI - we're pretty waterfall, that isn't going to change any time soon. And we do love artifacts... Saying "switch to agile" is not a viable solution for our group.
I can't tell from your question how detailed your use cases are. I get the impression that they may be high-level use cases, not broken down into detailed uses cases (perhaps through include or extend relationships.
In any case, I prefer to start with Requirements, and trace them to use cases. While I'm writing the use cases and the use case diagrams, I'm also creating a domain model (a high-level class diagram). This is mostly to give me something to discuss with stakeholders ("did I get that right?").
When the use cases and domain model are finished, it's possible to begin to work on screen design, and possibly also for an activity model, if there are complex interactions between screens. I would treat the screens as though they were classes with UI - a screen might have a FirstName attribute, which I'd note as being related to the FirstName attribute of the Person entity in my domain model. Yet the FirstName attribute might be represented on that screen as a text box.
At the same time, physical database design can occur. This would produce a class or ER diagram, with traceability back to the domain model. Eventually, you might find that some of your screen attributes or activity modeling refer to things that are part of the physical database model that are not present in the domain model. It's ok to relate a screen "PersonalName" attribute to the computed PersonalName column in the Person table in the People schema.
The tool I use for this sort of thing is Sparx Enterprise Architect. It's a great tool, and can do all of this and more, even in the Professional edition.
I also have to say for the sake of Truth that I mostly model on my own - I haven't yet worked on a project where the model, code and database were being developed by separate people. If someone told me that the above wouldn't work in "real life", I might be forced to believe them.
I am not sure I understand your question clearly, but I'll try to respond based on some reasonable assumptions...
Essentially, my recommendation is the same as what John Saunders already suggested: to consider using UML along with a good UML tool. But I would like to add a few points that might be important in your specific situation.
First and foremost, I don't think that UML is "too code-centric". To the contrary, it can be used to model pretty much any aspect of a software system, almost at any level of abstraction. With a good tool at hand (like Sparx EA), the beauty of UML is that you actually do get a well-defined model under the hood (as opposed to just a set of unconnected drawings/charts). As a result, even if the tool itself does not give you a feature that you are looking for (like traceability from DB to use cases)... well, at least you have an option to automate (or at least semi-automate) the task yourself: for example, you can export your UML model into an XMI (standard!) and then derive whatever traceability-related info you need from there (e.g. using XSL or any XML-aware library for your favorite programming/scripting language). I am not saying that it would be easy to do (especially if you want traceability on the level of individual DB columns 8-), but it's possible and it is very likely to beat any manual method if it has to scale along the size of the project.
BTW, speaking of Sparx EA... I don't know all of its capabilities yet, but it has so many that I would not be surprized if it allowed you to select a class (or even an attribute of a class) and show you other model elements that depend on it in some way. You might want to check this out.
Having said all that, I do understand that you may have at least the following two important concerns about UML:
It may appear to require too many modeling details to be in place to get what you want.
As any "universal tool", it may be grossly inferior to specialized modeling tools that you already use.
Regarding issue #1: Again, with a good UML tool at hand, you might be able to do as many shortcuts as you want. For example, instead of building a very detailed and accurate activity model for a use case, you could focus just on classes involved in the use case (just enough to enable tracking classes back to use cases). The same applies to UI, of course.
Regarding issue #2: I don't know what exact tools do you use now to model use cases, UI, and DB schema. So, theoretically it is possible that they are all so superior to UML that you wouldn't want to give any of them up in exchange of easier traceability. However, something tells me that your DB modeling tool (with its code-generating capabilities) might be the only one that is truly indispensable. If that's the case, then I would still recommend to consider using UML: you just do not model down to DB schema level and "stop" at the level of domain model (even if you do not have it in your application!). At that point, the UML tool would give you traceability from domain model elements (entities, their attributes, and their relationships) back to use cases and UI elements, and mappings between your domain model and DB schema could be left "in the air" because, in the vast majority of cases, they should be simple enough to track without drawing anything. This might not give you 100% of what you want, but it could give you 80% that would be sufficient to mitigate most of you problems.
The bottom line: if you are using three different tools/technologies to model three different aspects of your system... well, it's obvious that any traceability between those three aspects remains at mercy of those three tools: you could automate only as much as those tools allow (which probably means that you are going to be stuck with a lot of laborous manual tasks to do). As of today, UML appears the only well-defined and widely supported "lingua franca" that could help you to connect your models and could enable automation of substantial part of your analytical activities. Just make sure you distinguish UML "just-drawing tools" (like most of Visio add-ons and stencils) from true UML modeling tools (like Sparx EA and a bunch of others).
Your use cases is a good place to start from.
Convert your use cases into
executable test code. This test code
needs to verify that the resultant
return values are according to the
requirements of your use cases.
The smaller the parts of the work you
can identify and test, the more
robust you will be able to build
your application.
This means that the interaction of
your use cases with a large part
of your database and the GUI, will
be simpler to understand.
It's hard to lock-down the architecture or business logic interplay in complex projects with complete upfront design of the different layers. One truly learns about what will be able to facilitate your requirements only as you get to the point of implementing them.
As a developer, find the techniques, tools and processes that help you do your job in the best way possible. Don't judge these on their origin. Judge them on their value in making you the best developer possible.
Some items from the agile world has certainly made a big difference to the quality and productivity of my work. This doesn't require throwing out the apple cart and putting an experienced waterfall team into disarray.
The database should model your Problem Domain. It should model it completely enough so that you can extract solutions -- truths -- from it. Bad design is essentially "lying" to the database (allowing the possibility of invalid data or relationships), and when you "lie" to your database, it'll "lie" to you when you ask it questions.
Simple examples are modeling many-to-many relation where a relation can only be one-to-many, or assuming values can't be null, or treating a foreign key as an attribute. Many of these can be avoided by proper normalization, which requires you to explicitly find out what is a key and what isn't.
By "making illegal states unrepresentable" in the model, you avoid having to write "defensive" code to check for the impossible or to validate that a relation is possible, as impossible things are made unrepresentable because of table structures or declarative check constraints.
This lowers the cost of writing your code, as you can concentrate for the most part, on what it needs to do, rather than guarding against the impossible.
Is LINQ a kind of Object-Relational Mapper?
LINQ in itself is a set of language extensions to aid querying, readability and reduce code. LINQ to SQL is a kind of OR Mapper, but it isn't particularly powerful. The Entity Framework is often referred to as an OR Mapper, but it does quite a lot more.
There are several other LINQ to X implementations around, including LINQ to NHibernate and LINQ to LLBLGenPro that offer OR Mapping and supporting frameworks in a broadly similar fashion to the Entity Framework.
If you are just learning LINQ though, I'd recommend you stick to LINQ to Objects to get a feel for it, rather than diving into one of the more complicated flavours :-)
LINQ is not an ORM at all. LINQ is a way of querying "stuff", and can be more or less seen as a SQL-like language extension for different things (IEnumerables).
There are various types of "stuff" that can be queried, among them SQL Server databases. This is called LINQ-to-SQL. The way it works is that it generates (implicit) classes based on the structure of the DB and your query. In this sense it works much more like a code generator.
LINQ-to-SQL is not an ORM because it doesn't try at all to solve the object-relational impedance mismatch. In an ORM you design the classes and then either map them manually to tables or let the ORM generate the database. If you then change the database for whatever reason (typically refactoring, renormalization, denormalization), many times you are able to keep the classes as they are by changing the mapping.
LINQ-to-SQL does nothing of the sort. Your LINQ queries will be tightly coupled to the database structure. If you change the DB, you will probably have to change the LINQ as well.
LINQ to SQL (part of Visual Studio 2008) is an OR Mapper.
LINQ is a new query language that can be used to query many different types of sources.
LINQ itself is not a ORM. LINQ is the language features and methods that exist in allowing you to query objects like SQL.
"LINQ to SQL" is a provider that allows us to use LINQ against SQL strongly-typed objects.
I think a good test to ascertain whether a platform or code block displays the characteristics of an O/R-M is simply:
With his solution hat on, does the developer(s) (or his/her code generator) have any direct, unabstracted knowledge of what's inside the database?
With this criterion, the answer for differing LINQ implementations can be
Yes, knowledge of the database schema is entirely contained within the roll-your-own, LINQ utilizing O/R-M code layerorNo, knowledge of the database schema is scattered throughout the application
Further, I'd extend this characterization to three simple levels of O/R-M.
1. Abandonment.
It's a small app w/ a couple of developers and the object/data model isn't that complex and doesn't change very often. The small dev team can stay on top of it.
2. Roll your own in the data access layer.
With some managable refactoring in a data access layer, the desired O/R-M functionality can be effected in an intermediate layer by the relatively small dev team. Enough to keep the entire team on the same page.
3. Enterprise-level O/R-M specification defining/overhead introducing tools.
At some level of complexity, the need to keep all devs on the same page just swamps any overhead introduced by the formality. No need to reinvent the wheel at this level of complexity. N-hibernate or the (rough) V1.0 Entity Framework are examples of this scale.
For a richer classification, from which I borrowed and simplified, see Ted Neward's classic post at
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
where he classifies O/R-M treatments (or abdications) as
1. Abandonment. Developers simply give up on objects entirely, and return to a programming model that doesn't create the object/relational impedance mismatch. While distasteful, in certain scenarios an object-oriented approach creates more overhead than it saves, and the ROI simply isn't there to justify the cost of creating a rich domain model. ([Fowler] talks about this to some depth.) This eliminates the problem quite neatly, because if there are no objects, there is no impedance mismatch.
2. Wholehearted acceptance. Developers simply give up on relational storage entirely, and use a storage model that fits the way their languages of choice look at the world. Object-storage systems, such as the db4o project, solve the problem neatly by storing objects directly to disk, eliminating many (but not all) of the aforementioned issues; there is no "second schema", for example, because the only schema used is that of the object definitions themselves. While many DBAs will faint dead away at the thought, in an increasingly service-oriented world, which eschews the idea of direct data access but instead requires all access go through the service gateway thus encapsulating the storage mechanism away from prying eyes, it becomes entirely feasible to imagine developers storing data in a form that's much easier for them to use, rather than DBAs.
3. Manual mapping. Developers simply accept that it's not such a hard problem to solve manually after all, and write straight relational-access code to return relations to the language, access the tuples, and populate objects as necessary. In many cases, this code might even be automatically generated by a tool examining database metadata, eliminating some of the principal criticism of this approach (that being, "It's too much code to write and maintain").
4. Acceptance of O/R-M limitations. Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems. Doing so carries its own fair share of risks, however, as developers using an O/R-M must be aware of any caching the O/R-M solution does within it, because the "raw" relational access will clearly not be able to take advantage of that caching layer.
5. Integration of relational concepts into the languages. Developers simply accept that this is a problem that should be solved by the language, not by a library or framework. For the last decade or more, the emphasis on solutions to the O/R problem have focused on trying to bring objects closer to the database, so that developers can focus exclusively on programming in a single paradigm (that paradigm being, of course, objects). Over the last several years, however, interest in "scripting" languages with far stronger set and list support, like Ruby, has sparked the idea that perhaps another solution is appropriate: bring relational concepts (which, at heart, are set-based) into mainstream programming languages, making it easier to bridge the gap between "sets" and "objects". Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such as functional/object hybrid languages like Scala or F#, as well as direct integration into traditional O-O languages, such as the LINQ project from Microsoft for C# and Visual Basic. One such effort that failed, unfortunately, was the SQL/J strategy; even there, the approach was limited, not seeking to incorporate sets into Java, but simply allow for embedded SQL calls to be preprocessed and translated into JDBC code by a translator.
6. Integration of relational concepts into frameworks. Developers simply accept that this problem is solvable, but only with a change of perspective. Instead of relying on language or library designers to solve this problem, developers take a different view of "objects" that is more relational in nature, building domain frameworks that are more directly built around relational constructs. For example, instead of creating a Person class that holds its instance data directly in fields inside the object, developers create a Person class that holds its instance data in a RowSet (Java) or DataSet (C#) instance, which can be assembled with other RowSets/DataSets into an easy-to-ship block of data for update against the database, or unpacked from the database into the individual objects.
Linq To SQL using the dbml designer yes, otherwise Linq is just a set of extension methods for Enumerables.
The age old question. Where should you put your business logic, in the database as stored procedures ( or packages ), or in the application/middle tier? And more importantly, Why?
Assume database independence is not a goal.
Maintainability of your code is always a big concern when determining where business logic should go.
Integrated debugging tools and more powerful IDEs generally make maintaining middle tier code easier than the same code in a stored procedure. Unless there is a real reason otherwise, you should start with business logic in your middle tier/application and not in stored procedures.
However when you come to reporting and data mining/searching, stored procedures can often a better choice. This is thanks to the power of the databases aggregation/filtering capabilities and the fact you are keeping processing very close the the source of the data. But this may not be what most consider classic business logic anyway.
Put enough of the business logic in the database to ensure that the data is consistent and correct.
But don't fear having to duplicate some of this logic at another level to enhance the user experience.
For very simple cases you can put your business logic in stored procedures. Usually even the simple cases tend to get complicated over time. Here are the reasons I don't put business logic in the database:
Putting the business logic in the database tightly couples it to the technical implementation of the database. Changing a table will cause you to change a lot of the stored procedures again causing a lot of extra bugs and extra testing.
Usually the UI depends on business logic for things like validation. Putting these things in the database will cause tight coupling between the database and the UI or in different cases duplicates the validation logic between those two.
It will get hard to have multiple applications work on the same database. Changes for one aplication will cause others to break. This can quickly turn into a maintenance nightmare. So it doesn't really scale.
More practically SQL isn't a good language to implement business logic in an understandable way. SQL is great for set based operations but it misses constructs for "programming in the large" it's hard to maintain big amounts of stored procedures. Modern OO languages are better suited and more flexible for this.
This doesn't mean you can't use stored procs and views. I think it sometimes is a good idea to put an extra layer of stored procedures and views between the tables and application(s) to decouple the two. That way you can change the layout of the database without changing external interface allowing you to refactor the database independently.
It's really up to you, as long as you're consistent.
One good reason to put it in your database layer: if you are fairly sure that your clients will never ever change their database back-end.
One good reason to put it in the application layer: if you are targeting multiple persistence technologies for your application.
You should also take into account core competencies. Are your developers mainly application layer developers, or are they primarily DBA-types?
While there is no one right answer - it depends on the project in question, I would recommend the approach advocated in "Domain Driven Design" by Eric Evans. In this approach the business logic is isolated in its own layer - the domain layer - which sits on top of the infrastructure layer(s) - which could include your database code, and below the application layer, which sends the requests into the domain layer for fulfilment and listens for confirmation of their completion, effectively driving the application.
This way, the business logic is captured in a model which can be discussed with those who understand the business aside from technical issues, and it should make it easier to isolate changes in the business rules themselves, the technical implementation issues, and the flow of the application which interacts with the business (domain) model.
I recommend reading the above book if you get the chance as it is quite good at explaining how this pure ideal can actually be approximated in the real world of real code and projects.
While there are certainly benefits to have the business logic on the application layer, I'd like to point out that the languages/frameworks seem to change more frequently then the databases.
Some of the systems that I support, went through the following UIs in the last 10-15 years: Oracle Forms/Visual Basic/Perl CGI/ ASP/Java Servlet. The one thing that didn't change - the relational database and stored procedures.
Database independence, which the questioner rules out as a consideration in this case, is the strongest argument for taking logic out of the database. The strongest argument for database independence is for the ability to sell software to companies with their own preference for a database backend.
Therefore, I'd consider the major argument for taking stored procedures out of the database to be a commercial one only, not a technical one. There may be technical reasons but there are also technical reasons for keeping it in there -- performance, integrity, and the ability to allow multiple applications to use the same API for example.
Whether or not to use SP's is also strongly influenced by the database that you are going to use. If you take database independence out of consideration then you're going to have very different experiences using T-SQL or using PL/SQL.
If you are using Oracle to develop an application then PL/SQL is an obvious choice as a language. It's is very tightly coupled with the data, continually improved in every relase, and any decent development tool is going to integratePL/SQL development with CVS or Subversion or somesuch.
Oracle's web-based Application Express development environment is even built 100% with PL/SQL.
The only thing that goes in a database is data.
Stored procedures are a maintenance nightmare. They aren't data and they don't belong in the database. The endless coordination between developers and DBA's is little more than organizational friction.
It's hard to keep good version control over stored procedures. The code outside the database is really easy to install -- when you think you've got the wrong version you just do an SVN UP (maybe an install) and your application's back to a known state. You have environment variables, directory links, and lots of environment control over the application.
You can, with simple PATH manipulations, have variant software available for different situations (training, test, QA, production, customer-specific enhancements, etc., etc.)
The code inside the database, however, is much harder to manage. There's no proper environment -- no "PATH", directory links or other environment variables -- to provide any usable control over what software's being used; you have a permanent, globally bound set of application software stuck in the database, married to the data.
Triggers are even worse. They're both a maintenance and a debugging nightmare. I don't see what problem they solve; they seem to be a way of working around badly-designed applications where someone couldn't be bothered to use the available classes (or function libraries) correctly.
While some folks find the performance argument compelling, I still haven't seen enough benchmark data to convince me that stored procedures are all that fast. Everyone has an anecdote, but no one has side-by-side code where the algorithms are more-or-less the same.
[In the examples I've seen, the old application was a poorly designed mess; when the stored procedures were written, the application was re-architected. I think the design change had more impact than the platform change.]
Anything that affects data integrity must be put at the database level. Other things besides the user interface often put data into, update or delete data from the database including imports, mass updates to change a pricing scheme, hot fixes, etc. If you need to ensure the rules are always followed, put the logic in defaults and triggers.
This is not to say that it isn't a good idea to also have it in the user interface (why bother sending information that the database won't accept), but to ignore these things in the database is to court disaster.
If you need database independence, you'll probably want to put all your business logic in the application layer since the standards available in the application tier are far more prevalent than those available to the database tier.
However, if database independence isn't the #1 factor and the skill-set of your team includes strong database skills, then putting the business logic in the database may prove to be the best solution. You can have your application folks doing application-specific things and your database folks making sure all the queries fly.
Of course, there's a big difference between being able to throw a SQL statement together and having "strong database skills" - if your team is closer to the former than the latter then put the logic in the application using one of the Hibernates of this world (or change your team!).
In my experience, in an Enterprise environment you'll have a single target database and skills in this area - in this case put everything you can in the database. If you're in the business of selling software, the database license costs will make database independence the biggest factor and you'll be implementing everything you can in the application tier.
Hope that helps.
It is nowadays possible to submit to subversion your stored proc code and to debug this code with good tool support.
If you use stored procs that combine sql statements you can reduce the amount of data traffic between the application and the database and reduce the number of database calls and gain big performance gains.
Once we started building in C# we made the decision not to use stored procs but now we are moving more and more code to stored procs. Especially batch processing.
However don't use triggers, use stored procs or better packages. Triggers do decrease maintainability.
Putting the code in the application layer will result in a DB independent application.
Sometimes it is better to use stored procedures for performance reasons.
It (as usual) depends on the application requirements.
The business logic should be placed in the application/middle tier as a first choice. That way it can be expressed in the form of a domain model, be placed in source control, be split or combined with related code (refactored), etc. It also gives you some database vendor independence.
Object Oriented languages are also much more expressive than stored procedures, allowing you to better and more easily describe in code what should be happening.
The only good reasons to place code in stored procedures are: if doing so produces a significant and necessary performance benefit or if the same business code needs to be executed by multiple platforms (Java, C#, PHP). Even when using multiple platforms, there are alternatives such as web-services that might be better suited to sharing functionality.
The answer in my experience lies somewhere on a spectrum of values usually determined by where your organization's skills lie.
The DBMS is a very powerful beast, which means proper or improper treatment will bring great benefit or great danger. Sadly, in too many organizations, primary attention is paid to programming staff; dbms skills, especially query development skills (as opposed to administrative) are neglected. Which is exacerbated by the fact that the ability to evaluate dbms skills is also probably missing.
And there are few programmers who sufficiently understand what they don't understand about databases.
Hence the popularity of suboptimal concepts, such as Active Records and LINQ (to throw in some obvious bias). But they are probably the best answer for such organizations.
However, note that highly scaled organizations tend to pay a lot more attention to effective use of the datastore.
There is no standalone right answer to this question. It depends on the requirements of your app, the preferences and skills of your developers, and the phase of the moon.
Business logic is to be put in the application tier and not in the database.
The reason is that a database stored procedure is always dependen on the database product you use. This break one of the advantages of the three tier model. You cannot easily change to an other database unless you provide an extra stored procedure for this database product.
on the other hand sometimes, it makes sense to put logic into a stored procedure for performance optimization.
What I want to say is business logic is to be put into the application tier, but there are exceptions (mainly performance reasons)
Bussiness application 'layers' are:
1. User Interface
This implements the business-user's view of h(is/er) job. It uses terms that the user is familiar with.
2. Processing
This is where calculations and data manipulation happen. Any business logic that involves changing data are implemented here.
3. Database
This could be: a normalized sequential database (the standard SQL-based DBMS's); an OO-database, storing objects wrapping the business-data; etc.
What goes Where
In getting to the above layers you need to do the necessary analysis and design. This would indicate where business logic would best be implemented: data-integrity rules and concurrency/real-time issues regarding data-updates would normally be implemented as close to the data as possible, same as would calculated fields, and this is a good pointer to stored-procedures/triggers, where data-integrity and transaction-control is absolutely necessary.
The business-rules involving the meaning and use of the data would for the most part be implemented in the Processing layer, but would also appear in the User-Interface as the user's work-flow - linking the various process in some sequence that reflects the user's job.
Imho. there are two conflicting concerns with deciding where business logic goes in a relational database-driven app:
maintainability
reliability
Re. maintainability: To allow for efficient future development, business logic belongs in the part of your application that's easiest to debug and version control.
Re. reliability: When there's significant risk of inconsistency, business logic belongs in the database layer. Relational databases can be designed to check for constraints on data, e.g. not allowing NULL values in specific columns, etc. When a scenario arises in your application design where some data needs to be in a specific state which is too complex to express with these simple constraints, it can make sense to use a trigger or something similar in the database layer.
Triggers are a pain to keep up to date, especially when your app is supposed to run on client systems you don't even have access too. But that doesn't mean it's impossible to keep track of them or update them. S.Lott's arguments in his answer that it's a pain and a hassle are completely valid, I'll second that and have been there too. But if you keep those limitations in mind when you first design your data layer and refrain from using triggers and functions for anything but the absolute necessities it's manageable.
In our application, most business logic is contained in the application's model layer, e.g. an invoice knows how to initialize itself from a given sales order. When a bunch of different things are modified sequentially for a complex set of changes like this, we roll them up in a transaction to maintain consistency, instead of opting for a stored procedure. Calculation of totals etc. are all done with methods in the model layer. But when we need to denormalize something for performance or insert data into a 'changes' table used by all clients to figure out which objects they need to expire in their session cache, we use triggers/functions in the database layer to insert a new row and send out a notification (Postgres listen/notify stuff) from this trigger.
After having our app in the field for about a year, used by hundreds of customers every day, the only thing I would change if we were to start from scratch would be to design our system for creating database functions (or stored procedures, however you want to call them) with versioning and updates to them in mind from the get-go.
Thankfully, we do have some system in place to keep track of schema versions, so we built something on top of that to take care of replacing database functions. It would've saved us some time now if we'd considered the need to replace them from the beginning though.
Of course, everything changes when you step outside of the realm of RDBMS's into tuple-storage systems like Amazon SimpleDB and Google's BigTable. But that's a different story :)
We put a lot of business logic in stored procedures - it's not ideal, but quite often it's a good balance between performance and reliability.
And we know where it is without having to search through acres of solutions and codebase!
Scalability is also very important factor for pusing business logic in middle or app layer than to database layer.It should be understood that DatabaseLayer is only for interacting with Database not manipulating which is returned to or from database.
I remember reading an article somewhere that pointed out that pretty well everything can be, at some level, part of the business logic, and so the question is meaningless.
I think the example given was the display of an invoice onscreen. The decision to mark an overdue one in red is a business decision...
It's a continuum. IMHO the biggest factor is speed. How can u get this sucker up and running as quickly as possible while still adhering to good tenants of programming such as maintainability, performance, scalability, security, reliability etc. Often times SQL is the most concise way to express something and also happens to be the most performant many times, except for string operations etc, but that's where your CLR Procs can help. My belief is to liberally sprinkle business logic around whereever you feel it is best for the undertaking at hand. If you have a bunch of application developers who shit their pants when looking at SQL then let them use their app logic. If you really want to create a high performance application with large datasets, put as much logic in the DB as you can. Fire your DBA's and give developers ultimate freedom over their Dev databases. There is no one answer or best tool for the job. You have multiple tools so become expert at all levels of the application and you'll soon find that you're spending a lot more time writing nice consise expressive SQL where warranted and using the application layer other times. To me, ultimately, reducing the number of lines of code is what leads to simplicity. We have just converted a sql rich application with a mere 2500 lines of app code and 1000 lines of SQL to a domain model which now has 15500 lines of app code and 2500 lines of SQL to achieve what the former sql rich app did. If you can justify a 6 fold increase in code as "simplified" then go right ahead.
This is a great question! I found this after I had already asked a simliar question, but this is more specific. It came up as a result of a design change decision that I wasn't involved in making.
Basically, what I was told was that If you have millions of rows of data in your database tables, then look at putting business logic into stored procedures and triggers. That is what we are doing right now, converting a java app into stored procedures for maintainability as the java code had become convoluted.
I found this article on: The Business Logic Wars The author also made the million rows in a table argument, which I found interesting. He also added business logic in javascript, which is client side and outside of the business logic tier. I hadn't thought about this before even though I've used javascript for validation for years, to along with server side validation.
My opinion is that you want the business logic in the application/middle tier as a rule of thumb, but don't discount cases where it makes sense to put it into the database.
One last point, there is another group where I'm working presently that is doing massive database work for research and the amount of data they are dealing with is immense. Still, for them they don't have any business logic in the database itself, but keep it in the application/middle tier. For their design, the application/middle tier was the correct place for it, so I wouldn't use the size of tables as the only design consideration.
Business logic is usually embodied by objects, and the various language constructs of encapsulation, inheritance, and and polymorphism. For example, if a banking application is passing around money, there may be a Money type that defines the business elements of what "money" is. This, opposed to using a primitive decimal to represent money. For this reason, well-designed OOP is where the "business logic" lives—not strictly in any layer.
There is a lot of information out there on object-relational mappers and how to best avoid impedance mismatch, all of which seem to be moot points if one were to use an object database. My question is why isn't this used more frequently? Is it because of performance reasons or because object databases cause your data to become proprietary to your application or is it due to something else?
Familiarity. The administrators of databases know relational concepts; object ones, not so much.
Performance. Relational databases have been proven to scale far better.
Maturity. SQL is a powerful, long-developed language.
Vendor support. You can pick between many more first-party (SQL servers) and third-party (administrative interfaces, mappings and other kinds of integration) tools than is the case with OODBMSs.
Naturally, the object-oriented model is more familiar to the developer, and, as you point out, would spare one of ORM. But thus far, the relational model has proven to be the more workable option.
See also the recent question, Object Orientated vs Relational Databases.
I've been using db4o which is an OODB and it solves most of the cons listed:
Familiarity - Programmers know their language better then SQL (see Native queries)
Performance - this one is highly subjective but you can take a look at PolePosition
Vendor support and maturity - can change over time
Cannot be used by programs that don't also use the same framework - There are OODB standards and you can use different frameworks
Versioning is probably a bit of a bitch - Versioning is actually easier!
The pros I'm interested in are:
Native queries - Db4o lets you write queries in your static typed language so you don't have to worry about mistyping a string and finding data missing at runtime,
Ease of use - Defining buissiness logic in the domain layer, persistence layer (mapping) and finally the SQL database is certainly violation of DRY. With OODB you define your domain where it belongs.
I agree - OODB have a long way to go but they are going. And there are domain problems out there that are better solved by OODB,
One objection to object databases is that it creates a tight coupling between the data and your code. For certain apps this may be OK, but not for others. One nice thing that a relational database gives you is the possibility to put many views on your data.
Ted Neward explains this and a lot more about OODBMSs a lot better than this.
It has nothing to do with performance. That is to say, basically all applications would perform better with an OODB. But that would also put lots of DBA's out of work/having to learn a new technology. Even more people would be out of work correcting errors in the data. That's unlikely to make OODBs popular with established companies. Gavin seems to be totally clueless, a better link would be Kirk
Cons:
Cannot be used by programs that
don't also use the same framework
for accessing the data store, making
it more difficult to use across the
enterprise.
Less resources available online for
non SQL-based database
No compatibility across database
types (can't swap to a different db
provider without changing all the
code)
Versioning is probably a bit of a
bitch. I'd guess adding a new
property to an object isn't quite as
easy as adding a new column to a
table.
Sören
All of the reasons you stated are valid, but I see the problem with OODBMS is the logical data model. The object-model (or rather the network model of the 70s) is not as simple as the relational one, and is therefore inferior.
jodonnel, i dont' see how use of object databases couples application code to the data. You can still abstract your application from the OODB through using a Repository pattern and replace with an ORM backed SQL database if you design things properly.
For an OO application, an OO database will provide a more natural fit for persisting objects.
What's probably true is that you tie your data to your domain model, but then that's the crux!
Wouldn't it be good to have a single way of looking at both data, business rules and processes using a domain centric view?
So, a big pro is that an OODB matches how most modern, enterprise level object orientated software applications are designed, there is no extra effort to design a data layer using a different (relational) design. Cheaper to build and maintain, and in many cases general higher performance.
Cons, just general lack of maturity and adoption i reckon...