How much of an applications "smarts" should reside in the database? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've noticed a trend lately that people are moving more and more processing out of databases and in to applications. Some people are taking this to what seems to me to be ridiculous extremes.
I've seen application designs that not only banned all use of stored procedures, but also banned any kind of constraints enforced at the database (this would include primary key, foreign key, unique, and check constraints). I have even seen applications that required the use of only one data type stored in the database, namely varchar(2000). DateTime and number types were not allowed. Transactions and concurrency were also handled outside the database.
Has anyone seen these kind of applications implemented successfully? Both of the implementations I've dealt with that were implemented this way had all kinds of data integrity and concurrency problems. Can anyone explain this trend to move stuff (logic, processing, constraints) out of the database? What is the motivation behind it? Is it something I'm imagining?

Firstly, I really hope there is no trend towards databases without PKs and FKs and sensible datatypes. That would really be a tragedy.
But there is definitely a large core of developers who prefer putting logic in their apps than in stored procedures. I agree with Riho on the main reason for this: usually, DBAs manage databases, meaning that a developer has to go through a bunch of administrative overhead -- getting approvals from the DBA -- in order to create and update stored procs. Programmers by nature like to have control over their world, and to do things "their way."
There are also a couple of valid technical reasons:
Procedural extensions to SQL (e.g. T-SQL) used for developing stored procs have traditionally lacked user-defined datatypes, debuggability, portability, and interoperability with external systems -- all qualities helpful for developing reliable large-scale software. (And the clumsy syntax doesn't help.)
Software version control (e.g. svn) works well for managing even very large codebases, but managing DB schema changes is a harder problem and less well supported. Every serious shop uses version control for their application codebase, but many still don't have any rigorous system for managing their databases; hence stored procs can easily fall into an unversioned "black hole" that makes coders rightly nervous.
My personal view is that the closer the core business logic is to the data, the better, especially if more than one agent accesses the DB (or may do in the future). It's an unfortunate artefact of history that T-SQL and its ilk were weak languages, leading to the rise of the idea that "data and logic should be separated." My ideal world is one in which every business rule is encapsulated in a constraint enforced by the database, and all inconsistencies fail fast.

I like to keep logic out of the database. I tend to avoid stored procedures and triggers. I do, though, always use proper data types, keys, indicies and constraints. The way I see it is that the database is a database and the application is the application. The database should keep your data stored properly and efficiently whereas the application should own the logic. Perhaps I have never been in a situation where a stored procedure or trigger was needed; and thus never been inclined to use them to solve a problem. But to me, giving logic a home on the database seems "messy" to me; I would rather control everything from the application itself.

The trend results from the fact that the software technology industry is populated and driven largely by humans, and thus subject to trends and irrational behavior. To understand what's going on today requires a bit of perspective in the history of databases, and their parallel development with programming languages.
To be brief in this answer that will likely get downvoted: SQL is the IE6 of the database languages world. It breaks many of the rules of the relational model- in other words, it's a little bit like a calculator that performs multiplication incorrectly, and doesn't have a minus operator. SQL is not complete enough to be a real solution. It was never developed beyond the prototype stage, and was never meant to be used in industrial settings. But then it was naively used by oracle, which turned out to be a "killer app", SQL became industry standard instead of its technically superior competitors, and the rest is history. SQL's syntax is based around a set of command line tabular data processing tools, and COBOL. Full of bugs, inconsistencies, and a mishmash proprietary versions and features that don't have a grounding in math or logic, results in a situation where it really is unclear what goes where.
I think the trend you must be talking about is recent proliferation of ORMs: misguided and ill thought out attempts to patch over the obvious deficiencies of SQL. Database triggers and procedures are another misfeature trying to patch over SQL's problems.
If history had played out in a logical and orderly way, the answer to your question would be simple: Just follow the rules of the relational model and everything will work itself out. Unfortunately, the rules of the relational model don't fit cleanly into the current crop of SQL based DBMS's, so some application level fiddling, or triggers, or whatever other stupid patch is unfortunately necessary, and it ends up being a matter of subjective opinion, rather than reasoned argument, which stupid hack you use.
So the real answer is to just follow the relational model as close as you can, and then fudge it the rest of the way. Put the logic in the application if you're the only one using the db, and you need to keep all your source code in a version repository. If multiple applications are likely to use the database, make the DB as bullet proof and self sufficient as it can be- The main goal here is to ensure that the data remains consistent.

Ultimately the database and how you connect to it is your "persistence API" -- how much is in the database and how much is in the application is application-specific. But the important aspect is that the API boundary is responsible for producing or consuming correct data.
Personally I prefer a thin access layer in the application and sprocs/PKs/FKs in the database to enforce transactional correctness and to enable API versioning. This also allows other applications to modify the database without upsetting the data model.
And if I see another moron using *SELECT * FROM blah* I'm going to go nuts with an Uzi... :-)

"The database should keep your data stored properly and efficiently whereas the application should own the logic" - Nelson LaQeut in another answer.
This seems to be the crux of the issue: that all "logic" belongs to the application and not to the database. But what is meant by "logic"? There are various kinds of "logic", some of which belong in an application and some, I would say, better placed in the database.
I would think most developers would agree (surely?) that basic data integrity such as primary and foreign keys belongs in the database. There is less agreement on more sophisticated data integrity logic - even the humble but useful check constraint is woefully underused in general. .
The application camp see the database is "merely" a place to store the data that "belongs" to their application. The database camp (which is where I sit) see the application as "merely" one (perhaps currently the only) user of the data that "belongs" to the database - or rather that belongs to the business and is managed for the business by the database (DBMS = database management system).
If all your data logic is tied up in your application, what happens when the application needs to be rewritten in the latest trendy paradigm (or do you think J2EE for example is the last there will ever be)? As Tom Kyte often says, it's all about the data.

The database is an integral part of an application, but everyone interprets that differently. It's definitely a wise move to isolate them, but that shouldn't mean that you circumvent what they do in your programming. Correct data types and primary key references are important parts of good database design, on top of which a good application can be built.

Although I personally believe the Database should have enough smarts to defend itself, some people that don't understand that Databases aren't dumb services, think, and not incorrectly mind you, that data and logic should be separated. Now in many cases the separation of data and logic is a powerful tool, however most databases already provide us with solid implementations of atomicity, redundancy, processing, checking, etc... And many times that's where it belongs, however since the quality of these services and their API differs among vendors, many application programmers have felt like its worth trying to implement this sort of stuff in the application layer, to avoid tying themselves up with a specific database layer.

I can't say that I've seen a "trend" to create poor applications with terrible database designs. Programming is just like any other discipline in that there will be people who won't learn the tools or just want to cut corners. I've even talked to a person that just didn't "trust" databases. The applications that you described are just as you said, ridiculous nightmares. Don't follow those "trends".

I still prefer to use Stored Procedures and functions in SQL server. It adds more flexibility to application acturally. And it has a performance benefit also. Generally I don't think it is good idea to put everything to applicatons.

I think that those "developers" who created databases without indexes or with single VARCHAR(2000) column are just art majors who are making their first attempt into entering the high-priced IT world.
No-one, who has even little-bit of IT education, makes this kind of database structures.
I can understand the reason to keep logic out of the well formed database. Usually it is time-consuming to make changes (you have to convince database admins to make it, and all the red-tape that comes with it). If the business logic is in your program, then its up to you only.

Use constraints in the database, but for any sophisticated logic I would place that in a data access layer or use one of the standard Object Relational Mapping (ORM) tools such as Hibernate/NHibernate.
There is a general belief that this will affect performance; based on the view that stored procedures are pre-compiled but 'raw' queries have to be compiled on every call. However, I work mostly in SQL Server 2005/2008, and that is very efficient at handling 'raw' parameterised queries, caching the compiled query path for future calls to the database. This means that there under SQL Server there is essentially no difference between the performance of stored procedures to parameterised SQL queries.
The only downside on losing stored procedures is if you are very granular with your database security permissions, and which to enforce security at the database login level.

I have a simple philosophy.
If it's need to keep the database secure and in a consistant state, make sure to do it in the database
I do try to keep a lot of other stuff there too, in my world it's easier to update a client's database than it is to update their application...
Essentially I try to treat the database as a pseudo object. A bunch of methods I can call, etc, but I don't want the app to care about the detail of the internal data storage.

In my experience, putting any application logic in the database always results in a WTF. It doesn't matter how smart the database programmer, how advanced the database, it always ends up being a mistake. The reverse question is "how often should my C# code manage relational data using its own flat-file structure and query language", to which the answer is (almost) always never.
I think the database should be used for data storage, which is what it's good at.

Related

Does it make sense to use an OR-Mapper?

Does it make sense to use an OR-mapper?
I am putting this question of there on stack overflow because this is the best place I know of to find smart developers willing to give their assistance and opinions.
My reasoning is as follows:
1.) Where does the SQL belong?
a.) In every professional project I have worked on, security of the data has been a key requirement. Stored Procedures provide a natural gateway for controlling access and auditing.
b.) Issues with Applications in production can often be resolved between the tables and stored procedures without putting out new builds.
2.) How do I control the SQL that is generated? I am trusting parse trees to generate efficient SQL.
I have quite a bit of experience optimizing SQL in SQL-Server and Oracle, but would not feel cheated if I never had to do it again. :)
3.) What is the point of using an OR-Mapper if I am getting my data from stored procedures?
I have used the repository pattern with a homegrown generic data access layer.
If a collection needed to be cached, I cache it. I also have experience using EF on a small CRUD application and experience helping tuning an NHibernate application that was experiencing performance issues. So I am a little biased, but willing to learn.
For the past several years we have all been hearing a lot of respectable developers advocating the use of specific OR-Mappers (Entity-Framework, NHibernate, etc...).
Can anyone tell me why someone should move to an ORM for mainstream development on a major project?
edit: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html seems to have a strong discussion on this topic but it is out of date.
Yet another edit:
Everyone seems to agree that Stored Procedures are to be used for heavy-duty enterprise applications, due to their performance advantage and their ability to add programming logic nearer to the data.
I am seeing that the strongest argument in favor of OR mappers is developer productivity.
I suspect a large motivator for the ORM movement is developer preference towards remaining persistence-agnostic (don’t care if the data is in memory [unless caching] or on the database).
ORMs seem to be outstanding time-savers for local and small web applications.
Maybe the best advice I am seeing is from client09: to use an ORM setup, but use Stored Procedures for the database intensive stuff (AKA when the ORM appears to be insufficient).
I was a pro SP for many, many years and thought it was the ONLY right way to do DB development, but the last 3-4 projects I have done I completed in EF4.0 w/out SP's and the improvements in my productivity have been truly awe-inspiring - I can do things in a few lines of code now that would have taken me a day before.
I still think SP's are important for some things, (there are times when you can significantly improve performance with a well chosen SP), but for the general CRUD operations, I can't imagine ever going back.
So the short answer for me is, developer productivity is the reason to use the ORM - once you get over the learning curve anyway.
A different approach... With the raise of No SQL movement now, you might want to try object / document database instead to store your data. In this way, you basically will avoid the hell that is OR Mapping. Store the data as your application use them and do transformation behind the scene in a worker process to move it into a more relational / OLAP format for further analysis and reporting.
Stored procedures are great for encapsulating database logic in one place. I've worked on a project that used only Oracle stored procedures, and am currently on one that uses Hibernate. We found that it is very easy to develop redundant procedures, as our Java developers weren't versed in PL/SQL package dependencies.
As the DBA for the project I find that the Java developers prefer to keep everything in the Java code. You run into the occassional, "Why don't I just loop through all the Objects that just returned?" This caused a number of "Why isn't the index taking care of this?" issues.
With Hibernate your entities can contain not only their linked database properties, but can also contain any actions taken upon them.
For example, we have a Task Entity. One could Add or Modify a Task among other things. This can be modeled in the Hibernate Entity in Named Queries.
So I would say go with an ORM setup, but use procedures for the database intensive stuff.
A downside of keeping your SQL in Java is that you run the risk of developers using non-parameterized queries leaving your app open to a SQL Injection.
The following is just my private opinion, so it's rather subjective.
1.) I think that one needs to differentiate between local applications and enterprise applications. For local and some web applications, direct access to the DB is okay. For enterprise applications, I feel that the better encapsulation and rights management makes stored procedures the better choice in the end.
2.) This is one of the big issues with ORMs. They are usually optimized for specific query patterns, and as long as you use those the generated SQL is typically of good quality. However, for complex operations which need to be performed close to the data to remain efficient, my feeling is that using manual SQL code is stilol the way to go, and in this case the code goes into SPs.
3.) Dealing with objects as data entities is also beneficial compared to direct access to "loose" datasets (even if those are typed). Deserializing a result set into an object graph is very useful, no matter whether the result set was returned by a SP or from a dynamic SQL query.
If you're using SQL Server, I invite you to have a look at my open-source bsn ModuleStore project, it's a framework for DB schema versioning and using SPs via some lightweight ORM concept (serialization and deserialization of objects when calling SPs).

What should every developer know about databases? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today.
What is one important concept that developers and other software professionals ought to know about databases?
The very first thing developers should know about databases is this: what are databases for? Not how do they work, nor how do you build one, nor even how do you write code to retrieve or update the data in a database. But what are they for?
Unfortunately, the answer to this one is a moving target. In the heydey of databases, the 1970s through the early 1990s, databases were for the sharing of data. If you were using a database, and you weren't sharing data you were either involved in an academic project or you were wasting resources, including yourself. Setting up a database and taming a DBMS were such monumental tasks that the payback, in terms of data exploited multiple times, had to be huge to match the investment.
Over the last 15 years, databases have come to be used for storing the persistent data associated with just one application. Building a database for MySQL, or Access, or SQL Server has become so routine that databases have become almost a routine part of an ordinary application. Sometimes, that initial limited mission gets pushed upward by mission creep, as the real value of the data becomes apparent. Unfortunately, databases that were designed with a single purpose in mind often fail dramatically when they begin to be pushed into a role that's enterprise wide and mission critical.
The second thing developers need to learn about databases is the whole data centric view of the world. The data centric world view is more different from the process centric world view than anything most developers have ever learned. Compared to this gap, the gap between structured programming and object oriented programming is relatively small.
The third thing developers need to learn, at least in an overview, is data modeling, including conceptual data modeling, logical data modeling, and physical data modeling.
Conceptual data modeling is really requirements analysis from a data centric point of view.
Logical data modeling is generally the application of a specific data model to the requirements discovered in conceptual data modeling. The relational model is used far more than any other specific model, and developers need to learn the relational model for sure. Designing a powerful and relevant relational model for a nontrivial requirement is not a trivial task. You can't build good SQL tables if you misunderstand the relational model.
Physical data modeling is generally DBMS specific, and doesn't need to be learned in much detail, unless the developer is also the database builder or the DBA. What developers do need to understand is the extent to which physical database design can be separated from logical database design, and the extent to which producing a high speed database can be accomplished just by tweaking the physical design.
The next thing developers need to learn is that while speed (performance) is important, other measures of design goodness are even more important, such as the ability to revise and extend the scope of the database down the road, or simplicity of programming.
Finally, anybody who messes with databases needs to understand that the value of data often outlasts the system that captured it.
Whew!
Good question. The following are some thoughts in no particular order:
Normalization, to at least the second normal form, is essential.
Referential integrity is also essential, with proper cascading delete and update considerations.
Good and proper use of check constraints. Let the database do as much work as possible.
Don't scatter business logic in both the database and middle tier code. Pick one or the other, preferably in middle tier code.
Decide on a consistent approach for primary keys and clustered keys.
Don't over index. Choose your indexes wisely.
Consistent table and column naming. Pick a standard and stick to it.
Limit the number of columns in the database that will accept null values.
Don't get carried away with triggers. They have their use but can complicate things in a hurry.
Be careful with UDFs. They are great but can cause performance problems when you're not aware how often they might get called in a query.
Get Celko's book on database design. The man is arrogant but knows his stuff.
First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.
Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.
Third, developers need to understand basic SQL, including joins.
Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:
They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."
They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.
They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.
They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
Basic Indexing
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designing the database and just have to write some queries, it's still vital to understand, at a minimum:
What's indexed in your database and what's not:
The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
The concept of coverage (why you shouldn't just write SELECT *);
The difference between a clustered and non-clustered index;
Why more/bigger indexes are not necessarily better;
Why you should try to avoid wrapping filter columns in functions.
Designers should also be aware of common index anti-patterns, for example:
The Access anti-pattern (indexing every column, one by one)
The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by far the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
Normalization
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
How Indexes Work
It's probably not the most important, but for sure the most underestimated topic.
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
That's because SQL databases do a very good job working as black-box:
Tell me what you need (gimme SQL), I'll take care of it.
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
Disclaimer
The arguments are borrowed from the preface of my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
Avoiding SQL injection and how to secure your database
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN (or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
Also, the corollary: More Indexes are Not Better.
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
I think every developer should understand that databases require a different paradigm.
When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
These agile methodologies make database change process manageable, predictable and testable.
Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.
Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.
At least know, that Database Refactoring concept and methodologies exist.
http://www.agiledata.org/essays/databaseRefactoringCatalog.html
Classification and process description makes it possible to implement tooling for these refactorings too.
About the following comment to Walter M.'s answer:
"Very well written! And the historical perspective is great for people who weren't doing database work at that time (i.e. me)".
The historical perspective is in a certain sense absolutely crucial. "Those who forget history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the past, graph databases repeating the network mistakes of the past, OO systems forcing the hierarchical model upon users while everybody with even just a tenth of a brain should know that the hierarchical model is not suitable for general-purpose representation of the real world, etcetera, etcetera.
As for the question itself:
Every database developer should know that "Relational" is not equal to "SQL". Then they would understand why they are being let down so abysmally by the DBMS vendors, and why they should be telling those same vendors to come up with better stuff (e.g. DBMS's that are truly relational) if they want to go on sucking hilarious amounts of money out of their customers for such crappy software).
And every database developer should know everything about the relational algebra. Then there would no longer be a single developer left who had to post these stupid "I don't know how to do my job and want someone else to do it for me" questions on Stack Overflow anymore.
From my experience with relational databases, every developer should know:
- The different data types:
Using the correct type for the correct job will make your DB design more robust, your queries faster and your life easier.
- Learn about 1xM and MxM:
This is the bread and butter for relational databases. You need to understand one-to-many and many-to-many relations and apply then when appropriate.
- "K.I.S.S." principle applies to the DB as well:
Simplicity always works best. Provided you have studied how DB work, you will avoid unnecessary complexity which will lead to maintenance and speed problems.
- Indices:
It's not enough if you know what they are. You need to understand when to used them and when not to.
also:
Boolean algebra is your friend
Images: Don't store them on the DB. Don't ask why.
Test DELETE with SELECT
I would like everyone, both DBAs and developer/designer/architects, to better understand how to properly model a business domain, and how to map/translate that business domain model into both a normalized database logical model, an optimized physical model, and an appropriate object oriented class model, each one of which is (can be) different, for various reasons, and understand when, why, and how they are (or should be) different from one another.
I would say strong basic SQL skills. I've seen a lot of developers so far who know a little about databases but are always asking for tips about how to formulate a quite simple query. Queries are not always that easy and simple. You do have to use multiple joins (inner, left, etc.) when querying a well normalized database.
I think a lot of the technical details have been covered here and I don't want to add to them. The one thing I want to say is more social than technical, don't fall for the "DBA knowing the best" trap as an application developer.
If you are having performance issues with query take ownership of the problem too. Do your own research and push for the DBAs to explain what's happening and how their solutions are addressing the problem.
Come up with your own suggestions too after you have done the research. That is, I try to find a cooperative solution to the problem rather than leaving database issues to the DBAs.
Simple respect.
It's not just a repository
You probably don't know better than the vendor or the DBAs
You won't support it at 3 a.m. with senior managers shouting at you
Consider Denormalization as a possible angel, not the devil, and also consider NoSQL databases as an alternative to relational databases.
Also, I think the Entity-Relation model is a must-know for every developper even if you don't design databases. It'll let you understand thoroughly what's your database all about.
Never insert data with the wrong text encoding.
Once your database becomes polluted with multiple encodings, the best you can do is apply some kind combination of heuristics and manual labor.
Aside from syntax and conceptual options they employ (such as joins, triggers, and stored procedures), one thing that will be critical for every developer employing a database is this:
Know how your engine is going to perform the query you are writing with specificity.
The reason I think this is so important is simply production stability. You should know how your code performs so you're not stopping all execution in your thread while you wait for a long function to complete, so why would you not want to know how your query will affect the database, your program, and perhaps even the server?
This is actually something that has hit my R&D team more times than missing semicolons or the like. The presumtion is the query will execute quickly because it does on their development system with only a few thousand rows in the tables. Even if the production database is the same size, it is more than likely going to be used a lot more, and thus suffer from other constraints like multiple users accessing it at the same time, or something going wrong with another query elsewhere, thus delaying the result of this query.
Even simple things like how joins affect performance of a query are invaluable in production. There are many features of many database engines that make things easier conceptually, but may introduce gotchas in performance if not thought of clearly.
Know your database engine execution process and plan for it.
For a middle-of-the-road professional developer who uses databases a lot (writing/maintaining queries daily or almost daily), I think the expectation should be the same as any other field: You wrote one in college.
Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in college. Every web geek wrote interactive websites (usually before we had "web frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in college. Every physician dissected an entire cadaver in college, even if she's only going to take my blood pressure and tell me my cholesterol is too high today. Why would databases be any different?
Unfortunately, they do seem different, today, for some reason. People want .NET programmers to know how strings work in C, but the internals of your RDBMS shouldn't concern you too much.
It's virtually impossible to get the same level of understanding from just reading about them, or even working your way down from the top. But if you start at the bottom and understand each piece, then it's relatively easy to figure out the specifics for your database. Even things that lots of database geeks can't seem to grok, like when to use a non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll tone it down some: You could write one today, completely, from scratch. I don't care if you know the specifics of how the PostgreSQL query optimizer works, but if you know enough to write one yourself, it probably won't be too different from what they did. And you know, it's really not that hard to write a basic one.
The order of columns in a non-unique index is important.
The first column should be the column that has the most variability in its content (i.e. cardinality).
This is to aid SQL Server ability to create useful statistics in how to use the index at runtime.
Understand the tools that you use to program the database!!!
I wasted so much time trying to understand why my code was mysteriously failing.
If you're using .NET, for example, you need to know how to properly use the objects in the System.Data.SqlClient namespace. You need to know how to manage your SqlConnection objects to make sure they are opened, closed, and when necessary, disposed properly.
You need to know that when you use a SqlDataReader, it is necessary to close it separately from your SqlConnection. You need to understand how to keep connections open when appropriate to how to minimize the number of hits to the database (because they are relatively expensive in terms of computing time).
Basic SQL skills.
Indexing.
Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
JDBC driver documentation for the platform you are using.
Deal with binary data types (CLOB, BLOB, etc.)
For some projects, and Object-Oriented model is better.
For other projects, a Relational model is better.
The impedance mismatch problem, and know the common deficiencies or ORMs.
RDBMS Compatibility
Look if it is needed to run the application in more than one RDBMS. If yes, it might be necessary to:
avoid RDBMS SQL extensions
eliminate triggers and store procedures
follow strict SQL standards
convert field data types
change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or configurations) of the application would be developed.
Don't depend on the order of rows returned by an SQL query.
Three (things) is the magic number:
Your database needs version control too.
Cursors are slow and you probably don't need them.
Triggers are evil*
*almost always

Business Logic in Database versus Code? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
As a software engineer, I have a strong bias towards writing business logic in the application layer, while typically relying on the database for little more than CRUD (Create Retrieve Update and Delete) operations. On the other hand, I have run across applications (typically older ones) where a large amount of the business logic was written in stored procedures, so there are people out there that prefer to write business logic in the database layer.
For the people that have and/or enjoy written/writing business logic in a stored procedure, what were/are your reasons for using this method?
I try to seriously limit my business logic in the DB to only procs that have to do alot of querying and updating to perform a single application operation. Some may argue that even that should be in the app, but I like to keep the IO down if I can.
Databases are great for CRUD but if they get bloated with logic:
It becomes confusing where the logic is,
Typically databases are a silo and do not scale horizontally nearly as well as the app servers.
t_sql/PLsql is hard to read and procedural in nature
You forfeit all of the benefits of OOAD.
To the maximum extent possible, keep your business logic in the environment that is the most testable and debuggable. There are some valid reasons for storing business logic in the database in other people's existing answers, but they are almost always far outweighed by this.
Limiting the business logic to the application layer is short-sighted at best. Experienced professional database designers rarely allow it on their systems. Database need to have constraints and triggers and stored procs to help define how the data from any source will go into it.
If the database is to maintain its integrity and to ensure that all sources of new data or data changes follow the rules, the database is the place to put the required logic. Putting it the application layer is a data nightmare waiting to happen. Databases do not get information just from one application. Business logic in the application is often unintentionally bypassed by imports (assume you got a new customer who wanted their old historical data imported to your system or a large number of target records, no one is going to enter a million possible targets through the interface, it will happen in an import.) It is also bypassed by changes made through the query window to fix one-time issues (things like increasing the price of all products by 10%). If you have application layer logic that should have been applied to the data change, it won't be. Now it's ok to put it in the application layer as well, no sense sending bad data to the database and wasting network bandwidth, but to fail to put it in the database will sooner or later cause data problems.
Another reason to keep all of this in the database has to to with the possibility of users committing fraud. If you put all your logic in the application layer, then you must grant the users access directly to the tables. If you encapsulate all your logic in stored procs, they can be limited to doing only what the stored procs allow and not anything else. I would not consider allowing any kind of access by users to a database that stores financial records or personal information (such as health records) as I would not allow anyone except a couple of dbas to directly access the production records in any way shape or form. More fraud is committed than many developers realize and almost none of them consider the possibility in their design.
If you need to import large amount of data, going through a data access layer could slow down the import to a crawl becasue it doesn't take advanatge of the set-based operations that databases are designed to handle.
Your usage of the term "business logic" is rather vague.
It can be interpreted to mean to include the enforcement of constraints on the data (aka 'business rules'). Enforcement of these unequivocally belongs in the dbms, period.
It can also be interpreted to mean to include things like "if a new customer arrives, then within a week we send him a welcome letter." Trying to push stuff like this in the data layer is probably a big mistake. In such cases, the driver for "create a new welcome letter" should probably be the application that also triggers the new customer row insertion. Imagine every new database row insertion triggering a new welcome letter, and then suddenly we take over another company and we must integrate that company's customers in our own database ... Ouch.
We do a lot of processing in the DB tier, where appropriate. There's a lot of operations you wouldn't want to pull back large datasets to the app tier to do analysis on. It's also an easier deployment for us -- a single point vs. updating applications at all install points. But a lot depends on your application and what it does; there's no single good answer here.
On a couple of ocassions I have put 'logic' in sprocs because the CRUD might be happening in more than one place. By 'logic' I would have to say it is not really business logic but more 'integrity logic'. It might be the same - some cleanup might be necessary if something gets deleted or updated in a certain way, and if that delete or update could happen from more than one tool with different code-bases it made sense to put it in the proc they all used.
In addition, sometimes the 'business logic line' is pretty blurry. Take reports for example - they may rely on stored procedures or views that encapsulate 'smarts' about what the schema means to the business. How often have you seen CASE statements and the like that 'do things' based on column values or other critieria? Could be construed as business logic and yet it probably does belong in the DB where it can be optimized, etc.
I'd say if 'business-logic' means application flow, user control, timed operations and generally 'doing-business-stuff' then it should be in the application layer. But if it means making sure that no matter how you dig around in the data, it always makes sense and is a sensible, non-self-conflicting whole, then the checks to enforce those rules go in the DB, absolutely, no questions. There are always many ways to push data into the DB and manipulate it once its there. Not all those ways have 'business-logic' built in to them. You will find a SQL session into a DB through a DOS window on a support call at 3am is very liberal in what it allows for example! If the logic isn't in the DB to make sure that ALL data changes make sense, you can bet for sure that the data will get very, very screwed up over time. And since a system is only as valuable as the data it holds, that makes for a much lower return on investment.
Two good reasons for putting the business logic in the database are:
It secures your logic and data
against additional applications that
may access the database that don't
implement similar logic.
Database designs usually outlive the
application layer and it reduces the
work necessary when you move to new
technologies on the client side.
You often find business logic at the database layer because it can often be faster to make a change and deploy. I think often the best intentions are not to put the logic there but because of the ease of deployment it ends up there.
The primary reason I would put BL in stored procs in the past is that transactions were easier in the database.
If deployments are difficult for your app and you don't have an app-server, changing the BL in stored procedures is the most effective way to deploy a change.
I work for a financial type company where certain rules are applied by states, and these rules and their calculations are subject to change almost daily if not surely weekly. That being the case, it made more sense to move parts of the logic dealing with calculations to the database; where a change can be tested and applied without having to recompile and redistibute an application, which is impossible to do daily without disrupting business. The stored proc is tested, approved, applied and the end user is none the wiser.
With the move to web based applications, the reliance on moving the logic to the database is less but still present. Even web apps (depending on the language) must be compiled and published to the site which could cause downtime.
Sometimes business logic is too slow to run on the app layer. This is especially true on on older systems where client power and bandwidth was more limited.
The main reason for using the database to do the work is that you have a single point of control. Often, app developers re-use or rewrite code fragments in different parts of the application. Even assuming that these all work exactly the same way (which is doubtful), when the business logic changes, the app needs to be reviewed, recoded, recompiled. Unless the parameters change, this would not be necessary where the business logic is stored only in the database.
My preference is to keep any complicated business logic out of the database, simply for maintenance purposes. If I get a call at 2 o'clock in the morning I would rather debug my application code than try to step through database scripts.
I'm in a team to build-up and maintain a rather large financial system, and I find no way put the logic into the application layer for action that affect to or get constraints from dozens of thousand records.
Beside the performance issue, should errors happen, rectifying a stored procedures is much faster than debugging the application, fixing, recompiling, redeploying the code with longer downtime
I think Specially for older applications which i working on (Banking) where the Bussiness logic is huge, it's almost next to impossible to perform all these business logic in application layer, and also It's a big performenance hit when we put these logic in Application layer where the number of fetch to the database is more, results in more resource utilization(more java objects if it's done in java layer) and network issues and forget abt performenance.

Business Logic: Database or Application Layer

The age old question. Where should you put your business logic, in the database as stored procedures ( or packages ), or in the application/middle tier? And more importantly, Why?
Assume database independence is not a goal.
Maintainability of your code is always a big concern when determining where business logic should go.
Integrated debugging tools and more powerful IDEs generally make maintaining middle tier code easier than the same code in a stored procedure. Unless there is a real reason otherwise, you should start with business logic in your middle tier/application and not in stored procedures.
However when you come to reporting and data mining/searching, stored procedures can often a better choice. This is thanks to the power of the databases aggregation/filtering capabilities and the fact you are keeping processing very close the the source of the data. But this may not be what most consider classic business logic anyway.
Put enough of the business logic in the database to ensure that the data is consistent and correct.
But don't fear having to duplicate some of this logic at another level to enhance the user experience.
For very simple cases you can put your business logic in stored procedures. Usually even the simple cases tend to get complicated over time. Here are the reasons I don't put business logic in the database:
Putting the business logic in the database tightly couples it to the technical implementation of the database. Changing a table will cause you to change a lot of the stored procedures again causing a lot of extra bugs and extra testing.
Usually the UI depends on business logic for things like validation. Putting these things in the database will cause tight coupling between the database and the UI or in different cases duplicates the validation logic between those two.
It will get hard to have multiple applications work on the same database. Changes for one aplication will cause others to break. This can quickly turn into a maintenance nightmare. So it doesn't really scale.
More practically SQL isn't a good language to implement business logic in an understandable way. SQL is great for set based operations but it misses constructs for "programming in the large" it's hard to maintain big amounts of stored procedures. Modern OO languages are better suited and more flexible for this.
This doesn't mean you can't use stored procs and views. I think it sometimes is a good idea to put an extra layer of stored procedures and views between the tables and application(s) to decouple the two. That way you can change the layout of the database without changing external interface allowing you to refactor the database independently.
It's really up to you, as long as you're consistent.
One good reason to put it in your database layer: if you are fairly sure that your clients will never ever change their database back-end.
One good reason to put it in the application layer: if you are targeting multiple persistence technologies for your application.
You should also take into account core competencies. Are your developers mainly application layer developers, or are they primarily DBA-types?
While there is no one right answer - it depends on the project in question, I would recommend the approach advocated in "Domain Driven Design" by Eric Evans. In this approach the business logic is isolated in its own layer - the domain layer - which sits on top of the infrastructure layer(s) - which could include your database code, and below the application layer, which sends the requests into the domain layer for fulfilment and listens for confirmation of their completion, effectively driving the application.
This way, the business logic is captured in a model which can be discussed with those who understand the business aside from technical issues, and it should make it easier to isolate changes in the business rules themselves, the technical implementation issues, and the flow of the application which interacts with the business (domain) model.
I recommend reading the above book if you get the chance as it is quite good at explaining how this pure ideal can actually be approximated in the real world of real code and projects.
While there are certainly benefits to have the business logic on the application layer, I'd like to point out that the languages/frameworks seem to change more frequently then the databases.
Some of the systems that I support, went through the following UIs in the last 10-15 years: Oracle Forms/Visual Basic/Perl CGI/ ASP/Java Servlet. The one thing that didn't change - the relational database and stored procedures.
Database independence, which the questioner rules out as a consideration in this case, is the strongest argument for taking logic out of the database. The strongest argument for database independence is for the ability to sell software to companies with their own preference for a database backend.
Therefore, I'd consider the major argument for taking stored procedures out of the database to be a commercial one only, not a technical one. There may be technical reasons but there are also technical reasons for keeping it in there -- performance, integrity, and the ability to allow multiple applications to use the same API for example.
Whether or not to use SP's is also strongly influenced by the database that you are going to use. If you take database independence out of consideration then you're going to have very different experiences using T-SQL or using PL/SQL.
If you are using Oracle to develop an application then PL/SQL is an obvious choice as a language. It's is very tightly coupled with the data, continually improved in every relase, and any decent development tool is going to integratePL/SQL development with CVS or Subversion or somesuch.
Oracle's web-based Application Express development environment is even built 100% with PL/SQL.
The only thing that goes in a database is data.
Stored procedures are a maintenance nightmare. They aren't data and they don't belong in the database. The endless coordination between developers and DBA's is little more than organizational friction.
It's hard to keep good version control over stored procedures. The code outside the database is really easy to install -- when you think you've got the wrong version you just do an SVN UP (maybe an install) and your application's back to a known state. You have environment variables, directory links, and lots of environment control over the application.
You can, with simple PATH manipulations, have variant software available for different situations (training, test, QA, production, customer-specific enhancements, etc., etc.)
The code inside the database, however, is much harder to manage. There's no proper environment -- no "PATH", directory links or other environment variables -- to provide any usable control over what software's being used; you have a permanent, globally bound set of application software stuck in the database, married to the data.
Triggers are even worse. They're both a maintenance and a debugging nightmare. I don't see what problem they solve; they seem to be a way of working around badly-designed applications where someone couldn't be bothered to use the available classes (or function libraries) correctly.
While some folks find the performance argument compelling, I still haven't seen enough benchmark data to convince me that stored procedures are all that fast. Everyone has an anecdote, but no one has side-by-side code where the algorithms are more-or-less the same.
[In the examples I've seen, the old application was a poorly designed mess; when the stored procedures were written, the application was re-architected. I think the design change had more impact than the platform change.]
Anything that affects data integrity must be put at the database level. Other things besides the user interface often put data into, update or delete data from the database including imports, mass updates to change a pricing scheme, hot fixes, etc. If you need to ensure the rules are always followed, put the logic in defaults and triggers.
This is not to say that it isn't a good idea to also have it in the user interface (why bother sending information that the database won't accept), but to ignore these things in the database is to court disaster.
If you need database independence, you'll probably want to put all your business logic in the application layer since the standards available in the application tier are far more prevalent than those available to the database tier.
However, if database independence isn't the #1 factor and the skill-set of your team includes strong database skills, then putting the business logic in the database may prove to be the best solution. You can have your application folks doing application-specific things and your database folks making sure all the queries fly.
Of course, there's a big difference between being able to throw a SQL statement together and having "strong database skills" - if your team is closer to the former than the latter then put the logic in the application using one of the Hibernates of this world (or change your team!).
In my experience, in an Enterprise environment you'll have a single target database and skills in this area - in this case put everything you can in the database. If you're in the business of selling software, the database license costs will make database independence the biggest factor and you'll be implementing everything you can in the application tier.
Hope that helps.
It is nowadays possible to submit to subversion your stored proc code and to debug this code with good tool support.
If you use stored procs that combine sql statements you can reduce the amount of data traffic between the application and the database and reduce the number of database calls and gain big performance gains.
Once we started building in C# we made the decision not to use stored procs but now we are moving more and more code to stored procs. Especially batch processing.
However don't use triggers, use stored procs or better packages. Triggers do decrease maintainability.
Putting the code in the application layer will result in a DB independent application.
Sometimes it is better to use stored procedures for performance reasons.
It (as usual) depends on the application requirements.
The business logic should be placed in the application/middle tier as a first choice. That way it can be expressed in the form of a domain model, be placed in source control, be split or combined with related code (refactored), etc. It also gives you some database vendor independence.
Object Oriented languages are also much more expressive than stored procedures, allowing you to better and more easily describe in code what should be happening.
The only good reasons to place code in stored procedures are: if doing so produces a significant and necessary performance benefit or if the same business code needs to be executed by multiple platforms (Java, C#, PHP). Even when using multiple platforms, there are alternatives such as web-services that might be better suited to sharing functionality.
The answer in my experience lies somewhere on a spectrum of values usually determined by where your organization's skills lie.
The DBMS is a very powerful beast, which means proper or improper treatment will bring great benefit or great danger. Sadly, in too many organizations, primary attention is paid to programming staff; dbms skills, especially query development skills (as opposed to administrative) are neglected. Which is exacerbated by the fact that the ability to evaluate dbms skills is also probably missing.
And there are few programmers who sufficiently understand what they don't understand about databases.
Hence the popularity of suboptimal concepts, such as Active Records and LINQ (to throw in some obvious bias). But they are probably the best answer for such organizations.
However, note that highly scaled organizations tend to pay a lot more attention to effective use of the datastore.
There is no standalone right answer to this question. It depends on the requirements of your app, the preferences and skills of your developers, and the phase of the moon.
Business logic is to be put in the application tier and not in the database.
The reason is that a database stored procedure is always dependen on the database product you use. This break one of the advantages of the three tier model. You cannot easily change to an other database unless you provide an extra stored procedure for this database product.
on the other hand sometimes, it makes sense to put logic into a stored procedure for performance optimization.
What I want to say is business logic is to be put into the application tier, but there are exceptions (mainly performance reasons)
Bussiness application 'layers' are:
1. User Interface
This implements the business-user's view of h(is/er) job. It uses terms that the user is familiar with.
2. Processing
This is where calculations and data manipulation happen. Any business logic that involves changing data are implemented here.
3. Database
This could be: a normalized sequential database (the standard SQL-based DBMS's); an OO-database, storing objects wrapping the business-data; etc.
What goes Where
In getting to the above layers you need to do the necessary analysis and design. This would indicate where business logic would best be implemented: data-integrity rules and concurrency/real-time issues regarding data-updates would normally be implemented as close to the data as possible, same as would calculated fields, and this is a good pointer to stored-procedures/triggers, where data-integrity and transaction-control is absolutely necessary.
The business-rules involving the meaning and use of the data would for the most part be implemented in the Processing layer, but would also appear in the User-Interface as the user's work-flow - linking the various process in some sequence that reflects the user's job.
Imho. there are two conflicting concerns with deciding where business logic goes in a relational database-driven app:
maintainability
reliability
Re. maintainability:  To allow for efficient future development, business logic belongs in the part of your application that's easiest to debug and version control.
Re. reliability:  When there's significant risk of inconsistency, business logic belongs in the database layer.  Relational databases can be designed to check for constraints on data, e.g. not allowing NULL values in specific columns, etc.  When a scenario arises in your application design where some data needs to be in a specific state which is too complex to express with these simple constraints, it can make sense to use a trigger or something similar in the database layer.
Triggers are a pain to keep up to date, especially when your app is supposed to run on client systems you don't even have access too.  But that doesn't mean it's impossible to keep track of them or update them.  S.Lott's arguments in his answer that it's a pain and a hassle are completely valid, I'll second that and have been there too.  But if you keep those limitations in mind when you first design your data layer and refrain from using triggers and functions for anything but the absolute necessities it's manageable.
In our application, most business logic is contained in the application's model layer, e.g. an invoice knows how to initialize itself from a given sales order.  When a bunch of different things are modified sequentially for a complex set of changes like this, we roll them up in a transaction to maintain consistency, instead of opting for a stored procedure.  Calculation of totals etc. are all done with methods in the model layer.  But when we need to denormalize something for performance or insert data into a 'changes' table used by all clients to figure out which objects they need to expire in their session cache, we use triggers/functions in the database layer to insert a new row and send out a notification (Postgres listen/notify stuff) from this trigger.
After having our app in the field for about a year, used by hundreds of customers every day, the only thing I would change if we were to start from scratch would be to design our system for creating database functions (or stored procedures, however you want to call them) with versioning and updates to them in mind from the get-go.
Thankfully, we do have some system in place to keep track of schema versions, so we built something on top of that to take care of replacing database functions.  It would've saved us some time now if we'd considered the need to replace them from the beginning though.
Of course, everything changes when you step outside of the realm of RDBMS's into tuple-storage systems like Amazon SimpleDB and Google's BigTable.  But that's a different story :)
We put a lot of business logic in stored procedures - it's not ideal, but quite often it's a good balance between performance and reliability.
And we know where it is without having to search through acres of solutions and codebase!
Scalability is also very important factor for pusing business logic in middle or app layer than to database layer.It should be understood that DatabaseLayer is only for interacting with Database not manipulating which is returned to or from database.
I remember reading an article somewhere that pointed out that pretty well everything can be, at some level, part of the business logic, and so the question is meaningless.
I think the example given was the display of an invoice onscreen. The decision to mark an overdue one in red is a business decision...
It's a continuum. IMHO the biggest factor is speed. How can u get this sucker up and running as quickly as possible while still adhering to good tenants of programming such as maintainability, performance, scalability, security, reliability etc. Often times SQL is the most concise way to express something and also happens to be the most performant many times, except for string operations etc, but that's where your CLR Procs can help. My belief is to liberally sprinkle business logic around whereever you feel it is best for the undertaking at hand. If you have a bunch of application developers who shit their pants when looking at SQL then let them use their app logic. If you really want to create a high performance application with large datasets, put as much logic in the DB as you can. Fire your DBA's and give developers ultimate freedom over their Dev databases. There is no one answer or best tool for the job. You have multiple tools so become expert at all levels of the application and you'll soon find that you're spending a lot more time writing nice consise expressive SQL where warranted and using the application layer other times. To me, ultimately, reducing the number of lines of code is what leads to simplicity. We have just converted a sql rich application with a mere 2500 lines of app code and 1000 lines of SQL to a domain model which now has 15500 lines of app code and 2500 lines of SQL to achieve what the former sql rich app did. If you can justify a 6 fold increase in code as "simplified" then go right ahead.
This is a great question! I found this after I had already asked a simliar question, but this is more specific. It came up as a result of a design change decision that I wasn't involved in making.
Basically, what I was told was that If you have millions of rows of data in your database tables, then look at putting business logic into stored procedures and triggers. That is what we are doing right now, converting a java app into stored procedures for maintainability as the java code had become convoluted.
I found this article on: The Business Logic Wars The author also made the million rows in a table argument, which I found interesting. He also added business logic in javascript, which is client side and outside of the business logic tier. I hadn't thought about this before even though I've used javascript for validation for years, to along with server side validation.
My opinion is that you want the business logic in the application/middle tier as a rule of thumb, but don't discount cases where it makes sense to put it into the database.
One last point, there is another group where I'm working presently that is doing massive database work for research and the amount of data they are dealing with is immense. Still, for them they don't have any business logic in the database itself, but keep it in the application/middle tier. For their design, the application/middle tier was the correct place for it, so I wouldn't use the size of tables as the only design consideration.
Business logic is usually embodied by objects, and the various language constructs of encapsulation, inheritance, and and polymorphism. For example, if a banking application is passing around money, there may be a Money type that defines the business elements of what "money" is. This, opposed to using a primitive decimal to represent money. For this reason, well-designed OOP is where the "business logic" lives—not strictly in any layer.

Where to put your code - Database vs. Application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been developing web/desktop applications for about 6 years now. During the course of my career, I have come across application that were heavily written in the database using stored procedures whereas a lot of application just had only a few basic stored procedures (to read, insert, edit and delete entity records) for each entity.
I have seen people argue saying that if you have paid for an enterprise database use its features extensively. Whereas a lot of "object oriented architects" told me its absolute crime to put anything more than necessary in the database and you should be able to drive the application using the methods on those classes?
Where do you think is the balance?
Thanks,
Krunal
I think it's a business logic vs. data logic thing. If there is logic that ensures the consistency of your data, put it in a stored procedure. Same for convenience functions for data retrieval/update.
Everything else should go into the code.
A friend of mine is developing a host of stored procedures for data analysis algorithms in bioinformatics. I think his approach is quite interesting, but not the right way in the long run. My main objections are maintainability and lacking adaptability.
I'm in the object oriented architects camp. It's not necessarily a crime to put code in the database, as long as you understand the caveats that go along with that. Here are some:
It's not debuggable
It's not subject to source control
Permissions on your two sets of code will be different
It will make it more difficult to track where an error in the data came from if you're accessing info in the database from both places
Anything that relates to Referential Integrity or Consistency should be in the database as a bare minimum. If it's in your application and someone wants to write an application against the database they are going to have to duplicate your code in their code to ensure that the data remains consistent.
PLSQL for Oracle is a pretty good language for accessing the database and it can also give performance improvements. Your application can also be much 'neater' as it can treat the database stored procedures as a 'black box'.
The sprocs themselves can also be tuned and modified without you having to go near your compiled application, this is also useful if the supplier of your application has gone out of business or is unavailable.
I'm not advocating 'everything' should be in database, far from it. Treat each case seperately and logically and you will see which makes more sense, put it in the app or put it in the database.
I'm coming from almost the same background and have heard the same arguments. I do understand that there are very valid reasons to put logic into the database. However, it depends on the type of application and the way it handles data which approach you should choose.
In my experience, a typical data entry app like some customer (or xyz) management will massively benefit from using an ORM layer as there are not so many different views at the data and you can reduce the boilerplate CRUD code to a minimum.
On the other hand, assume you have an application with a lot of concurrency and calculations that span a lot of tables and that has a fine-grained column-level security concept with locking and so on, you're probably better off doing stuff like that directly in the database.
As mentioned before, it also depends on the variety of views you anticipate for your data. If there are many different combinations of columns and tables that need to be presented to the user, you may also be better off just handing back different result sets rather than map your objects one-by-one to another representation.
After all, the database is good at dealing with sets, whereas OO code is good at dealing with single entities.
Reading these answers, I'm quite confused by the lack of understanding of database programming. I am an Oracle Pl/sql developer, we source control for every bit of code that goes into the database. Many of the IDEs provide addins for most of the major source control products. From ClearCase to SourceSafe. The Oracle tools we use allow us to debug the code, so debugging isn't an issue. The issue is more of logic and accessibility.
As a manager of support for about 5000 users, the less places i have to look for the logic, the better. If I want to make sure the logic is applied for ALL applications that use the data , even business logic, i put it in the DB. If the logic is different depending on the application, they can be responsible for it.
#DannySmurf:
It's not debuggable
Depending on your server, yes, they are debuggable. This provides an example for SQL Server 2000. I'm guessing the newer ones also have this. However, the free MySQL server does not have this (as far as I know).
It's not subject to source control
Yes, it is. Kind of. Database backups should include stored procedures. Those backup files might or might not be in your version control repository. But either way, you have backups of your stored procedures.
My personal preference is to try and keep as much logic and configuration out of the database as possible. I am heavily dependent on Spring and Hibernate these days so that makes it a lot easier. I tend to use Hibernate named queries instead of stored procedures and the static configuration information in Spring application context XML files. Anything that needs to go into the database has to be loaded using a script and I keep those scripts in version control.
#Thomas Owens: (re source control) Yes, but that's not source control in the same sense that I can check in a .cs file (or .cpp file or whatever) and go and pick out any revision I want. To do that with database code requires a potentially-significant amount of effort to either retrieve the procedure from the database and transfer it to somewhere in the source tree, or to do a database backup every time a minor change is made. In either case (and regardless of the amount of effort), it's not intuitive; and for many shops, it's not a good enough solution either. There is also the potential here for developers who may not be as studious at that as others to forget to retrieve and check in a revision. It's technically possible to put ANYTHING in source control; the disconnect here is what I would take issue with.
(re debuggable) Fair enough, though that doesn't provide much integration with the rest of the application (where the majority of the code could live). That may or may not be important.
Well, if you care about the consistency of your data, there are reasons to implement code within the database. As others have said, placing code (and/or RI/constraints) inside the database acts to enforce business logic, close to the data itself. And, it provides a common, encapsulated interface, so that your new developer doesn't accidentally create orphan records or inconsistent data.
Well, this one is difficult. As a programmer, you'll want to avoid TSQL and such "Database languages" as much as possible, because they are horrendous, difficult to debug, not extensible and there's nothing you can do with them that you won't be able to do using code on your application.
The only reasons I see for writing stored procedures are:
Your database isn't great (think how SQL Server doesn't implement LIMIT and you have to work around that using a procedure.
You want to be able to change a behaviour by changing code in just one place without re-deploying your client applications.
The client machines have big calculation-power constraints (think small embedded devices).
For most applications though, you should try to keep your code in the application where you can debug it, keep it under version control and fix it using all the tools provided to you by your language.

Resources