Which comes first: database or application logic? - database

What is the best way or recommended best practice in the flow of database driven asp.net web application? I mean the database first or coding first or side by side?

Your data access code won't compile without an existing database - unless you stub (or Mock) it. So probably the database comes first.
But it is a bad idea to do whole chunks of the application in isolation. Ideally you should design and build slivers of the system - database and application - hand-in-hand. These slivers should be cohesive sub-sets of functionality, probably smaller than sub-systems. Inevitably, the act of coding screens and business rules will throw up problems in the data model. So it is good to have a data modeller or DBA who is happy to work incrementally alongside the developers.
edit
Stephanie makes an extremely pertinent point:
"the core tables which are persisting
your app's data really can't be
piecemealed. Most of the data is known
at project start. It has a form, you
need to find it."
I agree that the core entities are knowable at project start, and the physical data model can be derived from that logical data model. But I don't think it is ever possible to nail down completely the structure of any table, even a core table, at the start. This is because at the start of the design/build phase all we have to go on are the Requirements, and if there's one thing history tells us about the Requirements it is that they will change.
So, new tables will be needed and some existing tables will become obsolete. There will be columns which need to be added, columns which need to be modified, columns which need to be dropped. This is why Nature gave us the ALTER TABLE statement.
I am not suggesting that we don't design our tables, or assemble them piecemeal. I am merely suggesting that when we start designing the HR sub-system we need to worry about the EMPLOYEES table and the SALARIES table. We don't need to concern ourselves with INVENTORY or ORDERS until we commence work on Sales.

We personally start with the Domain and do things side-by-side. The important part is that we implement vertical slices of the application (fully working end-to-end features), not horizontal slices (e.g. first the whole database layer, then the data access, then the services, then the presentation): we build the application incrementally and demonstrate progress with working code after each iteration.

Applications are all about features.
You don't build apps to store data,
but to provide functionality. If we
can't agree on that, the discussion is
moot of course. Software should be
developed to satisfy the needs of its
users and not of its developers.
Well I have really no understanding of the second sentence. If you think my company pays me a good salary to write code that satisfies me and not my users you're crazy. So that argument is a strawman. Back to the first.
This is a common view point of application centric people (they), vs. database centric people (We). They see the entire point of the exercise to "provide features". Those are things the clients know they want and ask for them. To them, the database is just persistence required for these features. And when they are done, that's it, features delivered, database is sufficient for those features. Could be an entire Rube Goldberg inside the database with redundant data, severe violations of normal forms, constraints enforced by the application, what have you.
think overall usability alone outweighs database design
If the design of your database is affecting your usability than the design was bad. I have no doubt that one who strives for features will leave the database in such a state that it severely hampers usability.
Data Centric people, don't look at a system as a place to provide only what's been asked for, but a repository of Intellectual Capital that can be exploited by more than whatever the Application-du-jour is. I can't begin to describe the number of cases where one team has used the database of some other team's app to enhance their apps value. Just look at all the medical research that is nothing more that the meta-analysis of existing studies. None of that is possible if you believe that only the features of your app matter and subsequent uses of your apps data do not.
A good data model isn't inviolate. Sure you'll add to it, change it when requirements change. But if you don't completely understand your data, I don't know how anyone can begin to write code.

I guess you need first to define datamodel and only then going coding. You should plan everything carefully before actually writting the code.

First is a feature list.
Then, detailed spec.
Then test plan and design of all, including databases.
Then, it wouldn't matter which to implement first.

You'll probably end up doing it "side by side".
You need some data to be able to test the application, but you need the application to be able to verify that you're storing the correct data.
Do some modelling first and then build the minimum you can for one or two features. Then when these are working add the next feature and so on.
You'll need to write some database update procedures (both the code and the rules about what and when to update) as you will have to extend your tables, but you'll need those for the final system anyway as it will have to change as new requirements come along.

Having done it quite a few times, I find myself invariably doing it like so:
Define the problem I'm trying to solve.
Write out some use-cases.
Have my significant other or a friend tell me if this is even a problem.
Sketch out a few sample screens.
Write flow diagrams for the use cases.
Ask my Rubber-duck questions.
Use questions to refine 1-6.
Write out the 'nouns'. Those become my data Model.
Write out the actions. Those become application logic.
Code data Model.
Code Application Logic.
Realize I've gotten it a little wrong.
Repeat 10-12 as many times as needed.
Ask, "Have I solved the problem"?
If not, rinse, lather and repeat 1-15.

This is a trick question. IMO, they both come in parallel during your planning and design phase. They are so closely related that it make sense to do them together. Just keep in mind that your database design will be almost fully developed while your code is still in its infancy (though your application logic should be almost fully mapped out in you head or on paper)
The idea is that you're designing your solution in the context of the problem. When you're planning out your solution you will be (or should be) defining your application as a set of things and actions (nouns and verbs).
For example, a very basic helpdesk program has people and tickets. People need to create tickets, update tickets, and close tickets. The nouns that require persistent storage will comprise your database, and the nouns + actions will be contained in your application.
Sometimes your table mappings and the relationship between tables will be obvious (IE people create tickets, ticket.creatorID = people.personID) and other times the relationship doesn't really click in your head until you start working through use cases or until you start writing your code (IE different ppl have different access levels defining what they can do. At a glance this would seem like a simple field in a table, but in practice it is better as a separate table).

Related

Should data validation be done at the database level?

I am writing some stored procedures to create tables and add data. One of the fields is a column that indicates percentage. The value there should be 0-100. I started thinking, "where should the data validation for this be done? Where should data validation be done in general? Is it a case by case situation?"
It occurs to me that although today I've decided that 0-100 is a valid value for percentage, tomorrow, I might decide that any positive value is valid. So this could be a business rule, couldn't it? Should a business rule be implemented at the database level?
Just looking for guidance, we don't have a dba here anymore.
Generally, I would do validations in multiple places:
Client side using validators on the aspx page
Server side validations in the code behind
I use database validations as a last resort because database trips are generally more expensive than the two validations discussed above.
I'm definitely not saying "don't put validations in the database", but I would say, don't let that be the only place you put validations.
If your data is consumed by multiple applications, then the most appropriate place would be the middle tier that is (should be) consumed by the multiple apps.
What you are asking in terms of business rules, takes on a completely different dimension when you start thinking of your entire application in terms of business rules. If the question of validations is small enough, do it in individual places rather than build a centralized business rules system. If it is a rather large system, them you can look into a business rules engine for this.
If you have a good data access tier, it almost doesn't matter which approach you take.
That said, a database constraint is a lot harder to bypass (intentionally or accidentally) than an application-layer constraint.
In my work, I keep the business logic and constraints as close to the database as I can, ensuring that there are fewer potential points of failure. Different constraints are enforced at different layers, depending on the nature of the constraint, but everything that can be in the database, is in the database.
In general, I would think that the closer the validation is to the data, the better.
This way, if you ever need to rewrite a top level application or you have a second application doing data access, you don't have two copies of the (potentially different) code operating on the same data.
In a perfect world the only thing talking (updating, deleting, inserting) to your database would be your business api. In the perfect world databae level constraints are a waste of time, your data would already have been validated and cross checked in your business api.
In the real world we get cowboys taking shortcuts and other people writing directly to the database. In this case some constraints on the database are well worth the effort. However if you have people not using your api to read/write you have to consider where you went wrong in your api design.
It would depend on how you are interacting with the database, IMO. For example, if the only way to the database is through your application, then just do the validation there.
If you are going to allow other applications to update the database, then you may want to put the validation in the database, so that no matter how the data gets in there it gets validated at the lowest level.
But, validation should go on at various levels, to give the user the quickest opportunity possible to know that there is a problem.
You didn't mention which version of SQL Server, but you can look at user defined datatypes and see if that would help you out, as you can just centralize the validation.
I worked for a government agency, and we had a -ton- of business rules. I was one of the DBA's, and we implemented a large number of the business rules in the database; however, we had to keep them pretty simple to avoid Oracle's dreaded 'mutating table' error. Things get complicated very quickly if you want to use triggers to implement business rules which span several tables.
Our decision was to implement business rules in the database where we could because data was coming in through the application -and- through data migration scripts. Keeping the business rules only in the application wouldn't do much good when data needed to be migrated in to the new database.
I'd suggest implementing business rules in the application for the most part, unless you have data being modified elsewhere than in the application. It can be easier to maintain and modify your business rules that way.
One can make a case for:
In the database implement enough to ensure overall data integrity (e.g. in SO this could be every question/answer has at least one revision).
In the boundary between presentation and business logic layer ensure the data makes sense for the business logic (e.g. in SO ensuring markup doesn't contain dangerous tags)
But one can easily make a case for different places in the application layers for every case. Overall philosophy of what the database is there for can affect this (e.g. is the database part of the application as a whole, or is it a shared data repository for many clients).
The only thing I try to avoid is using Triggers in the database, while they can solve legacy problems (if you cannot change the clients...) they are a case of the Action at a Distance anti-pattern.
I think basic data validation like you described makes sure that the data entered is correct. The applications should be validating data, but it doesn't hurt to have the data validated again on the database. Especially if there is more than one way to access the database.
You can reasonable restrict the database so that the data always makes sense. A database will support multiple applications using the same data so some restrictions make sense.
I think the only real cost in doing so would be time. I think such restrictions aren't a big deal unless you are doing something crazy. And, you can change the rules later if needed (although some changes are obviously harder than others)
First ideal: have a "gatekeeper" so that your data's consistency does not depend upon each developer applying the same rules. Simple validation such as range validation may reasonably be implemented in the DB. If it changes at least you have somewhere to put.
Trouble is the "business rules" tend to get much more complex. It can be useful to offload processing to the application tier where OO languages can be better for managing complex logic.
The trick then is to structure the app tier so that the gatekeeper is clear and unduplicated.
In a small organisation (no DBA ergo, small?) I would tend to put the business rules where you have strong development expertise.
This does not exclude doing initial validation in higher levels, for example you might validate all the way up in the UI to help the user get it right, but you don't depend upon that initial validation - you still have the gatekeeper.
If you percentage is always 'part divided by whole' (and you don't save part and whole values elsewhere), then checking its value against [0-100] is appropriate at db level. Additional constraints should be applied at other levels.
If your percentage means some kind of growth, then it may have any kind of values and should not be checked at db level.
It is case by case situation. Usually you should check at db level only constraints, which can never change or have natural limits (like first example).
Richard is right: the question is subjective the way it has been asked here.
Another take is: what are the schools of thought on this? Do they vary by sector or technology?
I've been doing Ruby on Rails for a bit now, and there, even relationships between records (one-to-many etc.) are NOT respected on the DB level, not to mention cascade deleting and all that stuff. Neither are any kind of limits aside from basic data types, which allow the DB to do its work. Your percentage thing is not handled on the DB level but rather at the Data Model level.
So I think that one of the trends that we're seeing lately is to give more power to the app level. You MUST check the data coming in to your server (so somewhere in the presentation level) and you MIGHT check it on the client and you MIGHT check again in the business layer of your app. Why would you want to check it again at the database level?
However: the darndest things do happen and sometimes the DB gets values that are "impossible" reading the business-layer's code. So if you're managing, say, financial data, I'd say to put in every single constraint possible at every level. What do people from different sectors do?

Tracing Designs - Screen to Database Traceability

This is vaguely related to:
Should I design the application or model (database) first?
Design from the database first through to UI or t’other way round?
But my question is more about modeling and artifacts and less about the right way to do design. I'm trying to figure out what sort of design artifact would best enunciate the link between features (use cases), screens and database elements (tables and columns, most particularly). UML is very code-centric. Database models are very database centric. And of screen designs are UI centric!
Here's the deal... my team is working on the first release of the product. We used use cases, then did screen designs and database design was somewhat isolated from the two. A critical area for bugs was the lack of traceability between the use cases and their accompanying screens and the database. In our product, there's a very high degree of overlap between use cases and database elements. Many use cases touch over 75% of the database infrastructure. So we have high contention over database design areas, and it's easy for a small database change to disrupt the lower levels of business logic.
For our next release, I want the developers and our DBA to have a really clear insight into what parts of the database each feature touches. The use case/screen design approach worked well, so we're keeping it... the trick is linking each use case and screens to the database model so the relationships are really obvious and hard to forget about.
On smaller projects (we're only 10 people, but often I've worked on teams of 3 or less), I've created my own custom diagrams to show this part of the design. Sort of a fusion of screen, UML and database table, done in Visio with no link to actual code or SQL. I'm not sure it will work for a larger team, as its highly manual to keep up to date, and it doesn't auto generate code the way our database modeling tool does.
Any recommendations? Is there a commonly accepted mechanism for this?
FYI - we're pretty waterfall, that isn't going to change any time soon. And we do love artifacts... Saying "switch to agile" is not a viable solution for our group.
I can't tell from your question how detailed your use cases are. I get the impression that they may be high-level use cases, not broken down into detailed uses cases (perhaps through include or extend relationships.
In any case, I prefer to start with Requirements, and trace them to use cases. While I'm writing the use cases and the use case diagrams, I'm also creating a domain model (a high-level class diagram). This is mostly to give me something to discuss with stakeholders ("did I get that right?").
When the use cases and domain model are finished, it's possible to begin to work on screen design, and possibly also for an activity model, if there are complex interactions between screens. I would treat the screens as though they were classes with UI - a screen might have a FirstName attribute, which I'd note as being related to the FirstName attribute of the Person entity in my domain model. Yet the FirstName attribute might be represented on that screen as a text box.
At the same time, physical database design can occur. This would produce a class or ER diagram, with traceability back to the domain model. Eventually, you might find that some of your screen attributes or activity modeling refer to things that are part of the physical database model that are not present in the domain model. It's ok to relate a screen "PersonalName" attribute to the computed PersonalName column in the Person table in the People schema.
The tool I use for this sort of thing is Sparx Enterprise Architect. It's a great tool, and can do all of this and more, even in the Professional edition.
I also have to say for the sake of Truth that I mostly model on my own - I haven't yet worked on a project where the model, code and database were being developed by separate people. If someone told me that the above wouldn't work in "real life", I might be forced to believe them.
I am not sure I understand your question clearly, but I'll try to respond based on some reasonable assumptions...
Essentially, my recommendation is the same as what John Saunders already suggested: to consider using UML along with a good UML tool. But I would like to add a few points that might be important in your specific situation.
First and foremost, I don't think that UML is "too code-centric". To the contrary, it can be used to model pretty much any aspect of a software system, almost at any level of abstraction. With a good tool at hand (like Sparx EA), the beauty of UML is that you actually do get a well-defined model under the hood (as opposed to just a set of unconnected drawings/charts). As a result, even if the tool itself does not give you a feature that you are looking for (like traceability from DB to use cases)... well, at least you have an option to automate (or at least semi-automate) the task yourself: for example, you can export your UML model into an XMI (standard!) and then derive whatever traceability-related info you need from there (e.g. using XSL or any XML-aware library for your favorite programming/scripting language). I am not saying that it would be easy to do (especially if you want traceability on the level of individual DB columns 8-), but it's possible and it is very likely to beat any manual method if it has to scale along the size of the project.
BTW, speaking of Sparx EA... I don't know all of its capabilities yet, but it has so many that I would not be surprized if it allowed you to select a class (or even an attribute of a class) and show you other model elements that depend on it in some way. You might want to check this out.
Having said all that, I do understand that you may have at least the following two important concerns about UML:
It may appear to require too many modeling details to be in place to get what you want.
As any "universal tool", it may be grossly inferior to specialized modeling tools that you already use.
Regarding issue #1: Again, with a good UML tool at hand, you might be able to do as many shortcuts as you want. For example, instead of building a very detailed and accurate activity model for a use case, you could focus just on classes involved in the use case (just enough to enable tracking classes back to use cases). The same applies to UI, of course.
Regarding issue #2: I don't know what exact tools do you use now to model use cases, UI, and DB schema. So, theoretically it is possible that they are all so superior to UML that you wouldn't want to give any of them up in exchange of easier traceability. However, something tells me that your DB modeling tool (with its code-generating capabilities) might be the only one that is truly indispensable. If that's the case, then I would still recommend to consider using UML: you just do not model down to DB schema level and "stop" at the level of domain model (even if you do not have it in your application!). At that point, the UML tool would give you traceability from domain model elements (entities, their attributes, and their relationships) back to use cases and UI elements, and mappings between your domain model and DB schema could be left "in the air" because, in the vast majority of cases, they should be simple enough to track without drawing anything. This might not give you 100% of what you want, but it could give you 80% that would be sufficient to mitigate most of you problems.
The bottom line: if you are using three different tools/technologies to model three different aspects of your system... well, it's obvious that any traceability between those three aspects remains at mercy of those three tools: you could automate only as much as those tools allow (which probably means that you are going to be stuck with a lot of laborous manual tasks to do). As of today, UML appears the only well-defined and widely supported "lingua franca" that could help you to connect your models and could enable automation of substantial part of your analytical activities. Just make sure you distinguish UML "just-drawing tools" (like most of Visio add-ons and stencils) from true UML modeling tools (like Sparx EA and a bunch of others).
Your use cases is a good place to start from.
Convert your use cases into
executable test code. This test code
needs to verify that the resultant
return values are according to the
requirements of your use cases.
The smaller the parts of the work you
can identify and test, the more
robust you will be able to build
your application.
This means that the interaction of
your use cases with a large part
of your database and the GUI, will
be simpler to understand.
It's hard to lock-down the architecture or business logic interplay in complex projects with complete upfront design of the different layers. One truly learns about what will be able to facilitate your requirements only as you get to the point of implementing them.
As a developer, find the techniques, tools and processes that help you do your job in the best way possible. Don't judge these on their origin. Judge them on their value in making you the best developer possible.
Some items from the agile world has certainly made a big difference to the quality and productivity of my work. This doesn't require throwing out the apple cart and putting an experienced waterfall team into disarray.
The database should model your Problem Domain. It should model it completely enough so that you can extract solutions -- truths -- from it. Bad design is essentially "lying" to the database (allowing the possibility of invalid data or relationships), and when you "lie" to your database, it'll "lie" to you when you ask it questions.
Simple examples are modeling many-to-many relation where a relation can only be one-to-many, or assuming values can't be null, or treating a foreign key as an attribute. Many of these can be avoided by proper normalization, which requires you to explicitly find out what is a key and what isn't.
By "making illegal states unrepresentable" in the model, you avoid having to write "defensive" code to check for the impossible or to validate that a relation is possible, as impossible things are made unrepresentable because of table structures or declarative check constraints.
This lowers the cost of writing your code, as you can concentrate for the most part, on what it needs to do, rather than guarding against the impossible.

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns and there will be about 10 child entities inheriting from the parent. Each child entity will have about 10 columns. I thought it made sense to give the parent entity its own table and give each of the children their own tables - a table-per-subclass structure.
Today, my users requested to see the structure of the system I created. They balked at the idea of the table-per-subclass structure. They would prefer one big ~100 column table because it would be easier for them to perform their own custom queries.
Should I consider denormalizing the database for the sake of the users?
Absolutely not. You can always create a view later to show them what they want to see.
They are effectively asking for a report.
You could give them access to a view containing all the fields they require... that way you don't mess up your data model.
No. Structure the data properly and if the users need the a denormalized view of the data create it as a VIEW in the database.
Alternatively, consider that perhaps an RDBMS is not the appropriate storage tool for this project.
They are the users and not the programmers of the system for a reason. Provide a separate interface for their queries. Power users like this can both be helpful and a pain to deal with. Just explain you need the database designed a certain way so you can do your job, period. Once that is accomplished you and provide other means to make querying easier.
What do they know!? You could argue that users shouldn't even be having direct access to a database in the first place.
Doing that leaves you open to massive performance issues, just because a couple of users are running ridiculous queries.
How about if you created a VIEW in the format your users wanted while still maintaining a properly normalized table?
Aside from a lot of the technical reasons for or against your users' proposition, you need to be on same page in communicating the consequences of various scenarious and (more importantly) the costs of those consequences. If the users are your clients and they are paying you to do a job, explain that their awful "proposed" ideas may cost them more money in development time, additional hardware resources, etc.
Hopefully you can explain it in such a way that shows your expertise and why your idea is a much better value to your users in the long run.
As everyone more or less mentioned, that way lies madness, and you can always build a view.
If you just can't get them to come around on this point, consider showing them this thread and the number of pros who weighed in saying that the users are meddling with things that they don't fully understand, and the impact will be an undermined foundation.
A big part of the developer's craft is the feel for what won't work out long term, and the rules of normalization are almost canonical in that respect. There are situations where you need to denormalize (data warehouses, etc) but this doesn't sound like one of them!
It also sounds as though you may have a particularly troubling brand of user on your hand -- the amatuer developer who thinks they could do your job better themselves if only they had the time. This may or may not help, but I've found that those types respond well to presentation -- a few times now I've found that if I dress sharp and show a little bit of force in my personality, it helps them feel like I'm an expert and prevents a bunch of problems before they start.
I would strongly recommend coming up with an answer that doesn't involve someone running direct reports against your database. The moment that happens, your DB structure is set in stone and you can basically consider it legacy.
A view is a good start, but later on you'll probably want to structure this as an export, to decouple further. Of course, then you'll encounter someone who wants "real time" data. Proper business analysis usually reveals this to be unnecessary. Actual real time requirements are not best handled through reporting systems.
Just to be clear: I'd personally favour the table per subclass approach, but I don't think it's actually as big an issue as the direct reporting off transaction tables is going to be.
I would opt for a view (as others have suggested) or an inline table-valued function (the benefits of this is you require parameters - like an date range or a customer account - which can help to stop users from querying without any limits on the problem space) first. An inline TVF is really a parametrized view and is far closer to a view in terms of how the engine treats them than it is to a multi-statement table valued function or a scalar function, which can perform incredibly poorly.
However, in some cases, this can impact production performance if the view is complex or intensive. With poorly written ad hoc user queries, it can also cause locks to persist longer or be escalated further than they would on a better built query. It is also possible for users to misinterpret an E-R data model and produce multiplied numbers in cases where there are many-to-one or many-to-many relationships. The next option might be to materialize these views with indexes or make tables and keep them updated, which gets us closer to my next option...
So, given those drawbacks of the view option and already thinking of mitigating it by starting to make copies of data, the next option I would consider is to have a separate read-only (for these users) version of the data which is structured differently. Typically, I would first look at a Kimball-style star schema. You do not need to have a full-fledged time-consistent data warehouse. Of course, that's an option, but you could simply keep a reporting model up to date with data. Star-schemas are a special form of denormalization and are particularly good for numerical reporting, and a given star should not be able to be abused by users accidentally. You can keep the star up to date in a number of ways, including triggers, scheduled jobs, etc. They can be very fast for reporting needs and run on the same production installation - perhaps on a separate instance if not just a separate database.
Although such a solution may require you to effectively more than double your storage requirements, when compared with other practices it might be a really good option if you understand your data well and don't mind having two models - one for transactions and one for analysis (note that you will already start to have this logical separation anyway with the use of a the simplest first option of view).
Some architects will often double their servers and use the SAME model with some kind of replication in order to provide a reporting server which is indexed more heavily or differently. Such a second server doesn't impact production transactions with reporting requirements and can be kept up to date fairly easily. There will only be one model, but of course, this has the same usability problems with allowing users ad hoc access to the underlying model only, without the performance affects, since they get their own playground.
There are a lot of ways to skin these cats. Good luck.
The customer is always right. However, the customer is likely to back down when you convert their requirement into dollars and cents. A 100 column table will require extra dev time to write the code that does what the database would do automatically with the proper implementation. Further, their support costs will be higher since more code means more problems and lower ease of debugging.
I'm going to play devil's advocate here and say that both solutions sound like poor approximations of the actual data. There's a reason that object-oriented programming languages don't tend to be implemented with either of these data models, and it's not because Codd's 1970 ideas about relations were the ideal system for storing and querying object-oriented data structures. :-)
Remember that SQL was originally designed as a user interface language (that's why it looks vaguely like English and not at all like other languages of that era: Algol, C, APL, Prolog). The only reasons I've heard for not exposing a SQL database to users today are security (they could take down the server!) and usability (who wants to write SQL when you can clicky clicky?), but if it's their server and they want to, then why not let them?
Given that "the largest part of the system revolves around an inheritance type of relationship", then I'd seriously consider a database that lets me represent that natively, either Postgres (if SQL is important) or a native object database (which are awesome to work with, if you don't need SQL compatibility).
Finally, remember that every engineering decision is a tradeoff. By "sticking to your guns" (as somebody else proposed), you're implicitly saying the value of your users' desires are zero. Don't ask SO for a correct answer to this, because we don't know what your users want to do with your data (or even what your data is, or who your users are). Go tell them why you want a many-tables solution, and then work out a solution with them that's acceptable to both of you.
You've implemented Class Table Inheritance and they're asking for Single Table Inheritance. Both designs are valid in certain situations.
You might want to get a copy of Martin Fowler's Patterns of Enterprise Application Architecture to read more about the advantages and disadvantages of each design. That book is a classic reference to have on your bookshelf, in any case.

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Business Logic: Database or Application Layer

The age old question. Where should you put your business logic, in the database as stored procedures ( or packages ), or in the application/middle tier? And more importantly, Why?
Assume database independence is not a goal.
Maintainability of your code is always a big concern when determining where business logic should go.
Integrated debugging tools and more powerful IDEs generally make maintaining middle tier code easier than the same code in a stored procedure. Unless there is a real reason otherwise, you should start with business logic in your middle tier/application and not in stored procedures.
However when you come to reporting and data mining/searching, stored procedures can often a better choice. This is thanks to the power of the databases aggregation/filtering capabilities and the fact you are keeping processing very close the the source of the data. But this may not be what most consider classic business logic anyway.
Put enough of the business logic in the database to ensure that the data is consistent and correct.
But don't fear having to duplicate some of this logic at another level to enhance the user experience.
For very simple cases you can put your business logic in stored procedures. Usually even the simple cases tend to get complicated over time. Here are the reasons I don't put business logic in the database:
Putting the business logic in the database tightly couples it to the technical implementation of the database. Changing a table will cause you to change a lot of the stored procedures again causing a lot of extra bugs and extra testing.
Usually the UI depends on business logic for things like validation. Putting these things in the database will cause tight coupling between the database and the UI or in different cases duplicates the validation logic between those two.
It will get hard to have multiple applications work on the same database. Changes for one aplication will cause others to break. This can quickly turn into a maintenance nightmare. So it doesn't really scale.
More practically SQL isn't a good language to implement business logic in an understandable way. SQL is great for set based operations but it misses constructs for "programming in the large" it's hard to maintain big amounts of stored procedures. Modern OO languages are better suited and more flexible for this.
This doesn't mean you can't use stored procs and views. I think it sometimes is a good idea to put an extra layer of stored procedures and views between the tables and application(s) to decouple the two. That way you can change the layout of the database without changing external interface allowing you to refactor the database independently.
It's really up to you, as long as you're consistent.
One good reason to put it in your database layer: if you are fairly sure that your clients will never ever change their database back-end.
One good reason to put it in the application layer: if you are targeting multiple persistence technologies for your application.
You should also take into account core competencies. Are your developers mainly application layer developers, or are they primarily DBA-types?
While there is no one right answer - it depends on the project in question, I would recommend the approach advocated in "Domain Driven Design" by Eric Evans. In this approach the business logic is isolated in its own layer - the domain layer - which sits on top of the infrastructure layer(s) - which could include your database code, and below the application layer, which sends the requests into the domain layer for fulfilment and listens for confirmation of their completion, effectively driving the application.
This way, the business logic is captured in a model which can be discussed with those who understand the business aside from technical issues, and it should make it easier to isolate changes in the business rules themselves, the technical implementation issues, and the flow of the application which interacts with the business (domain) model.
I recommend reading the above book if you get the chance as it is quite good at explaining how this pure ideal can actually be approximated in the real world of real code and projects.
While there are certainly benefits to have the business logic on the application layer, I'd like to point out that the languages/frameworks seem to change more frequently then the databases.
Some of the systems that I support, went through the following UIs in the last 10-15 years: Oracle Forms/Visual Basic/Perl CGI/ ASP/Java Servlet. The one thing that didn't change - the relational database and stored procedures.
Database independence, which the questioner rules out as a consideration in this case, is the strongest argument for taking logic out of the database. The strongest argument for database independence is for the ability to sell software to companies with their own preference for a database backend.
Therefore, I'd consider the major argument for taking stored procedures out of the database to be a commercial one only, not a technical one. There may be technical reasons but there are also technical reasons for keeping it in there -- performance, integrity, and the ability to allow multiple applications to use the same API for example.
Whether or not to use SP's is also strongly influenced by the database that you are going to use. If you take database independence out of consideration then you're going to have very different experiences using T-SQL or using PL/SQL.
If you are using Oracle to develop an application then PL/SQL is an obvious choice as a language. It's is very tightly coupled with the data, continually improved in every relase, and any decent development tool is going to integratePL/SQL development with CVS or Subversion or somesuch.
Oracle's web-based Application Express development environment is even built 100% with PL/SQL.
The only thing that goes in a database is data.
Stored procedures are a maintenance nightmare. They aren't data and they don't belong in the database. The endless coordination between developers and DBA's is little more than organizational friction.
It's hard to keep good version control over stored procedures. The code outside the database is really easy to install -- when you think you've got the wrong version you just do an SVN UP (maybe an install) and your application's back to a known state. You have environment variables, directory links, and lots of environment control over the application.
You can, with simple PATH manipulations, have variant software available for different situations (training, test, QA, production, customer-specific enhancements, etc., etc.)
The code inside the database, however, is much harder to manage. There's no proper environment -- no "PATH", directory links or other environment variables -- to provide any usable control over what software's being used; you have a permanent, globally bound set of application software stuck in the database, married to the data.
Triggers are even worse. They're both a maintenance and a debugging nightmare. I don't see what problem they solve; they seem to be a way of working around badly-designed applications where someone couldn't be bothered to use the available classes (or function libraries) correctly.
While some folks find the performance argument compelling, I still haven't seen enough benchmark data to convince me that stored procedures are all that fast. Everyone has an anecdote, but no one has side-by-side code where the algorithms are more-or-less the same.
[In the examples I've seen, the old application was a poorly designed mess; when the stored procedures were written, the application was re-architected. I think the design change had more impact than the platform change.]
Anything that affects data integrity must be put at the database level. Other things besides the user interface often put data into, update or delete data from the database including imports, mass updates to change a pricing scheme, hot fixes, etc. If you need to ensure the rules are always followed, put the logic in defaults and triggers.
This is not to say that it isn't a good idea to also have it in the user interface (why bother sending information that the database won't accept), but to ignore these things in the database is to court disaster.
If you need database independence, you'll probably want to put all your business logic in the application layer since the standards available in the application tier are far more prevalent than those available to the database tier.
However, if database independence isn't the #1 factor and the skill-set of your team includes strong database skills, then putting the business logic in the database may prove to be the best solution. You can have your application folks doing application-specific things and your database folks making sure all the queries fly.
Of course, there's a big difference between being able to throw a SQL statement together and having "strong database skills" - if your team is closer to the former than the latter then put the logic in the application using one of the Hibernates of this world (or change your team!).
In my experience, in an Enterprise environment you'll have a single target database and skills in this area - in this case put everything you can in the database. If you're in the business of selling software, the database license costs will make database independence the biggest factor and you'll be implementing everything you can in the application tier.
Hope that helps.
It is nowadays possible to submit to subversion your stored proc code and to debug this code with good tool support.
If you use stored procs that combine sql statements you can reduce the amount of data traffic between the application and the database and reduce the number of database calls and gain big performance gains.
Once we started building in C# we made the decision not to use stored procs but now we are moving more and more code to stored procs. Especially batch processing.
However don't use triggers, use stored procs or better packages. Triggers do decrease maintainability.
Putting the code in the application layer will result in a DB independent application.
Sometimes it is better to use stored procedures for performance reasons.
It (as usual) depends on the application requirements.
The business logic should be placed in the application/middle tier as a first choice. That way it can be expressed in the form of a domain model, be placed in source control, be split or combined with related code (refactored), etc. It also gives you some database vendor independence.
Object Oriented languages are also much more expressive than stored procedures, allowing you to better and more easily describe in code what should be happening.
The only good reasons to place code in stored procedures are: if doing so produces a significant and necessary performance benefit or if the same business code needs to be executed by multiple platforms (Java, C#, PHP). Even when using multiple platforms, there are alternatives such as web-services that might be better suited to sharing functionality.
The answer in my experience lies somewhere on a spectrum of values usually determined by where your organization's skills lie.
The DBMS is a very powerful beast, which means proper or improper treatment will bring great benefit or great danger. Sadly, in too many organizations, primary attention is paid to programming staff; dbms skills, especially query development skills (as opposed to administrative) are neglected. Which is exacerbated by the fact that the ability to evaluate dbms skills is also probably missing.
And there are few programmers who sufficiently understand what they don't understand about databases.
Hence the popularity of suboptimal concepts, such as Active Records and LINQ (to throw in some obvious bias). But they are probably the best answer for such organizations.
However, note that highly scaled organizations tend to pay a lot more attention to effective use of the datastore.
There is no standalone right answer to this question. It depends on the requirements of your app, the preferences and skills of your developers, and the phase of the moon.
Business logic is to be put in the application tier and not in the database.
The reason is that a database stored procedure is always dependen on the database product you use. This break one of the advantages of the three tier model. You cannot easily change to an other database unless you provide an extra stored procedure for this database product.
on the other hand sometimes, it makes sense to put logic into a stored procedure for performance optimization.
What I want to say is business logic is to be put into the application tier, but there are exceptions (mainly performance reasons)
Bussiness application 'layers' are:
1. User Interface
This implements the business-user's view of h(is/er) job. It uses terms that the user is familiar with.
2. Processing
This is where calculations and data manipulation happen. Any business logic that involves changing data are implemented here.
3. Database
This could be: a normalized sequential database (the standard SQL-based DBMS's); an OO-database, storing objects wrapping the business-data; etc.
What goes Where
In getting to the above layers you need to do the necessary analysis and design. This would indicate where business logic would best be implemented: data-integrity rules and concurrency/real-time issues regarding data-updates would normally be implemented as close to the data as possible, same as would calculated fields, and this is a good pointer to stored-procedures/triggers, where data-integrity and transaction-control is absolutely necessary.
The business-rules involving the meaning and use of the data would for the most part be implemented in the Processing layer, but would also appear in the User-Interface as the user's work-flow - linking the various process in some sequence that reflects the user's job.
Imho. there are two conflicting concerns with deciding where business logic goes in a relational database-driven app:
maintainability
reliability
Re. maintainability:  To allow for efficient future development, business logic belongs in the part of your application that's easiest to debug and version control.
Re. reliability:  When there's significant risk of inconsistency, business logic belongs in the database layer.  Relational databases can be designed to check for constraints on data, e.g. not allowing NULL values in specific columns, etc.  When a scenario arises in your application design where some data needs to be in a specific state which is too complex to express with these simple constraints, it can make sense to use a trigger or something similar in the database layer.
Triggers are a pain to keep up to date, especially when your app is supposed to run on client systems you don't even have access too.  But that doesn't mean it's impossible to keep track of them or update them.  S.Lott's arguments in his answer that it's a pain and a hassle are completely valid, I'll second that and have been there too.  But if you keep those limitations in mind when you first design your data layer and refrain from using triggers and functions for anything but the absolute necessities it's manageable.
In our application, most business logic is contained in the application's model layer, e.g. an invoice knows how to initialize itself from a given sales order.  When a bunch of different things are modified sequentially for a complex set of changes like this, we roll them up in a transaction to maintain consistency, instead of opting for a stored procedure.  Calculation of totals etc. are all done with methods in the model layer.  But when we need to denormalize something for performance or insert data into a 'changes' table used by all clients to figure out which objects they need to expire in their session cache, we use triggers/functions in the database layer to insert a new row and send out a notification (Postgres listen/notify stuff) from this trigger.
After having our app in the field for about a year, used by hundreds of customers every day, the only thing I would change if we were to start from scratch would be to design our system for creating database functions (or stored procedures, however you want to call them) with versioning and updates to them in mind from the get-go.
Thankfully, we do have some system in place to keep track of schema versions, so we built something on top of that to take care of replacing database functions.  It would've saved us some time now if we'd considered the need to replace them from the beginning though.
Of course, everything changes when you step outside of the realm of RDBMS's into tuple-storage systems like Amazon SimpleDB and Google's BigTable.  But that's a different story :)
We put a lot of business logic in stored procedures - it's not ideal, but quite often it's a good balance between performance and reliability.
And we know where it is without having to search through acres of solutions and codebase!
Scalability is also very important factor for pusing business logic in middle or app layer than to database layer.It should be understood that DatabaseLayer is only for interacting with Database not manipulating which is returned to or from database.
I remember reading an article somewhere that pointed out that pretty well everything can be, at some level, part of the business logic, and so the question is meaningless.
I think the example given was the display of an invoice onscreen. The decision to mark an overdue one in red is a business decision...
It's a continuum. IMHO the biggest factor is speed. How can u get this sucker up and running as quickly as possible while still adhering to good tenants of programming such as maintainability, performance, scalability, security, reliability etc. Often times SQL is the most concise way to express something and also happens to be the most performant many times, except for string operations etc, but that's where your CLR Procs can help. My belief is to liberally sprinkle business logic around whereever you feel it is best for the undertaking at hand. If you have a bunch of application developers who shit their pants when looking at SQL then let them use their app logic. If you really want to create a high performance application with large datasets, put as much logic in the DB as you can. Fire your DBA's and give developers ultimate freedom over their Dev databases. There is no one answer or best tool for the job. You have multiple tools so become expert at all levels of the application and you'll soon find that you're spending a lot more time writing nice consise expressive SQL where warranted and using the application layer other times. To me, ultimately, reducing the number of lines of code is what leads to simplicity. We have just converted a sql rich application with a mere 2500 lines of app code and 1000 lines of SQL to a domain model which now has 15500 lines of app code and 2500 lines of SQL to achieve what the former sql rich app did. If you can justify a 6 fold increase in code as "simplified" then go right ahead.
This is a great question! I found this after I had already asked a simliar question, but this is more specific. It came up as a result of a design change decision that I wasn't involved in making.
Basically, what I was told was that If you have millions of rows of data in your database tables, then look at putting business logic into stored procedures and triggers. That is what we are doing right now, converting a java app into stored procedures for maintainability as the java code had become convoluted.
I found this article on: The Business Logic Wars The author also made the million rows in a table argument, which I found interesting. He also added business logic in javascript, which is client side and outside of the business logic tier. I hadn't thought about this before even though I've used javascript for validation for years, to along with server side validation.
My opinion is that you want the business logic in the application/middle tier as a rule of thumb, but don't discount cases where it makes sense to put it into the database.
One last point, there is another group where I'm working presently that is doing massive database work for research and the amount of data they are dealing with is immense. Still, for them they don't have any business logic in the database itself, but keep it in the application/middle tier. For their design, the application/middle tier was the correct place for it, so I wouldn't use the size of tables as the only design consideration.
Business logic is usually embodied by objects, and the various language constructs of encapsulation, inheritance, and and polymorphism. For example, if a banking application is passing around money, there may be a Money type that defines the business elements of what "money" is. This, opposed to using a primitive decimal to represent money. For this reason, well-designed OOP is where the "business logic" lives—not strictly in any layer.

Resources