CMS Database design - Master database or Multi-Db per site - database

I am in process of designing my CMS that I am about to create. I was thinking about the database and how I want to go by approaching it.
Do you think its best to create 1 master database for all my clients websites? or Should I have 1 database per site?
What is the benefits and negatives on both approaches? I am always thinking about the future so I was thinking about implementing memcache or APC cache to the project, to offer an option to my client.
Just trying to learn the best practices and what other developers apporach would be

I've run both. My business chooses to separate client-specific data into separate tables so that if one happens to go corrupt, not all are taken down. In an ideal world this might never happen, but murphy's law....It does seem very easy to find things with them separated. You will know with 100% certainty that one client's content will never show up on another's page.
If you do go down that route, be prepared to create scripts that build and configure databases for you. There's nothing fun about building a great system and having demand for it, only to spend your time manually setting up DB's and installs all day long. Also, setting db names is one additional step that's not part of using a single db table--it's a headache that will repeat itself seemingly over and over again.

Develop the single master DB. It will take a small amount of additional effort and add a little bit more complexity to the database design, but will give you a few nice features. The biggest is being able to share data between sites.
Designing for a master database means that you have the option to combine sites when it makes sense, but also lets you install a master per site. Best of both worlds.

It depends greatly upon the amount of customization each client will require. If you forsee clients asking for many one-off features specific to their deployment, separate databases based off of a single core structure might make sense. I would highly recommend trying to make any customizations usable by all clients though, and keep all structure defined in one place/database instead of duplicating it across multiple databases. By using one database, you make updating the structure straightforward and the implementation consistent across all sites so they can all use the same CMS code.

Related

Should we start with multiple small-grained databases for an app that may scale massively

We're developing a new eCommerce website and are using NHibernate for the first time. At present we are splitting our data into multiple SQL Server databases, divided per area of functionality. So we have one for UserInfo, one for Orders, one for ProductCatalogue and so on...
Our justification for this decision is twofold really:
the website has the potential to be HUGE (it is a new website for one of the largest online brands in the UK) and we feel that by partitioning our data along functional lines we will be able to move the databases onto their own servers which would give us an easy scaling route should we need it;
my team has always worked this way - partly as a consequence of following the MS Commerce Server pattern from previous projects.
However, reading up on this decision on the internet, we find that the normal response to this sort of model is extremely scathing. "Creating more work for the devs now in order to create more work for the devs later" is one sample comment from Stack Overflow!
In addition, NHibernate is much easier to use with only one database (just one SessionFactory needed). And knowing that Stack Overflow ran off just one box for a long time makes me think that maybe we should not try to be so clever.
So, my question is, "are we correct in thinking that using fine-grained databases might increase our ability to scale or should we sacrifice this for easier development"?
Why don't you just design your database properly and put the files on appropriate disk? Use a cluster if necessary. Creating multiple databases is not an inherently scaling solution. Also - cross database referential integrity? Good luck.
What's your definition of "HUGE"? SQL Server can handle massive databases, but one thing I've learnt is that people often have no idea what constitutes a lot of data.
I've never worked in a project like this. I'm used to databases with several hundred tables, which had never been a problem.
Therefore I can't say if your idea is a good idea, I never tried it. The "my team has always worked this way"-argument is a major driver for many decisions, and I can't even say that it is always wrong.
With NHibernate you organize your data in classes. They can be in different namespaces and assemblies. You usually don't work much with the database directly, you don't need this kind of structure there.
About the scalability argument: I'm not sure if it is really scaling well when you need to access several databases every time. I mean: you always need users and orders and probably more. Then you need to get all this data from several databases.
Agree fully with starskythehutch - keep your related tables together in the same DB. BUT, you may want to consider having separate databases for things that are not related or non-critical to your main product; but that are a part of the app.
For eg: if you decide to log every visit/hit to the site in a DB, you should probably keep that in a separate DB.
The reason you should consider:
1. huge number of transactions - say hundreds of thousands / sec. Having non-critical un-related stuff in a separate DB will ensure that tlog contentions because of this are avoided.
Restore, DBCC CHECKDB, backup times. If you stuff your non-related non-critical stuff in your main DB, you are essentially increasing the size of your DB and it will affect these operations. Having it in separate DB will help you improve performance of these operations.

Which comes first: database or application logic?

What is the best way or recommended best practice in the flow of database driven asp.net web application? I mean the database first or coding first or side by side?
Your data access code won't compile without an existing database - unless you stub (or Mock) it. So probably the database comes first.
But it is a bad idea to do whole chunks of the application in isolation. Ideally you should design and build slivers of the system - database and application - hand-in-hand. These slivers should be cohesive sub-sets of functionality, probably smaller than sub-systems. Inevitably, the act of coding screens and business rules will throw up problems in the data model. So it is good to have a data modeller or DBA who is happy to work incrementally alongside the developers.
edit
Stephanie makes an extremely pertinent point:
"the core tables which are persisting
your app's data really can't be
piecemealed. Most of the data is known
at project start. It has a form, you
need to find it."
I agree that the core entities are knowable at project start, and the physical data model can be derived from that logical data model. But I don't think it is ever possible to nail down completely the structure of any table, even a core table, at the start. This is because at the start of the design/build phase all we have to go on are the Requirements, and if there's one thing history tells us about the Requirements it is that they will change.
So, new tables will be needed and some existing tables will become obsolete. There will be columns which need to be added, columns which need to be modified, columns which need to be dropped. This is why Nature gave us the ALTER TABLE statement.
I am not suggesting that we don't design our tables, or assemble them piecemeal. I am merely suggesting that when we start designing the HR sub-system we need to worry about the EMPLOYEES table and the SALARIES table. We don't need to concern ourselves with INVENTORY or ORDERS until we commence work on Sales.
We personally start with the Domain and do things side-by-side. The important part is that we implement vertical slices of the application (fully working end-to-end features), not horizontal slices (e.g. first the whole database layer, then the data access, then the services, then the presentation): we build the application incrementally and demonstrate progress with working code after each iteration.
Applications are all about features.
You don't build apps to store data,
but to provide functionality. If we
can't agree on that, the discussion is
moot of course. Software should be
developed to satisfy the needs of its
users and not of its developers.
Well I have really no understanding of the second sentence. If you think my company pays me a good salary to write code that satisfies me and not my users you're crazy. So that argument is a strawman. Back to the first.
This is a common view point of application centric people (they), vs. database centric people (We). They see the entire point of the exercise to "provide features". Those are things the clients know they want and ask for them. To them, the database is just persistence required for these features. And when they are done, that's it, features delivered, database is sufficient for those features. Could be an entire Rube Goldberg inside the database with redundant data, severe violations of normal forms, constraints enforced by the application, what have you.
think overall usability alone outweighs database design
If the design of your database is affecting your usability than the design was bad. I have no doubt that one who strives for features will leave the database in such a state that it severely hampers usability.
Data Centric people, don't look at a system as a place to provide only what's been asked for, but a repository of Intellectual Capital that can be exploited by more than whatever the Application-du-jour is. I can't begin to describe the number of cases where one team has used the database of some other team's app to enhance their apps value. Just look at all the medical research that is nothing more that the meta-analysis of existing studies. None of that is possible if you believe that only the features of your app matter and subsequent uses of your apps data do not.
A good data model isn't inviolate. Sure you'll add to it, change it when requirements change. But if you don't completely understand your data, I don't know how anyone can begin to write code.
I guess you need first to define datamodel and only then going coding. You should plan everything carefully before actually writting the code.
First is a feature list.
Then, detailed spec.
Then test plan and design of all, including databases.
Then, it wouldn't matter which to implement first.
You'll probably end up doing it "side by side".
You need some data to be able to test the application, but you need the application to be able to verify that you're storing the correct data.
Do some modelling first and then build the minimum you can for one or two features. Then when these are working add the next feature and so on.
You'll need to write some database update procedures (both the code and the rules about what and when to update) as you will have to extend your tables, but you'll need those for the final system anyway as it will have to change as new requirements come along.
Having done it quite a few times, I find myself invariably doing it like so:
Define the problem I'm trying to solve.
Write out some use-cases.
Have my significant other or a friend tell me if this is even a problem.
Sketch out a few sample screens.
Write flow diagrams for the use cases.
Ask my Rubber-duck questions.
Use questions to refine 1-6.
Write out the 'nouns'. Those become my data Model.
Write out the actions. Those become application logic.
Code data Model.
Code Application Logic.
Realize I've gotten it a little wrong.
Repeat 10-12 as many times as needed.
Ask, "Have I solved the problem"?
If not, rinse, lather and repeat 1-15.
This is a trick question. IMO, they both come in parallel during your planning and design phase. They are so closely related that it make sense to do them together. Just keep in mind that your database design will be almost fully developed while your code is still in its infancy (though your application logic should be almost fully mapped out in you head or on paper)
The idea is that you're designing your solution in the context of the problem. When you're planning out your solution you will be (or should be) defining your application as a set of things and actions (nouns and verbs).
For example, a very basic helpdesk program has people and tickets. People need to create tickets, update tickets, and close tickets. The nouns that require persistent storage will comprise your database, and the nouns + actions will be contained in your application.
Sometimes your table mappings and the relationship between tables will be obvious (IE people create tickets, ticket.creatorID = people.personID) and other times the relationship doesn't really click in your head until you start working through use cases or until you start writing your code (IE different ppl have different access levels defining what they can do. At a glance this would seem like a simple field in a table, but in practice it is better as a separate table).

How to organize a database in Django with multiple, distinct apps?

I'm new to Django (and databases in general), and I'm not sure how to structure the following. The sources of data I'll have for my site are:
a blog
for a few different games:
a high score list
user-created levels
If I were storing the data in ordinary files, I'd just have one file for each of the above. In Django, ideally (I think) I'd have a separate database for each of these, but apparently multiple database support isn't there for Django yet. I'm worried (unnecessarily?) about keeping everything in one database for two reasons:
If I screw something up in one of the sections, I don't want to mess up the rest of the data.
When I'm working on one of these sections, I'd like the freedom to easily change the model around. Since I've learned that syncdb doesn't, in fact, sync the database, I've decided that the easiest thing to do when messing around with a model is to simply wipe the database and start over. Again, I'm worried about messing up the other sections. I looked at south, and it seems like more trouble than it's worth during the planning stages of an app (but I'll reconsider later when there's actually valuable data).
Part of the problem is that I'm not really comfortable keeping my data in a binary format. I'm used to text, so I can easily diff it, modify it in an editor, etc., without going through some magical database interface (I'm using postgresql, by the way).
Are my fears unfounded? How do people normally handle this problem?
For what it's worth, I totally understand your frustration as I went through a really similar thought process when starting. Unfortunately, there isn't much you can do (easily, anyway) besides get familiar with the tools you'll be using.
First of all, you don't need multiple databases for what you're doing - one will do. Each app will create its own tables in the database which are somewhat isolated from one another. As czarchaic mentioned, you can do python manage.py reset app_name to reset just one of them in case you change your model. You will lose that data, though.
To get data in relatively easy to work with format, you can use the command python manage.py dumpdata > file_name.json, and then to reload it later python manage.py loaddata file_name.json. You can use this for backups, local test data, or as a poor man's migration (hint: South would be easier).
Yet another option is to use a NoSQL database for any data you think will need extra flexibility. Just keep in mind that Django doesn't have support for any of these at the moment. That means no no admin support or ModelForms. Of course, having a model may become unnecessary.
In short, your fears are unfounded. You should "organize" your database by project to use the Django term. Each model in each app will have it's own table, but they will all be in the same database. Putting them in a separate database isn't a good idea for a whole host of reasons, the biggest is that you cannot query across databases.
While I agree that south is probably a bit heavy for your initial design/dev stages it should be considered seriously for anything even resembling a beta version and absolutely necessary in production.
If you're going to be messing with your models a bunch during development the best thing to do is use fixtures to load data in quickly after running sync. Or, if you are going to be changing a bunch of required fields, then write some quick Python to create dummy data for you.
As for not trusting your data to binary, a simple "pg_dump " will get you a textual version of your data. It sounds to me like you're working on your app against live production data, which is a mistake. Your goal should be to get your application built, working, and tested on fake data or at least a copy of your production data and then when you're sure everything is golden migrate it into production. This is where things like south come in handy as you can automate this deployment and it will help you handle any database table/column changes you need to make.
I'm sure all of this sounds like a pain, but all of it is able to be automated and trust me it makes your life down the road much much easier.
I generally just reset the module
>>> python manage.py reset blog
this will reset all tables in INSTALLED_APPS.blog
I'm not sure if this answers your question but it's much lest destructive than wiping the DB.
Syncdb should really only be used for development. That's why it doesn't really matter if you wipe the tables and start again, perhaps exporting look up data into a json file that you can import each time you sync.
When your site reaches production however, you have a little more work. Any changes you make to your models that need to be reflected in the database, need to be emitted as SQL and run manually on the database. There's a django-admin.py function to emit the suggested SQL, which you can use to build up a script to run on the database to migrate it. Like you mention, a migrations app like South can be beneficial here but it's not strictly needed.
As far as your separation of sites goes, run them as separate sites/projects. You can have a separate settings file per project which allows you to run two different databases. This is in contrast to running the two sites as separate apps within the same project. If they're totally separate they probably shouldn't be in the same project unless you need to leverage common code.

Using a common database for collaborative development

Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?
Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.
I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.
We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).
Having a single point of Failure is not a good thing isn't it?
I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.
If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Resources