Im wondering what will be the best way to organize my DB. Let me explain:
Im starting a new "big" project. This big project will be composed by few litle ones. In general the litle projects are not related to each other, they are just features of the big one.
One thing that all the projects have in common is the users that are going to use it.
So my questions are:
Should i create different DB for each one of the litle projects
(currently each project will contain 4-5 tables)
How to deal with the users? Should I create one DB for all the users
or should i
duplicate the users table in every DB? Have in mind that the
information about the users is used a lot in every litle project,
it's NOT only for identification purposes.
Thanks in advance for your advice.
This greatly depends on the database you choose to use.
If these "sub-projects" are designed to work as one coherent unit, then I strongly recommend you keep it all in the same database. One backup, one restore, one unit.
For organizational purposes, if you are using a database which supports it, select a different Schema per project. PostgreSQL and SQL Server are two databases (among others) which support this effortlessly.
In the case of a database like MySQL, I recommend you pick a short prefix for each subproject and prefix all tables accordingly. "P1_Customer" for example.
Shared data would go in it's own schema or prefix, like Global or something like that.
Actually, this was one of the many reasons we switched our main database from MySQL to PostgreSQL. We've been heavy users of both, and I really appreciate the features that PostgreSQL offers. SQL Server, if you are in a windows environment, is a great database IMO as well.
If the little projects are "features of the big one" then I don't see a reason why you wouldn't want just one user table for the main project. The way you setup the question makes this seem true "If there is a user A in little project 1, then there must be a user A in the 'big' project." If that is true, you should likely have the users in the big db instead of doing duplication unless you have more qualifying details.
i think the proper answer is 'it depends'.
Starting your organization down the path of single centralized system is good on many levels. I think in general i would recommend this.
however:
if you are going to have dramatically different development schedules, or dramatically different user experiences with the various sub projects, then you may be better off keeping them separate.
I'd have a look at OpenID or some other single sign-on protocol depending on the nature of your application. OpenID includes a mechanism called "attribute exchange", which allows applications to retrieve profile information from the OpenID provider.
This allows you to create a central user profile repository, with an authentication scheme, and have your individual apps query that repository for profile information.
The question as to how to design your database is hard to answer without more information. In most architectures, "features" within an application tend to be closely linked - "users" are related to "accounts" are related to "organisations" etc.
I'd recommend looking at the foreign key relationships to answer this question. If you have lots of foreign keys, build a single database for all tables. If you have "clusters" of foreign keys, and you want to have a different life cycle for each application (assuming the clusters map neatly to the applications), consider separate databases.
By "life cycle", I mean mostly the development lifecycle - app 1 might deploy weekly, app 2 monthly, app 3 once only and then be frozen.
Related
I've built a winforms application that i'm currently rebuilding into an ASP.NET MVC application using Web API etc. Maybe an app will be added later on.
Assume that I will provide these applications to a few customers.
My applications are made for customer accounting.
So all of my customers wil manage their customers whithin the applications I provide.
That brings me to my question. Should I work with one big database for al my customers, or should I use seperate database for each of my customers? I'd like to ask the same for web app instances, api's etc.
Technicaly I think both options are possible. If it's just a mather of preference, all input is appreciated.
Some pros and cons I could think off:
One database:
Easy to setup/maintain
Install one update for all of my customers
No possibility to restore db for one customer
Not flexible in terms of resource spreading
Performance, this db can get realy large
Multiple databases:
Preformance, databases are smaller sized and can be spread by multiple servers
Easy to restore data if customer made a 'huge mistake'
The ability to provide customer specific needs (not needed atm)
Harder to setup/maintain, every instance needs to be updated seperately.
A kind of gateway/routing thing is needed to route users to the right datbase/app
I would like to know how the 'big companies' approach this.
You seem to be talking about database multi-tenancy, and you are right about the pros and cons.
The answer to this depends a lot on the kind of application you are building and the kind of customers it will have.
I would go with multi-tenant (single DB multiple tenants) database if
Your application is a multi-tenant application.
Your users do not need to store their own data backups.
Your DB schema will not change for each customer (this is implied in multi-tenant applications anyway).
Your tenants/customers will not have a huge amount of individual data.
Your customers don't have government imposed data isolation laws they need to comply with (EU data in EU, US data in US etc.).
And for individual databases pretty much the inverse of all those points.
At some point in the next few months our app will be at the size where we need to shard our DB. We are using Heroku for hosting, Node.js/PostgreSQL stack.
Conceptually, it makes sense for our app to have each logical shard represent one user and all data associated with that user (each user of our app generates a lot of data, and there are no interactions between users). We need to retain the ability for the user to do complex ad-hoc querying on their data. I have read many articles such as this one which talk about sharding: http://www.craigkerstiens.com/2012/11/30/sharding-your-database/
Conceptually, I understand how Sharding works. However in practice I have no idea how to go about implementing this on Heroku, in terms of what code I need to write and what parts of my application I need to modify. A link to a tutorial or some pointers would be much appreciated.
Here are some resources I have already looked at:
http://www.craigkerstiens.com/2012/11/30/sharding-your-database/
MySQL sharding approaches?
Heroku takes care of multiple database servers?
http://petrohi.me/post/30848036722/scaling-out-postgres-partitioning
http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/
https://devcenter.heroku.com/articles/heroku-postgres-follower-databases
Why do people use Heroku when AWS is present? What distinguishes Heroku from AWS?
As the author of the first article happy to chime in further. When it comes to sharding one of the very key components is what key are you sharding on. The complexity of sharding really comes into play when you have data that is intermingled across different physical nodes. If you're something like a multi-tenant app then modeling all your data around this idea of a tenant or customer can fit very cleanly in this setup. In that case you're going to want to break up all tables that are related to customer and shard them the same way as other tenant related tables.
As for doing this on Heroku, there are two options. You can roll your own with Heroku Postgres and application logic, or using something like Citus (which is an add-on that helps manage more of this for you.
For rolling your own, you'll first create the various application logic to handle creating all your shards and knowing where to route the appropriate queries to. For Rails there are some gems to help wtih this like activerecord-multi-tenant or apartment. When it comes to actually moving to sharding and that migration, what you'll want to do is create a Heroku follower to start. During the migration you'll have it start un-following. Then you'll remove half of the data from the original primary and the other half from the follower you separated accordingly.
I am not sure I would call this "sharding."
In LedgerSMB here is how we do things. Each company (business entity) is a separate database with fully separate data. Data cannot be shared between companies. One postgreSQL cluster can run any number of company databases. We have an administrative interface that creates the database and loads the schema. The administrative interface can also create new users, which can be shared between companies (optionally). I don't know quite how well it would work to share users between dbs on Heroku but I am including that detail in terms of how we work with PostgreSQL.
So this is a viable approach.
What you really need is something to spin up databases and manage users in an automated way. From there you can require that the user specifies a company name that you can map to a database however you'd like (this mapping could be stored in another database for example).
I know this is fairly high level. It should get you started however.
I am currently designing a web application where I will have customers signing up as companies. Each company will have its own set of users. As I am designing this I am wondering which approach would work best. I see sites like fogbugz or basecamp which use subdomains. In cases with subdomains do you have a database instance per sub domain? I'm wondering if it is recommended to have a database instance per company or if I should have some kind of company table and manage the company and user data/credentials all from one database.
Which approach is best? Is there literature on this subject (i.e. any web or book)?
thanks in advance!
You have to weigh up your options, as some of this will be a matter of opinion and might not be feasible for your implementation.
That being said, I'd consider the single database approach, for these reasons:
Maintenance: when running a database per registered 'client', you will very easily reach a situation where any changes or upgrades you make to your app's schema have to be applied to every single database instance. This will get ridiculous, fast.
Convenience: You might want analytics and usage stats, or some way to administrate all these databases. Querying a single database is comparatively trivial to trying to aggregate the same query for all your databases. This isn't going to scale.
Scalability *: As mentioned in 2, you're going to require a special sort of aggregation to query things about your clients, and your app as a whole. The bigger your app gets, the more complex your querying. The other issue is, if one client uses the app a lot more than another, what will you be encouraged to optimise? Your app, the bigger client's database, or the smaller client's? Not forgetting anything you do change has to be copied to all databases.
Backups: You can backup one database easily, just by creating a dump and stashing it somewhere. Get a thousand clients and now you have to run 1000 database dumps, and name them well enough to be able to identify them if one single database corrupts. How will you even know if this happens? Database errors will be localised to that specific one, as opposed to your entire app.
UI: A user signs up or is invited to use your app, and belongs to one particular client. Are you going to save that user account to the client's database? If so, see scalability for the issue of working with that data when the user wants to change their password, or you want to email them. So, do you tell the user to let you know which database they're in so you can find them?
Simplification: You have a database per client and want to just use a single one. How do you merge them all together without significantly breaking things? There'll be primary key conflicts if you use auto incremented IDs; bookmarked URLs will break if you decide to just regenerate the keys; foreign keys across tables will no longer point to the right records. Your data integrity will go down the pan.
You mention 'white label' services that offer their product through custom subdomains. I'm not privy to how these work, but the subdomain is only a basic CNAME or A record in their DNS zonefile. The process of adding these can be automated, and the design of the application and a bit of server configuration can deal with linking these subdomains to the correct accounts and data. They're just URLs, so maybe on the backend, the app doesn't differentiate between:
http://client.example.com
http://example.com/client
Overall though, you may decide that all these problems are things you can and would prefer to deal with. Be warned, however, that by doing so you may be shooting yourself in the foot, and you can gain a lot more from crafting a well-designed single database schema and a well-abstracted front-end.
*#xQbert mentions the very real benefit of scalability with multiple databases. I've amended this answer to clarify that I was more concerned with other aspects.
How many databases are needed for a social website? I have my tech team working on developing a social site but all their tables are in 1 database. I wanted to create separate table sets for user data, temporary tables, etc and thinking maybe have one separate database only for critical data, etc but I am not a tech person and now sure how this works? The site is going to be a local reviews website.
This is what happens when management tries to make tech decisions...
The simple answer, as always, is as few as possible.
The slightly more complicated answer is that once your begin to push the limits of your server and begin to think about multiple servers with master/slave replication then your may want your frequent write tables separated from your seldom write tables which will lower the master-slave update requirements.
If you start using seperate databases you can also run into an with you backup / restore strategy. If you have 5 databases and backup all five, what happens when you need to restore one of them, do you then need to restore all five?
I would opt for the fewest number of databases.
The reason you would want to have multiple databases is for scaling-out to multiple machines. In the context of a "social application" where large volume / high availability is a concern. If you anticipate the need to scale out to multiple machines to handle high volumes then the breakout of tables should be those that logically need to stay together.
So, for example, maybe you want to keep tables related to a specific subject area (maybe status updates) together in one database and other tables that are related to a different subject area (let's say user's picture libraries) together in a different database.
There are logical and performance reasons to keep tables in separate physical or logical databases.
What is the reason that you want it in different databases?
You could just put all tables in one database without a problem, even with for example multiple installations of an open source package. In that case you can use table prefixes.
Unless you are developing a really BIG website, one database is the way to proceed (by the way, did you consider the possible issues that may raise when working with various databases?).
If you are worried about performance, you can always configure different tablespaces on several storage devices in order to improve timings.
If you are worried about security, just increase it (better passwords, no direct root login, no port forwarding, avoid tunneling, etc.)
I am not a tech person only doing the functional analysis but I own the project so I need to oversee the tech team. My reason to have multiple database is security and performance.
Since this is going to be a new startup, there is no money to invest into strong security or getting the database designed flawless. Plus there are currently no backup policies in place so:
1) I want to separate critical data like user password/basic profile info, then separate out user media (photos they upload on their profile) and then the user content. Then separate out the system content. Current design is to have to layers of tables: Master tables for entire system and module tables for each individual module.
2) Performance: There are a lot of modules being designed and this is a data intensive social site with lots of reporting / analytic being builtin so lots of read/writes. Maybe better to distribute load across database based on purpose?
Since there isn't much funding hence I want to get it right the first time with my investment so the database can scale & work well until revenue comes in to actually invest in getting it right. Ofcourse that could be maybe 6 months away and say a million users away too.
Oh & there is plan to add staging/production mode also so seperate or same database?
You'll be fine sticking with using one database for now. Your developers can isolate/seperate application data by making use of database schema. Working with multiple databases can quickly become a journey through a world of pain and is to be avoided unless its absolutely crucial.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm designing a few applications that will share 2 or 3 database tables and all of the other tables will be independent of each app. The shared databases contain mostly user information, and there might occur the case where other tables need to be shared, but that's my instinct speaking.
I'm leaning over the one database for all applications solution because I want to have referential integrity, and I won't have to keep the same information up to date in each of the databases, but I'm probably going to end with a database of 100+ tables where only groups of ten tables will have related information.
The database per application approach helps me keep everything more organized, but I don't know a way to keep the related tables in all databases up to date.
So, the basic question is: which of both approaches do you recommend?
Thanks,
Jorge Vargas.
Edit 1:
When I talk about not being able to have referential integrity, it's because there's no way to have foreign keys in tables when those tables are in different databases, and at least one of the tables per application will need a foreign key to one of the shared tables.
Edit 2:
Links to related questions:
SQL design around lack of cross-database foreign key references
Keeping referential integrity across multiple databases
How to salvage referential integrity with mutiple databases
Only the second one has an accepted answer. Still haven't decided what to do.
Answer:
I've decided to go with a database per application with cross-database references to a shared database, adding views to each database mimicking the tables in the shared database, and using NHibernate as my ORM. As the membership system I'll be using the asp.net one.
I'll also use triggers and logical deletes to try and keep to a minimum the number of ID's I'll have flying around livin' la vida loca without a parent. The development effort needed to keep databases synced is too much and the payoff is too little (as you all have pointed out). So, I'd rather fight my way through orphaned records.
Since using an ORM and Views was first suggested by svinto, he gets the correct answer.
Thanks to all for helping me out with this tough decision.
Neither way looks ideal
I think you should consider not making references in database layer for cross-application relations, and make them in application layer. That would allow you to split it to one database per app.
I'm working on one app with 100+ tables. I have them in one database, and are separated by prefixes - each table has prefix for module it belongs to. Then i have built a layer on top of database functions to use this custom groups. I'm also building data administrator, which takes advantage of this table groups and makes editing data very easy.
It depends and your options are a bit different depending on the database and frameworks you're using. I'd recommend using some sort of ORM and that way you don't need to bother that much. Anyways you could probably put each app in it's own schema in the database and then either reference the shared tables by schemaname.tablename or create views in each application schema that's just a SELECT * FROM schemaname.tablename and then code against that view.
There are no hard and fast rules to choose one over the other.
Multiple databases provide modularity. As far as sync-ing across multiple databases are concerned, one can use the concept of linked servers and views thereof and can gain the advantages of integrated database (unified access) as well.
Also, keeping multiple databases can help better management of security, data, backup & restore, replication, scaling out etc!
My 2cents.
THat does not sound like "a lot of applications" at all, but like "one application system with different executables". Naturally they can share one database. Make smart usage of Schemata to isolate the different funcational areas within one database.
One database for all application in my opinion .Data would be stored once no repitation.
With the other approach you would end up replicating and in my opinion when you start replicating it will bring its own headache and data would go out of sync too
The most appropriate approach from scalability and maintenence point of view would be to make the "shared/common" tables subset self-sufficient and put it to "commons" database, for all others have 1 db per application of per logical scope (business logic determined) and maintain this structure always
This will ease the planning and execution commissioning/decommissioning/relocation/maintenence procedures of your software (you will know exactly which two affected DBs (commons+app_specific) are involved if you know which app you are going to touch and vice versa.
At our business, we went with a separate database per application, with cross database references for the small amount of shared information and an occasional linked server. This has worked pretty well with a development, staging, build and production environments.
For users, our entire user base is on windows. We use Active Directory to manage the users with application references to groups, so that the apps don't have to manage users, which is nice. We did not centralize the group management, that is each application has tables for groups and security which is not so nice but works.
I would recommend, that if your applications are really different, to have a database per application. Looking back, the central shared database for users sounds workable as well.
You can use triggers for cross database referential integrity:
Create a linked server to the server that holds the database that you want to reference. Then use 4-part naming to reference the table in the remote database that holds the reference data. Then put this in the insert and update triggers on the table.
EXAMPLE(assumes single row inserts and updates):
DECLARE #ref (datatype appropriate to your field)
SELECT #ref = refField FROM inserted
IF NOT EXISTS (SELECT *
FROM referenceserver.refDB.dbo.refTable
WHERE refField = #ref)
BEGIN
RAISERROR(...)
ROLLBACK TRAN
END
To do multi row inserts and updates you can join the tables on the reference field but it can be very slow.
I think the answer to this question depends entirely on your non functional requirements. If you are designing a application that will one day need to be deployed across 100's of nodes then you need to design your database so that if need be it could be horizontally scaled. If on the other hand this application is to be used by a hand full of users and may have a short shelf life then you approach will be different. I have recently listened to a pod cast of how EBAY's architecture is set-up, http://www.se-radio.net/podcast/2008-09/episode-109-ebay039s-architecture-principles-randy-shoup, and they have a database per application stream and they use sharding to split tables across physical nodes. Now their non-functional requirements are that the system is available 24/7, is fast, can support thousands of users and that is does not lose any important data. EBAY make millions of pounds and so can support the effort that this takes to develop and maintain.
Anyway this does not answer your question:) my personnel opinion would be to make sure your non-functional requirements have been documented and signed off by someone. That way you can decide on the best Architecture. I would be tempted to have each application using its own database and a central database for shared data. And I would try to minimise the dependencies between them, which I'm sure is not easy or you would have done it:), but I would also try to steer clear of having to produce some sort of middle ware software to keep tables in sync as this could create a headaches for you.
At the end of the day you need to get your system up and running and the guys with the pointy hair wont give a monkeys chuff about how cool your design is.
We went for splitting the database down, and having one common database for all the shared tables. Due to them all being on the save SQL Server instance it didn't affect the cost of running queries across multiple database.
The key in replication for us was that whole server was on a Virtual Machine (VM), so for replication to create Dev/Test environments, IT Support would just create a copy of that image and restore additional copies when required.