Separating data from different sites - database

We are creating a web solution that contains large number of users, their events, calendars and content to be managed. This solution can be white-labeled and can be sold to other vendors as a services, i.e. Though the hosting is in our SINGLE server but thy will have their own administrator and there own users and separate contents, that are completely disconnected to the other vendors. For example we are going to host the solution as
www.example.com/company1
www.example.com/company2
www.example.com/company3
The question is should we use different database for different company, or we should use single database for managing all the company.
Thanks

You should use separate databases for each company, unless you are offering some sort of service where the companies know that data is being pooled.
This is a question of data protection. No matter how much you swear that one company can only see their data in the table, you may not be able to convince prospective clients of this fact.
In addition, you need to keep the options open of running the databases on different servers. You don't want peak performance at one company to affect another company. Or, you don't want a special change for one company -- which might require bringing down the application with their knowledge -- to affect other clients.

Related

Architecture: one or multiple databases for sub customers (web)APP

I've built a winforms application that i'm currently rebuilding into an ASP.NET MVC application using Web API etc. Maybe an app will be added later on.
Assume that I will provide these applications to a few customers.
My applications are made for customer accounting.
So all of my customers wil manage their customers whithin the applications I provide.
That brings me to my question. Should I work with one big database for al my customers, or should I use seperate database for each of my customers? I'd like to ask the same for web app instances, api's etc.
Technicaly I think both options are possible. If it's just a mather of preference, all input is appreciated.
Some pros and cons I could think off:
One database:
Easy to setup/maintain
Install one update for all of my customers
No possibility to restore db for one customer
Not flexible in terms of resource spreading
Performance, this db can get realy large
Multiple databases:
Preformance, databases are smaller sized and can be spread by multiple servers
Easy to restore data if customer made a 'huge mistake'
The ability to provide customer specific needs (not needed atm)
Harder to setup/maintain, every instance needs to be updated seperately.
A kind of gateway/routing thing is needed to route users to the right datbase/app
I would like to know how the 'big companies' approach this.
You seem to be talking about database multi-tenancy, and you are right about the pros and cons.
The answer to this depends a lot on the kind of application you are building and the kind of customers it will have.
I would go with multi-tenant (single DB multiple tenants) database if
Your application is a multi-tenant application.
Your users do not need to store their own data backups.
Your DB schema will not change for each customer (this is implied in multi-tenant applications anyway).
Your tenants/customers will not have a huge amount of individual data.
Your customers don't have government imposed data isolation laws they need to comply with (EU data in EU, US data in US etc.).
And for individual databases pretty much the inverse of all those points.

How to structure/coordinate multiple databases?

Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.

Building a web application with multiple database instances or just a single instance

I am currently designing a web application where I will have customers signing up as companies. Each company will have its own set of users. As I am designing this I am wondering which approach would work best. I see sites like fogbugz or basecamp which use subdomains. In cases with subdomains do you have a database instance per sub domain? I'm wondering if it is recommended to have a database instance per company or if I should have some kind of company table and manage the company and user data/credentials all from one database.
Which approach is best? Is there literature on this subject (i.e. any web or book)?
thanks in advance!
You have to weigh up your options, as some of this will be a matter of opinion and might not be feasible for your implementation.
That being said, I'd consider the single database approach, for these reasons:
Maintenance: when running a database per registered 'client', you will very easily reach a situation where any changes or upgrades you make to your app's schema have to be applied to every single database instance. This will get ridiculous, fast.
Convenience: You might want analytics and usage stats, or some way to administrate all these databases. Querying a single database is comparatively trivial to trying to aggregate the same query for all your databases. This isn't going to scale.
Scalability *: As mentioned in 2, you're going to require a special sort of aggregation to query things about your clients, and your app as a whole. The bigger your app gets, the more complex your querying. The other issue is, if one client uses the app a lot more than another, what will you be encouraged to optimise? Your app, the bigger client's database, or the smaller client's? Not forgetting anything you do change has to be copied to all databases.
Backups: You can backup one database easily, just by creating a dump and stashing it somewhere. Get a thousand clients and now you have to run 1000 database dumps, and name them well enough to be able to identify them if one single database corrupts. How will you even know if this happens? Database errors will be localised to that specific one, as opposed to your entire app.
UI: A user signs up or is invited to use your app, and belongs to one particular client. Are you going to save that user account to the client's database? If so, see scalability for the issue of working with that data when the user wants to change their password, or you want to email them. So, do you tell the user to let you know which database they're in so you can find them?
Simplification: You have a database per client and want to just use a single one. How do you merge them all together without significantly breaking things? There'll be primary key conflicts if you use auto incremented IDs; bookmarked URLs will break if you decide to just regenerate the keys; foreign keys across tables will no longer point to the right records. Your data integrity will go down the pan.
You mention 'white label' services that offer their product through custom subdomains. I'm not privy to how these work, but the subdomain is only a basic CNAME or A record in their DNS zonefile. The process of adding these can be automated, and the design of the application and a bit of server configuration can deal with linking these subdomains to the correct accounts and data. They're just URLs, so maybe on the backend, the app doesn't differentiate between:
http://client.example.com
http://example.com/client
Overall though, you may decide that all these problems are things you can and would prefer to deal with. Be warned, however, that by doing so you may be shooting yourself in the foot, and you can gain a lot more from crafting a well-designed single database schema and a well-abstracted front-end.
*#xQbert mentions the very real benefit of scalability with multiple databases. I've amended this answer to clarify that I was more concerned with other aspects.

Database tables - how many database?

How many databases are needed for a social website? I have my tech team working on developing a social site but all their tables are in 1 database. I wanted to create separate table sets for user data, temporary tables, etc and thinking maybe have one separate database only for critical data, etc but I am not a tech person and now sure how this works? The site is going to be a local reviews website.
This is what happens when management tries to make tech decisions...
The simple answer, as always, is as few as possible.
The slightly more complicated answer is that once your begin to push the limits of your server and begin to think about multiple servers with master/slave replication then your may want your frequent write tables separated from your seldom write tables which will lower the master-slave update requirements.
If you start using seperate databases you can also run into an with you backup / restore strategy. If you have 5 databases and backup all five, what happens when you need to restore one of them, do you then need to restore all five?
I would opt for the fewest number of databases.
The reason you would want to have multiple databases is for scaling-out to multiple machines. In the context of a "social application" where large volume / high availability is a concern. If you anticipate the need to scale out to multiple machines to handle high volumes then the breakout of tables should be those that logically need to stay together.
So, for example, maybe you want to keep tables related to a specific subject area (maybe status updates) together in one database and other tables that are related to a different subject area (let's say user's picture libraries) together in a different database.
There are logical and performance reasons to keep tables in separate physical or logical databases.
What is the reason that you want it in different databases?
You could just put all tables in one database without a problem, even with for example multiple installations of an open source package. In that case you can use table prefixes.
Unless you are developing a really BIG website, one database is the way to proceed (by the way, did you consider the possible issues that may raise when working with various databases?).
If you are worried about performance, you can always configure different tablespaces on several storage devices in order to improve timings.
If you are worried about security, just increase it (better passwords, no direct root login, no port forwarding, avoid tunneling, etc.)
I am not a tech person only doing the functional analysis but I own the project so I need to oversee the tech team. My reason to have multiple database is security and performance.
Since this is going to be a new startup, there is no money to invest into strong security or getting the database designed flawless. Plus there are currently no backup policies in place so:
1) I want to separate critical data like user password/basic profile info, then separate out user media (photos they upload on their profile) and then the user content. Then separate out the system content. Current design is to have to layers of tables: Master tables for entire system and module tables for each individual module.
2) Performance: There are a lot of modules being designed and this is a data intensive social site with lots of reporting / analytic being builtin so lots of read/writes. Maybe better to distribute load across database based on purpose?
Since there isn't much funding hence I want to get it right the first time with my investment so the database can scale & work well until revenue comes in to actually invest in getting it right. Ofcourse that could be maybe 6 months away and say a million users away too.
Oh & there is plan to add staging/production mode also so seperate or same database?
You'll be fine sticking with using one database for now. Your developers can isolate/seperate application data by making use of database schema. Working with multiple databases can quickly become a journey through a world of pain and is to be avoided unless its absolutely crucial.

Software as a service - Database

If I am building a CRM web application to sell as a membership service, what is the best method to design and deploy the database?
Do I have 1 database that houses 100s of records per table or deploy multiple databases for different clients?
Is it really an issue to use a single database since I believe sites like Flickr use them?
Multiple clients is called "multi-tenant". See for example this article "Multi-Tenant Data Architecture" from Microsoft.
In a situation like a CRM system, you will probably need to have separate instances of your database for each customer.
I say this because if you'd like larger clients, most companies have security policies in place regarding customer data. If you store their customer data in the same database as another customer, you're running the risk of exposing one companies confidential data to another company (a competitor, etc.).
Sites like Flickr don't have to worry about this as much since the majority of us out on the Interwebs don't have such strict policies regarding our personal data.
Long term it is easiest to maintain one database with multiple clients' data in it. Think about deployment, backup, etc. However, this doesn't keep you from having several instances of this database, each containing a subset of the full client dataset. I'd recommend to grow the number of databases after you have established the usefulness/desirability of your product. Having complex infrastructure is not necessary if you have no traffic....
So, I'd just put a client id in the relevant tables and smile when client 4 comes in and the extent of your new deployment is one insert statement.

Resources