Single or Multiple database? - database

I am planning to implement an product offering SaaS, while planning for architecture our team came across whether to use single database for all customers or using separate database for each customer?
Can anyone tell me what are pros and cons of using multiple database? (performance, table caching, etc..)

The pros of individual DBs are that you can tune the individual DBs to customer needs. Put the high demand customers on their own machines, bundle low demand on a single machine, etc.
The primary value of a single db is having a single view of all of the data, if that's important to you (statistics, the application, whatever). Most DBs don't give you the option of specifying "hot" parts of a table for caching and whatever, but you can do that for individual tables.
The amount of overhead that each individual DB has within a server I think is minimal. If your clients are not sharing any data, then an individual DB may well be a good plan.

Related

postgresql multitenant schemas vs databases

We've designed a multitenant system (lets say hundreds of tenants, not thousands). There is no shared data. The database is PostgreSQL. Is it better to create a separate database per tenant or schema?
What are the pros and cons? What is the impact on the filesystem, DB engine tables/views like locks, objects privileges, etc. - will they be much bigger in a multiple schema solution? Separate databases should be easier to backup/restore.
I know there are lots of similar questions, but most relate to cases with shared data, which is a major drawback to multiple databases, and we do not have such a requirement.
If you never need to use tables from multiple tenants in a single query, I'd go for separate databases.
The raw speed of queries is not really affected by this and neither is the impact on the filesystem or memory.
However the query optimizer tends to get slower with many schemas and many table (but we are talking hundreds of thousands here). Tab-completion is psql is also not as efficient in that case.
It's also a tad easier to use pg_dump/pg_restore with separated databases than with separated schemas.
But it's a blurry line and the actual answer can very well be based on personal opinion and preferences.

Need a solution to get rid of multiple database

In my company we have multiple database structure hosted in SQL Server.
for e.g., whenever a new customer sign up with us, we create a new DB in SQL Server to maintain their data.
Right now we already have 2000+ DBs in our database server. We expect more customers to sign up in near future, which might even cross 5000+ count.
Having DBs of 5000+ and increasing count of DBs might not be an advisable one, sometimes we run some task which will run across the DBs, and if we are going to run tasks across 5000+ DBs we will surely end up in performance issues.
What would be the alternative solution to avoid creating multiple DBs for each and every customer and also at the same time maintaining their data separately?
I am hearing about BigData and other DataBase solutions but could not get clear picture.
Can someone share some light on this?
If the databases have an identical schema you could combine them into one. That way each customer's table will now become a set of rows in the new database. A new customer will probably be a few new rows in the tables that store customer's profile.
You can use row level security for restricting access to customer's data:-
https://msdn.microsoft.com/en-us/library/dn765131.aspxpx
For pros and cons of using this approach over your existing see: Pros/Cons Using multiple databases vs using single database and Single or multiple databases
Using other options provide great learning opportunity but may have a significant transition cost even if there were some that were indeed better.
one solution I would suggest is to use prefix on the table name for each customer. you can then solve the security issue by limit per customer per set of tables.
the con is you will have to rewrite your application to use prefix to each table whenever it want to access it. If you have a lot of tables , that will be a problem.
I think this is how some multi Wordpress hosting site handle it database issue.
you should consider if you just store the data and access it with simple querys or if you usually do complex query's, if you just store the data and access it with simple querys and your need are not 100% relational maybe you should consider to move part of your data to HDFS file system:
https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS .
To process the data in hadoop there are many tools but the raising one for sure is spark:
https://en.wikipedia.org/wiki/Apache_Spark
probably the best solution is to start move your historic data in HDFS just for storage and keep the rest as it is until you take confidence with the hadoop and spark paradigm
hadoop is a distributed , fault tollerant file system and spark is an engine for batch processing huge amount of unstructured or structured data, consider that data in hadoop are not structure usually so you have to change the way you process your data, if you want to still use sql I suggest to check Impala and Hive as well:
http://impala.io/
https://hive.apache.org/
Take a look at cloudera web site for a more structure IT solution instead of a lot of single tool that you will need to organize
http://www.cloudera.com/content/www/en-us/solutions.html
They have a quick start VM to try all the hadoop ecosystem tools , probably thats the best way to start experimenting:
http://www.cloudera.com/content/www/en-us/downloads/quickstart_vms/5-4.html

One User database servicing multiple application databases

I am administering a rather large database that has grown in complexity and design from a single application database. Now there is a plan to add a fifth application that carries with it its own schema and specific data. I have been researching SSO solutions but that is not really what I am after. My goal is to have one point of customer registration, logins and authorization.
Ideally, each application would request authentication and be given authorization to multiple applications, where the applications would then connect to the appropriate database for operations. I do not have first hand experience dealing with this degree of separation as the one database has been churning flawlessly for years. Any best practice papers would be appreciated :)
I would envision a core database that maintained shared data - Customer/Company/Products
Core tables and primary Keys –To maintain referential integrity should I have a smaller replicated table in each “application” database. What are some ways to share keys among various databases and ensure referential integrity?
Replication – Two subscribers currently pull data from the production database where data is later batched into a DW solution for reporting. Am I going down a road that can lead to frustration?
Data integrity – How can I ensure for example that:
DATABASE_X.PREFERENCES.USER_ID =always references a= CORE_DATABASE.USERS.USER_ID
Reporting – What type of hurdles would I cross to replicate/transform data from multiple databases into one reporting database?
White Papers - Can anyone find good refernces to this strategy in practice?
Thanks
A few urls for you. Scale out implementations can vary wildly to suit requirements but hopefully these can help you.
http://blogs.msdn.com/b/sqlcat/archive/2008/06/12/sql-server-scale-out.aspx
this one is 2005 centric but is VERY good
http://msdn.microsoft.com/en-us/library/aa479364.aspx#scaloutsql_topic4
this one a good solution for reporting...
http://msdn.microsoft.com/en-us/library/ms345584.aspx
given you an analysis services one too :)
http://sqlcat.com/whitepapers/archive/2010/06/08/scale-out-querying-for-analysis-services-with-read-only-databases.aspx
I created something like this a few years ago using views and stored procedures to bring in the data from the Master database into the subordinate databases. This would allow you to fairly easily join those master tables into the other subordinate tables.
Have you looked into using RAC? You can have multiple physical databases but only one logical database. This would solve all of your integrity issues. And you can set aside nodes just for reporting.
Don't throw out the idea of having separate applications and linking the logon/off functions via webservice (esque) requests. I have seen billing/user registration systems separated in this way. Though at extremely large scales, this might not be a good idea.

We failed trying database per custom installation. Plan to recover?

There is a web application which is in production mode for 3 years or so by now. Historically, because of different reasons there was made a decision to use database-per customer installation.
Now we came across the fact that now deployments are very slow.
Should we ever consider moving all the databases back to single one to reduce environment complexity? Or is it too risky idea?
The problem I see now is that it's very hard to merge these databases with saving referential integrity(primary keys of different database' tables can not be obviously differentiated).
Databases are not that much big, so we don't have much benefits of reduced load by having multiple databases.
Your question is quite broad.
a) Ensure that merged databases don't suffer from degraded performance with things like JOIN statements when, say, 1000 databases are merged even though each is small. As for your referential integrity ... which I assume is auto_increment based ... you can replace these relationships by altering the schema and supplanting UUID or a similar unique, non-sequential value. Or even a surrogate key pair in addition to your auto increment PK.
b) Do benchmarking to ensure your application would respond within performance limits
c) Is there a direct ROI for doing this? What are the long term cost benefits vs the expense of migration? Is the decreased complexity worth increased (if any) cost?
d) How does this impact your backup and disaster recovery plans? Does it make them cheaper? Slower? More expensive?
Abstraction and management tools approach:
if it were me, depending on the situation, I would keep the scalability that comes with per-client sharding and create a set of management tools to abstractly create one virtual database. Using these tools you can acquire the simplified management without loosing technical flexibility. I suspect you want to simplify the cost of managing all these databases (based on your deployment statement). Creating a 'control panel' for your farm can be a good way to simplify a complex system (especially when deployments may use different schema versions).
For the migrated data... customer one database UUIDs can start with 10000000, Customer two database UUIDs can start with 20000000. Customer three database UUIDs can start with 30000000.....
In my opinion when you host the database for your customers, a single database that handles multiple customers is a better idea overall. Of course you need to add a "customers" table to record the customers, and a "customer_id" column on all top-level data that is within the table, and include checks in all your SQL to ensure the customer's view is limited to their own data.
I'd set up a new database with the additional columns, and then test it with a dummy customer or three for a while to ensure all bugs are wiped out. Then I'd migrate all the customers across, one by one, doing checks that the data will fit.

User Table in Separate DB

Note: I have no intention of implementing this, it's more of a thought experiment.
Suppose I had multiple services available through a web interface. At least two of which required user registration and some data in a database. A single registration would grant access to all services. Like Google (GMail, Google Docs, etc.).
Would all of these services, which are related to registered users, be located within a single database, perhaps with table-prefixes for what service they were for?
Or would each service have it's own database? The only plus I can see to doing this is that it would make table names cleaner. Any time any user interaction would be needed, interacting with at least two different databases would be needed, which would needlessly complicate sql queries.
Would this suggest that the 'big boys' use only a single database, and load it with tons of different (and perhaps completely unrelated) tables?
If you use the right DBMS, you can have the best of both strategies. In PostgreSQL, within a 'database' you can have separate schemas. The authentication service would access a single schema and provide the other services a key which is used as a reference for data in the other schemas. You can still deal with the entire database as a single entity i.e:
query across schemas without using dblink
store personally identifiable information separately (schemas can have separate per-user permissions to further protect data)
DBMS managed foreign key constrains (I believe)
consistent (re the data) backup and restore
You get these advantages at the cost of a more complex DAL (may not be supported by your favorite DAL framework) and less portability between DBMS's.
I do not think it is a good idea to make multiple services dependent on a single database. If you need to restore some service from a backup, you'll have to restore all.
You are overloading a database server probably too.
I would do that only if it is likely they will share much data at future point.
Also you might consider smaller database with only the shared user data.
I would consider having 1 user / role repository with a separate database for services.
I've never done this, but I think it would depend on performance. If there's almost no overhead to do separate databases, that might be the answer. Doing separate DBs may also make it easy to split DBs across machines.
Complexity is also an issue. Hopefully your schema would be defined in such a way that you wouldn't need to dip into several different databases for different queries.
There's always a problem with potentially overloading databases and access thereof; replication is one potential good solution.
There are several strategies.
When you move to multiple databases (or multiple servers), things get more complex. Your core user information could be in a single database. The individual services could be in other databases. The problem with that is that the database is the outer unit of referential integrity, so you cannot design in foreign keys across databases. One way around this is to distribute changes to the core master tables (additions and updates only, obviously, since deletions would be forbidden due to a foreign-key constraint) to separate databases on a regular basis, and then enforce RI against these copies of the core master database tables within the service databases. This also means that the service databases and their services can run while the other databases are down for maintenance. Obviously this is an increased architectural complexity for an improvement to your service windows and reduced coupling.
I would recommend starting with a single database. If your RDBMS supports it, I would organize components according to SCHEMAs which would allow you to at least maintain a logical separation by design. You can more easily refactor later.
Many databases have tables which can be considered unrelated. Sometimes in a system you have multiple entity networks that hardly connect (sometimes not at all). You can use SCHEMAs in these cases too.

Resources