Multiple Databases Vs Single Database with logically partitioned data [closed] - sql-server

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am pondering over a database design issue. Any help would be highly appreciated.
We are designing an application which has 20 tables (which may grow to about 30 maximum during new feature development)
The technology stack
MVC4,.NET 4.X, Entity Framework 5, SQL Server 2012, ASP.NET membership framework
No of users
We intend to cater to about 1000 clients who would have on average 20 users.
The Question
Should we design the database and the application in such a way that the tables are logically partitioned, i.e all clients use the same tables with a partition guid to separate the data.
OR
Go for multiple databases which could prove to be difficult during new feature launch and bug fixing. BUT could potentially allow for scaling?
Caveats: one of the tables has a binary column which stores files (maximum 5MB per record)
In addition to this we need to consider the Membership framework tables, which we will be extending to another custom table and logically mapping users to a partition guid.

You'll wish you had used separate databases:
If you ever want to grant permissions to the databases themselves to clients or superusers.
If you ever want to restore just one client's database without affecting the data of the others.
If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
If you ever want to easily archive and decommission old customer data.
If your customers care about their data being siloed, and they find out that you did otherwise.
If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = #CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
When you want to have different levels of backup and recovery protection for different clients.
Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = #CustomerID won't cut it now.
When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
When your database backup takes 5 hours to run and then fails.
When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.
You'll be glad you used separate databases:
When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.
You'll wish you had used a single database:
When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
When you didn't think how hard it would be to merge two different databases of data.
When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
When you find out you have to set up replication for 1000 databases.
Just because I listed more reasons for one doesn't mean it is better.
Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition

If you are refering your architecural as "multi tenant", Microsoft has a good article which is worth to read here. It shows some comparison between "isolated" (multiple db) and "shared" (single db). Generally, shared wins when the # of tenant (client) is big, but when the size of each tenant is big, an isolated approach is recommended.
Those consideration however can only be calculated by experienced developers though.
Still if you managed to use isolated (multiple db) architecture, you still won't get direct benefit in performance when they are still run at same instance. And if you use shared (single db) architecture, consider using int instead of guid, or sequential guid if you still need to use it.

Related

Does the location of a stored procedure affect its performance?

I have a large application which contains "modules" such as Finance, HR, Sales, Customer Service.
To make the application manageable and to distribute the load, I have decided to give each module its own database on a single server. There is also going to be a Master database for holding master information such as information about users, some global lookup tables, and some security stuff.
I am now trying to decide whether to place module-specific stored procedure in their corresponding database, or whether to keep them all in the Master database. For example there is a stored procedure named dbo.sales_customer_orders that selects data from only the Sales database tables. And of course this SP is going to be executed a lot of times by users. Therefore should it be in the Sales database or will it be okay to keep it in the Master database in terms of performance/scalability/reliability/security.
Does it matter that a stored procedure resides in a different database to the one its selecting from?
In my experience you would not experience an immediate performance penalty by sharding the data across multiple databases and this is actually a common practice in large n-tier applications. You would obviously experience some minor penalty upon moving the databases to different servers.
You could see this blog post as well as several others on the site which talk about the correct way to shard data as well as the importance of using multiple connection string's for reads and writes to facilitate scaling and possibly caching layer later on.
How do you actually plan to develop all these databases? If you want to use SSDT, you will be drowned in all those cross-database dependencies. Besides, your procedure in question being resided in the head database makes no sense if, for example, some particular customer decided not to buy the Sales module (and there is no Sales database anywhere around). In this case, calling it will lead to some very unpleasant and unexpected consequences, such as batch being aborted and (possibly) transaction left open.
Keep similar things together; otherwise, there will be no modularity in your approach.
Performance-wise, usually there is no difference for cross-database calls within the same SQL Server instance. If your shards are located on different instances, however, the result might be anywhere between "slightly noticeable" to "detrimental" - it depends on many factors, and not all of them can be mitigated by a DBA.

SQL Server compare design method

I have a project that have a big database, i want to know that which design is better, one database with 5 table or 5 database with 1 table?
My database is run on the server, if i want to extend my db to more than one server , is it change in answer that question?
As a general prescription a database isn't going to perform better just because it only has one table. Further, the concept of sharding and it's performance enhancements (or not) are way too far reaching for this forum.
So, to make your life easier, make one database with five tables, and optimize those tables properly. Build indexes where they are necessary based on how you query the database. Build covered indexes where possible. And don't over index if the application is write intensive.
Remember, optimizing a database is much more involved than a general design pattern and cannot be done well by an outside source without an excessive amount of information.

Reliable alternative to replication for continous data sync between two databases

I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.
Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.
I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.
If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...

Use one large database or use single databases per customer

Currently I'm working on a on-line webapplication for construction materials. Companies can log in on our website and then they can use the webapp.
From the beginnen the idea was to create a database per customer. But now it's becomming larger and larger (100+) so we have now 100 databases to manage.
We have to run approx. twice a year an update script for db maintanance.
The advantage that I see, is that when a customer wants to quit, we delete their database and than it's finished.
When I want to add new customer, I have to fill the database with approx. 1.000.000 unique records for that specific customer, because every customer have different prices /materials.
For backups I use a MySQL Dump script, that creates a *.sql file per database that I download every day.
What is your opnion and what do you think?
One large db or per customer a database?
I'm using MySQL with ASP.NET/C#...
I don't want to make a suggestion because there are far too many variables.
I do want to note, however, that my employer has 1000s of deployed databases -- we use one database per customer with replication (2+ databases).
So, the idea is workable. My job isn't related to DB management but I do recall that we do a lot in the way of automation and online tools. Backups and DB management is handled by a team.
Ultimately, you can make the 100+ deployments work but you are going to want to start investing in the development of utility and tools to help automate the backup and/or management of the DBs.
Ideally, nothing (DB Management) should be done by hand. Furthermore, the connection strings should be abstracted away from a given web app deployment.
But now it's becomming larger and larger (100+) so we have now 100 databases to manage
I think you have your answer right there.
Have to agree with #Hogan - the overhead of managing that many databases is probably far from ideal - especially if you ever need to make schema changes, etc. in the future.
That said, if you use a single database are you ever likely to need to separate out a given customer's data into a standalone database/site? If this is likely, how long would it take to carry out this separation?
In essence, if it's likely to take less effort to write a set of tools to handle the above case, then I'd be tempted to go for the single database approach. However, you'll also need to factor in the likely timescales for creating a unified version of the database schemas that handle datasets for each customer, etc.
Also, are the schemas precisely the same for all of the existing 100+ databases? If not, there's potentially a world of pain if you decide to migrate the existing data into a single database.
Update - Incidentally, all of the above is a bit generalised, but it's hard to be specific without knowing more about the amount of data, and traffic, etc. in use. (e.g.: If you ever had a high demand site for a customer it would be trivial to put it onto its own DB server if you were using a per-customer database.)
i agree with #Hogan and #middaparke... if the schemas are the same, you shuol dput it in one instance.
unfortuantely it is impossible to tell from here if your schemas would benefit from reusing most of those million rows or not, if normalized well, the ncertinly it would be beneficial.
it is also impossible to tell how difficult any changes to the applications would be based on this change.
unfortunately, it sounds like you have a large customer base with working applications, and therefore momentum to keep going in that direction - which thros you into the realm of sucking it up and dealing with it by automating the management of so many db's... not the way you would do it from scratch - but maybe cheapest since you are where you are.

why are multiple DBs actually needed?

I was looking at godaddy.com which says they offer up to 10 MySQL DBs, but I don't know why you would need more than 1 ever since a DB can have mutliple tables. Can't multiple DBs be integrated into a single DB? Is there an example case where its better or unfeasible to not have multiple ones? And how do you differentiate between them when you want to call them, from their directory or from a name?
Best,
I guess separation of concerns would be the most obvious answer. In the same way you can have all of your functionality in one humongous class in object oriented programming, it's a good idea to keep non-related information separate. It's easier to wrap your head around smaller chunks of data, and future developers mights start to think tables are related, and aggregate data in a way they were never meant to.
Imagine that you're doing two different projects with two different teams. Maybe you won't one team to access the other team tables.
There can also be a space limit in each database, and It each one can be configured with specific params to optimize the performance.
In other hand, two final users can be assigned to make the backups of each entire database, and you wan`t one user to make the backup of the other DB because he could be able to restore the database in other place and access the first database data.
I'm sure there are some pretty good DBAs on the forum who can answer this in detail.
Storing tables in different databases makes because you are able to backup them up individually. Furthermore, you will be able to control access to each database under different NT groups (e.g. Admin vs. users). Although this can be done at the indvidual table level, sometimes it makes sense to grant or deny access to an entire database to a particular group.
When you need to call them in SQL Server you need to append the database name to the query like this SELECT * FROM [MyDatabase].[dbo].[MyTable].
One other reason to use separate databases relates to whether you need full transactional recovery or not. For instance, if I havea bunch of tables that are populated on a schedule through import processes and never by the users, putting them in a separate database allows me to set the recovery mode to simple which reduces the logging (a good thing when you are loading millions of records at once). I can also not do transactional log backup every fifteen minutes like I do for the data in the database with the user inserted data. It could also make recovery a faster process when needed as the databases would be smaller and thus individally take less time to recover. Won't help much when the whole server crashes but it could help a lot if onely one datbase gets corrupted for some reason. If the data relates to different applications, it simplifies the security as well to have the data in separte databases. And of course sometimes we have commercial databases and we can;t add tables to those and so may need a separate database to handles some things we want to add to that data (we do this for instance with our Project Management software, we have a spearate database where we extract and summarize data from the PM system for reporting and then write all our custome reports off that.)

Resources