SQL Server compare design method - sql-server

I have a project that have a big database, i want to know that which design is better, one database with 5 table or 5 database with 1 table?
My database is run on the server, if i want to extend my db to more than one server , is it change in answer that question?

As a general prescription a database isn't going to perform better just because it only has one table. Further, the concept of sharding and it's performance enhancements (or not) are way too far reaching for this forum.
So, to make your life easier, make one database with five tables, and optimize those tables properly. Build indexes where they are necessary based on how you query the database. Build covered indexes where possible. And don't over index if the application is write intensive.
Remember, optimizing a database is much more involved than a general design pattern and cannot be done well by an outside source without an excessive amount of information.

Related

Multiple Databases Vs Single Database with logically partitioned data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am pondering over a database design issue. Any help would be highly appreciated.
We are designing an application which has 20 tables (which may grow to about 30 maximum during new feature development)
The technology stack
MVC4,.NET 4.X, Entity Framework 5, SQL Server 2012, ASP.NET membership framework
No of users
We intend to cater to about 1000 clients who would have on average 20 users.
The Question
Should we design the database and the application in such a way that the tables are logically partitioned, i.e all clients use the same tables with a partition guid to separate the data.
OR
Go for multiple databases which could prove to be difficult during new feature launch and bug fixing. BUT could potentially allow for scaling?
Caveats: one of the tables has a binary column which stores files (maximum 5MB per record)
In addition to this we need to consider the Membership framework tables, which we will be extending to another custom table and logically mapping users to a partition guid.
You'll wish you had used separate databases:
If you ever want to grant permissions to the databases themselves to clients or superusers.
If you ever want to restore just one client's database without affecting the data of the others.
If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
If you ever want to easily archive and decommission old customer data.
If your customers care about their data being siloed, and they find out that you did otherwise.
If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = #CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
When you want to have different levels of backup and recovery protection for different clients.
Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = #CustomerID won't cut it now.
When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
When your database backup takes 5 hours to run and then fails.
When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.
You'll be glad you used separate databases:
When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.
You'll wish you had used a single database:
When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
When you didn't think how hard it would be to merge two different databases of data.
When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
When you find out you have to set up replication for 1000 databases.
Just because I listed more reasons for one doesn't mean it is better.
Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition
If you are refering your architecural as "multi tenant", Microsoft has a good article which is worth to read here. It shows some comparison between "isolated" (multiple db) and "shared" (single db). Generally, shared wins when the # of tenant (client) is big, but when the size of each tenant is big, an isolated approach is recommended.
Those consideration however can only be calculated by experienced developers though.
Still if you managed to use isolated (multiple db) architecture, you still won't get direct benefit in performance when they are still run at same instance. And if you use shared (single db) architecture, consider using int instead of guid, or sequential guid if you still need to use it.

SQL Server Database Optimization Strategy

I am starting a new project using SQL Server for a medical office. Their current database (SQL Server 2008) have over 500,000 rows that span across 15+ tables. Currently they are complaining that their data entry application is very slow to generate reports and insert new data.
For my new system I was thinking of developing a two tiered database approach where the primary used SQL Server 2012 will only contain 3 months worth of rows and the second SQL Server 2012 would maintain all the data for the system. This way when users insert new data it will be entered into a much smaller system and when they query recent data the query should execute much faster. This system will also have reporting, but I think the reports will have to be generated from the larger data set.
My questions are as follows
Will a solution like this improve the overall performance of the database
Are there any scalability concerns with this solution?
What is the best way to transfer that data between the two servers each night?
If my solution makes no sense please feel free to offer any other solutions.
Don't do this. Splitting your app into multiple databases will be a management nightmare. Plus, 500k records isn't that many, assuming that the records are of reasonable size.
Instead, go after the low-hanging fruit. Turn on logging and look at the access patterns. Which queries are slow? Figure out why. Do they lack indexes? Can the queries be simplified? Debug the problem.
Keep in mind that sometimes throwing hardware at the problem is the right solution. If you can solve the problem with an $800 server, do it. That's a lot cheaper than your time.
To chime in: 500K records is not so big. You ought to be able to make the db work very speedily as is with some tuning.

We failed trying database per custom installation. Plan to recover?

There is a web application which is in production mode for 3 years or so by now. Historically, because of different reasons there was made a decision to use database-per customer installation.
Now we came across the fact that now deployments are very slow.
Should we ever consider moving all the databases back to single one to reduce environment complexity? Or is it too risky idea?
The problem I see now is that it's very hard to merge these databases with saving referential integrity(primary keys of different database' tables can not be obviously differentiated).
Databases are not that much big, so we don't have much benefits of reduced load by having multiple databases.
Your question is quite broad.
a) Ensure that merged databases don't suffer from degraded performance with things like JOIN statements when, say, 1000 databases are merged even though each is small. As for your referential integrity ... which I assume is auto_increment based ... you can replace these relationships by altering the schema and supplanting UUID or a similar unique, non-sequential value. Or even a surrogate key pair in addition to your auto increment PK.
b) Do benchmarking to ensure your application would respond within performance limits
c) Is there a direct ROI for doing this? What are the long term cost benefits vs the expense of migration? Is the decreased complexity worth increased (if any) cost?
d) How does this impact your backup and disaster recovery plans? Does it make them cheaper? Slower? More expensive?
Abstraction and management tools approach:
if it were me, depending on the situation, I would keep the scalability that comes with per-client sharding and create a set of management tools to abstractly create one virtual database. Using these tools you can acquire the simplified management without loosing technical flexibility. I suspect you want to simplify the cost of managing all these databases (based on your deployment statement). Creating a 'control panel' for your farm can be a good way to simplify a complex system (especially when deployments may use different schema versions).
For the migrated data... customer one database UUIDs can start with 10000000, Customer two database UUIDs can start with 20000000. Customer three database UUIDs can start with 30000000.....
In my opinion when you host the database for your customers, a single database that handles multiple customers is a better idea overall. Of course you need to add a "customers" table to record the customers, and a "customer_id" column on all top-level data that is within the table, and include checks in all your SQL to ensure the customer's view is limited to their own data.
I'd set up a new database with the additional columns, and then test it with a dummy customer or three for a while to ensure all bugs are wiped out. Then I'd migrate all the customers across, one by one, doing checks that the data will fit.

What should I keep in mind if I wish to merge many DBs into one DB?

I am working with a half dozen DBs. The DBs all have the same schemas, the same SPs, etc. Speaking to the person who originally designed the DBs, a big part of the motivation for using many DBs was efficiency; the alternative would be to add a column to pretty much every table and sp in the database indicating which set of data was being worked in, resulting in one giant (and thus slower) DB instead of several samll DBs. In place of having a column to indicate which set of data is being queried, the connection string is used to select which database is being hit.
The only reason I really dislike this organization is that it involves a lot of code duplication and thus hurts maintenance. For example, every time I wish to change a stored procedure, I need to run the alter statement on every database.
One solution I have considered is to combine all of the data into one big database, adding an extra column all over the place to indicate which database the data would be in if I had not combined it. Then, I could partition all of the tables by this column's value. In theory, the result of all of this is that the underlying representation of all of the data itself will be morally the same as it is now, but without the redundancies in the indexes, schemas, SPs, etc.
My questions are this:
Is this a good idea? Is there a better way to accomplish this?
Are there any gotchas in doing this?
Will this have any impact on performance?
Everyone will deal with this at some point. My own personal opinion is that multiple databases are a pain in the backside and are not faster. They are a pain because of the maintenance headaches. Adding an extra column in each table as necessary will not slow your process done that much, if indexing is set properly. And your maintenance will be much easier. Plus, doing transactions across multiple DB's can be a hassle and involve MTC.
BTW, using a single database is often called a multi-tenant database. You might want to research this a bit. But I would avoid multiple DB's like this if possible.
I'm of a different mind than Randy.
The multi-tenant model has its advantages.
For one, maintenance is not really much different whether you have 5 databases or 500. At some point you stop looking at maintenance of individual databases and look at the set. Yes you must serialize backups and you can't be performing index reorg/rebuild across all databases at once.
But for code changes across multiple more-or-less identical databases, there are easy ways to script a lot of things to be done to multiple databases without really lifting an extra finger. I use a tool called SQLFarms Combine (now sold by JNetDirect), but there are other offerings such as RedGate MultiScript that I haven't played with.
What I like most about the multi-tenant model is that when you grow and scale and suddenly need a new database server, it is very easy to move one of the tenants (say, the busiest or fastest growing) to the new server. If everybody is jammed into the same database, this extraction of only their data becomes quite difficult, especially if there is to be minimized downtime. In the multi-tenant model, you can set up mirroring for just their database, and then switch the primary when you're ready.
I'd be in favor of combining these databases. There are other facilities built into SQL Server to account for the potential performance downfalls of a very large database, like additional indexing on a second physical disk, partitioning, clustering, etc. The headache and overhead involved in deploying schema updates to that many different databases can be time consuming when it's easily handled in a single database. I think SQL Server scales really well in cases like this - let the database server do what it's designed to do and provide responsive access to your data. You can focus on application design and leave the storage model to SQL Server.
Also, though this isn't mentioned above, I'd suspect that there's some level of dynamic SQL involved in the applications that use this "many database" model because you've got to switch between databases based on something you know, so it can't be hard coded into the application or in a configuration file, meaning that either connection strings or actual SQL statements have to be generated on the fly, and that can be a really big security risk (read about "SQL Injection" if you're unfamiliar with the potential risks of dynamic SQL).

What's the best way to manage a large number of tables in MS SQL Server?

This question is related to another:
Will having multiple filegroups help speed up my database?
The software we're developing is an analytical tool that uses MS SQL Server 2005 to store relational data. Initial analysis can be slow (since we're processing millions or billions of rows of data), but there are performance requirements on recalling previous analyses quickly, so we "save" results of each analysis.
Our current approach is to save analysis results in a series of "run-specific" tables, and the analysis is complex enough that we might end up with as many as 100 tables per analysis. Usually these tables use up a couple hundred MB per analysis (which is small compared to our hundreds of GB, or sometimes multiple TB, of source data). But overall, disk space is not a problem for us. Each set of tables is specific to one analysis, and in many cases this provides us enormous performance improvements over referring back to the source data.
The approach starts to break down once we accumulate enough saved analysis results -- before we added more robust archive/cleanup capability, our testing database climbed to several million tables. But it's not a stretch for us to have more than 100,000 tables, even in production. Microsoft places a pretty enormous theoretical limit on the size of sysobjects (~2 billion), but once our database grows beyond 100,000 or so, simple queries like CREATE TABLE and DROP TABLE can slow down dramatically.
We have some room to debate our approach, but I think that might be tough to do without more context, so instead I want to ask the question more generally: if we're forced to create so many tables, what's the best approach for managing them? Multiple filegroups? Multiple schemas/owners? Multiple databases?
Another note: I'm not thrilled about the idea of "simply throwing hardware at the problem" (i.e. adding RAM, CPU power, disk speed). But we won't rule it out either, especially if (for example) someone can tell us definitively what effect adding RAM or using multiple filegroups will have on managing a large system catalog.
Without first seeing the entire system, my first recommendation would be to save the historical runs in combined tables with a RunID as part of the key - a dimensional model may also be relevant here. This table can be partitioned for improvement, which will also allow you to spread the table into other filegroups.
Another possibility it to put each run in its own database and then detach them, only attaching them as needed (and in read-only form)
CREATE TABLE and DROP TABLE are probably performing poorly because the master or model databases are not optimized for this kind of behavior.
I also recommend talking to Microsoft about your choice of database design.
Are the tables all different structures? If they are the same structure you might get away with a single partitioned table.
If they are different structures, but just subsets of the same set of dimension columns, you could still store them in partitions in the same table with nulls in the non-applicable columns.
If this is analytic (derivative pricing computations perhaps?) you could dump the results of a computation run to flat files and reuse your computations by loading from the flat files.
This seems to be a very interesting problem/application that you are working with. I would love to work on something like this. :)
You have a very large problem surface area, and that makes it hard to start helping. There are several solution parameters that are not evident in your post. For example, how long do you plan to keep the run analysis tables? There's a LOT other questions that need to be asked.
You are going to need a combination of serious data warehousing, and data/table partitioning. Depending on how much data you want to keep and archive you may need to start de-normalizing and flattening the tables.
This would be pretty good case where contacting Microsoft directly can be mutually beneficial. Microsoft gets a good case to show other customers, and you get help directly from the vendor.
We ended up splitting our database into multiple databases. So the main database contains a "databases" table that refers to one or more "run" databases, each of which contains distinct sets of analysis results. Then the main "run" table contains a database ID, and the code that retrieves a saved result includes the relevant database prefix on all queries.
This approach allows the system catalog of each database to be more reasonable, it provides better separation between the core/permanent tables and the dynamic/run tables, and it also makes backups and archiving more manageable. It also allows us to split our data across multiple physical disks, although using multiple filegroups would have done that too. Overall, it's working well for us now given our current requirements, and based on expected growth we think it will scale well for us too.
We've also noticed that SQL 2008 tends to handle large system catalogs better than SQL 2000 and SQL 2005 did. (We hadn't upgraded to 2008 when I posted this question.)

Resources