SQL Server database with MASSIVE amount of tables - sql-server

I've been asked to troubleshoot performance problems in a SQL Server 2005 database.
The challenge is not a huge amount of data, but the huge number of tables. There are more than 30,000 tables in a single database. The total data size is about 650 GB.
I don't have any control over the application that creates all those tables. The application uses roughly 2,500 tables per "division" on a larger company with 10-15 divisions.
How do you even start to check for performance problems? All the articles you find on VLDB (Very Large DB) are about the amount of data, not the amount of tables.
Any ideas? Pointers? Hints?

Start like any other kind of performance tuning. Among other things, you should not assume that the large number of tables constitutes a performance problem. It may be a red herring.
Instead, ask the users "what's slow"? Even if you measured the performance (using the Profiler, perhaps), your numbers might not match the perceived performance problem.

As others have noted, the number of tables is probably indicative of a bad design, but it is far from a slam dunk that it is the source of the performance problems.
The best advice I can give you for any performance optimization is to stop guessing about the source of the problem and go look for it. Above all else, don't start optimizing until you have positively identified the source of the problem.
I'd start by running some traces on the database and identify the poor performing queries. This would also tell you which tables are getting used the most by the application. In all likelihood a large number of those tables are probably either: A) leftover temp tables; B) no longer used; or C) working tables someone didn't clean up.

Putting the poor DB design aside, if no users are reporting slow response times then you don't currently have a performance problem.
If you do have a performance problem:
1) Check for fragmentation (dbcc showcontig)
2) Check the hardware specs, RAID/drive/file placement. Check the SQL server error logs.
If hardware seems underspecified or poorly designed, run Performance counters (see PAL
tool)
3) Gather trace data during a normal query work load and identify expensive queries (see this SO answer: How Can I Log and Find the Most Expensive Queries?)

Is the software creating all these tables? If so, maybe the same errors are being repeated over and over. Do all the tables have a primary key? Do they all have a clustered index? Are all the necessary non-clustered indexes present (those columns that are used for filtering and joins) etc etc etc.
Is upgrading the SQL Server 2008 an option? If so, you could take advantage of the new Policy Based Management feature to enforce best practice for this large amount of tables.
To start tuning now, I would use profiler to find those statements with the longest duration, then see what you can do to improve them (add indexes is usually the simplest way).

Related

Tips for improving performance of DB that is above size 40 GB (Sql Server 2005) and growing monthly by around 3GB

The current DB or our project has crossed over 40 GB this month and on an average it is growing monthly by around 3 GB. Now all the tables are best normalized and proper indexing has been used. But still as the size is growing it is taking more time to fire even basic queries like 'select count(1) from table'. So can u share some more points that will help in this front. Database is Sql Server 2005. Further if we implement Partitioning wouldn't it create a overhead ?
Thanks in advance.
make sure you have suitable/appropriate indexes
make sure you have a good index maintenance strategy (e.g. rebuild/defrag/keep statistics up to date to ensure indexes stay performing well)
identify poorly performing queries and optimise them (may have been written/tested against small data volumes when performance issues would not have shown up)
consider partitioning your data (e.g. SQL 2005 and onwards has built in support for partitioning if you have Enterprise Edition). Edit: to elaborate on SQL Server partitioning, I full recommend a read through this MSDN article on the whys and the hows. On a general note, there was also a good talk at QCon 2008 by Randy Shoup (eBay architect) on scalability, of which one of the key points on scaling a system in general is to partition. It's summarised here.
is your db server hardware sufficient? could it benefit from more memory?
Edit: looking at your comment with your hardware info, I think you could do with (at least) throwing more RAM in it
you may benefit from some denormalisation. Difficult to be specific without knowing exact db structure, but denormalising may improve certain queries at the expense of data duplication/disk space
A 40 GB database is by no means considered a big database these days. And a 3 GB growth per month is also nothing unusual.
However, in the areas you really have to be careful about some small things that you might get away with in smaller databases. Since you write about issuing a "SELECT COUNT(1) ..." query, you might want to think about the need for such queries. Sounds like this is a "displaying number of rows in the table" type of feature. Do you really need these kind of what you call "basic queries" or can you do without? Considering especially this query: do you need the result to be accurate or could it also be a "good estimate"? If so, you might want to throw in a WITH (NOLOCK) hint here and there, where accuracy is not mandatory. However, use NOLOCK wisely as it will return wrong data at a incredible speed. :-)
Plenty of good suggestions have been mentioned by AdaTheDev, just let add me one point:
Nothing gives you better performance than a sound and solid schema. And, who knows, what may has been considered appropriate at the time when you designed the schema, may need to be revised now after being in production for some time. This is especially true for indices.
Your machine is quite low spec, however as you've not even mentioned what disk you're using, that is most likely the problem. You will need very fast disk to support a 40GB database with 4GB of RAM, multiple striped drives would be a bare minimum.
I would start by using Performance Monitor and SQL Server Profiler to find out which is the most critical performance limits on your system. After that you probably have a good idea where to start.
Here is one place to start:
Troubleshooting Performance Problems in SQL Server 2005
I think in some points you need partitioning. It is fundamental approach to scale the database.
Also, you might want to go bach to basics and rethink about the reason the DB is growing and its structure. 3GB a month extra (ans probably increasing if you are succesful) is going to get you into trouble sooner or later ;-)

What can cause bad SQL server performance?

Every time I find out that the performance of data retrieval from my database is slow. I try to figure out which part of my SQL query has the problem and I try to optimize it and also add some indexes to the table. But this does not always solve the problem.
My question is :
Are there any other tricks to make SQL server performance better?
What are the other reason which can make SQL server performance worse?
Inefficient query design
Auto-growing files
Too many indexes to be maintained on a table
Too few indexes on a table
Not properly choosing your clustered index
Index fragmentation due to poor maintenance
Heap fragmentation due to no clustered index
Too high FILLFACTORs used on indexes, causing excessive page splitting
Too low of a FILLFACTOR used on indexes, causing excessive space usage and increased scanning time
Not using covered indexes where appropriate
Non-selective indexes being used
Improper maintenance of statistics (out of date statistics)
Databases not normalized properly
Transaction logs and data sharing the same drive spindles
The wrong memory configuration
Too little memory
Too little CPU
Slow hard drives
Failing hard drives or other hardware
A 3D screensaver on your database server chewing up your CPU
Sharing the database server with other processes which compete for CPU and memory
Lock contention between queries
Queries which scan entire large tables
Front end code which searches data in an inefficent manner (nested loops, row by row)
CURSORS which are not necessary and/or are not FAST_FORWARD
Not setting NOCOUNT when you have large tables being cursored through.
Using a transaction isolation level which is too high (such as using SERIALIZABLE when it's not necessary)
Too many round trips between the client and the SQL Server (a chatty interface)
An unnecessary linked server query
A linked server query which targets a table on a remote server with no primary or candidate key defined
Selecting too much data
Excessive query recompilations
oh and there might be some others, too.
When I talk to new developers that have this problem I usually find that it is because of one of two problems. Both of them are fixed if you follow these 2 rules.
First, don’t retrieve any data that you don’t need. For example, if you are doing paging then don’t bring back 100 rows and then calculate which ones belong on the page. Have the stored proc figure it out and only retrieve the 10 you need.
Second, nothing is faster than work you don’t do. For example, I worked on a system where the full roles and rights for a user were retrieved with every page requested – this was 100’s of rows for some users. Even just saving this to session state on the first request and then using it from there for subsequent requests took a meaningful weight off of the database.
Suggest you get a good book on Performance tuning for the database you use (this is very much database specific). This is an extremely complex subject and cannot really be answered other than in generalities on the web.
For instance, Dave markle tell you inefficient queries can cause the problem and there are many many ways to write inefficient queries and many more ways to fix them.
If you're new to the database and you have access to the database engine tuning advisor, you can heuristically tune your database.
You basically capture the SQL queries being run against your DB in the SQL Profiler, then feed those to DETA. DETA effectively runs the queries (without altering your data) and then works out what information your database is missing (views, indexes, partitions, statistics etc.) to do the queries better.
It can then apply them for you and monitor them in the future. I'm not saying to assume that DETA is always right or to do things without understanding, but I've found that it's definately a good way to see what your queries are doing, how long they take, and how you can index the DB appropriately.
PS: With all that said, it's much better to invest in a good DBA at the start of a project so that you have good structures and indexing to start with. But thats not the position that you're in right now...
This is a very wide question. And there is a ton of answers already. Still I would like to add one important factor - Page Split. The problem is – there are good splits and bad splits. Following are good articles explaining how to use transaction_log extended event for identifying bad/nasty page splits
Tracking Problematic Pages Splits in SQL Server 2012 Extended Events - Jonathan Kehayias
Tracking page splits using the transaction log - Paul Randal
You mentioned:
I try to optimize it and also add some indexes
But, sometimes removing unused non-clustered indexes may help to improve performance as it help to reduce transaction logs. Read Top Reasons for Log Performance Problems
Wait statistics, or please tell me where it hurts gives an idea about using wait statistics for performance analysis.
To see some fresh ideas for performance, take a look at
Performance Considerations - sqlmag.com
Separate tables in joins to different disks (for parallel disk I/O - filegroups).
Avoid joins on columns with few unique values.
To understand JOIN, read Advanced JOIN Techniques

RDBMS data-relation burden

Our in-house system is built on SQL Server 2008 with a 40-table 6NF schema. Most of the tables FK to 3 others, a key few as many as 7. The system will ultimately support 100s of employees working with 10s of 1000s of customers and store 100s of 1000s of transactional records -- prime-time access should peak at 1000 rows per second.
Is there any reason to think that this depth of RDBMS inter-relation would overburden a system built using modern hardware with ample RAM? I'm attempting to evaluate whether we need to adjust our design or project direction/goals before we approach the final development phase (in a couple of months).
In SQl Server terms what you describe is a smallish database. With correct design SQL Server can handle terrabytes of data.
This is not to guarantee that your current design may perform well. There are many ways to construct poorly performing t-SQL and many bad database design choices.
If I were you I would load test data to twice the size you expect the tables to have and then start testing your code. Load testing might also be a good idea. It is far easier to fix database performance problems before they go to production. Far, far easier!

What's the best way to manage a large number of tables in MS SQL Server?

This question is related to another:
Will having multiple filegroups help speed up my database?
The software we're developing is an analytical tool that uses MS SQL Server 2005 to store relational data. Initial analysis can be slow (since we're processing millions or billions of rows of data), but there are performance requirements on recalling previous analyses quickly, so we "save" results of each analysis.
Our current approach is to save analysis results in a series of "run-specific" tables, and the analysis is complex enough that we might end up with as many as 100 tables per analysis. Usually these tables use up a couple hundred MB per analysis (which is small compared to our hundreds of GB, or sometimes multiple TB, of source data). But overall, disk space is not a problem for us. Each set of tables is specific to one analysis, and in many cases this provides us enormous performance improvements over referring back to the source data.
The approach starts to break down once we accumulate enough saved analysis results -- before we added more robust archive/cleanup capability, our testing database climbed to several million tables. But it's not a stretch for us to have more than 100,000 tables, even in production. Microsoft places a pretty enormous theoretical limit on the size of sysobjects (~2 billion), but once our database grows beyond 100,000 or so, simple queries like CREATE TABLE and DROP TABLE can slow down dramatically.
We have some room to debate our approach, but I think that might be tough to do without more context, so instead I want to ask the question more generally: if we're forced to create so many tables, what's the best approach for managing them? Multiple filegroups? Multiple schemas/owners? Multiple databases?
Another note: I'm not thrilled about the idea of "simply throwing hardware at the problem" (i.e. adding RAM, CPU power, disk speed). But we won't rule it out either, especially if (for example) someone can tell us definitively what effect adding RAM or using multiple filegroups will have on managing a large system catalog.
Without first seeing the entire system, my first recommendation would be to save the historical runs in combined tables with a RunID as part of the key - a dimensional model may also be relevant here. This table can be partitioned for improvement, which will also allow you to spread the table into other filegroups.
Another possibility it to put each run in its own database and then detach them, only attaching them as needed (and in read-only form)
CREATE TABLE and DROP TABLE are probably performing poorly because the master or model databases are not optimized for this kind of behavior.
I also recommend talking to Microsoft about your choice of database design.
Are the tables all different structures? If they are the same structure you might get away with a single partitioned table.
If they are different structures, but just subsets of the same set of dimension columns, you could still store them in partitions in the same table with nulls in the non-applicable columns.
If this is analytic (derivative pricing computations perhaps?) you could dump the results of a computation run to flat files and reuse your computations by loading from the flat files.
This seems to be a very interesting problem/application that you are working with. I would love to work on something like this. :)
You have a very large problem surface area, and that makes it hard to start helping. There are several solution parameters that are not evident in your post. For example, how long do you plan to keep the run analysis tables? There's a LOT other questions that need to be asked.
You are going to need a combination of serious data warehousing, and data/table partitioning. Depending on how much data you want to keep and archive you may need to start de-normalizing and flattening the tables.
This would be pretty good case where contacting Microsoft directly can be mutually beneficial. Microsoft gets a good case to show other customers, and you get help directly from the vendor.
We ended up splitting our database into multiple databases. So the main database contains a "databases" table that refers to one or more "run" databases, each of which contains distinct sets of analysis results. Then the main "run" table contains a database ID, and the code that retrieves a saved result includes the relevant database prefix on all queries.
This approach allows the system catalog of each database to be more reasonable, it provides better separation between the core/permanent tables and the dynamic/run tables, and it also makes backups and archiving more manageable. It also allows us to split our data across multiple physical disks, although using multiple filegroups would have done that too. Overall, it's working well for us now given our current requirements, and based on expected growth we think it will scale well for us too.
We've also noticed that SQL 2008 tends to handle large system catalogs better than SQL 2000 and SQL 2005 did. (We hadn't upgraded to 2008 when I posted this question.)

Will having multiple filegroups help speed up my database?

Currently, I am developing a product that does fairly intensive calculations using MS SQL Server 2005. At a high level, the architecture of my product is based on the concept of "runs" where each time I do some analytics it gets stored in a series of run tables (~100 tables per run).
The problem I'm having is that when the number of runs grows to be about 1,000 or so after a few months, performance on the database really seems to drop off, and specifically simple queries like checking for the existence of tables or creating views can take up to a second to two.
I've heard that using multiple filegroups, which I'm not currently doing, could help. Is this true, and if so, why/how would that help? Also, if there are other suggestions, even ones like, use fewer tables, I'm open to them. I just want to speed the database up and hopefully get it in a state where it will scale.
In terms of performance, the big gain in using separate files/filegroups is that it lets you spread your data across multiple physical disks. This is beneficial because with several disks, multiple data requests can be handled simultaneously (parallel is generally faster than serial). All other things being equal, this would tend to benefit performance, but the question of how much depends on your particular data set and the queries you're running.
From your description, the slow operations you're concerned about are creating tables and checking for the existence of tables. If you are generating 100 tables per run, then after 1000 runs you have 100,000 tables. I don't have much experience with creating that many tables in a single database, but you may be pressing the limits of the system tables that track the database schema. In this case, you might see some benefit by spreading your tables across more than one database (these databases could still all live within the same instance of SQL Server).
In general, the SQL Profiler tool is the best starting point for finding slow queries. There are data columns which indicate the CPU and IO cost of each SQL batch, which should point you to the worst offenders. Once you have found the problem queries, I would use the Query Analyzer to generate query plans for each of these queries, and see if you can tell what's making them slow. Do this by opening a query window, entering your query, and hitting Ctrl+L. A complete discussion of what might be slow would fill an entire book, but good things to look for are table scans (very slow for large tables) and inefficient joins.
In the end, you may be able to improve things simply by rewriting your queries, or you may have to make more broad changes to the table schema. For instance, maybe there's a way to create only one or a few tables per run, instead of 1000. More specifics about your particular setup would help us give a more detailed answer.
I also recommend this website for lots of tips on how to make things faster:
http://www.sql-server-performance.com/
When you talk about 100 tables per run, do you actually mean that you're creating new SQL tables? If so, I think that the architecture of your application may be the issue. I can't imagine a situation where you would need that many new tables as opposed to reusing the same few tables multiple times and simply adding a column or two to differentiate between runs.
If you're already reusing the same group of tables and new runs just mean additional rows in those tables, then the issue could simply be that the new data over time is hurting performance in one of several ways. For example:
The tables/indexes could be fragmented after awhile. Make sure that all of your tables have a clustered index. Check for fragmentation using sys.DM_DB_INDEX_PHYSICAL_STATS and issue ALTER INDEX with the REBUILD option if needed to defrag them.
The tables could simply be too large, so that inefficient on small tables are now obvious on the larger tables. Look into proper indexes on the tables to improve performance.
SQL Server will cache query plans (especially for stored procedures), but if the data in a table changes significantly over time that query plan may no longer be appropriate. Look into sp_recompile for your stored procedures to see if that's needed.
#2 is the culprit that I see most often in real world situations. Developers tend to develop using only a small set of test data and overlook proper indexing because you can do almost anything with a table of 20 rows and it will look fast.
Hope this helps
About 1000 of what? Single row writes? Multiple row transactions? Deletes?
A general tip would be to place the data files and log files on separate physical drives. SQL Server keeps track of every write to the log so having those in different drives should give you a general better performance.
But SQL Server tuning depends on what the application is actually doing. There are general tips but you have to measure your own thing...
The file groups being on different physical drives is what will give you the biggest performance boost, can also split up where the indexes are housed so that table writes and index accesses are hitting different disks. There's a lot you can do with partitioning, but that general concept is where the biggest speed impact comes from.
It can help with performance. moving certain tables/elemnts to distinct file areas/portions of the disk. this can reduce to a certain extent the amount of external fragmentation impacting the daabase.
I would also look at other factors such as tracesql to determine why queries etc are slowing down - there can be other factors such as query statistics, SP recompiles etc that are easier to fix and can give you greater gains in performance.
Split the tables across separate physical drives. If you have that much disk IO, you need a decent IO solution. Raid 10, fast disks, split the logs and DBs onto separate drives.
Re-examine your architecture - can you use multiple databases? If you create 1000s of tables in a go, you will soon hit some interesting bottlenecks that I've not had to deal with before. Multiple DBs should solve that. Think about having one "Controlling" db containing all your main meta-data, and then satellite DBs containing the actual data.
You don't mention any specs about your server - but we saw a decent increase in performance when we went from 8GB to 20GB RAM.
It could if you place them on separate drives - not logical but physical drives so IO is not slowing you down so much.

Resources