tips for optimizing a read-only sql database

tips for optimizing a read-only sql database - sql-server

I have a mid-sized SQL Server 2008 database that has actuarial data in it. All of the use cases for it are read-only queries. Are there any special optimizations I should consider given this scenario? Or should I just stick with the normal rules for optimizing a database?

One strategy is to add a readonly filegroup to your DB, and put your readonly tables there. A readonly filegroup allows SQL Server to make a number of optimizations, including things like eliminating all locks.
In addition to standard DB optimization:
Make sure all tables and indexes have zero fragmentation
Consider adding indexes that you may have otherwise avoided due to excessive update costs

In database:
Denormalize it.
Use more indexes where needed.
Aggregate some data if you need it in your reports.
In program:
Use READ UNCOMMITTED isolation level.
Use autocommits to escape long-run transactions.

If it is read only, one thing that you can do is put indexes on just about anything that might help (space permitting). Normally adding an index is a trade-off between a performance hit to writes and a performance gain for reads. If you get rid of the writes it's no longer a trade-off.
When you load the database you would want to drop all/most of the indexes, perform the load, then put the indexes back on the tables.

I'm not sure what you consider "normal rules", but here's some suggestions.
If you're 100% certain it's read-only, you can set the transaction isolation level to READ_UNCOMMITTED. This is the fastest possible read setting, but it will lead to phantom reads and dirty reads if you are writing to the tables.
If you have Views, use Indexed Views (create clustered indexes for them). Since they will never have to be updated, the performance penalty is negated.
Take a look at this article.

Denormalize the data.
Apply the appropriate indexes.
Precalculate aggregations.
Implement the database atop a striped disk.
I've never seen this done but if you could somehow load the entire thing into memory (RAM disk???) that would be super fast, right?

For a read-only table, consider altering the indexes to use a fill factor of 100%.
This will increase the amount of data on each data page. More data per page, fewer pages to read, less I/O, thus better performance.
I like this option because it improves performance without code changes or table changes.

For performance tuning there are several things you can do. Denormailzation works. Proper clustered indexes dependent on how the data will be queried. I don't recommend using a nolock hint. I'd use snapshot isolation level.
It's also important on how your database is laid out on the disks. For read only performance, I'd recommend Raid 10, with separate mdf's and ldf's to isolated spindles. Normally, for a production database it would be Raid 5 for data and Raid 1 for logs. Make sure you have a tempdb file for each cpu, used for sorting, a good starting size is 5gb data and 1 gb log for each cpu. Also make sure you run your queries or procs through showplan to help optimize them as well as possible. Ensure that parallelism is on in the server settings.
Also if you have the time and space for optimal performance, I'd map out exactly where the data lives on the disks, creating file groups and putting them on completely separate volumes that are isolated disks in each volume.

Related

Database tables optimized for both read and write

We have a web service that pumps data into 3 database tables and a web application that reads that data in aggregated format in a SQL Server + ASP.Net environment.
There is so much data arriving to the database tables and so much data read from them and at such high velocity, that the system started to fail.
The tables have indexes on them, one of them is unique. One of the tables has billions of records and occupies a few hundred gigabytes of disk space; the other table is a smaller one, with only a few million records. It is emptied daily.
What options do I have to eliminate the obvious problem of simultaneously reading and writing from- and to multiple database tables?
I am interested in every optimization trick, although we have tried every trick we came across.
We don't have the option to install SQL Server Enterprise edition to be able to use partitions and in-memory-optimized tables.
Edit:
The system is used to collect fitness tracker data from tens of thousands of devices and to display data to thousands of them on their dashboard in real-time.

Way too broad of requirements and specifics to give a concrete answer. But a suggestion would be to setup a second database and do log shipping over to it. So the original db would be the "write" and the new db would be the "read" database.
Cons
Diskspace
Read db would be out of date by the length of time for log tranfser
Pro
- Could possible drop some of the indexes on "write" db, this would/could increase performance
- You could then summarize the table in the "read" database in order to increase query performance
https://msdn.microsoft.com/en-us/library/ms187103.aspx

Here's some ideas, some more complicated than others, their usefulness depending really heavily on the usage which isn't fully described in the question. Disclaimer: I am not a DBA, but I have worked with some great ones on my DB projects.
[Simple] More system memory always helps
[Simple] Use multiple files for tempdb (one filegroup, 1 file for each core on your system. Even if the query is being done entirely in memory, it can still block on the number of I/O threads)
[Simple] Transaction logs on SIMPLE over FULL recover
[Simple] Transaction logs written to separate spindle from the rest of data.
[Complicated] Split your data into separate tables yourself, then union them in your queries.
[Complicated] Try and put data which is not updated into a separate table so static data indices don't need to be rebuilt.
[Complicated] If possible, make sure you are doing append-only inserts (auto-incrementing PK/clustered index should already be doing this). Avoid updates if possible, obviously.
[Complicated] If queries don't need the absolute latest data, change read queries to use WITH NOLOCK on tables and remove row and page locks from indices. You won't get incomplete rows, but you might miss a few rows if they are being written at the same time you are reading.
[Complicated] Create separate filegroups for table data and index data. Place those filegroups on separate disk spindles if possible. SQL Server has separate I/O threads for each file so you can parallelize reads/writes to a certain extent.
Also, make sure all of your large tables are in separate filegroups, on different spindles as well.
[Complicated] Remove inserts with transactional locks
[Complicated] Use bulk-insert for data
[Complicated] Remove unnecessary indices
Prefer included columns over indexed columns if sorting isn't required on them
That's kind of a generic list of things I've done in the past on various DB projects I've worked on. Database optimizations tend to be highly specific to your situation...which is why DBA's have jobs. Some of the 'complicated' answers could be simple if your architecture supports it already.

How to scale writings in your DB without recurring to sharding?

How would you scale writings without recurring to sharding (specially with SQL Server 2008)?

Normally ... avoid indexes and foreign keys in big tables. Every insert/update on a indexed column implies rebuilding partially the index and sometimes this can be very costly. Of course, you'll have to trade query speed VS writing speed but this is a known issue in database design. You can combine this with a NoSQL database with a some sort of mechanism for caching queries. Maybe a fast NoSQL system sitting in front of your transactional system.
Another option is to use transactions in order to do many writes in one go, when you commit the transaction the indexes will be rebuilt but just once per transaction not one per write.

Why not shard? The complexities in the code can be avoided by using transparent sharding tools, which ease all the heavy lifting associated with sharding.
Check out ScaleBase for more info

What can cause bad SQL server performance?

Every time I find out that the performance of data retrieval from my database is slow. I try to figure out which part of my SQL query has the problem and I try to optimize it and also add some indexes to the table. But this does not always solve the problem.
My question is :
Are there any other tricks to make SQL server performance better?
What are the other reason which can make SQL server performance worse?

Inefficient query design
Auto-growing files
Too many indexes to be maintained on a table
Too few indexes on a table
Not properly choosing your clustered index
Index fragmentation due to poor maintenance
Heap fragmentation due to no clustered index
Too high FILLFACTORs used on indexes, causing excessive page splitting
Too low of a FILLFACTOR used on indexes, causing excessive space usage and increased scanning time
Not using covered indexes where appropriate
Non-selective indexes being used
Improper maintenance of statistics (out of date statistics)
Databases not normalized properly
Transaction logs and data sharing the same drive spindles
The wrong memory configuration
Too little memory
Too little CPU
Slow hard drives
Failing hard drives or other hardware
A 3D screensaver on your database server chewing up your CPU
Sharing the database server with other processes which compete for CPU and memory
Lock contention between queries
Queries which scan entire large tables
Front end code which searches data in an inefficent manner (nested loops, row by row)
CURSORS which are not necessary and/or are not FAST_FORWARD
Not setting NOCOUNT when you have large tables being cursored through.
Using a transaction isolation level which is too high (such as using SERIALIZABLE when it's not necessary)
Too many round trips between the client and the SQL Server (a chatty interface)
An unnecessary linked server query
A linked server query which targets a table on a remote server with no primary or candidate key defined
Selecting too much data
Excessive query recompilations
oh and there might be some others, too.

When I talk to new developers that have this problem I usually find that it is because of one of two problems. Both of them are fixed if you follow these 2 rules.
First, don’t retrieve any data that you don’t need. For example, if you are doing paging then don’t bring back 100 rows and then calculate which ones belong on the page. Have the stored proc figure it out and only retrieve the 10 you need.
Second, nothing is faster than work you don’t do. For example, I worked on a system where the full roles and rights for a user were retrieved with every page requested – this was 100’s of rows for some users. Even just saving this to session state on the first request and then using it from there for subsequent requests took a meaningful weight off of the database.

Suggest you get a good book on Performance tuning for the database you use (this is very much database specific). This is an extremely complex subject and cannot really be answered other than in generalities on the web.
For instance, Dave markle tell you inefficient queries can cause the problem and there are many many ways to write inefficient queries and many more ways to fix them.

If you're new to the database and you have access to the database engine tuning advisor, you can heuristically tune your database.
You basically capture the SQL queries being run against your DB in the SQL Profiler, then feed those to DETA. DETA effectively runs the queries (without altering your data) and then works out what information your database is missing (views, indexes, partitions, statistics etc.) to do the queries better.
It can then apply them for you and monitor them in the future. I'm not saying to assume that DETA is always right or to do things without understanding, but I've found that it's definately a good way to see what your queries are doing, how long they take, and how you can index the DB appropriately.
PS: With all that said, it's much better to invest in a good DBA at the start of a project so that you have good structures and indexing to start with. But thats not the position that you're in right now...

This is a very wide question. And there is a ton of answers already. Still I would like to add one important factor - Page Split. The problem is – there are good splits and bad splits. Following are good articles explaining how to use transaction_log extended event for identifying bad/nasty page splits
Tracking Problematic Pages Splits in SQL Server 2012 Extended Events - Jonathan Kehayias
Tracking page splits using the transaction log - Paul Randal
You mentioned:
I try to optimize it and also add some indexes
But, sometimes removing unused non-clustered indexes may help to improve performance as it help to reduce transaction logs. Read Top Reasons for Log Performance Problems
Wait statistics, or please tell me where it hurts gives an idea about using wait statistics for performance analysis.
To see some fresh ideas for performance, take a look at
Performance Considerations - sqlmag.com
Separate tables in joins to different disks (for parallel disk I/O - filegroups).
Avoid joins on columns with few unique values.
To understand JOIN, read Advanced JOIN Techniques

Should static database data be in its own Filegroup?

I'm creating a new DB and have a bunch of static data that won't change. If it does, it will be a manual process AND it will happen very rarely.
This data is a mix of varchars and Geographies.
I'm guessing it could be around 100K or so in total, over 4 or so tables.
Questions
Should I put these on a READ ONLY filegroup
Can I create the tables in the designer and define the filegroup during creation? Or is it only possible via a script?
Once the data is in the table (on a read only filegroup), can I change it later? Is it really hard to do that?
thanks.

It is worth it for VLDB (very large databases) for assorted reasons.
For 100,000 rows or 100 KB, I wouldn't bother.
This SQL Server support engineering team article discusses one of the associated "urban legends".
There is another one (can't find it) where you need 300 GB - 1B of data before you should consider multiple files/filegroups.
But, to answer specifically
Personal choice (there is no hard and fast rule)
Yes (edit:) In SSMS 2005, design mode, go to Indexes/Key, "data space specfication". The data lives where the clustered index is. WIthout a clustered index, then you can only do it via CREATE TABLE (..) ON filegroup
Yes, but You'll have to ALTER DATABASE myDB MODIFY FILEGROUP foo READ_WRITE with the database in single user exclusive mode

It is unlikely to hurt to put the data in to a read only space but I am unsure you will gain significantly. A read-only file group (or tablespace in Oracle) can give you 2 advantages; less to back-up each time a full backup is taken and a higher level of security over the data (e.g. it cannot be changed by a bug, accessing the DB via another tool, etc). The backup advantage is most true with larger DBs where backup windows are tight so putting a small amount of effort into excluding file groups is valuable. The security one depends on the nature of the site, data, etc. (if you do exclude the read-only space from regular backups make sure you get a copy on any retained backup tapes. I tend to backup up read-only spaces once a month.)
I am not familiar with designer.
Changing to and from read only is not onerous.

I think anything you read here is likely to be speculation, unless you have any evidence that it's been actually tried and recommended - to me it looks like a novel but unlikely idea. Do you have some reason to suspect that conventional practices will be unsatisfactory? It should be fairly easy to just try it and find out. Post your results if you get a chance.

Will having multiple filegroups help speed up my database?

Currently, I am developing a product that does fairly intensive calculations using MS SQL Server 2005. At a high level, the architecture of my product is based on the concept of "runs" where each time I do some analytics it gets stored in a series of run tables (~100 tables per run).
The problem I'm having is that when the number of runs grows to be about 1,000 or so after a few months, performance on the database really seems to drop off, and specifically simple queries like checking for the existence of tables or creating views can take up to a second to two.
I've heard that using multiple filegroups, which I'm not currently doing, could help. Is this true, and if so, why/how would that help? Also, if there are other suggestions, even ones like, use fewer tables, I'm open to them. I just want to speed the database up and hopefully get it in a state where it will scale.

In terms of performance, the big gain in using separate files/filegroups is that it lets you spread your data across multiple physical disks. This is beneficial because with several disks, multiple data requests can be handled simultaneously (parallel is generally faster than serial). All other things being equal, this would tend to benefit performance, but the question of how much depends on your particular data set and the queries you're running.
From your description, the slow operations you're concerned about are creating tables and checking for the existence of tables. If you are generating 100 tables per run, then after 1000 runs you have 100,000 tables. I don't have much experience with creating that many tables in a single database, but you may be pressing the limits of the system tables that track the database schema. In this case, you might see some benefit by spreading your tables across more than one database (these databases could still all live within the same instance of SQL Server).
In general, the SQL Profiler tool is the best starting point for finding slow queries. There are data columns which indicate the CPU and IO cost of each SQL batch, which should point you to the worst offenders. Once you have found the problem queries, I would use the Query Analyzer to generate query plans for each of these queries, and see if you can tell what's making them slow. Do this by opening a query window, entering your query, and hitting Ctrl+L. A complete discussion of what might be slow would fill an entire book, but good things to look for are table scans (very slow for large tables) and inefficient joins.
In the end, you may be able to improve things simply by rewriting your queries, or you may have to make more broad changes to the table schema. For instance, maybe there's a way to create only one or a few tables per run, instead of 1000. More specifics about your particular setup would help us give a more detailed answer.
I also recommend this website for lots of tips on how to make things faster:
http://www.sql-server-performance.com/

When you talk about 100 tables per run, do you actually mean that you're creating new SQL tables? If so, I think that the architecture of your application may be the issue. I can't imagine a situation where you would need that many new tables as opposed to reusing the same few tables multiple times and simply adding a column or two to differentiate between runs.
If you're already reusing the same group of tables and new runs just mean additional rows in those tables, then the issue could simply be that the new data over time is hurting performance in one of several ways. For example:
The tables/indexes could be fragmented after awhile. Make sure that all of your tables have a clustered index. Check for fragmentation using sys.DM_DB_INDEX_PHYSICAL_STATS and issue ALTER INDEX with the REBUILD option if needed to defrag them.
The tables could simply be too large, so that inefficient on small tables are now obvious on the larger tables. Look into proper indexes on the tables to improve performance.
SQL Server will cache query plans (especially for stored procedures), but if the data in a table changes significantly over time that query plan may no longer be appropriate. Look into sp_recompile for your stored procedures to see if that's needed.
#2 is the culprit that I see most often in real world situations. Developers tend to develop using only a small set of test data and overlook proper indexing because you can do almost anything with a table of 20 rows and it will look fast.
Hope this helps

About 1000 of what? Single row writes? Multiple row transactions? Deletes?
A general tip would be to place the data files and log files on separate physical drives. SQL Server keeps track of every write to the log so having those in different drives should give you a general better performance.
But SQL Server tuning depends on what the application is actually doing. There are general tips but you have to measure your own thing...

The file groups being on different physical drives is what will give you the biggest performance boost, can also split up where the indexes are housed so that table writes and index accesses are hitting different disks. There's a lot you can do with partitioning, but that general concept is where the biggest speed impact comes from.

It can help with performance. moving certain tables/elemnts to distinct file areas/portions of the disk. this can reduce to a certain extent the amount of external fragmentation impacting the daabase.
I would also look at other factors such as tracesql to determine why queries etc are slowing down - there can be other factors such as query statistics, SP recompiles etc that are easier to fix and can give you greater gains in performance.

Split the tables across separate physical drives. If you have that much disk IO, you need a decent IO solution. Raid 10, fast disks, split the logs and DBs onto separate drives.
Re-examine your architecture - can you use multiple databases? If you create 1000s of tables in a go, you will soon hit some interesting bottlenecks that I've not had to deal with before. Multiple DBs should solve that. Think about having one "Controlling" db containing all your main meta-data, and then satellite DBs containing the actual data.
You don't mention any specs about your server - but we saw a decent increase in performance when we went from 8GB to 20GB RAM.

It could if you place them on separate drives - not logical but physical drives so IO is not slowing you down so much.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight