We work for a small company that cannot afford to pay SQL DBA nor consultation.
What started as a small project has now become a full scale system with a lot of data.
I need someone to help me sort out performance improvements. I realise no-one will be able to help directly and nail this issue completely, but I just want to make sure I have covered my tracks.
OK, the problem is basically we are experiencing time-outs with our queries on cached data. I have increased the time-out time with c# code but I can only go so far when it's becoming ridiculous.
The current setup is a database that has data inserted every 5 / 10 seconds, constantly! During this process we populate tables from csv files. Over night we run data caching processes that reduces the overload on the "inserted" tables. Originally we were able to convert 10+ million rows into say 400000 rows, but as users want more filtering we had to include more data rows and of course increases the number of data cached tables from 400000 to 1-3 million rows.
On my SQL Development Server (which does not have data inserted every 5 seconds) it used to take 30 seconds to run queries on data cache table with 5 million rows, now with indexing and some improvements it's now 17 seconds. The live server has standard SQL Server and used to take 57 seconds, now 40 seconds.
We have 15+ instances running with same number of databases.
So far we have outlined the following ways of improving the system:
Indexing on some of the data cached tables - database now bloated and slows down overnight processes.
Increased CommandTimeout
Moved databases to SSD
Recent improvements likely:
Realised we will have to move csv files on another hard disk and not on the same SSD drive SQL Server databases reside.
Possibly use file-groups for indexing or cached tables - not sure if SQL Server standard will cover this.
Enterprise version and partition table data - customer may pay for this but we certainly can't afford this.
As I said I'm looking for rough guidelines and realise no-one may be able to help fix this issue completely. We're are a small team and no-one has extended SQL Server experience. Customer wants answers and we've tried everything we know. Incidentally they had a small scale version in Excel and said they found no issues so why are we?!?!?
Hope someone can help.
Related
I have an Access database for which I am the only user. It's the first database I've built. It has 16 related tables, around 40 select queries, and a dozen or so update/delete queries. It is already 512MB and will at least double in size as additional data is added to tables and more queries & reports are created over the next 12 months. The largest table (which is accessed by most of the queries) is around 800k rows by 11 fields. This table will most likely grow to over 2M lines over the useful life of the database (c. 12 months).
Queries that ran in under 30 sec a month ago are starting to run slower as the tables have grown, some queries which include calculations now taking 10 min or so to complete (and yes, I am using stacked queries as much as possible).
Does anyone have solid advice one way or the other as to the performance boost I could expect from splitting?
Thanks
No you wouldn't, only query optimizing and careful indexing can speed up the query time.
That said, you should split it anyway (create a backup and run the wizard) if for nothing else to ease backup of your data and to make that independent of your ongoing development of the frontend.
Not sure if this is a SO or a ServerFault question, so please feel free to move if it's not in the right place:
I have a client with a large database containing a table with around 30-35 million rows running on a SQL2008R2 server (the server is pretty high spec, 16 cores, 92 gig ram, RAID etc). There are other tables this table may join on, but it is the main driver of a several reports.
Their SSRS instance/database and query source database are both running on the same box/sql instance
They regularly run ad-hoc reports from this database (which have undergone extensive optimisation), many of which may end up touching a lot of the data in the table. After looking at the report server stats it appears that the data fetch doesn't actually take that long, but a lot of data is returned and report processing takes a fair while: it can take up to 20-30 minutes to process some of the larger reports, which can have tens of thousands of pages (the data fetch in these cases is less than 10 seconds).
(Note: I realise that there is never really a need to run 25,000 pages off but the client insists and won't listen to reason...something about Excel spreadsheets *FACEPALM!*)
At the moment they are concerned about a couple of performance issues that crop up sporadically and the culprit may be the ad-hoc reporting.
We are looking at offloading the report processing anyway, so thought that this would be an ideal opportunity - but before doing so I'm wondering how much relief this will give the SQL server.
If I move the SSRS app and database onto another SQL host and remotely query the data (network conditions should be ideal as this is datacentre based), will I see any performance gains?
This is mainly based on guesswork at this stage but I see the following being the factors that could affect performance:
I/O for moving a shedload of rows from the query source to RS temp DB
CPU load when the report server is crunching all the data
In moving to another host I see these factors being reduced for the SQL server. The new server will be solely responsible for report processing (and should also be high spec), so hopefully there will be no contention when processing reports.
Do I sound like I am on the right track in my assumptions? Is there anything else that I may have missed which could adversely affect performance or improve performance?
Thanks in advance
You should look at transactional replication to send data from the main server to a database on the reporting server. Querying the tables directly over the network will only slow things down even more.
I am starting a new project using SQL Server for a medical office. Their current database (SQL Server 2008) have over 500,000 rows that span across 15+ tables. Currently they are complaining that their data entry application is very slow to generate reports and insert new data.
For my new system I was thinking of developing a two tiered database approach where the primary used SQL Server 2012 will only contain 3 months worth of rows and the second SQL Server 2012 would maintain all the data for the system. This way when users insert new data it will be entered into a much smaller system and when they query recent data the query should execute much faster. This system will also have reporting, but I think the reports will have to be generated from the larger data set.
My questions are as follows
Will a solution like this improve the overall performance of the database
Are there any scalability concerns with this solution?
What is the best way to transfer that data between the two servers each night?
If my solution makes no sense please feel free to offer any other solutions.
Don't do this. Splitting your app into multiple databases will be a management nightmare. Plus, 500k records isn't that many, assuming that the records are of reasonable size.
Instead, go after the low-hanging fruit. Turn on logging and look at the access patterns. Which queries are slow? Figure out why. Do they lack indexes? Can the queries be simplified? Debug the problem.
Keep in mind that sometimes throwing hardware at the problem is the right solution. If you can solve the problem with an $800 server, do it. That's a lot cheaper than your time.
To chime in: 500K records is not so big. You ought to be able to make the db work very speedily as is with some tuning.
HI all!
My client currently has a SQL Server database that performs 3-4 Million Inserts, about as many updates and even more reads a day, every day. Current DB is laid out weirdly IMHO: The incoming data goes to "Current" table, then nightly records are moved to corresponding monthly tables (i.e. MarchData, AprilData, MayData etc.), that are exact copies of Current table (schema-wise i mean). Reads are done from view that UNIONs all monthly tables and Current table, Inserts and Updates are done only to Current table. It was explained to me that the separation of data into 13 tables was motivated by the fact that all those tables use separate data files and those data files are written to 13 physical hard drives. So each table gets its own hard drive, supposedly speeding up the view performance. What i'm noticing is that nightly record move to monthly tables (which is done every 2 minutes for the period of night, 8 hours) coincides with full backup and DB starts crawling, web site times out etc.
I was wondering is this approach really the best approach out there? Or can we consider a different approach? Please mind, that the database is about 300-400 GB and growing by 1.5-2 GB a day. Every so often we move records that are more than 12 months old to a separate database (archive).
Any insight is highly appreciated.
If you are using MS SQL Server, consider Partitioned Tables and Indexes.
In short: you can group your rows by some value, i.e. by year and month. Each group could be accessible as separate table with own index. So you can list, summarize and edit February 2011 sales without accessing all rows. Partitioned Tables complicate the database, but in case of extremely long tables it could lead to significantly better performance. It also supports "filegroups" to store values in different disks.
This MS-made solution seems very similar to yours, except one important thing: it doesn't move records over night.
It was explained to me that the separation of data into 13 tables was motivated by the fact that
all those tables use separate data files and those data files are written to 13 physical hard
drives. So each table gets its own hard drive,
THere is one statement for that: IDIOTS AT WORK.
Tables are not stored on discs, but in file spaces which can span multiple data files. Note this... so you can have one file space that has 12 data files on 13discs and a table would be DISTRIBUTED OVER ALL 13 TABLES. No need to play stupid silly games to distribute the load, it is already possible just by reading the documentation.
Even then, I seriously doubt 13 discs are fast. Really. I run a smaller database privately (merely 800gb) that has 6 discs for the data alone, and my current work assignment is into three digits of discs (that is 100+). Please, do not name 13 discs a large database.
Anyhow, SHOULD the need arive to distribute data, not a UNION but partitioned tables (atgain a standard sql server, albeit enterprise edition feature) is the way to go.
Please mind, that the database is about 300-400 GB and growing by 1.5-2 GB a day.
Get a decent server.
I was wondering is this approach really the best approach out there?
Oh, hardware. Get one of the SuperMicro boxes for databases 2 to 4 rack units high, SAS backplane, 24 to 72 slots for discs. Yes, one one computer.
Scrap that monthly blabla table crap that someone came up with who obviously shoul not work with databases. All in one table. Use filespaces and multiple data files to handle load distribution for all tables into the various discs. Unless...
...you actually realize that running discs like that is gross neglect. A RAID 5 or RAID 6 or RAID 10 is in order, otherwise your server is possibly down when a disc fails which will happen and resotring a 600gb database takes time. I run RAID 10 for my data discs, but then privately have tables with about a billion rows (and in work we add about that a day). Given the SMALL size of the database, a couple of SSD would also help.... their IOPS budget would mean you could go to possibly 2-3 discs and get a lot more speed out. If that is not possible, my bet is that those discs are slow 3.5" discs with 7200 RPM... an upgade to enterprise level discs would help. I personaly use 300gb Velociraptors for databases, but there are 15k SAS discs to be taken ;)
Anyho, this sounds really badly set up. So bad I would either be happy my trainee came up with something that smart (as it woul definitely be over the head of a trainee), or my developer would stop working for me the moment I Find that out (based on gross incompetence, feel free to challenge in court)
Reorganize it. Also be carefull with any batch processing - those NEED to be time staggered so they do not overlap wioth backups. There is only so much IO a mere simple low speed disc can deliver.
I'm interested to know how I could improve the performance of SQL Server when using sequential GUID when using Access 2007 as a front end to SQL Server 2008 (please note it's the only context I'm interested in).
I have made some tests (and gotten some fairly surprising results, in particular from SQL Server when using sequential GUID: the insert performance degrades very very quickly and it doesn't seem right to degrade so quickly to me.
Basically the test is as follow:
From the Access front-end, using VBA only, insert 100,000 records in batches of 1000,
sequentially.
I tried it both with a Identity and a sequential GUID as the PK.
I tried it in SQL Server 2008 Standard (no special tweaking just default install) as and an Access 2007 database as the back-end. All tables linked back to the front-end.
Some of the results (more, with raw data available on my blog entry about the test):
It's clear that, as the database grows, the insert performance is reduced but SQL Server isn't performing very well at all here.
http://blog.nkadesign.com/wp-content/uploads/2009/04/chart02.png
Expanded view of the results for SQL Server:
http://blog.nkadesign.com/wp-content/uploads/2009/04/chart03.png
Edit 13APR2009
I've found an issue with my server configuration and I updated the tests on my blog.
Thanks to all for your replies, they helped me a lot.
There's two things at play here. First, it's important to point out that SQL doesn't necessarily work very well, for a specific use case, out of the box. It is a professional product designed to be tuned by a person who knows what they're doing.
By comparison, Access is designed to work very well for most use cases without any configuration. The downside of this trade-off is covered in the second point:
SQL Server is designed for scalability. Notice how Access severely degrades with only 100,000 records. It would probably drop very steeply below SQL's line before a million. By comparison, SQL server holds almost perfectly steady, with the variation stabilizing after about 45,000 records and will continue to hold at many millions.
Edit I think there also may be something else at play here we're not seeing. I thought your SQL numbers looked bad, so I ran a test of my own. On my desktop running Windows Vista 3.6 ghz and 2gb of RAM, inserts with sequential GUID on SQL Server performed:
Average of 1382 inserts per second at 0 records
Average of 1426 inserts per second at 500k records
Averaging 1609.6 inserts per second from 0 to 500k with an average floor of 992 inserts/sec and an average ceiling of 1989 inserts/sec.
So accounting for the normal variance incurred by running this on an in-use desktop, I'd say SQL Server inserts basically scale linearly from 0 records to half a million. On a dedicated, tuned server I'd expect even more consistency (not to mention far better performance):
Excel chart, inserts per second http://img24.imageshack.us/img24/9485/insertspersecond.jpg
My question is whether your test setup represents the reality of your application or not. In short, are you testing the right thing?
Is your app going to be appending large numbers of records one at a time?
Or is it going to be appending batches of records based on a SQL SELECT?
If the latter, you might look at trying to do it all server-side, particularly if the source table(s) in the SELECT are on the server. It's important to realize that with ODBC, a batch append is going to be sent to the SQL Server as a single insert for every single row (every similar to the recordset-based approach in your test code). If you move the same process entirely server-side, it can be done as a batch operation.
Also, you should test again using ADO instead of DAO. It may optimize the operation completely differently.
Last of all, someone brought to my attention just this past week this fascinating article by Andy Baron:
Optimizing Microsoft Office Access Applications Linked to SQL Server
I'm still absorbing the contents of that very useful article, and it discusses several issues in regard to non-GUID-specific topics that may help you optimize your process for maximum efficiency.
You realize at least part of the decreasing performance is the log filling up, and that a GUID id what, 40 bytes longer than an int?
But I'm not quibbling; it's good to see someone taking actual metrics rather than just handwaving. Modded up.
Where are you getting the data from?
Does it change the numbers if you use the Access Export menu options rather than record-at-a-time-in-a-loop?
VBA is really sensitive to the connection paramters too, and there are lots of options that aren't necessarily intuitive.
If an identity column is acceptable, why are you even considering a sequential GUID (which is something of a tacked-on facility in MSSQL last I checked).
EDIT:
Looking at your code and briefly reviewing the Recordset docs on MSDN, I see you may be able to use more efficient parameters. E.g. your dbSeeChanges and dbOpenDynaset, which are appropriate if you are trying to allow for other users messing with the same rows (or needing to get back the inserted IDENTITY value or probably GUID), but I don't think you need those. In essence, after every INSERT or UPDATE, you're reading the record back from the database into VBA. I'd read through those connection config settings carefully, and I bet you'll come up with something a lot more satisfactory.
The last time I saw something like that (really slow insertion with GUID PK) was because of the log-file filling up. Insertion performance was dropping like a stone, pretty fast (no hard measurement, just looking at live traces, but it sure looked like it was kinda logarithmic). This was pre-loading of historical data.
Moved over to identity PK, took care of actually cleaning up the log file, and everything went much better afterwards (a couple of hours where the first version took several hours and was not finished).
Also, just a thought, are there any transactions involved? Maybe SQL Server transactions create a big performance hit that access does not have (given that access is not really geared towards concurrent access).