Avoid running the same query multiple times that takes lot of time - database

I am running a complex database query that takes enough time.
First I am unloading the results into a file and then I am running this query again to delete all those entries. This again takes lots of time.
Is there a better way so that I can save time here and fulfill both the purposes.
I am new to databases..dont have idea whether there is a way.
Thanks !!!

You must find bottleneck in your query or process: is it SQL query or saving data to slow media?
If it is SQL query then try to find what part of this query takes most of the time and for example add some indexes to make it faster. Have a look at similar question: Tweak Informix query . Remember that SET EXPLAIN is your friend. Also remember to UPDATE STATISTICS after you change a lot of data in database.
Why do you unload the results into a file? Do you analyze those results with different tools? If so then maybe there is a way to do it more efficiently in SQL?

Related

SQL Server 2008 R2 - optimization issue

I'm running into a performance issue with the current schema. So I built an equivalent schema to solve the issue.
I ran some tests on both schemas and the results are hard to understand. For the record, the data is the same.
I get the following from the Profiler when executing equivalent requests on the two schemas.
Old schema:
1,300,000 reads
5,000 CPU
4 seconds execution time
New schema:
30,000 reads
3,000 CPU
6 seconds execution time
The difference seems to be in the query plan used. The old schema has parallelism in the query plan. The new schema isn't using parallelism.
Has anyone faced similar situations (less IO/CPU but more execution time). How did you solve it?
Is there a way to force parallelism? I've played with query hints(http://msdn.microsoft.com/en-us/library/ms18171). I'm able to stop parallelism on the old schema but can't seem the query on the new schema to use parallelism.
Thanks in advance.
Louis,
Currently there is no way to force parallelism in SQL Server straight out of the box but Adam Machanic did some work to do that though.
http://whoisactive.com
Coming to your first question, yes we have seen cases like that too. Note that Parallelism is cpu bound and that's why you are seeing more cpu time but overall less execution time as you have multiple threads doing the work for you.
http://www.simple-talk.com/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server/
Make sure you have proper indexes in place and also stats are updated with full scan. In the long run it is best if Query Optimizer makes the decisions by itself but if you want to overwrite the QO plans then you may have to add lot more details. Schema, data and repro.
HTH

What are these sys.sp_* Stored Procedures doing?

I'm tracking down an odd and massive performance problem in my SQL server installation. On my system, a particular stored procedure takes 2 minutes to execute; on a colleague's system it takes less than 1 second. We have similar databases/data and configurations, but there's obviously something very different.
I ran the SP in question through the Profiler on both systems and noticed something odd. On My system, I see 9 entries with the following properties:
The Duration is way high relative to other rows. I have values as high as 37,698 and as low as 1734. On the "fast" system the maximum duration (for the entire SP call) is 259.
They are executed for two databases related to the one that contains the SP I'm running. (This SP makes calls via Linked Servers to these two databases).
They are executions of one of the following system SPs:
sp_tables_info_90_rowset
sp_check_constbytable_rowset
sp_columns_90_rowset
sp_table_statistics2_rowset
sp_indexes_90_rowset
I can't find any Googleable documentation on what these are, why they would be so slow, or why they would run on one system but not the other. Does anyone know what they're all about?
Try manually updating statistics on that table.
UPDATE STATISTICS [TableName]
Then double check that the database option to AutoUpdateStatistics is TRUE. Even if it is, though, I've seen cases where adding large amounts of data to a table doesn't always cause the statistics to update in a timely way, and queries can be slow.
I don't know the answer to your question. But to try to fix the problem you're having (which, I assume, is what you're actually interested in), the first thing I'd do is run a re-index on the tables you're querying. This frequently will fix any kind of slowness when the conditions are as you described (same database structure, different data/database, same query).
These are the tables created when you have linked server calls. These are called work tables created in Tempdb. They are automatically created by the database engine for temporary operations like Spooling etc.
Those sp's mean your query is hitting linked servers by using synonyms. This should be avoided whenever possible.
I'm not familiar with those specific procedures, but you can try running:
SELECT object_definition(object_id('Procedure Name'))
To get a better idea of what's going on under the hood.
Last index rebuild? Last statistics update?
Otherwise, these stored procs are used by the SQL Server client too... no? And probably won't cause these errors

Saving / Caching Stored Procedure results for better performance? (SQL Server 2005)

I have a SP that has been worked on my 2 people now that still takes 2 minutes or more to run. Is there a way to have these pre run and stored in cache or somewhere else so when my client needs to look at this data in a web browser he doesn't want to hang himself or me?
I am no where near a DBA so I am kind of at the mercy of who I hire to figure this out for me, so having a little knowledge up front would really help me out.
If it truly takes that long to run, you could schedule the process to run using SQL Agent, and have the output go to a table, then change the web application to read the table rather than execute the stored procedure. You'd have to decide how often to run the refresh, and deal with the requests that occur while it is being refreshed, but that can be dealt with as well by having two output files, one live and one for the latest refresh.
But I would take another look at the procedure, look at the execution plan and see where it is slow, make sure it is not doing full table scans.
Preferred solutions in this order:
Analyze the query and optimize accordingly
Cache it in the application (you can use httpRuntime.Cache (even if not asp.net application)
Cache SPROC results in a table in the DB and then add triggers to invalidate the cache (delete the table) so a a call to the SPROC would first look to see if there is any data in the cache table. If none, run SPROC and store the result in the cache table, if so, return the data from that table. The triggers on the "source" tables for the SPROC would just delete * from CacheTable to "clear the cache" (depending on what you sproc is doing and its dependencies, you may even be able to partially update the cache table based on the trigger, but all of this quickly gets difficult to maintain...but sometimes you gotta do what you gotta do...This approach will allow the cache table to update itself as needed. You will always have the latest data and the SPROC will only run when needed.
Try "Analyze query in database engine tuning advisor" from the Query menu.
I usually script the procedure to a new window, take out the query definition part and try different combinations of temp tables, regular tables and table variables.
You could cache the result set in the application as opposed to the database, either in memory by keeping an instance of the datatable around, or by serializing it to disk. How many rows does it return?
Is it too long to post the code here?
OK first things first, indexes:
What indexes do you have on the tables and is the execution plan using them?
Do you have indexes on all the foreign key fields?
Second, does the proc use any of the following performance killers:
a cursor
a subquery
a user-defined function
select *
a search criteria that starts with a wildcard
third
Can the where clause be rewritten to be sargeable? There is more than one way to write almost everything and some ways are better performers than others.
I suggest you buy your developers some books on performance tuning.
Likely your proc can be fixed, but without seeing the code, it is hard to guess what the problems might be.

Whats the best way to profile a sqlserver 2005 database for performance?

What techinques do you use? How do you find out which jobs take the longest to run? Is there a way to find out the offending applications?
Step 1:
Install the SQL Server Performance Dashboard.
Step2:
Profit.
Seriously, you do want to start with a look at that dashboard. More about installing and using it can be found here and/or here
To identify problematic queries start the Profiler, select following Events:
TSQL:BatchCompleted
TSQL:StmtCompleted
SP:Completed
SP:StmtCompleted
filter output for example by
Duration > x ms (for example 100ms, depends mainly on your needs and type of system)
CPU > y ms
Reads > r
Writes > w
Depending on what you want to optimize.
Be sure to filter the output enough to not having thousands of datarows scrolling through your window, because that will impact your server performance!
Its helpful to log output to a database table to analyse it afterwards.
Its also helpful to run Windows system monitor in parallel to view cpu load, disk io and some sql server performance counters. Configure sysmon to save the data to a file.
Than you have to get production typical query load and data volumne on your database to see meaningfull values with profiler.
After getting some output from profiler, you can stop profiling.
Then load the stored data from the profiling table again into profiler, and use importmenu to import the output from systemmonitor and the profiler will correlate the sysmon output to your sql profiler data. Thats a very nice feature.
In that view you can immediately identifiy bootlenecks regarding to your memory, disk or cpu sytem.
When you have identified some queries you want to omtimize, go to query analyzer and watch the execution plan and try to omtimize index usage and query design.
I have had good sucess with the Database Tuning tools provided inside SSMS or SQL Profiler when working on SQL Server 2000.
The key is to work with a GOOD sample set, track a portion of TRUE production workload for analsys, that will get the best overall bang for the buck.
I use the SQL Profiler that comes with SQL Server. Most of the poorly performing queries I've found are not using a lot of CPU but are generating a ton of disk IO.
I tend to put in filters on disk reads and look for queries that tend to do more than 20,000 or so reads. Then I look at the execution plan for those queries which usually gives you the information you need to optimize either the query or the indexes on the tables involved.
I use a few different techniques.
If you're trying to optimize a specific query, use Query Analyzer. Use the tools in there like displaying the execution plan, etc.
For your situation where you're not sure WHICH query is running slowly, one of the most powerful tools you can use is SQL Profiler.
Just pick the database you want to profile, and let it do its thing.
You need to let it run for a decent amount of time (this varies on traffic to your application) and then you can dump the results in a table and start analyzing them.
You are going to want to look at queries that have a lot of reads, or take up a lot of CPU time, etc.
Optimization is a bear, but keep going at it, and most importantly, don't assume you know where the bottleneck is, find proof of where it is and fix it.

What steps should be necessary to optimize a poorly performing query?

I know this is a broad question, but I've inherited several poor performers and need to optimize them badly. I was wondering what are the most common steps involved to optimize. So, what steps do some of you guys take when faced with the same situation?
Related Question:
What generic techniques can be applied to optimize SQL queries?
Look at the execution plan in query analyzer
See what step costs the most
Optimize the step!
Return to step 1 [thx to Vinko]
In SQL Server you can look at the Query Plan in Query Analyzer or Management Studio. This will tell you the rough percentage of time spent in each batch of statements. You'll want to look for the following:
Table scans; this means you are completely missing indexes
Index scans; your query may not be using the correct indexes
The thickness of the arrows between each step in a query tells you how many rows are being produced by that step, very thick arrows means you are processing a lot of rows, and can indicate that some joins need to be optimized.
Some other general tips:
A large amount of conditional statements, such as multiple if-else statements, can cause SQL Server to constantly rebuild the query plan. You can check for this using Profiler.
Make sure that different queries aren't blocking each other, such as an update statement blocking a select statement. This can be avoided by specifying the (nolock) hint in SQL Server select statements.
As others have mentioned, try out the Performance Tuning wizard in Management Studio.
Finally, I would highly recommend creating a set of load tests (using Visual Studio 2008 Test Edition), which you can use to simulate your application's behavior when dealing with a large amount of requests. Some SQL performance bottlenecks only manifest themselves under these circumstances, and being able to reproduce them makes it a lot easier to fix.
Indexes may be a good place to start...
The low hanging fruit can be knocked down with the SQL Server Index Tuning Wizard.
I'm not sure about other databases, but for SQL Server I recommend the Execution Plan. It very clearly (albeit with lots of vertical and horizontal scrolling, unless you've got a 400" monitor!) shows what steps of your query are sucking up the time.
If you've got one step that takes a crazy 80%, then maybe an index could be added, then after tweaking the index, re-run the Execution Plan to find your next biggest step.
After a couple tweaks you may find that there really are no steps that stand out from the others i.e. they're all 1-2% each. If that is the case, then you might then need to see if there is a way you can cut down the amount of data included in your query, do those four million closed sales orders need to be included in the "Active Sales Orders" query? No, so exclude all those with STATUS='C' ... or something like that.
Another improvement you'll see from the Execution Plan is bookmark lookups, basically it finds a match in the index, but then SQL Server has to quickly trawl through the table to find the record you want. This operation might at times take longer than just scanning the table in the first place would have, if that is the case, do you really need that index?
With indexes, and especially with SQL Server 2005 you should look to the INCLUDE clause, this basically allows you to have a column in an index without really being in the index, so if all the data you need for your query is in your index or is an included columnn then SQL Server doesn't have to even look at the table, a big performance pickup.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
The execution plan is a great start and will help you figure out what part of your query you need to tackle.
Once you figure out the where, it is time to tackle the how and why. Take a look at the type of queries you are trying to preform. Avoid loops at all cost as they are slow. Avoid cursors at all costs because they are slow. Stick to set based queries when ever possible.
There are ways to give sql hints on the type of joins to use if you are using joins. Be careful here though, while one hint may speed up your query once, it may slow down your query 10 fold the next time through depending on the data and parameters.
Finally, make sure your database is well indexed. A good place to start is any field that is contained in a where clause probably should have a index on it.
Look at the indexes on the tables that make the query. An indexes may be needed on particular fields that participate in the where clause. Also look at the fields used in the joins in the query (if joins exist). If indexes already exist, look at the type of index.
Failing that (because there are negatives to using locking hints) Look at locking hints and explicitly naming the index to use in the join. Using NOLOCKS is more obvious if you're getting a lot of deadlocked transactions.
Do what roman and Andy S mentioned first though.

Resources