I have a search procedure that is being passed around 15-20 (optional) parameters and the search procedure calls their respective functions to check if the value passed in parameter exists in the database. So, it is basically a Search structure based on a number of parameters.
Now, since the database is going to have millions of records, I expect the simple plain search procedure to fail right away. What are the ways that can improve query performance?
What I have tried so far:
Clustered index on FirstName column of database (as I expect it to be used very frequently)
Non Clustered index on rest of the columns that are basis of the user search and also the include keyword.
Note:
I am looking for more ways to optimize my queries.
Most of the queries are nothing but select statements checked against a condition.
One of the queries uses GroupBy clause.
I have also created a temporary table in which I am inserting all the matched entries.
First Run the query from Sql Server Management Studio and look at the query plan to see where the bottle neck is. Any place you see a "table scan" or "index scan" it has to go through all data to find what it is looking for. If you create appropriate indexes that can be used for these operations it should increase performance.
Below Listed are some tips for improving the performance of sql query..
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries.
Eliminate Cursors from the Query
Try to remove cursors from the query and use set-based query; set-based query is more efficient than cursor-based. If there is a need to use cursor than avoid dynamic cursors as it tends to limit the choice of plans available to the query optimizer. For example, dynamic cursor limits the optimizer to using nested loop joins.
Avoid Use of Non-correlated Scalar Sub Query
You can re-write your query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.
Avoid Multi-statement Table Valued Functions (TVFs)
Multi-statement TVFs are more costly than inline TFVs. SQL Server expands inline TFVs into the main query like it expands views but evaluates multi-statement TVFs in a separate context from the main query and materializes the results of multi-statement into temporary work tables. The separate context and work table make multi-statement TVFs costly.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned.
Position a Column in an Index
Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.
Drop Unused Indexes
Dropping unused indexes can help to speed up data modifications without affecting data retrieval. Also, you need to define a strategy for batch processes that run infrequently and use certain indexes. In such cases, creating indexes in advance of batch processes and then dropping them when the batch processes are done helps to reduce the overhead on the database.
Statistic Creation and Updates
You need to take care of statistic creation and regular updates for computed columns and multi-columns referred in the query; the query optimizer uses information about the distribution of values in one or more columns of a table statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high-quality query plan.
Revisit Your Schema Definitions
Last but not least, revisit your schema definitions; keep on eye out that appropriate FORIGEN KEY, NOT NULL and CEHCK constraints are in place or not. Availability of the right constraint on the right place always helps to improve the query performance, like FORIGEN KEY constraint helps to simplify joins by converting some outer or semi-joins to inner joins and CHECK constraint also helps a bit by removing unnecessary or redundant predicates.
Reference
Related
I have a number of horribly large queries which work ok on small databases but when the volume of data gets larger then the performance of these queries get slower. They are badly designed really and we must address that really. These queries have a very large number of LEFT OUTER JOINS. I note that when the number of LEFT OUTER JOINS goes past 10 then performance gets logarithmically slower each time a new join is added. If I put a OPTION (FAST 1) at the end of my query then the results appear nearly immediately. Of course I do not want to use this as it firstly, it is not going to help all of the time (if it did then every query would have it) and secondly I want to know how to optimise these joins better. When I run the query without the OPTION set then the execution plan shows a number of nested loops on my LEFT OUTER JOINS are showing a high percentage cost, but with the option off it does not. How can I find out what it does to speed the query up so I can reflect it in the query ?
I cannot get the query nor the execution plans today as the server I am on does not let me copy data from it. If they are needed for this I can arrange to get them sent but will take some time, in the morning.
Your comments would be really interesting to know.
Kind regards,
Derek.
You can set column to primary key and column automatically will be Clustered Index.
Clustered Index->Benefit and Drawback
Benefit: Performance boost if implemented correctly
Drawback: requires understanding of clustered/non clustered indexes and storage implications
Note: varchar foreign keys can lead to poor performance as well. Change the base table to have an integer primary key instead.
And also
I would suggest to use database paging(f.e. via ROW_NUMBER function) to partition your result set and query only the data you want to show(f.e. 20 rows per page in a GridView).
we have a little problem with one of our queries, which is executed inside a .Net (4.5) application via System.Data.SqlClient.SqlCommand.
The problem is, that the query is going to perform a Table-Scan which is very slow. So the execution plan shows the Table-Scan here
Screenshot:
The details:
So the text shows, that the filter to Termine.Datum and Termine.EndDatum causing the Table-Scan. But why is the SQL-Server ignoring the Indexes? There are two indexes on Termine.Datum and Termine.EndDatum. We also tryed to add a third one with Datum and EndDatum combined.
The indexes are all non-clustered indexes and both fields are DateTime.
It decides on Table Scan based on Estimated number of rows 124844 where as your actual rows are only 831.
Optimizer thinks that to traverse 124844 it will better do scan in table instead of Index Seek.
Also need to check about other columns selected apart from Index. If you have selected other columns apart from Index it has to Do RID Lookup after doing index seek, Optimizer might think instead of RID lookup it preferred to go with Table Scan.
First fix: Update the statistics and provide enough information to optimizer to choose better plan.
Can you provide the full query? I see that you are pulling a range of data that span a range of 3 months. If this range is a high percentage of the dataset it might be scanning due to you attempting to return such a large percentage of the data. If the index is not selective enough it won't get picked up.
Also...
You have an OR clause in the filter. From looking at the predicate in the screenshot you provided it looks like you might be missing () around the two different filters. This might also lead to the scan.
One more thing...
OR clauses can sometimes lead to bad plans - an alternative is to split the query into two UNIONED queries each with the different OR in it. If you provide the query I should be able to give you a re-written version to show this.
I have an sql server 2008 database along with 30000000000 records in one of its major tables. Now we are looking for the performance for our queries. We have done with all indexes. I found that we can split our database tables into multiple partitions, so that the data will be spread over multiple files, and it will increase the performance of the queries.
But unfortunatly this functionality is only available in the sql server enterprise edition, which is unaffordable for us.
Is there any way to opimize for the query performance? For example, the query
select * from mymajortable where date between '2000/10/10' and '2010/10/10'
takes around 15 min to retrieve around 10000 records.
A SELECT * will obviously be less efficiently served than a query that uses a covering index.
First step: examine the query plan and look for and table scans and the steps taking the most effort(%)
If you don’t already have an index on your ‘date’ column, you certainly need one (assuming sufficient selectivity). Try to reduce the columns in the select list, and if ‘sufficiently’ few, add these to the index as included columns (this can eliminate bookmark lookups into the clustered index and boost performance).
You could break your data up into separate tables (say by a date range) and combine via a view.
It is also very dependent on your hardware (# cores, RAM, I/O subsystem speed, network bandwidth)
Suggest you post your table and index definitions.
First always avoid Select * as that will cause the select to fetch all columns and if there is an index with just the columns you need you are fetching a lot of unnecessary data. Using only the exact columns you need to retrieve lets the server make better use of indexes.
Secondly, have a look on included columns for your indexes, that way often requested data can be included in the index to avoid having to fetch rows.
Third, you might try to use an int column for the date and convert the date into an int. Ints are usually more effective in range searches than dates, especially if you have time information to and if you can skip the time information the index will be smaller.
One more thing to check for is the Execution plan the server uses, you can see this in management studio if you enable show execution plan in the menu. It can indicate where the problem lies, you can see which indexes it tries to use and sometimes it will suggest new indexes to add.
It can also indicate other problems, Table Scan or Index Scan is bad as it indicates that it has to scan through the whole table or index while index seek is good.
It is a good source to understand how the server works.
If you add an index on date, you will probably speed up your query due to an index seek + key lookup instead of a clustered index scan, but if your filter on date will return too many records the index will not help you at all because the key lookup is executed for each result of the index seek. SQL server will then switch to a clustered index scan.
To get the best performance you need to create a covering index, that is, include all you columns you need in the "included columns" part of your index, but that will not help you if you use the select *
another issue with the select * approach is that you can't use the cache or the execution plans in an efficient way. If you really need all columns, make sure you specify all the columns instead of the *.
You should also fully quallify the object name to make sure your plan is reusable
you might consider creating an archive database, and move anything after, say, 10-20 years into the archive database. this should drastically speed up your primary production database but retains all of your historical data for reporting needs.
What type of queries are we talking about?
Is this a production table? If yes, look into normalizing a bit more and see if you cannot go a bit further as far as normalizing the DB.
If this is for reports, including a lot of Ad Hoc report queries, this screams data warehouse.
I would create a DW with seperate pre-processed reports which include all the calculation and aggregation you could expect.
I am a bit worried about a business model which involves dealing with BIG data but does not generate enough revenue or even attract enough venture investment to upgrade to enterprise.
SELECT o.oxxxxID,
m.mxxxx,
txxxx,
exxxxName,
paxxxxe,
fxxxxe,
pxxxx,
axxxx,
nxxxx,
nxxxx,
nxxxx,
ixxxx,
CONVERT(VARCHAR, o.dateCreated, 103)
FROM Offer o INNER JOIN Mxxxx m ON o.mxxxxID = m.mxxxxID
INNER JOIN EXXXX e ON e.exxxxID = o.exxxxID
INNER JOIN PXXXX p ON p.pxxxxxxID = o.pxxxxID
INNER JOIN Fxxxx f ON f.fxxxxxID = o.fxxxxxID
WHERE o.cxxxxID = 11
The above query is expected to be executed via website by approximately 1000 visitors daily. Is it badly written and has a high chance to cause lack of performance? If yes, can you please suggest me how to improve it.
NOTE: every table has only one index (Primary key).
Looks good to me.
Now for the performance piece you need to make sure you have the proper indexes covering the columns you are filtering and joining (Foreign Keys, etc).
A good start would be to do an Actual Execution Plan or, the easy route, run it against the Indexing Tunning Wizard.
The actual execution plan in SQL 2008 (perhaps 2005 as well) will give you missing indexes hints already on the top.
It's hard to tell without knowing the content of the data, but it looks like a perfectly valid SQL statement. The many joins will likely degrade performance a bit, but you can use a fw strategies for improving performance... I have a few ideas.
indexed views can often improve performance
stored procedures will optomize the query for you and save the optomized query
or if possible, create a one-off table that's not live, but contains the data from this statement, only in a non-normalized format. This on-off table would need tp be updated regularly, but you can get some huge performance boosts using this strategy if it's possible in your situation.
For general performance issues and ideas, this is a good place to start, if you haven't alredy: http://msdn.microsoft.com/en-us/library/ff647793.aspx
This one is very good as well: http://technet.microsoft.com/en-us/magazine/2006.01.boostperformance.aspx
That would depend mostly on the keys and indexes defined on the tables. If you could provide those a better answer could be given. While the query looks ok (other than the xxx's in all the names), if you're joining on fields with no indexes, or there field in the where clause has no index then you may run into performance issues on larger data sets.
It looks pretty good to me. Probably the only improvement I might make is to output o.datecreated as is and let the client format it.
You could also add indexes to the join columns.
There may also be a potential to create an indexed view if performance is an issue and space isn't.
Actually, your query looks perfectly good written. The only point that we can't know if it can be improved is the existence of indexes and keys on the columns that you are using on the JOINS and the WHERE statement. Other than that, I don't see anything that can be improved.
If you only have single indexes on the primary keys, then it is unlikely the indexes will be covering for all the data output in your select statement. So what will happen is that the query can efficiently locate the rows for each primary key but it will need to use bookmark lookups to find the data rows and extract the additional columns.
So, although the query itself is probably fine (except for the date conversion) as long as all these columns are truly needed in the output, the execution plan could probably be improved by adding additional columns to your indexes. A clustered index key is not allowed to have included columns, and this is probably also your primary key enforcement, and you are unlikely to want to add other columns to your primary key, so this would mean creating an additional non-clustered index with the PK column first and then including additional columns.
At this point the indexes will cover the query and it will not need to do the bookmark lookups. Note that the indexes need to support the most common usage scenarios and that the more indexes you add, the slower your write performance will be, since all the indexes will need to be updated.
In addition, you might also want to review your constraints, since these can be used by the optimizer to eliminate joins if a table is not used for any output columns when the optimizer can determine there will not be an outer join or cross join which would eliminate or multiply rows.
What methods are there for identifying superfluous columns in covering indices: columns which are never searched against, and therefore may be extracted into Includes, or even removed completely without affecting the applicability of the index?
To clarify things
The idea of a covering index is that it also includes columns which may not be searched by (used in the WHERE clause and such) but may be selected (part of the SELECT columns list).
There doesn't seem to be any easy way to assert the existence of unused colums in a covering index. I can only think of a painstaking process below:
For a representative period of time, record all queries being run on the server (or on the table desired)
Filter out (through regular expression) queries not involving the underlying table
For remaining queries, obtain the query plan; discard queries not involving the index in question
For the remaining queries, or rather for each "template" of query (many queries are same but for the search criteria values), make the list of the columns from the index that are either in select or where clause (or in JOIN...)
the columns from the index not found in that list are positively good to go.
Now, there may be a few more [columns to remove] because the process above doesn't check in which context the covering index is used (it is possible that it be used for resolving the where, but that the underlying table is still accessed as well (for example to get to columns not in the covering index...)
The above clinical approach is rather unattractive. An analytical approach may be preferable:
Find all queries "templates" that may be used in all the applications using the server. For each of these patterns, find the ones which may be using the covering index. These are (again a few holes...) queries that:
include a reference to the underlying table
do not cite in any way a column from the underlying table that is not a column in the index
do not use a search criteria from the underlying table that is more selective that the columns of the index (in their very order...)
Or... without even going to the applications: think of all the use cases, and if queries that would serve these cases would benefit of not from all columns in the index. Doing so would imply that you have a relatively good idea of the selectivity of the index, regarding its first few columns.
If you do audits of your use cases and data points, obviously anything that isn't used or caught in the audit is a candidate for deletion. If the database lacks such a thorough audit, you can save a time-window's worth of queries that hit the database by running a trace and saving it. You can analyze the trace and see what type of queries are hitting the database and from there intuit which columns can be dropped.
Trace analysis is typically used to find candidates for missing indices, but I'm guessing that it could be also used to analyze usage trends.