SELECT o.oxxxxID,
m.mxxxx,
txxxx,
exxxxName,
paxxxxe,
fxxxxe,
pxxxx,
axxxx,
nxxxx,
nxxxx,
nxxxx,
ixxxx,
CONVERT(VARCHAR, o.dateCreated, 103)
FROM Offer o INNER JOIN Mxxxx m ON o.mxxxxID = m.mxxxxID
INNER JOIN EXXXX e ON e.exxxxID = o.exxxxID
INNER JOIN PXXXX p ON p.pxxxxxxID = o.pxxxxID
INNER JOIN Fxxxx f ON f.fxxxxxID = o.fxxxxxID
WHERE o.cxxxxID = 11
The above query is expected to be executed via website by approximately 1000 visitors daily. Is it badly written and has a high chance to cause lack of performance? If yes, can you please suggest me how to improve it.
NOTE: every table has only one index (Primary key).
Looks good to me.
Now for the performance piece you need to make sure you have the proper indexes covering the columns you are filtering and joining (Foreign Keys, etc).
A good start would be to do an Actual Execution Plan or, the easy route, run it against the Indexing Tunning Wizard.
The actual execution plan in SQL 2008 (perhaps 2005 as well) will give you missing indexes hints already on the top.
It's hard to tell without knowing the content of the data, but it looks like a perfectly valid SQL statement. The many joins will likely degrade performance a bit, but you can use a fw strategies for improving performance... I have a few ideas.
indexed views can often improve performance
stored procedures will optomize the query for you and save the optomized query
or if possible, create a one-off table that's not live, but contains the data from this statement, only in a non-normalized format. This on-off table would need tp be updated regularly, but you can get some huge performance boosts using this strategy if it's possible in your situation.
For general performance issues and ideas, this is a good place to start, if you haven't alredy: http://msdn.microsoft.com/en-us/library/ff647793.aspx
This one is very good as well: http://technet.microsoft.com/en-us/magazine/2006.01.boostperformance.aspx
That would depend mostly on the keys and indexes defined on the tables. If you could provide those a better answer could be given. While the query looks ok (other than the xxx's in all the names), if you're joining on fields with no indexes, or there field in the where clause has no index then you may run into performance issues on larger data sets.
It looks pretty good to me. Probably the only improvement I might make is to output o.datecreated as is and let the client format it.
You could also add indexes to the join columns.
There may also be a potential to create an indexed view if performance is an issue and space isn't.
Actually, your query looks perfectly good written. The only point that we can't know if it can be improved is the existence of indexes and keys on the columns that you are using on the JOINS and the WHERE statement. Other than that, I don't see anything that can be improved.
If you only have single indexes on the primary keys, then it is unlikely the indexes will be covering for all the data output in your select statement. So what will happen is that the query can efficiently locate the rows for each primary key but it will need to use bookmark lookups to find the data rows and extract the additional columns.
So, although the query itself is probably fine (except for the date conversion) as long as all these columns are truly needed in the output, the execution plan could probably be improved by adding additional columns to your indexes. A clustered index key is not allowed to have included columns, and this is probably also your primary key enforcement, and you are unlikely to want to add other columns to your primary key, so this would mean creating an additional non-clustered index with the PK column first and then including additional columns.
At this point the indexes will cover the query and it will not need to do the bookmark lookups. Note that the indexes need to support the most common usage scenarios and that the more indexes you add, the slower your write performance will be, since all the indexes will need to be updated.
In addition, you might also want to review your constraints, since these can be used by the optimizer to eliminate joins if a table is not used for any output columns when the optimizer can determine there will not be an outer join or cross join which would eliminate or multiply rows.
Related
I have a number of horribly large queries which work ok on small databases but when the volume of data gets larger then the performance of these queries get slower. They are badly designed really and we must address that really. These queries have a very large number of LEFT OUTER JOINS. I note that when the number of LEFT OUTER JOINS goes past 10 then performance gets logarithmically slower each time a new join is added. If I put a OPTION (FAST 1) at the end of my query then the results appear nearly immediately. Of course I do not want to use this as it firstly, it is not going to help all of the time (if it did then every query would have it) and secondly I want to know how to optimise these joins better. When I run the query without the OPTION set then the execution plan shows a number of nested loops on my LEFT OUTER JOINS are showing a high percentage cost, but with the option off it does not. How can I find out what it does to speed the query up so I can reflect it in the query ?
I cannot get the query nor the execution plans today as the server I am on does not let me copy data from it. If they are needed for this I can arrange to get them sent but will take some time, in the morning.
Your comments would be really interesting to know.
Kind regards,
Derek.
You can set column to primary key and column automatically will be Clustered Index.
Clustered Index->Benefit and Drawback
Benefit: Performance boost if implemented correctly
Drawback: requires understanding of clustered/non clustered indexes and storage implications
Note: varchar foreign keys can lead to poor performance as well. Change the base table to have an integer primary key instead.
And also
I would suggest to use database paging(f.e. via ROW_NUMBER function) to partition your result set and query only the data you want to show(f.e. 20 rows per page in a GridView).
I have a search procedure that is being passed around 15-20 (optional) parameters and the search procedure calls their respective functions to check if the value passed in parameter exists in the database. So, it is basically a Search structure based on a number of parameters.
Now, since the database is going to have millions of records, I expect the simple plain search procedure to fail right away. What are the ways that can improve query performance?
What I have tried so far:
Clustered index on FirstName column of database (as I expect it to be used very frequently)
Non Clustered index on rest of the columns that are basis of the user search and also the include keyword.
Note:
I am looking for more ways to optimize my queries.
Most of the queries are nothing but select statements checked against a condition.
One of the queries uses GroupBy clause.
I have also created a temporary table in which I am inserting all the matched entries.
First Run the query from Sql Server Management Studio and look at the query plan to see where the bottle neck is. Any place you see a "table scan" or "index scan" it has to go through all data to find what it is looking for. If you create appropriate indexes that can be used for these operations it should increase performance.
Below Listed are some tips for improving the performance of sql query..
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries.
Eliminate Cursors from the Query
Try to remove cursors from the query and use set-based query; set-based query is more efficient than cursor-based. If there is a need to use cursor than avoid dynamic cursors as it tends to limit the choice of plans available to the query optimizer. For example, dynamic cursor limits the optimizer to using nested loop joins.
Avoid Use of Non-correlated Scalar Sub Query
You can re-write your query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.
Avoid Multi-statement Table Valued Functions (TVFs)
Multi-statement TVFs are more costly than inline TFVs. SQL Server expands inline TFVs into the main query like it expands views but evaluates multi-statement TVFs in a separate context from the main query and materializes the results of multi-statement into temporary work tables. The separate context and work table make multi-statement TVFs costly.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned.
Position a Column in an Index
Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.
Drop Unused Indexes
Dropping unused indexes can help to speed up data modifications without affecting data retrieval. Also, you need to define a strategy for batch processes that run infrequently and use certain indexes. In such cases, creating indexes in advance of batch processes and then dropping them when the batch processes are done helps to reduce the overhead on the database.
Statistic Creation and Updates
You need to take care of statistic creation and regular updates for computed columns and multi-columns referred in the query; the query optimizer uses information about the distribution of values in one or more columns of a table statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high-quality query plan.
Revisit Your Schema Definitions
Last but not least, revisit your schema definitions; keep on eye out that appropriate FORIGEN KEY, NOT NULL and CEHCK constraints are in place or not. Availability of the right constraint on the right place always helps to improve the query performance, like FORIGEN KEY constraint helps to simplify joins by converting some outer or semi-joins to inner joins and CHECK constraint also helps a bit by removing unnecessary or redundant predicates.
Reference
If I have a large table with:
varchar foo
integer foo_id
integer other_id
varchar other_field
And I might be doing queries like:
select * from table where other_id=x
obviously I need an index on other_id to avoid a table scan.
If I'm also doing:
select * from table where other_id=x and other_field='y'
Do I want another index on other_field or is that a waste if I never do:
select * from table where other_field='y'
i.e. I only use other_field with other_id together in a query.
Would a compound index of both [other_id, other_field] be better? Or would that cause a table scan for the 1st simple query?
Use EXPLAIN and EXPLAIN ANALYZE, if you are not using these two already. Once you understand query plan basics you'll be able to optimize database queries pretty effectively.
Now to the question - saying anything without knowing a bit about the values might be misleading. If there are not that many other_field values for any specific other_id, then a simple index other_id would be enough. If there are many other_field values (i.e. thousands), I would consider making the compound index.
Do I want another index on other_field or is that a waste if I never do:
Yes, that would be very probably waste of space. Postgres is able to combine two indexes, but the conditions must be just right for that.
Would a compound index of both [other_id, other_field] be better?
Might be.
Or would that cause a table scan for the 1st simple query?
Postgres is able to use multi-column index only for the first column (not exactly true - check answer comments).
The basic rule is - get a real data set, prepare queries you are trying to optimize. Run EXPLAIN ANALYZE on those queries. Try to rewrite them (i.e. joins instead of subselects or vice versa) and check the performance (EXPLAIN ANALYZE). Try to add indexes where you feel it might help and check the performance (EXPLAIN ANALYZE)... if it does not help, don't forget to drop the unnecessary index.
And if you are still having problems and your data set is big (tens of millions+), you might need to reconsider even running specific queries. A different approach might be needed (e.g. batch / async processing) or a different technology for the specific task.
If other_id is highly selective, then you might not need an index on other_field at all. If only a few rows match other_id=x in the index, looking at each of them to see if they also match other_field=y might be fast enough to not bother with more indexes.
If it turns out that you do need to make the query faster, then you almost surely want the compound index. The stand alone index on other_field is unlikely to help.
The accepted answer is not entirely accurate - if you need all three queries mentioned in your question, then you'll actually need two indexes.
Let's see which indexes satisfy which WHERE clause in your queries:
{other_id} {other_id, other_field} {other_field, other_id} {other_field}
other_id=x yes yes no no
other_id=x and other_field='y' partially yes yes partially
other_field='y' no no yes yes
So to satisfy all 3 WHERE clauses, you'll need:
either an index on {other_id} and a composite index on {other_field, other_id}
or an index on {other_field} and a composite index on {other_id, other_field}
or a composite index on {other_id, other_field} and a composite index on {other_field, other_id}.1
Depending on distribution of your data, you could also get away with {other_id} and {other_field}, but you should measure carefully before opting for that solution. Also, you may consider replacing * with a narrower set of fields and then covering them by indexes, but that's a whole other topic...
1 "Fatter" solution than the other two - consider only if you have specific covering needs.
here is the query I am stuck with:
SELECT *
FROM customers
WHERE salesmanid = #salesrep
OR telephonenum IN (SELECT telephonenum
FROM salesmancustomers
WHERE salesmanname = #salesrepname)
ORDER BY customernum
It is SLOW and crushing my CPU at 99%. I know an index would help but not sure what kind or if it should be 2 indexes or 1 with both columns included.
Three indexes probably each on a single column. This is assuming that your queries are all quite selective relative to the size of the tables.
It would help if you told us what your table schemas are along with details of existing indexes (Your PKs will get a clustered index by default if you don't specify otherwise) and some details about size of tables / selectivity.
Customers
SalesmanId
TelephoneNum
SalesmanCustomers
SalesmanName
Take a look at the Query Execution Plan and see if there are any table scans going on. This will help you identify what indexes you need.
I suppose that in addition to columns suggested by #Martin, an index on CustomerNum is also required since its used in order by clause.
If you have lots of records, the OrderBy is something which takes a lot of time. You can also try to run query without the orderby and see how much time it takes.
Suppose I have a database table with two fields, "foo" and "bar". Neither of them are unique, but each of them are indexed. However, rather than being indexed together, they each have a separate index.
Now suppose I perform a query such as SELECT * FROM sometable WHERE foo='hello' AND bar='world'; My table a huge number of rows for which foo is 'hello' and a small number of rows for which bar is 'world'.
So the most efficient thing for the database server to do under the hood is use the bar index to find all fields where bar is 'world', then return only those rows for which foo is 'hello'. This is O(n) where n is the number of rows where bar is 'world'.
However, I imagine it's possible that the process would happen in reverse, where the fo index was used and the results searched. This would be O(m) where m is the number of rows where foo is 'hello'.
So is Oracle smart enough to search efficiently here? What about other databases? Or is there some way I can tell it in my query to search in the proper order? Perhaps by putting bar='world' first in the WHERE clause?
Oracle will almost certainly use the most selective index to drive the query, and you can check that with the explain plan.
Furthermore, Oracle can combine the use of both indexes in a couple of ways -- it can convert btree indexes to bitmaps and perform a bitmap ANd operation on them, or it can perform a hash join on the rowid's returned by the two indexes.
One important consideration here might be any correlation between the values being queried. If foo='hello' accounts for 80% of values in the table and bar='world' accounts for 10%, then Oracle is going to estimate that the query will return 0.8*0.1= 8% of the table rows. However this may not be correct - the query may actually return 10% of the rwos or even 0% of the rows depending on how correlated the values are. Now, depending on the distribution of those rows throughout the table it may not be efficient to use an index to find them. You may still need to access (say) 70% or the table blocks to retrieve the required rows (google for "clustering factor"), in which case Oracle is going to perform a ful table scan if it gets the estimation correct.
In 11g you can collect multicolumn statistics to help with this situation I believe. In 9i and 10g you can use dynamic sampling to get a very good estimation of the number of rows to be retrieved.
To get the execution plan do this:
explain plan for
SELECT *
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Contrast that with:
explain plan for
SELECT /*+ dynamic_sampling(4) */
*
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Eli,
In a comment you wrote:
Unfortunately, I have a table with lots of columns each with their own index. Users can query any combination of fields, so I can't efficiently create indexes on each field combination. But if I did only have two fields needing indexes, I'd completely agree with your suggestion to use two indexes. – Eli Courtwright (Sep 29 at 15:51)
This is actually rather crucial information. Sometimes programmers outsmart themselves when asking questions. They try to distill the question down to the seminal points but quite often over simplify and miss getting the best answer.
This scenario is precisely why bitmap indexes were invented -- to handle the times when unknown groups of columns would be used in a where clause.
Just in case someone says that BMIs are for low cardinality columns only and may not apply to your case. Low is probably not as small as you think. The only real issue is concurrency of DML to the table. Must be single threaded or rare for this to work.
Yes, you can give "hints" with the query to Oracle. These hints are disguised as comments ("/* HINT */") to the database and are mainly vendor specific. So one hint for one database will not work on an other database.
I would use index hints here, the first hint for the small table. See here.
On the other hand, if you often search over these two fields, why not create an index on these two? I do not have the right syntax, but it would be something like
CREATE INDEX IX_BAR_AND_FOO on sometable(bar,foo);
This way data retrieval should be pretty fast. And in case the concatenation is unique hten you simply create a unique index which should be lightning fast.
First off, I'll assume that you are talking about nice, normal, standard b*-tree indexes. The answer for bitmap indexes is radically different. And there are lots of options for various types of indexes in Oracle that may or may not change the answer.
At a minimum, if the optimizer is able to determine the selectivity of a particular condition, it will use the more selective index (i.e. the index on bar). But if you have skewed data (there are N values in the column bar but the selectivity of any particular value is substantially more or less than 1/N of the data), you would need to have a histogram on the column in order to tell the optimizer which values are more or less likely. And if you are using bind variables (as all good OLTP developers should), depending on the Oracle version, you may have issues with bind variable peeking.
Potentially, Oracle could even do an on the fly conversion of the two b*-tree indexes to bitmaps and combine the bitmaps in order to use both indexes to find the rows it needs to retrieve. But this is a rather unusual query plan, particularly if there are only two columns where one column is highly selective.
So is Oracle smart enough to search
efficiently here?
The simple answer is "probably". There are lots'o' very bright people at each of the database vendors working on optimizing the query optimizer, so it's probably doing things that you haven't even thought of. And if you update the statistics, it'll probably do even more.
I'm sure you can also have Oracle display a query plan so you can see exactly which index is used first.
The best approach would be to add foo to bar's index, or add bar to foo's index (or both). If foo's index also contains an index on bar, that additional indexing level will not affect the utility of the foo index in any current uses of that index, nor will it appreciably affect the performance of maintaining that index, but it will give the database additional information to work with in optimizing queries such as in the example.
It's better than that.
Index Seeks are always quicker than full table scans. So behind the scenes Oracle (and SQL server for that matter) will first locate the range of rows on both indices. It will then look at which range is shorter (seeing that it's an inner join), and it will iterate the shorter range to find the matches with the larger of the two.
You can provide hints as to which index to use. I'm not familiar with Oracle, but in Mysql you can use USE|IGNORE|FORCE_INDEX (see here for more details). For best performance though you should use a combined index.