I have a table on SQL Server with about 10 million rows. It has a nonclustered index ClearingInfo_idx which looks like:
I am running query which isn't using ClearingInfo_idx index and execution plan looks like this:
Can anyone explain why query optimizer chooses to scan clustered index ?
I think it suggests this index because you use a sharp search for the two columns immediate and clearingOrder_clearingOrderId. Those values are numbers, which were good to search. The column status is nvarchar which isn't the best for a search, and due to your search with in, SQL Server needs to search two of those values.
SQL Server would use the two number columns to get a faster result and searching in the status in the second round after the number of possible results is reduced due to the exact search on the two number columns.
Hopefully you get my opinion. :-) Otherwise, just ask again. :-)
As Luaan already pointed out, the likely reason the system prefers to scan the clustered index is because
you're asking for all fields to be returned (SELECT *), change this to fields that are present in the index ( = index fields + clustered index-fields) and you'll probably see it using just the index. If you'd need a couple of extra fields you can consider INCLUDEing those in the index.
the order of the index fields isn't very optimal. Additionally it might well be that the 'content' of the field isn't very helpful either. How many distinct values are present in the index-columns and how are they spread around? If you're WHERE covers 90% of the records there is very little reason to first create a (huge) list of keys and then go fetch those from the clustered index later on. Scanning the latter directly then makes much more sense.
Did you try the suggested index? Not sure what other queries run on the table, but for this particular query it seems like a valid replacement to me. If the replacement will satisfy the other queries is another question off course. Adding extra indexes might negatively impact your IUD operations and it will require more disk-space; there is no such thing as a free lunch =)
That said, if performance is an issue, have you considered a filtered index? (again, no such thing as a free lunch; it's all about priorities)
Related
we have a little problem with one of our queries, which is executed inside a .Net (4.5) application via System.Data.SqlClient.SqlCommand.
The problem is, that the query is going to perform a Table-Scan which is very slow. So the execution plan shows the Table-Scan here
Screenshot:
The details:
So the text shows, that the filter to Termine.Datum and Termine.EndDatum causing the Table-Scan. But why is the SQL-Server ignoring the Indexes? There are two indexes on Termine.Datum and Termine.EndDatum. We also tryed to add a third one with Datum and EndDatum combined.
The indexes are all non-clustered indexes and both fields are DateTime.
It decides on Table Scan based on Estimated number of rows 124844 where as your actual rows are only 831.
Optimizer thinks that to traverse 124844 it will better do scan in table instead of Index Seek.
Also need to check about other columns selected apart from Index. If you have selected other columns apart from Index it has to Do RID Lookup after doing index seek, Optimizer might think instead of RID lookup it preferred to go with Table Scan.
First fix: Update the statistics and provide enough information to optimizer to choose better plan.
Can you provide the full query? I see that you are pulling a range of data that span a range of 3 months. If this range is a high percentage of the dataset it might be scanning due to you attempting to return such a large percentage of the data. If the index is not selective enough it won't get picked up.
Also...
You have an OR clause in the filter. From looking at the predicate in the screenshot you provided it looks like you might be missing () around the two different filters. This might also lead to the scan.
One more thing...
OR clauses can sometimes lead to bad plans - an alternative is to split the query into two UNIONED queries each with the different OR in it. If you provide the query I should be able to give you a re-written version to show this.
I need to make queries such as
SELECT
Url, COUNT(*) AS requests, AVG(TS) AS avg_timeSpent
FROM
myTable
WHERE
Url LIKE '%/myController/%'
GROUP BY
Url
run as fast as possible.
The columns selected and grouped are almost always the same, being the difference, an extra column on the select and group by (the column tenantId)
What kind of index should I create to help me run this scenario?
Edit 1:
If I change my base query to '/myController/%' (note there's no % at the begging) would it be better?
This is a query that cannot be sped up with an index. The DBMS cannot know beforehand how many records will match the condition. It may be 100% or 0.001%. There is no clue for the DBMS to guess this. And access via an index only makes sense when a small percentage of rows gets selected.
Moreover, how can such an index be structured and useful? Think of a telephone book and you want to find all names that contain 'a' or 'rs' or 'ems' or whatever. How would you order the names in the book to find all these and all other thinkable letter combinations quickly? It simply cannot be done.
So the DBMS will read the whole table record for record, no matter whether you provide an index or not.
There may be one exception: With an index on URL and TS, you'd have both columns in the index. So the DBMS might decide to read the whole index rather than the whole table then. This may make sense for instance when the table has hundreds of columns or when the table is very fragmented or whatever. I don't know. A table is usually much easier to read sequentially than an index. You can still just try, of course. It doesn't really hurt to create an index. Either the DBMS uses it or not for a query.
Columnstore indexes can be quite fast at such tasks (aggregates on globals scans). But even they will have trouble handling a LIKE '%/mycontroler/%' predicate. I recommend you parse the URL once into an additional computed field that projects the extracted controller of your URL. But the truth is that looking at global time spent on a response URL reveals very little information. It will contain data since the beginning of time, long since obsolete by newer deployments, and not be able to capture recent trends. A filter based on time, say per hour or per day, now that is a very useful analysis. And such a filter can be excellently served by a columnstore, because of natural time order and segment elimination.
Based on your posted query you should have a index on Url column. In general columns which are involved in WHERE , HAVING, ORDER BY and JOIN ON condition should be indexed.
You should get the generated query plan for the said query and see where it's taking more time. Again based n the datatype of the Url column you may consider having a FULLTEXT index on that column
If I have a large table with:
varchar foo
integer foo_id
integer other_id
varchar other_field
And I might be doing queries like:
select * from table where other_id=x
obviously I need an index on other_id to avoid a table scan.
If I'm also doing:
select * from table where other_id=x and other_field='y'
Do I want another index on other_field or is that a waste if I never do:
select * from table where other_field='y'
i.e. I only use other_field with other_id together in a query.
Would a compound index of both [other_id, other_field] be better? Or would that cause a table scan for the 1st simple query?
Use EXPLAIN and EXPLAIN ANALYZE, if you are not using these two already. Once you understand query plan basics you'll be able to optimize database queries pretty effectively.
Now to the question - saying anything without knowing a bit about the values might be misleading. If there are not that many other_field values for any specific other_id, then a simple index other_id would be enough. If there are many other_field values (i.e. thousands), I would consider making the compound index.
Do I want another index on other_field or is that a waste if I never do:
Yes, that would be very probably waste of space. Postgres is able to combine two indexes, but the conditions must be just right for that.
Would a compound index of both [other_id, other_field] be better?
Might be.
Or would that cause a table scan for the 1st simple query?
Postgres is able to use multi-column index only for the first column (not exactly true - check answer comments).
The basic rule is - get a real data set, prepare queries you are trying to optimize. Run EXPLAIN ANALYZE on those queries. Try to rewrite them (i.e. joins instead of subselects or vice versa) and check the performance (EXPLAIN ANALYZE). Try to add indexes where you feel it might help and check the performance (EXPLAIN ANALYZE)... if it does not help, don't forget to drop the unnecessary index.
And if you are still having problems and your data set is big (tens of millions+), you might need to reconsider even running specific queries. A different approach might be needed (e.g. batch / async processing) or a different technology for the specific task.
If other_id is highly selective, then you might not need an index on other_field at all. If only a few rows match other_id=x in the index, looking at each of them to see if they also match other_field=y might be fast enough to not bother with more indexes.
If it turns out that you do need to make the query faster, then you almost surely want the compound index. The stand alone index on other_field is unlikely to help.
The accepted answer is not entirely accurate - if you need all three queries mentioned in your question, then you'll actually need two indexes.
Let's see which indexes satisfy which WHERE clause in your queries:
{other_id} {other_id, other_field} {other_field, other_id} {other_field}
other_id=x yes yes no no
other_id=x and other_field='y' partially yes yes partially
other_field='y' no no yes yes
So to satisfy all 3 WHERE clauses, you'll need:
either an index on {other_id} and a composite index on {other_field, other_id}
or an index on {other_field} and a composite index on {other_id, other_field}
or a composite index on {other_id, other_field} and a composite index on {other_field, other_id}.1
Depending on distribution of your data, you could also get away with {other_id} and {other_field}, but you should measure carefully before opting for that solution. Also, you may consider replacing * with a narrower set of fields and then covering them by indexes, but that's a whole other topic...
1 "Fatter" solution than the other two - consider only if you have specific covering needs.
On an MS-SQL 2012, does it makes sense to index a "Deleted" BIT field if one is going to always use it on the Queries (ie. SELECT xx FROM oo WHERE Deleted = 0)
Or does the fact that a field is BIT, already comes with some sort of auto index for performance issues??
When you index a bit field which consist of either 1,0 or some limited values, you are actually reducing the number of rows matching that value. For fewer records this may work well but for large number of data it may help you in performance gain.
You can include bit columns as part of compound index
Index on bit field could be really helpful in scenarios where there is a large discrepancy between the number of 0's and 1's, and you are searching for the the smaller of the two.
indexing a bit field will be pretty useless, under must conditions, because the selectivity is so low. An index scan on a large table is not going to be better than a table scan. If there are other conditions you can use to create filtered indices you could consider that.
If this field is changing the nature of the logic in such a way that you will always need to consider it in the predicate, you might consider splitting the data into other tables when reporting.
Whether to index a bit field depends on several factors which have been adequately explained in the answer to this question. Link to 231125
As others have mentioned, selectivity is the key. However, if you're always searching on one value or another and that value is highly selective, consider using a filtered index.
Why not put out on the front of your clustered index? If deletes are incremental, you'd have to turn your fill factor down, but they're probably daily, right? And you have way more deleted records than undeleted records? And, as you say, you only ever query undeleted records. So, yes. Don't just index that column. Cluster on it.
It can be useful as a part of composite index, when the bit-column is at the first position in the index. But if you suppose to use it only for selectitn one value (select .... where deleted=1 and another_key=?; but never deleted=0) then create index on another_key with filter:
create index i_another on t(another_key) where deleted=1
If the bit-column should be the last in the composite index then the occurrence in index is useless. However You can include it for better performace:
create index i_another on t(another_key) include(deleted)
Then the DB engine gets the value along with reading index and doesn't need to pick up it from base table page.
Suppose I have a database table with two fields, "foo" and "bar". Neither of them are unique, but each of them are indexed. However, rather than being indexed together, they each have a separate index.
Now suppose I perform a query such as SELECT * FROM sometable WHERE foo='hello' AND bar='world'; My table a huge number of rows for which foo is 'hello' and a small number of rows for which bar is 'world'.
So the most efficient thing for the database server to do under the hood is use the bar index to find all fields where bar is 'world', then return only those rows for which foo is 'hello'. This is O(n) where n is the number of rows where bar is 'world'.
However, I imagine it's possible that the process would happen in reverse, where the fo index was used and the results searched. This would be O(m) where m is the number of rows where foo is 'hello'.
So is Oracle smart enough to search efficiently here? What about other databases? Or is there some way I can tell it in my query to search in the proper order? Perhaps by putting bar='world' first in the WHERE clause?
Oracle will almost certainly use the most selective index to drive the query, and you can check that with the explain plan.
Furthermore, Oracle can combine the use of both indexes in a couple of ways -- it can convert btree indexes to bitmaps and perform a bitmap ANd operation on them, or it can perform a hash join on the rowid's returned by the two indexes.
One important consideration here might be any correlation between the values being queried. If foo='hello' accounts for 80% of values in the table and bar='world' accounts for 10%, then Oracle is going to estimate that the query will return 0.8*0.1= 8% of the table rows. However this may not be correct - the query may actually return 10% of the rwos or even 0% of the rows depending on how correlated the values are. Now, depending on the distribution of those rows throughout the table it may not be efficient to use an index to find them. You may still need to access (say) 70% or the table blocks to retrieve the required rows (google for "clustering factor"), in which case Oracle is going to perform a ful table scan if it gets the estimation correct.
In 11g you can collect multicolumn statistics to help with this situation I believe. In 9i and 10g you can use dynamic sampling to get a very good estimation of the number of rows to be retrieved.
To get the execution plan do this:
explain plan for
SELECT *
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Contrast that with:
explain plan for
SELECT /*+ dynamic_sampling(4) */
*
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Eli,
In a comment you wrote:
Unfortunately, I have a table with lots of columns each with their own index. Users can query any combination of fields, so I can't efficiently create indexes on each field combination. But if I did only have two fields needing indexes, I'd completely agree with your suggestion to use two indexes. – Eli Courtwright (Sep 29 at 15:51)
This is actually rather crucial information. Sometimes programmers outsmart themselves when asking questions. They try to distill the question down to the seminal points but quite often over simplify and miss getting the best answer.
This scenario is precisely why bitmap indexes were invented -- to handle the times when unknown groups of columns would be used in a where clause.
Just in case someone says that BMIs are for low cardinality columns only and may not apply to your case. Low is probably not as small as you think. The only real issue is concurrency of DML to the table. Must be single threaded or rare for this to work.
Yes, you can give "hints" with the query to Oracle. These hints are disguised as comments ("/* HINT */") to the database and are mainly vendor specific. So one hint for one database will not work on an other database.
I would use index hints here, the first hint for the small table. See here.
On the other hand, if you often search over these two fields, why not create an index on these two? I do not have the right syntax, but it would be something like
CREATE INDEX IX_BAR_AND_FOO on sometable(bar,foo);
This way data retrieval should be pretty fast. And in case the concatenation is unique hten you simply create a unique index which should be lightning fast.
First off, I'll assume that you are talking about nice, normal, standard b*-tree indexes. The answer for bitmap indexes is radically different. And there are lots of options for various types of indexes in Oracle that may or may not change the answer.
At a minimum, if the optimizer is able to determine the selectivity of a particular condition, it will use the more selective index (i.e. the index on bar). But if you have skewed data (there are N values in the column bar but the selectivity of any particular value is substantially more or less than 1/N of the data), you would need to have a histogram on the column in order to tell the optimizer which values are more or less likely. And if you are using bind variables (as all good OLTP developers should), depending on the Oracle version, you may have issues with bind variable peeking.
Potentially, Oracle could even do an on the fly conversion of the two b*-tree indexes to bitmaps and combine the bitmaps in order to use both indexes to find the rows it needs to retrieve. But this is a rather unusual query plan, particularly if there are only two columns where one column is highly selective.
So is Oracle smart enough to search
efficiently here?
The simple answer is "probably". There are lots'o' very bright people at each of the database vendors working on optimizing the query optimizer, so it's probably doing things that you haven't even thought of. And if you update the statistics, it'll probably do even more.
I'm sure you can also have Oracle display a query plan so you can see exactly which index is used first.
The best approach would be to add foo to bar's index, or add bar to foo's index (or both). If foo's index also contains an index on bar, that additional indexing level will not affect the utility of the foo index in any current uses of that index, nor will it appreciably affect the performance of maintaining that index, but it will give the database additional information to work with in optimizing queries such as in the example.
It's better than that.
Index Seeks are always quicker than full table scans. So behind the scenes Oracle (and SQL server for that matter) will first locate the range of rows on both indices. It will then look at which range is shorter (seeing that it's an inner join), and it will iterate the shorter range to find the matches with the larger of the two.
You can provide hints as to which index to use. I'm not familiar with Oracle, but in Mysql you can use USE|IGNORE|FORCE_INDEX (see here for more details). For best performance though you should use a combined index.