SQL Server ignoring index and performs table scan - sql-server

we have a little problem with one of our queries, which is executed inside a .Net (4.5) application via System.Data.SqlClient.SqlCommand.
The problem is, that the query is going to perform a Table-Scan which is very slow. So the execution plan shows the Table-Scan here
Screenshot:
The details:
So the text shows, that the filter to Termine.Datum and Termine.EndDatum causing the Table-Scan. But why is the SQL-Server ignoring the Indexes? There are two indexes on Termine.Datum and Termine.EndDatum. We also tryed to add a third one with Datum and EndDatum combined.
The indexes are all non-clustered indexes and both fields are DateTime.

It decides on Table Scan based on Estimated number of rows 124844 where as your actual rows are only 831.
Optimizer thinks that to traverse 124844 it will better do scan in table instead of Index Seek.
Also need to check about other columns selected apart from Index. If you have selected other columns apart from Index it has to Do RID Lookup after doing index seek, Optimizer might think instead of RID lookup it preferred to go with Table Scan.
First fix: Update the statistics and provide enough information to optimizer to choose better plan.

Can you provide the full query? I see that you are pulling a range of data that span a range of 3 months. If this range is a high percentage of the dataset it might be scanning due to you attempting to return such a large percentage of the data. If the index is not selective enough it won't get picked up.
Also...
You have an OR clause in the filter. From looking at the predicate in the screenshot you provided it looks like you might be missing () around the two different filters. This might also lead to the scan.
One more thing...
OR clauses can sometimes lead to bad plans - an alternative is to split the query into two UNIONED queries each with the different OR in it. If you provide the query I should be able to give you a re-written version to show this.

Related

SQL Server - what kind of index should I create?

I need to make queries such as
SELECT
Url, COUNT(*) AS requests, AVG(TS) AS avg_timeSpent
FROM
myTable
WHERE
Url LIKE '%/myController/%'
GROUP BY
Url
run as fast as possible.
The columns selected and grouped are almost always the same, being the difference, an extra column on the select and group by (the column tenantId)
What kind of index should I create to help me run this scenario?
Edit 1:
If I change my base query to '/myController/%' (note there's no % at the begging) would it be better?
This is a query that cannot be sped up with an index. The DBMS cannot know beforehand how many records will match the condition. It may be 100% or 0.001%. There is no clue for the DBMS to guess this. And access via an index only makes sense when a small percentage of rows gets selected.
Moreover, how can such an index be structured and useful? Think of a telephone book and you want to find all names that contain 'a' or 'rs' or 'ems' or whatever. How would you order the names in the book to find all these and all other thinkable letter combinations quickly? It simply cannot be done.
So the DBMS will read the whole table record for record, no matter whether you provide an index or not.
There may be one exception: With an index on URL and TS, you'd have both columns in the index. So the DBMS might decide to read the whole index rather than the whole table then. This may make sense for instance when the table has hundreds of columns or when the table is very fragmented or whatever. I don't know. A table is usually much easier to read sequentially than an index. You can still just try, of course. It doesn't really hurt to create an index. Either the DBMS uses it or not for a query.
Columnstore indexes can be quite fast at such tasks (aggregates on globals scans). But even they will have trouble handling a LIKE '%/mycontroler/%' predicate. I recommend you parse the URL once into an additional computed field that projects the extracted controller of your URL. But the truth is that looking at global time spent on a response URL reveals very little information. It will contain data since the beginning of time, long since obsolete by newer deployments, and not be able to capture recent trends. A filter based on time, say per hour or per day, now that is a very useful analysis. And such a filter can be excellently served by a columnstore, because of natural time order and segment elimination.
Based on your posted query you should have a index on Url column. In general columns which are involved in WHERE , HAVING, ORDER BY and JOIN ON condition should be indexed.
You should get the generated query plan for the said query and see where it's taking more time. Again based n the datatype of the Url column you may consider having a FULLTEXT index on that column

SQL Server detecting slow vs fast columns

I have an ASP.Net MVC application & I use PetaPoco and SQL Server.
My usecase is I want to allow a search on a table with many fields, but hide fields that are "slow" (ie) unindexed. I'm going to modify the PetaPoco T4 template to decorate this information on the columns.
I found this answer that gives you a list of tables vs indexes. My concern is it shows a lot of columns for a particular table. Is the query given in the answer reliable for my usecase ? (ie) can the columns shown be included in the where clause & it wont be slow ? I have some tables that have 40M rows. I dont want to include slow columns in the where condition.
Or is there a better way to solve this problem ?
There are no slow columns in the sense of your question. You have to distinguish between two uses of a column.
Searching. When the column appears in the WHERE, or JOIN clause, it slows down your query, if there is no index for it.
Returning in recordset. If the column appears in the SELECT clause, its content must be returned with each row, whether you need it, or not. So for queries returning many rows, each additional column to be returned means a performance penalty.
Conclusion: As you can see, the performance impact of SELECTED columns does NOT DEPEND on index, but on the number of the returned rows.
Advice: Create indexes for columns used to search and do not return unnecessary columns. Let your queries be as specific as possible in terms of both, selected columns and returned rows.
I think it will not be that simple. You can check indexed columns using the suggested approach (or similar), but the fact that a column is present in an index does not mean your query will necessarily utilize it efficiently. For example if an index is created on columns A, B and C (in that order) and you only have a 'WHERE' clause on B or C (but not on A) you will probably end up with index scan rather than index seek and your query is likely to be slower than expected.
So your check should take into account the sequence of the columns in the indices - instantly fast columns (in your situation) might probably be considered the first columns of the indices (where ic.index_column_id = 1 in the post you mentioned). Columns that are not first in the indices (i.e. ic.index_column_id > 1) will be fast as long as the first columns are also included in the filter. There are other things you might also need to take into account (e.g. cardinality), but this is important to make sure you drive index seeks rather than scans.

SQL Server not using proper index for query

I have a table on SQL Server with about 10 million rows. It has a nonclustered index ClearingInfo_idx which looks like:
I am running query which isn't using ClearingInfo_idx index and execution plan looks like this:
Can anyone explain why query optimizer chooses to scan clustered index ?
I think it suggests this index because you use a sharp search for the two columns immediate and clearingOrder_clearingOrderId. Those values are numbers, which were good to search. The column status is nvarchar which isn't the best for a search, and due to your search with in, SQL Server needs to search two of those values.
SQL Server would use the two number columns to get a faster result and searching in the status in the second round after the number of possible results is reduced due to the exact search on the two number columns.
Hopefully you get my opinion. :-) Otherwise, just ask again. :-)
As Luaan already pointed out, the likely reason the system prefers to scan the clustered index is because
you're asking for all fields to be returned (SELECT *), change this to fields that are present in the index ( = index fields + clustered index-fields) and you'll probably see it using just the index. If you'd need a couple of extra fields you can consider INCLUDEing those in the index.
the order of the index fields isn't very optimal. Additionally it might well be that the 'content' of the field isn't very helpful either. How many distinct values are present in the index-columns and how are they spread around? If you're WHERE covers 90% of the records there is very little reason to first create a (huge) list of keys and then go fetch those from the clustered index later on. Scanning the latter directly then makes much more sense.
Did you try the suggested index? Not sure what other queries run on the table, but for this particular query it seems like a valid replacement to me. If the replacement will satisfy the other queries is another question off course. Adding extra indexes might negatively impact your IUD operations and it will require more disk-space; there is no such thing as a free lunch =)
That said, if performance is an issue, have you considered a filtered index? (again, no such thing as a free lunch; it's all about priorities)

How to deal with billions of records in an sql server?

I have an sql server 2008 database along with 30000000000 records in one of its major tables. Now we are looking for the performance for our queries. We have done with all indexes. I found that we can split our database tables into multiple partitions, so that the data will be spread over multiple files, and it will increase the performance of the queries.
But unfortunatly this functionality is only available in the sql server enterprise edition, which is unaffordable for us.
Is there any way to opimize for the query performance? For example, the query
select * from mymajortable where date between '2000/10/10' and '2010/10/10'
takes around 15 min to retrieve around 10000 records.
A SELECT * will obviously be less efficiently served than a query that uses a covering index.
First step: examine the query plan and look for and table scans and the steps taking the most effort(%)
If you don’t already have an index on your ‘date’ column, you certainly need one (assuming sufficient selectivity). Try to reduce the columns in the select list, and if ‘sufficiently’ few, add these to the index as included columns (this can eliminate bookmark lookups into the clustered index and boost performance).
You could break your data up into separate tables (say by a date range) and combine via a view.
It is also very dependent on your hardware (# cores, RAM, I/O subsystem speed, network bandwidth)
Suggest you post your table and index definitions.
First always avoid Select * as that will cause the select to fetch all columns and if there is an index with just the columns you need you are fetching a lot of unnecessary data. Using only the exact columns you need to retrieve lets the server make better use of indexes.
Secondly, have a look on included columns for your indexes, that way often requested data can be included in the index to avoid having to fetch rows.
Third, you might try to use an int column for the date and convert the date into an int. Ints are usually more effective in range searches than dates, especially if you have time information to and if you can skip the time information the index will be smaller.
One more thing to check for is the Execution plan the server uses, you can see this in management studio if you enable show execution plan in the menu. It can indicate where the problem lies, you can see which indexes it tries to use and sometimes it will suggest new indexes to add.
It can also indicate other problems, Table Scan or Index Scan is bad as it indicates that it has to scan through the whole table or index while index seek is good.
It is a good source to understand how the server works.
If you add an index on date, you will probably speed up your query due to an index seek + key lookup instead of a clustered index scan, but if your filter on date will return too many records the index will not help you at all because the key lookup is executed for each result of the index seek. SQL server will then switch to a clustered index scan.
To get the best performance you need to create a covering index, that is, include all you columns you need in the "included columns" part of your index, but that will not help you if you use the select *
another issue with the select * approach is that you can't use the cache or the execution plans in an efficient way. If you really need all columns, make sure you specify all the columns instead of the *.
You should also fully quallify the object name to make sure your plan is reusable
you might consider creating an archive database, and move anything after, say, 10-20 years into the archive database. this should drastically speed up your primary production database but retains all of your historical data for reporting needs.
What type of queries are we talking about?
Is this a production table? If yes, look into normalizing a bit more and see if you cannot go a bit further as far as normalizing the DB.
If this is for reports, including a lot of Ad Hoc report queries, this screams data warehouse.
I would create a DW with seperate pre-processed reports which include all the calculation and aggregation you could expect.
I am a bit worried about a business model which involves dealing with BIG data but does not generate enough revenue or even attract enough venture investment to upgrade to enterprise.

Can Multiple Indexes Work Together?

Suppose I have a database table with two fields, "foo" and "bar". Neither of them are unique, but each of them are indexed. However, rather than being indexed together, they each have a separate index.
Now suppose I perform a query such as SELECT * FROM sometable WHERE foo='hello' AND bar='world'; My table a huge number of rows for which foo is 'hello' and a small number of rows for which bar is 'world'.
So the most efficient thing for the database server to do under the hood is use the bar index to find all fields where bar is 'world', then return only those rows for which foo is 'hello'. This is O(n) where n is the number of rows where bar is 'world'.
However, I imagine it's possible that the process would happen in reverse, where the fo index was used and the results searched. This would be O(m) where m is the number of rows where foo is 'hello'.
So is Oracle smart enough to search efficiently here? What about other databases? Or is there some way I can tell it in my query to search in the proper order? Perhaps by putting bar='world' first in the WHERE clause?
Oracle will almost certainly use the most selective index to drive the query, and you can check that with the explain plan.
Furthermore, Oracle can combine the use of both indexes in a couple of ways -- it can convert btree indexes to bitmaps and perform a bitmap ANd operation on them, or it can perform a hash join on the rowid's returned by the two indexes.
One important consideration here might be any correlation between the values being queried. If foo='hello' accounts for 80% of values in the table and bar='world' accounts for 10%, then Oracle is going to estimate that the query will return 0.8*0.1= 8% of the table rows. However this may not be correct - the query may actually return 10% of the rwos or even 0% of the rows depending on how correlated the values are. Now, depending on the distribution of those rows throughout the table it may not be efficient to use an index to find them. You may still need to access (say) 70% or the table blocks to retrieve the required rows (google for "clustering factor"), in which case Oracle is going to perform a ful table scan if it gets the estimation correct.
In 11g you can collect multicolumn statistics to help with this situation I believe. In 9i and 10g you can use dynamic sampling to get a very good estimation of the number of rows to be retrieved.
To get the execution plan do this:
explain plan for
SELECT *
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Contrast that with:
explain plan for
SELECT /*+ dynamic_sampling(4) */
*
FROM sometable
WHERE foo='hello' AND bar='world'
/
select * from table(dbms_xplan.display)
/
Eli,
In a comment you wrote:
Unfortunately, I have a table with lots of columns each with their own index. Users can query any combination of fields, so I can't efficiently create indexes on each field combination. But if I did only have two fields needing indexes, I'd completely agree with your suggestion to use two indexes. – Eli Courtwright (Sep 29 at 15:51)
This is actually rather crucial information. Sometimes programmers outsmart themselves when asking questions. They try to distill the question down to the seminal points but quite often over simplify and miss getting the best answer.
This scenario is precisely why bitmap indexes were invented -- to handle the times when unknown groups of columns would be used in a where clause.
Just in case someone says that BMIs are for low cardinality columns only and may not apply to your case. Low is probably not as small as you think. The only real issue is concurrency of DML to the table. Must be single threaded or rare for this to work.
Yes, you can give "hints" with the query to Oracle. These hints are disguised as comments ("/* HINT */") to the database and are mainly vendor specific. So one hint for one database will not work on an other database.
I would use index hints here, the first hint for the small table. See here.
On the other hand, if you often search over these two fields, why not create an index on these two? I do not have the right syntax, but it would be something like
CREATE INDEX IX_BAR_AND_FOO on sometable(bar,foo);
This way data retrieval should be pretty fast. And in case the concatenation is unique hten you simply create a unique index which should be lightning fast.
First off, I'll assume that you are talking about nice, normal, standard b*-tree indexes. The answer for bitmap indexes is radically different. And there are lots of options for various types of indexes in Oracle that may or may not change the answer.
At a minimum, if the optimizer is able to determine the selectivity of a particular condition, it will use the more selective index (i.e. the index on bar). But if you have skewed data (there are N values in the column bar but the selectivity of any particular value is substantially more or less than 1/N of the data), you would need to have a histogram on the column in order to tell the optimizer which values are more or less likely. And if you are using bind variables (as all good OLTP developers should), depending on the Oracle version, you may have issues with bind variable peeking.
Potentially, Oracle could even do an on the fly conversion of the two b*-tree indexes to bitmaps and combine the bitmaps in order to use both indexes to find the rows it needs to retrieve. But this is a rather unusual query plan, particularly if there are only two columns where one column is highly selective.
So is Oracle smart enough to search
efficiently here?
The simple answer is "probably". There are lots'o' very bright people at each of the database vendors working on optimizing the query optimizer, so it's probably doing things that you haven't even thought of. And if you update the statistics, it'll probably do even more.
I'm sure you can also have Oracle display a query plan so you can see exactly which index is used first.
The best approach would be to add foo to bar's index, or add bar to foo's index (or both). If foo's index also contains an index on bar, that additional indexing level will not affect the utility of the foo index in any current uses of that index, nor will it appreciably affect the performance of maintaining that index, but it will give the database additional information to work with in optimizing queries such as in the example.
It's better than that.
Index Seeks are always quicker than full table scans. So behind the scenes Oracle (and SQL server for that matter) will first locate the range of rows on both indices. It will then look at which range is shorter (seeing that it's an inner join), and it will iterate the shorter range to find the matches with the larger of the two.
You can provide hints as to which index to use. I'm not familiar with Oracle, but in Mysql you can use USE|IGNORE|FORCE_INDEX (see here for more details). For best performance though you should use a combined index.

Resources